Ambari 2.1.2. Reference Guide

Pivotal HD

Pivotal HD

Ambari Reference Guide


Chapter 1. Customizing PHD Services

Defining Service Users and Groups for a PHD 3.x Stack

The individual services in Hadoop run under the ownership of their respective Unix accounts. These accounts are known as service users. These service users belong to a special Unix group. "Smoke Test" is a service user dedicated specifically for running smoke tests on components during installation using the Services View of the Ambari Web GUI. You can also run service checks as the "Smoke Test" user on-demand after installation. You can customize any of these users and groups using the Misc tab during the Customize Services installation step.

[Note]Note

Use the Skip Group Modifications option to not modify the Linux groups in the cluster. Choosing this option is typically required if your environment manages groups using LDAP and not on the local Linux machines.

If you choose to customize names, Ambari checks to see if these custom accounts already exist. If they do not exist, Ambari creates them. The default accounts are always created during installation whether or not custom accounts are specified. These default accounts are not used and can be removed post-install.

[Note]Note

All new service user accounts, and any existing user accounts used as service users, must have a UID >= 1000.

Service Users

Service*

Component

Default User Account

Ambari Metrics

Metrics Collector, Metrics Monitor

ams

HBase

MasterServer RegionServer

hbase

HDFS

NameNode SecondaryNameNode DataNode

hdfs

Hive

Hive Metastore, HiveServer2

hive

Knox

Knox Gateway

knox

MapReduce2

HistoryServer

mapred

Oozie

Oozie Server

oozie

PostgreSQL

PostgreSQL (with Ambari Server)

postgres (Created as part of installing the default PostgreSQL database with Ambari Server. If you are not using the Ambari PostgreSQL database, this user is not needed.)

Ranger

Ranger Admin, Ranger Usersync

ranger

Spark

Spark History Server

spark

Tez

Tez clients

tez

WebHCat

WebHCat Server

hcat

YARN

NodeManager ResourceManager

yarn

ZooKeeper

ZooKeeper

zookeeper

*For all components, the Smoke Test user performs smoke tests against cluster services as part of the install process. It also can perform these on-demand, from the Ambari Web UI. The default user account for the smoke test user is ambari-qa.

Service Groups

Service

Components

Default Group Account

All

All

hadoop

Knox

Knox Gateway

knox

Ranger

Ranger Admin, Ranger Usersync

ranger

Spark

Spark History Server

spark

Setting Properties That Depend on Service Usernames/Groups

Some properties must be set to match specific service user names or service groups. If you have set up non-default, customized service user names for the HDFS or HBase service or the Hadoop group name, you must edit the following properties, using Services > Service.Name > Configs > Advanced:

HDFS Settings: Advanced

Property Name

Value

dfs.permissions.superusergroup

The same as the HDFS username. The default is "hdfs"

dfs.cluster.administrators

A single space followed by the HDFS username.

dfs.block.local-path-access.user

The HBase username. The default is "hbase".

MapReduce Settings: Advanced

Property Name

Value

mapreduce.cluster.administrators

A single space followed by the Hadoop group name.

Chapter 2. Using Custom Host Names

You can customize the agent registration host name and the public host name used for
 each host in Ambari. Use this capability when "hostname" does not return the public network host name for your machines.

How to Customize the name of a host

How to Customize the name of a host

  1. At the Install Options step in the Cluster Installer wizard, select Perform Manual Registration for Ambari Agents.

  2. Install the Ambari Agents manually on each host, as described in Install the Ambari Agents Manually.

  3. To echo the customized name of the host to which the Ambari agent registers, for every host, create a script like the following example, named
 /var/lib/ambari-agent/hostname.sh. Be sure to chmod the script so it is executable by the Agent. #!/bin/sh
 echo <ambari_hostname>

    where <ambari_hostname> is the host name to use for Agent registration.

  4. Open /etc/ambari-agent/conf/ambari-agent.ini on every host, using a text editor.

  5. Add to the [agent] section the following line:

    hostname_script=/var/lib/ambari-agent/hostname.sh

    where /var/lib/ambari-agent/hostname.sh is the name of your custom echo script.

  6. To generate a public host name for every host, create a script like the following example, named var/lib/ambari-agent/public_hostname.sh to show the name for that host in the UI. Be sure to chmod the script so it is executable by the Agent. #!/bin/sh <hostname> -f

    where <hostname> is the host name to use for Agent registration.

  7. Open /etc/ambari-agent/conf/ambari-agent.ini on every host, using a text editor.

  8. Add to the [agent] section the following line:

    public_hostname_script=/var/lib/ambari-agent/public_hostname.sh

  9. If applicable, add the host names to /etc/hosts on every host.

  10. Restart the Agent on every host for these changes to take effect.

    ambari-agent restart

Chapter 3. Moving the Ambari Server

To transfer an Ambari Server that uses the default, embedded, PostgreSQL database from one host to a new host, use the following instructions:

  1. Back up current data - from the original Ambari Server database.

  2. Update all Agents - to point to the new Ambari Server.

  3. Install the New Ambari Server - on the new host and populate databases with information from the original Server.

[Note]Note

If your Ambari Server is using one of the non-default databases (such as MySQL, Oracle, or an existing PostgreSQL instance) then be sure to follow backup, restore, and stop/start procedures that match that database type.

Back up Current Data

  1. On the Ambari Server host, stop the original Ambari Server.

    ambari-server stop

  2. Create a directory to hold the database backups.

    cd /tmp

    mkdir dbdumps/

    cd dbdumps/

  3. Create the database backups.

    pg_dump -U {ambari.db.username} -f ambari.sql

    Password: {ambari.db.password}

    where the following:

    Variable

    Description

    Default

    ambari.db.username

    The database username.

    ambari

    ambari.db.password

    The database password.

    bigdata

  4. Create a backup of the Ambari Server meta info.

    ambari-server backup

Update all Agents

  1. On each agent host, stop the agent.

    ambari-agent stop

  2. Remove old agent certificates (if any exist).

    rm /var/lib/ambari-agent/keys/*

  3. Using a text editor, edit /etc/ambari-agent/conf/ambari-agent.ini to point to the new host.

    [server]

    hostname={new.ambari.server.fqdn}

    url_port=8440

    secured_url_port=8441

Install the New Ambari Server

  1. Install the new Ambari Server on the new host.

    yum install ambari-server

  2. Run setup the Ambari Server and setup similar to how the original Ambari Server is configured.

    ambari-server setup

  3. Restart the PostgreSQL instance.

    service postgresql restart

  4. Open the PostgreSQL interactive terminal.

    su - postgres

    psql

  5. Using the interactive terminal, drop the "ambari" database created by the new ambari setup and install.

    drop database ambari;

  6. Check to make sure the databases have been dropped. The "ambari" databases should not be listed.

    \l

  7. Create new "ambari" database to hold the transferred data.

    create database ambari;

  8. Exit the PostgreSQL interactive terminal.

    \q

  9. Copy the saved data (/tmp/dbdumps/ambari.sql) from Back up Current Data to the new Ambari Server host.

  10. Load the saved data into the new database.

    psql -d ambari -f /tmp/dbdumps/ambari.sql

  11. Start the new Server.

    ambari-server start

  12. On each Agent host, start the Ambari Agent.

    ambari-agent start

  13. Open Ambari Web. Point your browser to:

    <new.Ambari.Server>:8080

The new Ambari Server is ready to use.

Chapter 4. Configuring LZO Compression

LZO is a lossless data compression library that favors speed over compression ratio. Ambari does not install nor enable LZO Compression by default. To enable LZO compression in your PHD cluster, you must Configure core-site.xml for LZO.

Optionally, you can implement LZO to optimize Hive queries in your cluster for speed. For more information about using LZO compression with Hive, see Running Compression with Hive Queries.

Configure core-site.xml for LZO

  1. Browse to Ambari Web > Services > HDFS > Configs, then expand Advanced core-site.

  2. Find the io.compression.codecs property key.

  3. Append to the io.compression.codecs property key, the following value: com.hadoop.compression.lzo.LzoCodec

  4. Add a description of the config modification, then choose Save.

  5. Expand the Custom core-site.xml section.

  6. Select Add Property.

  7. Add to Custom core-site.xml the following property key and value

    Property Key

    Property Value

    io.compression.codec.lzo.class

    com.hadoop.compression.lzo.LzoCodec

  8. Choose Save.

  9. Add a description of the config modification, then choose Save.

  10. Restart the HDFS, MapReduce2 and YARN services.

    [Note]Note

    If performing a Restart or a Restart All does not start the required package install, you may need to stop, then start the HDFS service to install the necessary LZO packages. Restart is only available for a service in the "Runnning" or "Started" state.

Running Compression with Hive Queries

Running Compression with Hive Queries requires creating LZO files. To create LZO files, use one of the following procedures:

Create LZO Files

  1. Create LZO files as the output of the Hive query.

  2. Use lzo command utility or your custom Java to generate lzo.index for the .lzo files.

Hive Query Parameters

Prefix the query string with these parameters:

SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec
SET hive.exec.compress.output=true
SET mapreduce.output.fileoutputformat.compress=true

For example:

hive -e "SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec;SET hive.exec.compress.output=true;SET mapreduce.output.fileoutputformat.compress=true;"

Write Custom Java to Create LZO Files

  1. Create text files as the output of the Hive query.

  2. Write custom Java code to

    • convert Hive query generated text files to .lzo files

    • generate lzo.index files for the .lzo files

Hive Query Parameters

Prefix the query string with these parameters:

SET hive.exec.compress.output=false
SET mapreduce.output.fileoutputformat.compress=false

For example:

hive -e "SET hive.exec.compress.output=false;SET mapreduce.output.fileoutputformat.compress=false;<query-string>"

Chapter 5. Using Non-Default Databases

Use the following instructions to prepare a non-default database for Ambari, Hive, or Oozie. You must complete these instructions before you set up the Ambari Server by running ambari-server setup.

[Important]Important

Using the Microsoft SQL Server or SQL Anywhere database options are not supported.

Using Non-Default Databases - Ambari

The following sections describe how to use Ambari with an existing database, other than the embedded PostgreSQL database instance that Ambari Server uses by default.

[Important]Important

Using the Microsoft SQL Server or SQL Anywhere database options are not supported.

Using Ambari with Oracle

To set up Oracle for use with Ambari:

  1. On the Ambari Server host, install the appropriate JDBC.jar file.

    1. Download the Oracle JDBC (OJDBC) driver from http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html.

    2. Select Oracle Database 11g Release 2 - ojdbc6.jar.

    3. Copy the .jar file to the Java share directory.

      cp ojdbc6.jar /usr/share/java

    4. Make sure the .jar file has the appropriate permissions - 644.

  2. Create a user for Ambari and grant that user appropriate permissions.

    For example, using the Oracle database admin utility, run the following commands:

    # sqlplus sys/root as sysdba

    CREATE USER <AMBARIUSER> IDENTIFIED BY <AMBARIPASSWORD> default tablespace “USERS” temporary tablespace “TEMP”;

    GRANT unlimited tablespace to <AMBARIUSER>;

    GRANT create session to <AMBARIUSER>;

    GRANT create TABLE to <AMBARIUSER>;

    GRANT create SEQUENCE to <AMBARIUSER>;

    QUIT;

    Where <AMBARIUSER> is the Ambari user name and <AMBARIPASSWORD> is the Ambari user password.

  3. Load the Ambari Server database schema.

    1. You must pre-load the Ambari database schema into your Oracle database using the schema script.

      sqlplus <AMBARIUSER>/<AMBARIPASSWORD> < Ambari-DDL-Oracle-CREATE.sql

    2. Find the Ambari-DDL-Oracle-CREATE.sql file in the /var/lib/ambari-server/resources/ directory of the Ambari Server host after you have installed Ambari Server.

  4. When setting up the Ambari Server, select Advanced Database Configuration > Option [2] Oracle and respond to the prompts using the username/password credentials you created in step 2.

Using Ambari with MySQL

To set up MySQL for use with Ambari:

  1. On the Ambari Server host, install the connector.

    1. Install the connector

      RHEL/CentOS

      yum install mysql-connector-java

      SLES

      zypper install mysql-connector-java

    2. Confirm that .jar is in the Java share directory.

      ls /usr/share/java/mysql-connector-java.jar

    3. Make sure the .jar file has the appropriate permissions - 644.

  2. Create a user for Ambari and grant it permissions.

    • For example, using the MySQL database admin utility:

      # mysql -u root -p

      CREATE USER '<AMBARIUSER>'@'%' IDENTIFIED BY '<AMBARIPASSWORD>';

      GRANT ALL PRIVILEGES ON *.* TO '<AMBARIUSER>'@'%';

      CREATE USER '<AMBARIUSER>'@'localhost' IDENTIFIED BY '<AMBARIPASSWORD>';

      GRANT ALL PRIVILEGES ON *.* TO '<AMBARIUSER>'@'localhost';

      CREATE USER '<AMBARIUSER>'@'<AMBARISERVERFQDN>' IDENTIFIED BY '<AMBARIPASSWORD>';

      GRANT ALL PRIVILEGES ON *.* TO '<AMBARIUSER>'@'<AMBARISERVERFQDN>';

      FLUSH PRIVILEGES;

    • Where <AMBARIUSER> is the Ambari user name, <AMBARIPASSWORD> is the Ambari user password and <AMBARISERVERFQDN> is the Fully Qualified Domain Name of the Ambari Server host.

  3. Load the Ambari Server database schema.

    • You must pre-load the Ambari database schema into your MySQL database using the schema script.

      mysql -u <AMBARIUSER> -p

      CREATE DATABASE <AMBARIDATABASE>;

      USE <AMBARIDATABASE>;

      SOURCE Ambari-DDL-MySQL-CREATE.sql;

    • Where <AMBARIUSER> is the Ambari user name and <AMBARIDATABASE> is the Ambari database name.

      Find the Ambari-DDL-MySQL-CREATE.sql file in the /var/lib/ambari-server/resources/ directory of the Ambari Server host after you have installed Ambari Server.

  4. When setting up the Ambari Server, select Advanced Database Configuration > Option [3] MySQL and enter the credentials you defined in Step 2. for user name, password and database name.

Using Ambari with PostgreSQL

To set up PostgreSQL for use with Ambari:

  1. Create a user for Ambari and grant it permissions.

    • Using the PostgreSQL database admin utility:

      # sudo -u postgres psql

      CREATE DATABASE <AMBARIDATABASE>;

      CREATE USER <AMBARIUSER> WITH PASSWORD ‘<AMBARIPASSWORD>’;

      GRANT ALL PRIVILEGES ON DATABASE <AMBARIDATABASE> TO <AMBARIUSER>;

      \connect <AMBARIDATABASE>;

      CREATE SCHEMA <AMBARISCHEMA> AUTHORIZATION <AMBARIUSER>;

      ALTER SCHEMA <AMBARISCHEMA> OWNER TO <AMBARIUSER>;

      ALTER ROLE <AMBARIUSER> SET search_path to ‘<AMBARISCHEMA>’, 'public';

    • Where <AMBARIUSER> is the Ambari user name <AMBARIPASSWORD> is the Ambari user password, <AMBARIDATABASE> is the Ambari database name and <AMBARISCHEMA> is the Ambari schema name.

  2. Load the Ambari Server database schema.

    • You must pre-load the Ambari database schema into your PostgreSQL database using the schema script.

      # psql -U <AMBARIUSER> -d <AMBARIDATABASE>

      \connect <AMBARIDATABASE>;

      \i Ambari-DDL-Postgres-CREATE.sql;

    • Find the Ambari-DDL-Postgres-CREATE.sql file in the /var/lib/ambari-server/resources/ directory of the Ambari Server host after you have installed Ambari Server.

  3. When setting up the Ambari Server, select Advanced Database Configuration > Option[4] PostgreSQL and enter the credentials you defined in Step 2. for user name, password, and database name.

Troubleshooting Non-Default Databases with Ambari

Use these topics to help troubleshoot any issues you might have installing Ambari with an existing Oracle database.

Problem: Ambari Server Fails to Start: No Driver

Check /var/log/ambari-server/ambari-server.log for the following error:

ExceptionDescription:Configurationerror.Class[oracle.jdbc.driver.OracleDriver] not found.

The Oracle JDBC.jar file cannot be found.

Solution

Make sure the file is in the appropriate directory on the Ambari server and re-run ambari-server setup. Review the load database procedure appropriate for your database type in Using Non-Default Databases - Ambari.

Problem: Ambari Server Fails to Start: No Connection

Check /var/log/ambari-server/ambari-server.log for the following error:

The Network Adapter could not establish the connection Error Code: 17002

Ambari Server cannot connect to the database.

Solution

Confirm that the database host is reachable from the Ambari Server and is correctly configured by reading /etc/ambari-server/conf/ambari.properties. server.jdbc.url=jdbc:oracle:thin:@oracle.database.hostname:1521/ambaridb server.jdbc.rca.url=jdbc:oracle:thin:@oracle.database.hostname:1521/ambari

Problem: Ambari Server Fails to Start: Bad Username

Check /var/log/ambari-server/ambari-server.log for the following error:

Internal Exception: java.sql.SQLException:ORA­01017: invalid username/password; logon denied

You are using an invalid username/password.

Solution

Confirm the user account is set up in the database and has the correct privileges. See Step 3 above.

Problem: Ambari Server Fails to Start: No Schema

Check /var/log/ambari-server/ambari-server.log for the following error:

Internal Exception: java.sql.SQLSyntaxErrorException: ORA­00942: table or view does not exist

The schema has not been loaded.

Solution

Confirm you have loaded the database schema. Review the load database schema procedure appropriate for your database type in Using Non-Default Databases - Ambari.

Using Non-Default Databases - Hive

The following sections describe how to use Hive with an existing database, other than the MySQL database instance that Ambari installs by default.

[Important]Important

Using the Microsoft SQL Server or SQL Anywhere database options are not supported.

Using Hive with Oracle

To set up Oracle for use with Hive:

  1. On the Ambari Server host, stage the appropriate JDBC driver file for later deployment.

    1. Download the Oracle JDBC (OJDBC) driver from http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html.

    2. Select Oracle Database 11g Release 2 - ojdbc6.jar and download the file.

    3. Make sure the .jar file has the appropriate permissions - 644.

    4. Execute the following command, adding the path to the downloaded .jar file:

      ambari-server setup --jdbc-db=oracle --jdbc-driver=/path/to/downloaded/ojdbc6.jar

  2. Create a user for Hive and grant it permissions.

    • Using the Oracle database admin utility:

      # sqlplus sys/root as sysdba

      CREATE USER <HIVEUSER> IDENTIFIED BY <HIVEPASSWORD>;

      GRANT SELECT_CATALOG_ROLE TO <HIVEUSER>;

      GRANT CONNECT, RESOURCE TO <HIVEUSER>;

      QUIT;

    • Where <HIVEUSER> is the Hive user name and <HIVEPASSWORD> is the Hive user password.

  3. Load the Hive database schema.

    [Important]Important

    Ambari sets up the Hive Metastore database schema automatically.

    You do not need to pre-load the Hive Metastore database schema into your Oracle database for a PHD 3.x Stack.

Using Hive with MySQL

To set up MySQL for use with Hive:

  1. On the Ambari Server host, stage the appropriate MySQL connector for later deployment.

    1. Install the connector.

      RHEL/CentOS

      yum install mysql-connector-java*

      SLES

      zypper install mysql-connector-java*

    2. Confirm that mysql-connector-java.jar is in the Java share directory.

      ls /usr/share/java/mysql-connector-java.jar

    3. Make sure the .jar file has the appropriate permissions - 644.

    4. Execute the following command:

      ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar

  2. Create a user for Hive and grant it permissions.

    • Using the MySQL database admin utility:

      # mysql -u root -p

      CREATE USER ‘<HIVEUSER>’@’localhost’ IDENTIFIED BY ‘<HIVEPASSWORD>’;

      GRANT ALL PRIVILEGES ON *.* TO '<HIVEUSER>'@'localhost';

      CREATE USER ‘<HIVEUSER>’@’%’ IDENTIFIED BY ‘<HIVEPASSWORD>’;

      GRANT ALL PRIVILEGES ON *.* TO '<HIVEUSER>'@'%';

      CREATE USER '<HIVEUSER>'@'<HIVEMETASTOREFQDN>'IDENTIFIED BY '<HIVEPASSWORD>';

      GRANT ALL PRIVILEGES ON *.* TO '<HIVEUSER>'@'<HIVEMETASTOREFQDN>';

      FLUSH PRIVILEGES;

    • Where <HIVEUSER> is the Hive user name, <HIVEPASSWORD> is the Hive user password and <HIVEMETASTOREFQDN> is the Fully Qualified Domain Name of the Hive Metastore host.

  3. Create the Hive database.

    The Hive database must be created before loading the Hive database schema.

    # mysql -u root -p

    CREATE DATABASE <HIVEDATABASE>

    Where <HIVEDATABASE> is the Hive database name.

  4. Load the Hive database schema.

    [Important]Important

    Ambari sets up the Hive Metastore database schema automatically.

    You do not need to pre-load the Hive Metastore database schema into your MySQL database for a PHD 3.x Stack.

Using Hive with PostgreSQL

To set up PostgreSQL for use with Hive:

  1. On the Ambari Server host, stage the appropriate PostgreSQL connector for later deployment.

    1. Install the connector.

      RHEL/CentOS

      yum install postgresql-jdbc*

      SLES

      zypper install -y postgresql-jdbc

    2. Confirm that .jar is in the Java share directory.

      ls /usr/share/java/postgresql-jdbc.jar

    3. Change the access mode of the.jar file to 644.

      chmod 644 /usr/share/java/postgresql-jdbc.jar

    4. Execute the following command:

      ambari-server setup --jdbc-db=postgres --jdbc-driver=/usr/share/java/postgresql-jdbc.jar

  2. Create a user for Hive and grant it permissions.

    • Using the PostgreSQL database admin utility:

      echo "CREATE DATABASE <HIVEDATABASE>;" | psql -U postgres

      echo "CREATE USER <HIVEUSER> WITH PASSWORD '<HIVEPASSWORD>';" | psql -U postgres

      echo "GRANT ALL PRIVILEGES ON DATABASE <HIVEDATABASE> TO <HIVEUSER>;" | psql -U postgres

    • Where <HIVEUSER> is the Hive user name, <HIVEPASSWORD> is the Hive user password and <HIVEDATABASE> is the Hive database name.

  3. Load the Hive database schema.

    [Important]Important

    Ambari sets up the Hive Metastore database schema automatically.

    You do not need to pre-load the Hive Metastore database schema into your PostgreSQL database for a PHD 3.x Stack.

Troubleshooting Non-Default Databases with Hive

Use these entries to help you troubleshoot any issues you might have installing Hive with non-default databases.

Problem: Hive Metastore Install Fails Using Oracle

Check the install log:

cp /usr/share/java/${jdbc_jar_name} ${target}] has failures: true

The Oracle JDBC.jar file cannot be found.

Solution

Make sure the file is in the appropriate directory on the Hive Metastore server and click Retry.

Problem: Install Warning when "Hive Check Execute" Fails Using Oracle

Check the install log:

java.sql.SQLSyntaxErrorException: ORA-01754: a table may contain only one column of type LONG

The Hive Metastore schema was not properly loaded into the database.

Solution

Ignore the warning, and complete the install. Check your database to confirm the Hive Metastore schema is loaded. In the Ambari Web GUI, browse to Services > Hive. Choose Service Actions > Service Check to check that the schema is correctly in place.

Problem: Hive Check Execute may fail after completing an Ambari upgrade to version 1.4.2

For secure and non-secure clusters, with Hive security authorization enabled, the Hive service check may fail. Hive security authorization may not be configured properly.

Solution

Two workarounds are possible. Using Ambari Web, in HiveConfigsAdvanced:

  • Disable hive.security.authorization, by setting the hive.security.authorization.enabled value to false.

    or

  • Properly configure Hive security authorization. For example, set the following properties:

    For more information about configuring Hive security, see Metastore Server Security in Hive Authorization and the HCatalog document Storage Based Authorization.

    Hive Security Authorization Settings

    Property

    Value

    hive.security.authorization.manager

    org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider

    hive.security.metastore.authorization.manager

    org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider

    hive.security.authenticator.manager

    org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator

    Metastore Server SecurityHive AuthorizationStorage Based Authorization

Using Non-Default Databases - Oozie

The following sections describe how to use Oozie with an existing database, other than the Derby database instance that Ambari installs by default.

[Important]Important

Using the Microsoft SQL Server or SQL Anywhere database options are not supported.

Using Oozie with Oracle

To set up Oracle for use with Oozie:

  1. On the Ambari Server host, stage the appropriate JDBC driver file for later deployment.

    1. Download the Oracle JDBC (OJDBC) driver from http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html.

    2. Select Oracle Database 11g Release 2 - ojdbc6.jar.

    3. Make sure the .jar file has the appropriate permissions - 644.

    4. Execute the following command, adding the path to the downloaded.jar file:

      ambari-server setup --jdbc-db=oracle --jdbc-driver=/path/to/downloaded/ojdbc6.jar

  2. Create a user for Oozie and grant it permissions.

    Using the Oracle database admin utility, run the following commands:

    # sqlplus sys/root as sysdba

    CREATE USER <OOZIEUSER> IDENTIFIED BY <OOZIEPASSWORD>;

    GRANT ALL PRIVILEGES TO <OOZIEUSER>;

    GRANT CONNECT, RESOURCE TO <OOZIEUSER>;

    QUIT;

    Where <OOZIEUSER> is the Oozie user name and <OOZIEPASSWORD> is the Oozie user password.

Using Oozie with MySQL

To set up MySQL for use with Oozie:

  1. On the Ambari Server host, stage the appropriate MySQL connector for later deployment.

    1. Install the connector.

      RHEL/CentOS

      yum install mysql-connector-java*

      SLES

      zypper install mysql-connector-java*

    2. Confirm that mysql-connector-java.jar is in the Java share directory.

      ls /usr/share/java/mysql-connector-java.jar

    3. Make sure the .jar file has the appropriate permissions - 644.

    4. Execute the following command:

      ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar

  2. Create a user for Oozie and grant it permissions.

    • Using the MySQL database admin utility:

      # mysql -u root -p

      CREATE USER ‘<OOZIEUSER>’@’%’ IDENTIFIED BY ‘<OOZIEPASSWORD>’;

      GRANT ALL PRIVILEGES ON *.* TO '<OOZIEUSER>'@'%';

      FLUSH PRIVILEGES;

    • Where <OOZIEUSER> is the Oozie user name and <OOZIEPASSWORD> is the Oozie user password.

  3. Create the Oozie database.

    • The Oozie database must be created prior.

      # mysql -u root -p

      CREATE DATABASE <OOZIEDATABASE>

    • Where <OOZIEDATABASE> is the Oozie database name.

Using Oozie with PostgreSQL

To set up PostgreSQL for use with Oozie:

  1. On the Ambari Server host, stage the appropriate PostgreSQL connector for later deployment.

    1. Install the connector.

      RHEL/CentOS

      yum install postgresql-jdbc

      SLES

      zypper install -y postgresql-jdbc

    2. Confirm that .jar is in the Java share directory.

      ls /usr/share/java/postgresql-jdbc.jar

    3. Change the access mode of the .jar file to 644.

      chmod 644 /usr/share/java/postgresql-jdbc.jar

    4. Execute the following command:

      ambari-server setup --jdbc-db=postgres --jdbc-driver=/usr/share/java/postgresql-jdbc.jar

  2. Create a user for Oozie and grant it permissions.

    • Using the PostgreSQL database admin utility:

      echo "CREATE DATABASE <OOZIEDATABASE>;" | psql -U postgres

      echo "CREATE USER <OOZIEUSER> WITH PASSWORD '<OOZIEPASSWORD>';" | psql -U postgres

      echo "GRANT ALL PRIVILEGES ON DATABASE <OOZIEDATABASE> TO <OOZIEUSER>;" | psql -U postgres

    • Where <OOZIEUSER> is the Oozie user name, <OOZIEPASSWORD> is the Oozie user password and <OOZIEDATABASE> is the Oozie database name.

Troubleshooting Non-Default Databases with Oozie

Use these entries to help you troubleshoot any issues you might have installing Oozie with non-default databases.

Problem: Oozie Server Install Fails Using MySQL

Check the install log:

cp /usr/share/java/mysql-connector-java.jar usr/lib/oozie/libext/mysql-connector-java.jar has failures: true

The MySQL JDBC.jar file cannot be found.

Solution

Make sure the file is in the appropriate directory on the Oozie server and click Retry.

Problem: Oozie Server Install Fails Using Oracle or MySQL

Check the install log:

Exec[exec cd /var/tmp/oozie && /usr/lib/oozie/bin/ooziedb.sh create -sqlfile oozie.sql -run ] has failures: true

Oozie was unable to connect to the database or was unable to successfully setup the schema for Oozie.

Solution

Check the database connection settings provided during the Customize Services step in the install wizard by browsing back to Customize Services > Oozie. After confirming and adjusting your database settings, proceed forward with the install wizard.

If the Install Oozie Server wizard continues to fail, get more information by connecting directly to the Oozie server and executing the following command as <OOZIEUSER>:

su oozie /usr/lib/oozie/bin/ooziedb.sh create -sqlfile oozie.sql -run

Chapter 6. Setting up an Internet Proxy Server for Ambari

If you plan to use the public repositories for installing the Stack, Ambari Server must have Internet access to confirm access to the repositories and validate the repositories. If your machine requires use of a proxy server for Internet access, you must configure Ambari Server to use the proxy server.

How To Set Up an Internet Proxy Server for Ambari

How To Set Up an Internet Proxy Server for Ambari

  1. On the Ambari Server host, add proxy settings to the following script: /var/lib/ambari-server/ambari-env.sh.

    -Dhttp.proxyHost=<yourProxyHost> -Dhttp.proxyPort=<yourProxyPort>

  2. Optionally, to prevent some host names from accessing the proxy server, define the list of excluded hosts, as follows:

    -Dhttp.nonProxyHosts=<pipe|separated|list|of|hosts>

  3. If your proxy server requires authentication, add the user name and password, as follows:

    -Dhttp.proxyUser=<username> -Dhttp.proxyPassword=<password>

  4. Restart the Ambari Server to pick up this change.

Configuring Ambari to use a proxy server and have Internet access is not required. The Ambari Server must have access to your local repositories.

Chapter 7. Configuring Network Port Numbers

This chapter lists port number assignments required to maintain communication between Ambari Server, Ambari Agents, and Ambari Web.

Default Network Port Numbers - Ambari

The following table lists the default ports used by Ambari Server and Ambari Agent services.

Service

Servers

Default Ports Used

Protocol

Description

Need End User Access?

Configuration Parameters

Ambari Server

Ambari Server host

8080 See Optional: Change the Ambari Server Port for instructions on changing the default port.

http See Configure Ambari Server for Authenticated HTTP for instructions.

Interface to Ambari Web and Ambari REST API

No

Ambari Server

Ambari Server host

8440

https

Handshake Port for Ambari Agents to Ambari Server

No

Ambari Server

Ambari Server host

8441

https

Registration and Heartbeat Port for Ambari Agents to Ambari Server

No

Ambari Agent

All hosts running Ambari Agents

8670 You can change the Ambari Agent ping port in the Ambari Agent configuration.

tcp

Ping port used for alerts to check the health of the Ambari Agent

No

Optional: Changing the Default Ambari Server Port

By default, Ambari Server uses port 8080 to access the Ambari Web UI and the REST API. To change the port number, you must edit the Ambari properties file.

Ambari Server should not be running when you change port numbers. Edit ambari.properties before you start Ambari Server the first time or stop Ambari Server before editing properties.

  1. On the Ambari Server host, open /etc/ambari-server/conf/ambari.properties with a text editor.

  2. Add the client API port property and set it to your desired port value:

    client.api.port=<port_number>

  3. Start or re-start the Ambari Server. Ambari Server now accesses Ambari Web via the newly configured port:

    http://<your.ambari.server>:<port_number>

Chapter 8. Using Ambari Blueprints

Ambari Blueprints provide an API to perform cluster installations. You can build a reusable “blueprint” that defines which Stack to use, how Service Components should be laid out across a cluster and what configurations to set.

After setting up a blueprint, you can call the API to instantiate the cluster by providing the list of hosts to use. The Ambari Blueprint framework promotes reusability and facilitates automating cluster installations without UI interaction.

Learn more about Ambari Blueprints API on the Ambari Wiki.

Chapter 9. Tuning Ambari Performance

For clusters larger than 200 nodes, consider the following tuning options:

  1. Calculate the new, larger cache size, using the following relationship:

    ecCacheSizeValue=60*<cluster_size>

    where <cluster_size> is the number of nodes in the cluster.

  2. On the Ambari Server host, in /etc/ambari-server/conf/ambari-properties, add the following property and value:

    server.ecCacheSize=<ecCacheSizeValue>

    where <ecCacheSizeValue> is the value calculated previously, based on the number of nodes in the cluster.

  3. Add the following properties to adjust the JDBC connection pool settings:

    server.jdbc.connection-pool.acquisition-size=5

    server.jdbc.connection-pool.max-age=0

    server.jdbc.connection-pool.max-idle-time=14400

    server.jdbc.connection-pool.max-idle-time-excess=0

    server.jdbc.connection-pool.idle-test-interval=7200

  4. If using MySQL as the Ambari database, in your MSQL configuration, increase the wait_timeout and interacitve_timeout to 8 hours (28800) and max. connections from 32 to 128.

    [Important]Important

    It is critical that the Ambari configuration for server.jdbc.connection-pool.max-idle-time and server.jdbc.connection-pool.idle-test-interval must be lower than the MySQL wait_timeout and interactive_timeout set on the MySQL side. If you choose to decrease these timeout values, adjust downserver.jdbc.connection-pool.max-idle-time and server.jdbc.connection-pool.idle-test-interval accordingly in the Ambari configuration so that they are less than wait_timeout and interactive_timeout.

  5. Restart Ambari Server.

    ambari-server restart

  6. If you are using the Ambari Metrics service, you might want to consider switching from the default embedded mode to distributed mode, as well as other tuning options. See Tuning Ambari Metrics for more information.

Chapter 10. Tuning Ambari Metrics

Ambari Metrics System ("AMS") is a system for collecting, aggregating and serving Hadoop and system metrics in Ambari-managed clusters. AMS has three primary components: Metrics Collector, Metrics Monitors and Hadoop Sinks.

  • The Metrics Monitors are installed and run on each host in the cluster to collect system-level metrics and publish to the Metrics Collector.

  • The Hadoop Sinks plug into the various Hadoop components to publish Hadoop metrics to the Metrics Collector.

  • The Metrics Collector is a daemon that runs on a specific host in the cluster and receives data from the registered publishers, the Monitors and Sinks.

The following diagram provides a high-level illustration of how the components of AMS work together to collect metrics and make those metrics available to Ambari.

  1. Metrics Monitors (on each host), send system-level metrics to Collector

  2. Hadoop Sinks (on each host), send system-level metrics to Collector

  3. Metrics Collector stores and aggregates metrics

  4. Ambari exposes REST API for metrics retrieval

  5. Ambari REST API feed Ambari Web UI

To get optimal performance from the Ambari Metrics System, you should review the following Collector configuration options and the General Guidelines.

Option Description
Collector Modes The Collector can run in two modes: embedded mode and distributed mode. These modes impact where metrics data is stored and how the Collector process runs. See Collector Modes for more information.
Aggregated TTL Settings The Time to Live settings for aggregated metrics. This impacts the amount of data that is stored and how long the data is retained. See Aggregated Metrics TTL for more information.
Memory Settings Memory properties for the Collector components. These settings impact the overall performance of the Collector. See Memory Settings for more information.

Collector Modes

The Metrics Collector is built using Hadoop technologies such as HBase and ATS. The Collector can store metrics data on the local filesystem, referred to as "embedded mode" or use an external HDFS, referred to as "distributed mode". By default, the Collector runs in embedded mode. In embedded mode, the Collector will capture and write metrics to the local file system on the host where the Collector is running. As well, all the Collector runs in a single process on that host.

[Important]Important

When running in embedded mode, you should confirm the "hbase.rootdir" and "hbase.tmp.dir" directory configurations in Ambari Metrics > Configs > Advanced > ams-hbase-site are using a sufficiently sized and not heavily utilized partition, such as:

file:///grid/0/var/lib/ambari-metrics-collector/hbase.

Refer to General Guidelines for more information on Disk Space recommendations.

Another critical factor in embedded mode is the TTL settings, which manage how much data will be stored. Refer to Aggregated Metrics TTL for more information on these settings.

When the Collector is configured for distributed mode, the Collector writes metrics to HDFS and the components will run in distributed processes. This mode helps manage CPU and memory consumption.

To switch the Metrics Collector from embedded mode to distributed mode, in Ambari Web, browse to Services > Ambari Metrics > Configs, make the following changes, then restart the Metrics Collector.

Configuration Section Property Description Value
General

Metrics Service operation mode (timeline.metrics.service.operation.mode)

Designates whether to run in distributed or embedded mode. distributed
Advanced ams-hbase-site

hbase.cluster.distributed

Indicates AMS will run in distributed mode. true

Advanced ams-hbase-site

hbase.rootdir see note 1

The HDFS directory location where metrics will be stored. hdfs://$NAMENODE_FQDN:8020/apps/ams/metrics

Note 1: If your cluster if configured for a highly-available NameNode, set the hbase.rootdir value to use the HDFS nameservice, instead of the NameNode hostname:

hdfs://hdfsnameservice/apps/ams/metrics

Optionally, existing data can be migrated from the local store to HDFS prior to switching to distributed mode.

  1. Create HDFS directory for ams user. For example:

    su - hdfs -c 'hdfs dfs -mkdir -p /apps/ams/metrics'

  2. Stop Metrics Collector.

  3. Copy the metric data from the AMS local directory to an HDFS directory. This is the value of hbase.rootdir in Advanced ams-hbase-site used when running in embedded mode. For example:

    su - hdfs -c 'hdfs dfs -copyFromLocal /var/lib/ambari-metrics-collector/hbase/* /apps/ams/metrics'

    su - hdfs -c 'hdfs dfs -chown -R ams:hadoop /apps/ams/metrics'

  4. Perform the configuration changes above to switch to distributed mode.

  5. Start the Metrics Collector.

Aggregated Metrics TTL Settings

AMS provides configurable Time To Live configuration for aggregated metrics. The TTL settings are available in Ambari Metrics > Configs > Advanced ams-site and have the “.ttl” suffix. Each property name is self explanatory and controls the amount of time to keep metrics at the specified aggregation level before they are purged. The values for these TTL’s are set in seconds. In an example where you are running a single node and want to ensure that no values are stored for more than 7 days to save on local disk space, you would set any property ending in “.ttl” that has a value greater than 604800, 7 days in seconds, to 604800. That would ensure that properties such as timeline.metrics.cluster.aggregator.daily.ttl that controls the daily aggregation TTL, which by by default stores data for 2 years, will only store daily aggregations for 604800 seconds, or 7 days. Reducing the TTL values helps significantly reduce the total amount of storage used for metric storage. Those that matter most for reducing the total amount of disk space used for AMS are:

  • timeline.metrics.cluster.aggregator.minute.ttl - Controls minute level aggregated metrics TTL

  • timeline.metrics.host.aggregator.ttl - Controls host-based precision metrics TTL

It’s important to note that these settings should be set during installation. If these settings need to be changed post-installation, they have to be set using the HBase shell. The HBase shell is used to connect to the embedded HBase instance that is part of AMS. Run the following command from the Collector host.

/usr/lib/ams-hbase/bin/hbase --config /etc/ams-hbase/conf shell

Once connected to HBase each of the following tables needs to be updated with the appropriate TTL values that are being changed. The table below maps a specific ".ttl" property in Ambari Metrics > Configs > Advanced ams-site to the actual HBase table.

Property Table
timeline.metrics.cluster.aggregator.daily.ttl METRIC_AGGREGATE_DAILY
timeline.metrics.cluster.aggregator.hourly.ttl METRIC_AGGREGATE_HOURLY
timeline.metrics.cluster.aggregator.minute.ttl METRIC_AGGREGATE
timeline.metrics.host.aggregator.daily.ttl METRIC_RECORD_DAILY
timeline.metrics.host.aggregator.hourly.ttl METRIC_RECORD_HOURLY
timeline.metrics.host.aggregator.minute.ttl METRIC_RECORD_MINUTE
timeline.metrics.host.aggregator.ttl METRIC_RECORD

For each table that needs to be updated, alter the TTL value as follows:

hbase(main):000:0> alter 'METRIC_RECORD_DAILY', { NAME => '0', TTL => 604800}

Memory Settings

Since AMS uses multiple components for metrics storage and query, there are multiple tunable properties for tuning memory use. The following table lists each memory configuration.

Configuration Property Description
Advanced ams-env metrics_collector_heapsize Heap size configuration for the Collector.
Advanced ams-hbase-env hbase_regionserver_heapsize Heap size configuration for the single AMS HBase Region Server.
Advanced ams-hbase-env hbase_master_heapsize Heap size configuration for the single AMS HBase Master.
Advanced ams-hbase-env regionserver_xmn_size

Maximum value for the young generation heap size for the single AMS HBase RegionServer.

Advanced ams-hbase-env hbase_master_xmn_size Maximum value for the young generation heap size for the single AMS HBase Master.

(Optional) Enabling HBase Region and Table Metrics

By default, Ambari Metrics does not collect metrics related to HBase regions, tables and RegionServers. These metrics can be numerous and can cause performance issues.

If you want these metrics to be collected by Ambari, you can do the following. It is highly recommended that you test turning on this option and confirm your AMS performance is acceptable.

  1. On the Ambari Server, browse to:

    cd /var/lib/ambari-server/resources/stacks/PHD/3.0.0/hooks/before-START/templates

  2. Edit the hadoop-metrics2.properties.j2 template file.

  3. Comment out (or remove) the following lines:

    *.source.filter.class=org.apache.hadoop.metrics2.filter.GlobFilter

    hbase.*.source.filter.exclude=*Regions*

  4. Save the file and restart Ambari Server for the change to take effect.

[Important]Important

If you upgrade Ambari to a newer version, you will need to re-apply this change to the template file.

General Guidelines

The operation mode, TTL, memory settings, and disk space requirements for AMS are dependent on the number of nodes in the cluster. The following table lists specific recommendations and tuning guidelines for each.

In Ambari Web, browse to Ambari Metrics > Configs, make the following changes, then restart the Collector.

Cluster Environment Host Count Disk Space Collector Mode TTL Memory Settings

Single-Node

1 2GB embedded Reduce TTLs to 7 Days

metrics_collector_heap_size=1024

hbase_regionserver_heapsize=512

hbase_master_heapsize=512

hbase_master_xmn_size=128

PoC 1-5

5GB

embedded Reduce TTLs to 30 Days

metrics_collector_heap_size=1024

hbase_regionserver_heapsize=512

hbase_master_heapsize=512

hbase_master_xmn_size=128

Pre-Production 5-20 20GB embedded Reduce TTLs to 3 Months

metrics_collector_heap_size=1024

hbase_regionserver_heapsize=1024

hbase_master_heapsize=512

hbase_master_xmn_size=128

Production 20-50 50GB embedded n.a.

metrics_collector_heap_size=1024

hbase_regionserver_heapsize=1024

hbase_master_heapsize=512

hbase_master_xmn_size=128

Production 50-200 100GB embedded n.a.

metrics_collector_heap_size=2048

hbase_regionserver_heapsize=2048

hbase_master_heapsize=2048

hbase_master_xmn_size=256

Production 200-400 200GB embedded n.a.

metrics_collector_heap_size=2048

hbase_regionserver_heapsize=2048

hbase_master_heapsize=2048

hbase_master_xmn_size=512

Production 400-800 200GB distributed n.a.

metrics_collector_heap_size=8192

hbase_regionserver_heapsize=122288

hbase_master_heapsize=1024

hbase_master_xmn_size=1024

regionserver_xmn_size=1024

Production 800+ 500GB distributed n.a.

metrics_collector_heap_size=12288

hbase_regionserver_heapsize=16384

hbase_master_heapsize=16384

hbase_master_xmn_size=2048

regionserver_xmn_size=1024

Chapter 11. Moving the Ambari Metrics Collector

Use this procedure to move the Ambari Metrics Collector to a new host. For information and guidelines on tuning the Ambari Metrics Service, refer to Tuning Ambari Metrics in the Ambari Reference Guide.

  1. In Ambari Web , stop the Ambari Metrics service.

  2. Execute the following API call to delete the current Metric Collector component.

    curl -u admin:admin -H "X-Requested-By:ambari" - i -X DELETE http://ambari.server:8080/api/v1/clusters/cluster.name/hosts/metrics.collector.hostname/host_components/METRICS_COLLECTOR

    where ambari.server is the Ambari Server host, cluster.name is your Cluster Name, and metrics.collector.hostname is the host running the Metrics Collector.

  3. Execute the following API call to add Metrics Collector to a new host.

    curl -u admin:admin -H "X-Requested-By:ambari" - i -X POST http://ambari.server:8080/api/v1/clusters/cluster.name/hosts/metrics.collector.hostname/host_components/METRICS_COLLECTOR

    where ambari.server is the Ambari Server host, cluster.name is your Cluster Name, and metrics.collector.hostname is the host that will run the Metrics Collector.

  4. In Ambari Web, go the Host page where you installed the new Metrics Collector. Click to Install the Metrics Collector component from the Host page.

  5. On every host, point the Metric Monitors to the new Collector by editing this file: /etc/ambari-metrics-monitor/conf/metric_monitor.ini. Change property metrics_server in the .ini file and change the value to the hostname of the new host. For example, the following sed command can be used (be sure to replace hostnames in the command):

    sed -i 's/old.collector.hostname/new.collector.hostname/' /etc/ambari-metrics-monitor/conf/metric_monitor.ini

  6. In Ambari Web, start the Ambari Metrics service.