Ambari 1.7 Installation Guide

Note: If you are using Ambari 2.1.2, follow the instructions in the Pivotal HD Ambari 2.1.2 Documentation instead of this page.

This document contains the following topics:

1. Prerequisites

1.1 System Requirements

Operating System Requirements

The following operating systems are supported:

  • Red Hat Enterprise Linux (RHEL) 6.4+ (64-bit)

  • CentOS 6.4+ (64-bit)

  • SuSE Linux Enterprise Server (SLES) 11 SP3

The installer uses many packages from the base OS repositories. All machines in the cluster should have access to a complete set of base OS repositories. The repositories can be either installed locally or be proxied / mirrored from another location.

Browser Requirements

The Ambari Cluster Creation Wizard is a web-based tool running in a browser. You must have a machine that can run a graphical web browser to use the Ambari Cluster Creation Wizard. The machine can either be the same or have a network connectivity to the machine where the Ambari server is installed.

  • Windows (Vista, 7)

    • Internet Explorer 9.0 and higher (for Vista + Windows 7)

    • Firefox latest stable release

    • Safari latest stable release

    • Google Chrome latest stable release

  • Mac OS X (10.6 or later)

    • Firefox latest stable release

    • Safari latest stable release

    • Google Chrome latest stable release

  • Linux (RHEL, CentOS, SLES)

    • Firefox latest stable release

    • Google Chrome latest stable release

Software Requirements

The following packages must be installed on all your hosts:

  • yum and rpm (RHEL/CentOS)

  • Copy the downloaded tarball to scp, curl, and wget

  • python (2.6 or later)

JDK Requirement

The following Java runtime environments are supported:

  • Oracle jdk-7u67-linux-x64.tar.gz

Database Requirements

Ambari, Hive/HCatalog and Oozie require their own databases.

  • Ambari: by default uses PostgreSQL 8.x server instance installed by Ambari. It is also possible to use an existing instance of PostgreSQL 9.x, MySQL 5.x, or Oracle 11g.
  • Hive/HCatalog: Ambari will install an instance of MySQL on the Hive Metastore host. It is also possible to use an existing instance of PostgreSQL 9.x, MySQL 5.x, or Oracle 11g.
  • Oozie: by default uses Derby instance installed by Ambari. It is also possible to use an existing instance of PostgreSQL 9.x, MySQL 5.x, or Oracle 11g.
OpenSSL Requirement openssl-1.0.1e-16.el6.x86_64 or above is required on all nodes

1.2 Password-less SSH (Ambari Server to Cluster Hosts)

Ambari requires ambari-agents be installed on all the cluster hosts. Ambari server communicates with agents to perform the PHD cluster installation and management tasks. The agents can either be installed automatically by the Ambari Server or manually by the system administrator.

To have the Ambari Server install the Ambari Agents automatically on each cluster host, a password-less ssh connection must be established between the Ambari Server host and all other hosts. For manual install, refer to the “Ambari Agents Manual Install” section.

1.3 Synchronize clocks across all cluster hosts

The clocks on all cluster hosts and the machine that runs the browser must be synchronized. Enable NTP service to make sure synchronization happens automatically.

1.4 Check DNS settings

All hosts in the cluster must be configured for DNS and Reverse DNS. Alternatively, you can manage host resolution in /etc/hosts file. The hostname command should return Fully Qualified Domain name (FQDN) of your host.

1.5 Disable or Configure iptables

A number of network ports need to be open on the cluster hosts for Ambari to be able to provision and manage them during setup. The easiest way to open the ports is to temporarily disable the iptables process:

~> service iptables stop

The service can be restarted after the setup is complete.

1.6 Disable SELinux and PackageKit and Check umask Value

Temporarily disable SELinux during the Ambari setup:

~> setenforce 0
  • If PackageKit is installed, open /etc/yum/pluginconf.d/refresh-packagekit.conf in a test editor and make this change: enabled=0

  • Make sure umask is set to 022

1.7 Disable IPv6

Enter the following commands to disable IPv6:

~> mkdir -p /etc/sysctl.d
~> ( cat > /etc/sysctl.d/99-hadoop-ipv6.conf <<-'EOF'
## Disabled ipv6
## Provided by Ambari Bootstrap
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
EOF
    )
~> sysctl -e -p /etc/sysctl.d/99-hadoop-ipv6.conf

1.8 Disable Transparent Huge Pages (THP)

When installing Ambari, one or more host checks may fail if you have not disabled Transparent Huge Pages on all hosts. To disable THP:

  1. Add the following commands to your /etc/rc.local file:

    • RHEL6

      if test -f /sys/kernel/mm/redhat_transparent_hugepage/enabled; then
         echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled; fi
      if test -f /sys/kernel/mm/redhat_transparent_hugepage/defrag; then
         echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag; fi
      
    • SLES

      if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
         echo never > /sys/kernel/mm/transparent_hugepage/enabled; fi
      if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
         echo never > /sys/kernel/mm/transparent_hugepage/defrag; fi
      
  2. To confirm, reboot the host and then run the command:

    • RHEL6

      $ cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
          always madvise [never]
      $ cat /sys/kernel/mm/redhat_transparent_hugepage/defrag
          always madvise [never]
      
    • SLES

      $ cat /sys/kernel/mm/transparent_hugepage/enabled
          always madvise [never]
      $ cat /sys/kernel/mm/transparent_hugepage/defrag
          always madvise [never]
      

2. Installing Ambari 1.7 Server

2.1 Setup YUM repository server

You need to have a YUM repository server setup to be able to install Ambari. The repository must reside on a host accessible from all the cluster hosts. You can use a dedicated host for that purpose or setup the YUM repository server on the admin host where the Ambari Server will be installed.

To setup a new YUM repository server you will need an httpd web server. Make sure the httpd server is running on the host that will serve as a YUM repo.

~> service httpd status

If the service is not running, install and start it:

~> yum install httpd
~> service httpd start

2.2 Create Staging Directory

We recommend you use a staging directory where you will extract the tarballs for Ambari and PHD stacks. Each tarball is an archived yum repository and has setup_repo.sh script that creates a symlink from the document root of httpd server /var/www/html to the directory where the tarball is extracted. The staging directory (and all the directories above it) must be readable and executable by the system user running the httpd process (typically apache), but better yet, make the directory readable and executable by everyone. Do not use /tmp directory as staging since the files there can be removed at any time.

~> mkdir /staging
~> chmod a+rx /staging

2.3 Download and Extract Ambari 1.7 Tarball

Pivotal Ambari RPMs are shipped as an archived YUM repository that should be extracted to the YUM repo server.

On the host used as a YUM repo, download the Pivotal Ambari 1.7.1 tarball from https://network.pivotal.io/products/pivotal-hd into a staging directory you setup previously (e.g. /staging). Ensure that all the parent directories up to the staging directory have r+x access for all users as we will be using this directory to stage the local yum repository. Once downloaded, extract the tarball into the staging directory. For example:

~> tar -xvzf /staging/AMBARI-1.7.1-87-centos6.tar.gz -C /staging/

2.4 Setup local YUM repository

On the host used as a YUM repo, execute a helper script setup_repo.sh shipped as a part of the ambari tarball:

~> /staging/AMBARI-1.7.1/setup_repo.sh

This script assumes that the document root of YUM repo web server is set to /var/www/html and will create a symlink like ambari-<version> there that points to the extracted ambari tarball. Verify that the ambari YUM repo is now available from the YUM web server:


~> curl http://localhost/AMBARI-1.7.1/repodata/repomd.xml

The script also creates Ambari repo definition and places it in /etc/yum.repos.d/ambari.repo file. This file should be available on the admin host where the Ambari Server will be installed.

At this point the Pivotal Ambari YUM repository should be available for the cluster hosts. Test that you can access the following URL from the admin and cluster hosts: http://<yum.repo.fqdn>/AMBARI-1.7.1

2.5 Install Ambari Server

The Ambari Server is installed from RPMs by yum command:

~> yum install ambari-server

This command installs the Ambari Server which is a web application server listening on port 8080. It also installs a PostgreSQL server instance that listens on port 5432.

2.6 Download Java JDK and JCE files

Ambari Server is an application server that requires Java JDK 1.7 framework to run. It also requires Java Unlimited JCE policies when used to manage a Kerberos-enabled cluster.

2.6.1 Download Oracle JDK

Execute this command to download the Oracle JDK 1.7 tarball:

wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/7u67-b01/jdk-7u67-linux-x64.tar.gz

If you use a browser to download the file instead, accept the Oracle license and download Oracle JDK 1.7 tarball from Oracle site: http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase7-521261.html#jdk-7u67-oth-JPR (choose jdk-7u67-linux-x64.tar.gz). Do not download the RPM package, as the tarball is required. If your browser changes the downloaded filename (for example, Google Chrome), you must rename the file to jdk-7u67-linux-x64.tar.gz.

Note: The name of the Java JDK tarball is hardcoded in the Ambari setup script to jdk-7u67-linux-x64.tar.gz, which means you need to download the exact same version from Oracle.

From here on, you have 2 options:

  1. Rely on Ambari to install and distribute the Oracle JDK to the cluster hosts.

    Copy the downloaded tarball to /var/lib/ambari-server/resources/ directory. The setup script for the Ambari server will detect the presence of the tarball in this location and will install the JDK on the admin host. The Ambari Server will also distribute and install this JDK on all cluster hosts during cluster creation. Ensure that the file has READ permission for all.

  2. Manually install Oracle JDK on all cluster hosts. If you prefer this option, write down the value of JAVA_HOME for your Java installation as you will need it when setting up the Ambari Server.

2.6.2 Download JCE archive

Once you have configured the Oracle JDK, download the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files for Java 7 directly from Oracle site: http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html . Copy the file to the /var/lib/ambari-server/resources/ directory. The setup script for Ambari server will detect the presence of the JCE archive in this location and will distribute the files to the cluster hosts automatically. Ensure that the file has READ permission for all.

2.7 Setup Ambari 1.7 Server

The Ambari Server must be setup in order to be functional. If your PostgreSQL instance is configured to listen on the default port number, simply run the following command:

~> ambari-server setup

This command will prompt for user input to make several decisions:

  • User Account – you can choose to use another user, different from root to run the daemon process ambari-server on the admin node. If you decide to use a different user and the user does not yet exist, it will be created.
  • Java JDK – you should choose Oracle JDK 1.7 option to use the JDK tarball that you have downloaded beforehand in /var/lib/ambari-server/resources/ directory. If you want to install a custom JDK on all hosts manually, choose “Custom JDK” option and provide the value of the JAVA_HOME. Do not choose Oracle JDK 1.6 option as it is no longer supported.
  • Database – if you would like to use an existing database server instead of the new PostgreSQL instance provisioned by Ambari, answer “y” to “Enter advanced database configuration” question and provide connection parameters to the existing database.

If your PostgreSQL instance is configured to listen on a non-default port number, perform these alternate steps to configure postgres and Ambari:

  1. Open the PostgreSQL /var/lib/pgsql/data/pg_hba.conf configuration file in a text editor. Append the following lines to the end of the file to allow the ambari user to connect to the database:

    local  all  ambari md5
    host  all   ambari 0.0.0.0/0  md5
    host  all   ambari ::/0 md5
    
  2. Open the /etc/sysconfig/pgsql/postgresql to enable the non-default port. For example, to use port 10432 the file would need the line:

    PGPORT=10432
    
  3. Restart the PostgreSQL database:

    ~> service postgresql restart
    
  4. Connect to the database as postgres (superuser) and configure the database for Ambari:

    ~> psql -U postgres -p 10432;
    postgres=# CREATE DATABASE ambari;
    postgres=# CREATE USER ambari WITH ENCRYPTED PASSWORD 'bigdata';
    postgres=# \c ambari
    ambari=# CREATE SCHEMA ambari AUTHORIZATION ambari;
    ambari=# ALTER SCHEMA ambari OWNER TO ambari;
    ambari=# ALTER ROLE ambari SET search_path to 'ambari','public';
    ambari=# \q
    
  5. Execute this command to setup Ambari:

    ~> ambari-server setup --database=postgres --databasehost=localhost --databaseport=10432 --databasename=ambari --databaseusername=ambari --databasepassword=bigdata
    

    Note: Use the following command to verify that postgres is listening on the hostname value assigned to --databasehost:

    ~> netstat -anp | egrep <port>
    
  6. Execute the Ambari-DDL-Postgres-CREATE.sql file in PostgreSQL to complete the configuration:

    ~> psql -f /var/lib/ambari-server/resources/Ambari-DDL-Postgres-CREATE.sql -U ambari -p 10432 -d ambari
    

    Note: Enter the password bigdata when prompted.

  7. Continue with the next topic to start the Ambari server.

2.8 Start Ambari 1.7 Server

After the Ambari server was setup, you can start it:

~> ambari-server start

To check the status of the server, use the following command:

~> ambari-server status
To stop the server, use the following command:
~> ambari-server stop

3. Installing a Pivotal HD cluster

3.1 Setup YUM repository server

You should’ve created a YUM repository server when you installed Ambari.

3.2 Download and extract PHD stack tarballs

The PHD stack tarballs should be installed on the machine that hosts the YUM server. Unless you’re using a dedicated machine for the YUM repository server, this will be the same admin host you used for installing the Ambari Server.

Download the following tarballs and extract them in a dedicated location (avoid using /tmp directory).

Stack Name Download URL Description
Ambari-1.7.1 Pivotal Network Pivotal Ambari Stack contains rpms for Ambari server and Ambari agents
PHD-3.0.1.0 Pivotal Network Pivotal Hadoop stack contains RPMs for Hadoop Services such as HDFS, YARN, HBASE, HIVE, OOZIE, ZOOKEEPER.
PADS-1.3.1.0 Pivotal Network Pivotal Advance Database Services stack contains HAWQ (a parallel SQL query engine), PXF and MADlib.
PHD-UTILS-1.1.0.20 Pivotal Network Pivotal Hadoop Utilities stack contains optional services and libraries such as Ganglia and Nagios used for monitoring and alerting of core cluster services.

Assuming you downloaded the tarballs to the /tmp directory and want to stage them in /staging directory:

~> tar -xzf /tmp/{stack}.tar.gz -C /staging/

3.3 Setup local YUM repositories

PHD stacks are shipped as archived YUM repositories that need to be deployed in your YUM repository server to be accessible by the Ambari Server and all cluster hosts.

Each stack repository contains the setup_repo.sh script that assumes:

  • the YUM repository server is accessible by all hosts in the cluster
  • the document root of your YUM server is /var/www/html/

Each stack’s script creates a symbolic link in the YUM repository server document root to point to the location of the extracted stack tarball and creates a repo definition file in /etc/yum.repos.d/ directory so that your local yum command can find the repository. It is essential that the hostnames in the repo definition files use the Fully Qualified Domain Name (FQDN) of the YUM server host that is accessible from all cluster hosts.

For each stack, run the local repo setup script:

~> /staging/{stack}/setup_repo.sh

If the repository setup was successful, the script will print out the repository URL. Write down the URL as you will need it later when installing a PHD cluster using Ambari Sever UI.

Note: If your YUM repository server runs on a different host than the admin host where the Amabri Server is installed, copy the generated repository definition files in /etc/yum.repos.d/ to /etc/yum.repos.d on the admin host where you installed the Ambari Sever.

Test that the repositories are properly configured – run the following command from the admin host:

~> yum repolist 

You should see the repositories for the stacks listed.

3.4 Login to Ambari Server

Once the Ambari Server is started:

  1. Open http://{ambari.server.host}:8080 in the web browser

  2. Login to the server using user admin and the password admin. These credentials can be changed later.

3.5 Launch Install Wizard

Once logged into Ambari, click on “Launch Install Wizard” button to enter into cluster creation wizard. The wizard is self-explanatory and guides you through the steps necessary to provision a new PHD cluster. A few actions requiring particular attention are listed below:

3.5.1 Modify YUM repository URLs

In the Select Stack section, click Advanced Repository Options to reveal a list of YUM repositories Ambari will search to get PHD stacks RPMs from. The values provided here out-of-the box need to be replaced with the URLs of the stack repositories you have installed previously. Replace http://SET_REPO_URL with the appropriate repository URL you have noted down earlier when you ran setup_repo.sh script for the stack. If you don’t have the links handy, you can always get them from the /etc/yum.repos.d/-.repo file.

Note: After you deploy the cluster, you can update repositories via the Ambari UI (Admin > Repositories).

3.5.2 Specify host names and SSH key

In the Install Options section, you need to provide FQDN names for the hosts that will comprise your cluster. You can use the range expression using square brackets – for example, host[01-10].domain will describe 10 hosts.

If you want Ambari to automatically provision and register Ambari Agents on the cluster hosts, you will need to provide a private key that you used to setup password-less SSH on your cluster. You can either pick and choose a file or copy&paste the file content into the screen form.

Ambari Agents Manual Install

Note: If you do not want to provide the private key or setup password-less SSH you will have to provision and configure the Ambari Agents manually. In this you have to:

  1. Setup Ambari Repository by copying /etc/yum.repos.d/ambari.repo file from the YUM repository server
  2. Install the Ambari Agent:

    ~> yum install ambari-agent 

  3. Edit the Ambari Agent configuration (/etc/ambari-agent/conf/ambari-agent.ini) to point it to the Ambari Server:

    [server]
    hostname={ambari.server.hostname}
    url_port=8440
    secured_url_port=8441
    

  4. Start the agent:

    ~> ambari-agent start

The agent will register itself with the server when it starts.

3.5.3 Choose Services

You need to choose which PHD services you want to install initially. You must install the HDFS and Zookeeper services and you can add others later.

Note: You can use Ambari to monitor your cluster if you select to install Ganglia and Nagios services. You will get a warning you do not select these services, but you can ignore it if you plan to monitor your cluster using other tools. You can also add Ganglia and Nagios services to your cluster later.

3.5.4 Assign Masters

You need to assign “master” service components to your cluster hosts.

Note: The HAWQ Master component MUST NOT reside on the same host that is used for Hive Metastore if Hive Metastore uses the new PostgreSQL database. This is because both these services will attempt to use port 5432. If it is absolutely required to co-locate these components on the same host, provision a PostgreSQL database beforehand on a port other than 5432 and choose “Existing PostgreSQL Database” option for Hive Metastore configuration. The same restriction applies to the admin host – neither HAWQ Master nor Hive Metastore will be able to run on the admin host where Ambari Server is installed.

3.5.5 Assign Slaves and Clients

You need to assign “slave” and “client” service components to your cluster hosts.

Note: The UI panel displaying the list of services to provision on each host is scrollable, but the scroll bar is currently not visible. Make sure to scroll the main area of the page to the right to make sure you configure all the components properly.

Note: If you are installing HAWQ service, make sure HAWQ Segment components are installed on the hosts where DataNode components are installed. While you do not have to install HAWQ Segment on each host that runs DataNode, it is mandatory to have DataNode run on a host that also runs HAWQ Segment.

3.5.6 Install, Start and Test

The screen shows the progress of cluster deployment on each host. Each component that is mapped to the host will be installed, started and a simple test will be run to verify the component is functional.

You can see more details about completed and pending tasks for each host if you click on the link in the Message column.

When “Successfully installed and started the services” message appears, click Next.

On the Summary page, review the list of accomplished tasks and click Complete to open the cluster Dashboard.

3.6 Cluster Dashboard

The Dashboard is a central place that displays the services you deployed and their status. You can add new services or hosts, stop and start services and components, explore monitoring metrics and perform service specific actions.

You cluster is ready !

Appendix: Setting Up Password-less SSH

The following instructions will help you setup passwordless SSH – they need to be executed as a root user:

  1. Generate SSH keys on the Ambari Server host using ssh-keygen command.
    ~> ssh-keygen
  2. Add the SSH public key ( .ssh/idrsa.pub ) to the set of authorized keys on all cluster hosts:

    ~> cat .ssh/idrsa.pub | ssh root@{host} 'cat >> .ssh/authorized_keys' 

  3. Set permissions on SSH directory on all cluster hosts:

    ~> ssh root@{host} 'chmod 700 ~/.ssh ; chmod 600 ~/.ssh/authorized_keys' 

  4. Test the connection between the Ambari Server and each cluster host and make sure you do not have to enter a password to login to the host:

    ~> ssh root@{host}

Note: While it is possible to use a non-root SSH account, that account should be able to execute sudo command without prompting for a password.

Known Issues

OpenSSL Issue

Ambari agents might fail to register with the Ambari server with older versions of openSSL. Upgrade the version of openSSL to openssl-1.0.1e-16.el6.x86_64 or above to fix the issue.

Ambari Cannot Locate DataNodes

Ambari might show all datanodes to be up, but the namenode may be unable to locate the datanodes.

Ensure that both the forward and reverse hostname lookup is setup correctly.

RHEL 5 Repo URL In the UI

Although the Ambari user interface might show a “RHEL 5.x repo URL” text box, Piotal does not support RHEL 5.x for Pivotal HD 3.0.

SUSE setup_repo.sh Broken Due to sym-links

The setup_repo.sh script present in Ambari, PHD, and PHD-UTILS tarballs, tries to create a sym link like /srv/www/htdocs/AMBARI-1.7.1 -> /staging/AMBARI-1.7.1/suse11/1.7.1-62. By default, apache2 in SUSE does not have FollowSymLinks enabled. This causes a 403 Access Denied error when trying to access/curl repo-host/AMBARI-1.7.1.

To fix the above issue, enable FollowSymLinks in apache2 by modifying /etc/apache2/default-server.conf as follows:


 # Possible values for the Options directive are "None", "All",
 # or any combination of:
 # Indexes Includes FollowSymLinks SymLinksifOwnerMatch ExecCGI MultiViews
 #
 # Note that "MultiViews" must be named *explicitly* --- "Options All"
 # doesn't give it to you.
 #
 # The Options directive is both complicated and important. Please see
 # httpd.apache.or...re.html#options
 # for more information.
 Options None FollowSymLinks
 # AllowOverride controls what directives may be placed in .htaccess files.
 # It can be "All", "None", or any combination of the keywords:
 # Options FileInfo AuthConfig Limit
 AllowOverride None
 # Controls who can get stuff from this server.
 Order allow,deny
 Allow from all