Pivotal HD Automated Install with Ambari 2.1.2

1. Getting Ready

This section describes the information and materials you should get ready to install a PHD cluster using Ambari. Ambari provides an end-to-end management and monitoring solution for your PHD cluster. Using the Ambari Web UI and REST APIs, you can deploy, operate, manage configuration changes, and monitor services for all nodes in your cluster from a central point.

  • Determine Stack Compatibility

  • Meet Minimum System Requirements

  • Collect Information

  • Prepare the Environment

  • 1.1 Determine Stack Compatibility

    Ambari 2.1.2 is compatible with Pivotal 3.0.x releases.

    1.2 Meet Minimum System Requirements

    To run Hadoop, your system must meet the following minimum requirements:

    1.2.1Operating Systems Requirements

    The following, 64-bit operating systems are supported:

    • Red Hat Enterprise Linux (RHEL) v6.x

    • CentOS v6.x

    • SUSE Linux Enterprise Server (SLES) v11 SP3

    [Important]Important

    The installer pulls many packages from the base OS repositories. If you do not have a complete set of base OS repositories available to all your machines at the time of installation you may run into issues.

    If you encounter problems with base OS repositories being unavailable, please contact your system administrator to arrange for these additional repositories to be proxied or mirrored.

    1.2.2 Browser Requirements

    The Ambari Install Wizard runs as a browser-based Web application. You must have a machine capable of running a graphical browser to use this tool. The minimum required browser versions are:

    • Windows (Vista, 7, 8)

      • Internet Explorer 9.0 (deprecated)

      • Firefox 18

      • Google Chrome 26

    • Mac OS X (10.6 or later)

      • Firefox 18

      • Safari 5

      • Google Chrome 26

    • Linux (CentOS, RHEL, SLES)

      • Firefox 18

      • Google Chrome 26

    On any platform, we recommend updating your browser to the latest, stable version.

    1.2.3 Software Requirements

    On each of your hosts:

    • yum and rpm (RHEL/CentOS)

    • zypper and php_curl (SLES)

    • scp, curl, unzip, tar, and wget

    • OpenSSL (v1.01, build 16 or later)

    • python v2.6

    [Important]Important
    • The Python version shipped with SUSE 11, 2.6.0-8.12.2, has a critical bug that may cause the Ambari Agent to fail within the first 24 hours. If you are installing on SUSE 11, please update all your hosts to Python version 2.6.8-0.15.1.

    • Python v2.7.9 or later is not supported due to changes in how Python performs certificate validation.

    1.2.4 JDK Requirements

    The following Java runtime environments are supported:

    • Oracle JDK 1.8 64-bit (minimum JDK 1.840) (default, but not supported with PHD 3.0.1)

    • Oracle JDK 1.7 64-bit (minimum JDK 1.767)

    [Note]Note

    You must choose JDK 1.7 during installation because PHD 3.0.1 is not compatible with JDK 1.8.

    1.2.5 Database Requirements

    Ambari requires a relational database to store information about the cluster configuration and topology. If you install PHD Stack with Hive or Oozie, they also require a relational database. The following table outlines these database requirements:

    Component

    Databases

    Description

    Ambari

    - PostgreSQL 8

    - PostgreSQL 9.1.13+,9.3

    - MySQL 5.6

    - Oracle 11gr2, 12c

    By default, Ambari will install an instance of PostgreSQL on the Ambari Server host. Optionally, to use an existing instance of PostgreSQL, MySQL or Oracle. For further information, see Setup Ambari 2.1.2 Server.

    Hive

    - PostgreSQL 8

    - PostgreSQL 9.1.13+, 9.3

    - MySQL 5.6

    - Oracle 11gr2, 12c

    By default (on RHEL/CentOS), Ambari will install an instance of MySQL on the Hive Metastore host. Otherwise, you need to use an existing instance of PostgreSQL, MySQL or Oracle. See Setup Ambari 2.1.2 Server for more information.

    Oozie

    - PostgreSQL 8

    - PostgreSQL 9.1.13+, 9.3

    - MySQL 5.6

    - Oracle 11gr2, 12c

    By default, Ambari will install an instance of Derby on the Oozie Server host. Optionally, to use an existing instance of PostgreSQL, MySQL or Oracle, see Setup Ambari 2.1.2 Server for more information.

    The default instance of Derby should not be used for a production environment. If you plan to use Derby for a demo, development or test environment, migration of the Oozie database from Derby to a new database is only available in the community.

    Ranger

    - PostgreSQL 9.1.13+, 9.3

    - MySQL 5.6

    - Oracle 11gr2, 12c

    You must have an existing instance of PostgreSQL , MySQL or Oracle available for Ranger. Refer to Installing Ranger for more information.

    [Important]Important

    For the Ambari database, if you use an existing Oracle database, make sure the Oracle listener runs on a port other than 8080 to avoid conflict with the default Ambari port. Alternatively, refer to the Ambari Reference Guide for information on Changing the Default Ambari Server Port.

    [Important]Important

    Using the Microsoft SQL Server or SQL Anywhere database options are not supported.

    1.2.6 Memory Requirements

    The Ambari host should have at least 1 GB RAM, with 500 MB free.

    To check available memory on any host, run:

    free -m

    If you plan to install the Ambari Metrics Service (AMS) into your cluster, you should review the Tuning Ambari Metrics section in the Ambari Reference Guide for guidelines on resources requirements. In general, the host you plan to run the Ambari Metrics Collector host should have the following memory and disk space available based on cluster size:

    Number of hosts

    Memory Available

    Disk Space

    1

    1024 MB

    10 GB

    10

    1024 MB

    20 GB

    50

    2048 MB

    50 GB

    100

    4096 MB

    100 GB

    300

    4096 MB

    100 GB

    500

    8096 MB

    200 GB

    1000

    12288 MB

    200 GB

    2000

    16384 MB

    500 GB

    [Note]Note

    The above is offered as guidelines. Be sure to test for your particular environment. Also refer to Package Size and Inode Count Requirements for more information on package size and Inode counts.

    1.2.7 Package Size and Inode Count Requirements

    *Size and Inode values are approximate

    Size

    Inodes

    Ambari Server

    100MB

    5,000

    Ambari Agent

    8MB

    1,000

    Ambari Metrics Collector

    225MB

    4,000

    Ambari Metrics Monitor

    1MB

    100

    Ambari Metrics Hadoop Sink

    8MB

    100

    After Ambari Server Setup

    N/A

    4,000

    After Ambari Server Start

    N/A

    500

    After Ambari Agent Start

    N/A

    200

    1.2.8 Check the Maximum Open File Descriptors

    The recommended maximum number of open file descriptors is 10000, or more. To check the current value set for the maximum number of open file descriptors, execute the following shell commands on each host:

    ulimit -Sn

    ulimit -Hn

    If the output is not greater than 10000, run the following command to set it to a suitable default:

    ulimit -n 10000

    1.3 Collect Information

    Before deploying an PHD cluster, you should collect the following information:

    • The fully qualified domain name (FQDN) of each host in your system. The Ambari install wizard supports using IP addresses. You can use hostname -f to check or verify the FQDN of a host.

      [Note]Note

      Deploying all PHD components on a single host is possible, but is appropriate only for initial evaluation purposes. Typically, you set up at least three hosts; one master host and two slaves, as a minimum cluster.

    • A list of components you want to set up on each host.

    • The base directories you want to use as mount points for storing:

      • NameNode data

      • DataNodes data

      • Secondary NameNode data

      • Oozie data

      • YARN data

      • ZooKeeper data, if you install ZooKeeper

      • Various log, pid, and db files, depending on your install type

      [Important]Important

      You must use base directories that provide persistent storage locations for your PHD components and your Hadoop data. Installing PHD components in locations that may be removed from a host may result in cluster failure or data loss. For example: Do Not use /tmp in a base directory path.

    1.4 Prepare the Environment

    To deploy your Hadoop instance, you need to prepare your deployment environment:

    1.4.1 Set Up Password-less SSH

    To have Ambari Server automatically install Ambari Agents on all your cluster hosts, you must set up password-less SSH connections between the Ambari Server host and all other hosts in the cluster. The Ambari Server host uses SSH public key authentication to remotely access and install the Ambari Agent.

    [Note]Note

    You can choose to manually install the Agents on each cluster host. In this case, you do not need to generate and distribute SSH keys.

    1. Generate public and private SSH keys on the Ambari Server host.

      ssh-keygen

    2. Copy the SSH Public Key (idrsa.pub) to the root account on your target hosts.

      .ssh/idrsa

      .ssh/idrsa.pub

    3. Add the SSH Public Key to the authorizedkeys file on your target hosts.

      cat idrsa.pub >> authorizedkeys

    4. Depending on your version of SSH, you may need to set permissions on the .ssh directory (to 700) and the authorizedkeys file in that directory (to 600) on the target hosts.

      chmod 700 ~/.ssh

      chmod 600 ~/.ssh/authorizedkeys

    5. From the Ambari Server, make sure you can connect to each host in the cluster using SSH, without having to enter a password.

      ssh root@<remote.target.host> where <remote.target.host> has the value of each host name in your cluster.

    6. If the following warning message displays during your first connection: Are you sure you want to continue connecting (yes/no)? Enter Yes.

    7. Retain a copy of the SSH Private Key on the machine from which you will run the web-based Ambari Install Wizard.

      [Note]Note

      It is possible to use a non-root SSH account, if that account can execute sudo without entering a password.

    1.4.2 Set Up Service User Accounts

    Each PHD service requires a service user account. The Ambari Install wizard creates new and preserves any existing service user accounts, and uses these accounts when configuring Hadoop services. Service user account creation applies to service user accounts on the local operating system and to LDAP/AD accounts.

    For more information about customizing service user accounts for each PHD service, see Defining Service Users and Groups for a PHD 3.x Stack.

    1.4.3 Enable NTP on the Cluster and on the Browser Host

    The clocks of all the nodes in your cluster and the machine that runs the browser through which you access the Ambari Web interface must be able to synchronize with each other.

    To check that the NTP service will be automatically started upon boot, run the following command on each host:

    RHEL/CentOS/Oracle 6

    chkconfig --list ntpd

    RHEL/CentOS/Oracle 7

    systemctl is-enabled ntpd

    To set the NTP service to auto-start on boot, run the following command on each host:

    RHEL/CentOS/Oracle 6

    chkconfig ntpd on

    RHEL/CentOS/Oracle 7

    systemctl enable ntpd

    To start the NTP service, run the following command on each host:

    RHEL/CentOS/Oracle 6

    service ntpd start

    CentOS/Oracle 7

    systemctl start ntpd

    1.4.4 Check DNS and NSCD

    All hosts in your system must be configured for both forward and and reverse DNS.

    If you are unable to configure DNS in this way, you should edit the /etc/hosts file on every host in your cluster to contain the IP address and Fully Qualified Domain Name of each of your hosts. The following instructions are provided as an overview and cover a basic network setup for generic Linux hosts. Different versions and flavors of Linux might require slightly different commands and procedures. Please refer to the documentation for the operating system(s) deployed in your environment.

    Hadoop relies heavily on DNS, and as such performs many DNS lookups during normal operation. To reduce the load on your DNS infrastructure, it’s highly recommended to use the Name Service Caching Daemon (NSCD) on cluster nodes running Linux. This daemon will cache host, user, and group lookups and provide better resolution performance, and reduced load on DNS infrastructure.

    Edit the Host File

    1. Using a text editor, open the hosts file on every host in your cluster. For example:

      vi /etc/hosts

    2. Add a line for each host in your cluster. The line should consist of the IP address and the FQDN. For example:

      1.2.3.4 <fully.qualified.domain.name>

      [Important]Important

      Do not remove the following two lines from your hosts file. Removing or editing the following lines may cause various programs that require network functionality to fail.

      127.0.0.1 localhost.localdomain localhost

      ::1 localhost6.localdomain6 localhost6

    Set the Hostname

    1. Confirm that the hostname is set by running the following command:

      hostname -f

      This should return the <fully.qualified.domain.name> you just set.

    2. Use the “hostname” command to set the hostname on each host in your cluster. For example:

      hostname <fully.qualified.domain.name>

    Edit the Network Configuration File

    1. Using a text editor, open the network configuration file on every host and set the desired network configuration for each host. For example:

      vi /etc/sysconfig/network

    2. Modify the HOSTNAME property to set the fully qualified domain name.

      NETWORKING=yes

      HOSTNAME=<fully.qualified.domain.name>

    1.4.5 Configuring iptables

    For Ambari to communicate during setup with the hosts it deploys to and manages, certain ports must be open and available. The easiest way to do this is to temporarily disable iptables, as follows:

    RHEL/CentOS

    chkconfig iptables off

    /etc/init.d/iptables stop

    RHEL/CentOS

    systemctl disable firewalld

    service firewalld stop

    You can restart iptables after setup is complete. If the security protocols in your environment prevent disabling iptables, you can proceed with iptables enabled, if all required ports are open and available. For more information about required ports, see Configuring Network Port Numbers.

    Ambari checks whether iptables is running during the Ambari Server setup process. If iptables is running, a warning displays, reminding you to check that required ports are open and available. The Host Confirm step in the Cluster Install Wizard also issues a warning for each host that has iptables running.

    1.4.6 Disable SELinux and PackageKit and check the umask Value

    1. You must disable SELinux for the Ambari setup to function. On each host in your cluster,

      setenforce 0

      [Note]Note

      To permanently disable SELinux set SELINUX=disabled in /etc/selinux/config This ensures that SELinux does not turn itself on after you reboot the machine .

    2. On an installation host running RHEL/CentOS with PackageKit installed, open /etc/yum/pluginconf.d/refresh-packagekit.conf using a text editor. Make the following change:

      enabled=0

      [Note]Note

      PackageKit is not enabled by default on SLES systems. Unless you have specifically enabled PackageKit, you may skip this step for an SLES installation host.

    3. UMASK (User Mask or User file creation MASK) sets the default permissions or base permissions granted when a new file or folder is created on a Linux machine. Most Linux distros set 022 as the default umask value. A umask value of 022 grants read, write, execute permissions of 755 for new files or folders. A umask value of 027 grants read, write, execute permissions of 750 for new files or folders. Ambari supports a umask value of 022 or 027. For example, to set the umask value to 022, run the following command as root on all hosts, vi /etc/profile then, append the following line: umask 022

    2. Installing Ambari 2.1.2 Server

    2.1 Setup YUM repository server

    You need to have a YUM repository server setup to be able to install Ambari. The repository must reside on a host accessible from all the cluster hosts. You can use a dedicated host for that purpose or setup the YUM repository server on the admin host where the Ambari Server will be installed.

    To setup a new YUM repository server you will need an httpd web server. Make sure the httpd server is running on the host that will serve as a YUM repo.

    ~> service httpd status

    If the service is not running, install and start it:

    ~> yum install httpd
    ~> service httpd start

    2.2 Create Staging Directory

    We recommend you use a staging directory where you will extract the tarballs for Ambari and PHD stacks. Each tarball is an archived yum repository and has setup_repo.sh script that creates a symlink from the document root of httpd server /var/www/html to the directory where the tarball is extracted. The staging directory (and all the directories above it) must be readable and executable by the system user running the httpd process (typically apache), but better yet, make the directory readable and executable by everyone. Do not use /tmp directory as staging since the files there can be removed at any time.

    ~> mkdir /staging
    ~> chmod a+rx /staging

    2.3 Download and Extract Ambari 2.1.2 Tarball

    Pivotal Ambari RPMs are shipped as an archived YUM repository that should be extracted to the YUM repo server.

    On the host used as a YUM repo, download the Pivotal Ambari 2.1.2 tarball from https://network.pivotal.io/products/pivotal-hd into a staging directory you setup previously (e.g. /staging). Ensure that all the parent directories up to the staging directory have r+x access for all users as we will be using this directory to stage the local yum repository. Once downloaded, extract the tarball into the staging directory. For example:

    ~> tar -xvzf /staging/AMBARI-2.1.2.2-163-centos6.tar.gz -C /staging/

    2.4 Setup local YUM repository

    On the host used as a YUM repo, execute a helper script setup_repo.sh shipped as a part of the ambari tarball:

    ~> /staging/AMBARI-2.1.2.2/setup_repo.sh

    This script assumes that the document root of YUM repo web server is set to /var/www/html and will create a symlink like ambari-<version> there that points to the extracted ambari tarball. Verify that the ambari YUM repo is now available from the YUM web server:

    
    ~> curl http://localhost/AMBARI-2.1.2.2/repodata/repomd.xml

    The script also creates Ambari repo definition and places it in /etc/yum.repos.d/ambari.repo file. This file should be available on the admin host where the Ambari Server will be installed.

    At this point the Pivotal Ambari YUM repository should be available for the cluster hosts. Test that you can access the following URL from the admin and cluster hosts: http://<yum.repo.fqdn>/AMBARI-2.1.2.2

    2.5 Install Ambari Server

    The Ambari Server is installed from RPMs by yum command:

    ~> yum install ambari-server

    This command installs the Ambari Server which is a web application server listening on port 8080. It also installs a PostgreSQL server instance that listens on port 5432.

    2.6 Setup Ambari 2.1.2 Server

    The Ambari Server must be setup in order to be functional. If your PostgreSQL instance is configured to listen on the default port number, simply run the following command:

    ~> ambari-server setup

    This command will prompt for user input to make several decisions:

    • User Account – you can choose to use another user, different from root to run the daemon process ambari-server on the admin node. If you decide to use a different user and the user does not yet exist, it will be created.
    • Java JDK – Enter 2 at the prompt to select and download Oracle JDK 1.7. Accept the Oracle JDK license to download the files from Oracle. The JDK is automatically installed during the deploy phase.
    • Database: - Enter n at the Enter advanced database configuration prompt to use the default, embedded PostgreSQL database for Ambari. The default PostgreSQL database name is ambari and the default user name/password are ambari/bigdata. If you would like to use an existing PostgreSQL, MySQL, or Oracle database with Ambari instead of the default, Enter y at the prompt and provide the connection parameters for the existing database.

    If your PostgreSQL instance is configured to listen on a non-default port number, perform these alternate steps to configure postgres and Ambari:

    1. Open the PostgreSQL /var/lib/pgsql/data/pg_hba.conf configuration file in a text editor. Append the following lines to the end of the file to allow the ambari user to connect to the database:

      local  all  ambari md5
      host  all   ambari 0.0.0.0/0  md5
      host  all   ambari ::/0 md5
      
    2. Open the /etc/sysconfig/pgsql/postgresql to enable the non-default port. For example, to use port 10432 the file would need the line:

      PGPORT=10432
      
    3. Restart the PostgreSQL database:

      ~> service postgresql restart
      
    4. Connect to the database as postgres (superuser) and configure the database for Ambari:

      ~> psql -U postgres -p 10432;
      postgres=# CREATE DATABASE ambari;
      postgres=# CREATE USER ambari WITH ENCRYPTED PASSWORD 'bigdata';
      postgres=# \c ambari
      ambari=# CREATE SCHEMA ambari AUTHORIZATION ambari;
      ambari=# ALTER SCHEMA ambari OWNER TO ambari;
      ambari=# ALTER ROLE ambari SET search_path to 'ambari','public';
      ambari=# \q
      
    5. Execute this command to setup Ambari:

      ~> ambari-server setup --database=postgres --databasehost=localhost --databaseport=10432 --databasename=ambari --databaseusername=ambari --databasepassword=bigdata
      

      Note: Use the following command to verify that postgres is listening on the hostname value assigned to --databasehost:

      ~> netstat -anp | egrep <port>
      
    6. Execute the Ambari-DDL-Postgres-CREATE.sql file in PostgreSQL to complete the configuration:

      ~> psql -f /var/lib/ambari-server/resources/Ambari-DDL-Postgres-CREATE.sql -U ambari -p 10432 -d ambari
      

      Note: Enter the password bigdata when prompted.

    7. Continue with the next topic to start the Ambari server.

    2.7 Start Ambari 2.1.2 Server

    After the Ambari server was setup, you can start it:

    ~> ambari-server start

    To check the status of the server, use the following command:

    ~> ambari-server status
    To stop the server, use the following command:
    ~> ambari-server stop

    3. Installing a Pivotal HD cluster

    3.1 Setup YUM repository server

    You should’ve created a YUM repository server when you installed Ambari.

    3.2 Download and extract PHD stack tarballs

    The PHD stack tarballs should be installed on the machine that hosts the YUM server. Unless you’re using a dedicated machine for the YUM repository server, this will be the same admin host you used for installing the Ambari Server.

    Download the following tarballs and extract them in a dedicated location (avoid using /tmp directory).

    Stack Name Download URL Description
    Ambari-2.1.2.2 Pivotal Network Pivotal Ambari Stack contains rpms for Ambari server and Ambari agents
    PHD-3.0.1.0 Pivotal Network Pivotal Hadoop stack contains RPMs for Hadoop Services such as HDFS, YARN, HBASE, HIVE, OOZIE, ZOOKEEPER.
    PADS-1.3.1.1 Pivotal Network Contains HAWQ (a parallel SQL query engine), PXF and MADlib.
    PHD-UTILS-1.1.0.20 Pivotal Network Pivotal Hadoop Utilities stack contains optional services and libraries such as Ganglia and Nagios used for monitoring and alerting of core cluster services.

    Assuming you downloaded the tarballs to the /tmp directory and want to stage them in /staging directory:

    ~> tar -xzf /tmp/{stack}.tar.gz -C /staging/

    3.3 Setup local YUM repositories

    PHD stacks are shipped as archived YUM repositories that need to be deployed in your YUM repository server to be accessible by the Ambari Server and all cluster hosts.

    Each stack repository contains the setup_repo.sh script that assumes:

    • the YUM repository server is accessible by all hosts in the cluster
    • the document root of your YUM server is /var/www/html/

    Each stack’s script creates a symbolic link in the YUM repository server document root to point to the location of the extracted stack tarball and creates a repo definition file in /etc/yum.repos.d/ directory so that your local yum command can find the repository. It is essential that the hostnames in the repo definition files use the Fully Qualified Domain Name (FQDN) of the YUM server host that is accessible from all cluster hosts.

    For each stack, run the local repo setup script:

    ~> /staging/{stack}/setup_repo.sh

    If the repository setup was successful, the script will print out the repository URL. Write down the URL as you will need it later when installing a PHD cluster using Ambari Sever UI.

    Note: If your YUM repository server runs on a different host than the admin host where the Amabri Server is installed, copy the generated repository definition files in /etc/yum.repos.d/ to /etc/yum.repos.d on the admin host where you installed the Ambari Sever.

    Test that the repositories are properly configured – run the following command from the admin host:

    ~> yum repolist 

    You should see the repositories for the stacks listed.

    3.4 Login to Ambari Server

    Once the Ambari Server is started:

    1. Open http://{ambari.server.host}:8080 in the web browser

    2. Login to the server using user admin and the password admin. These credentials can be changed later.

    3.5 Launch Install Wizard

    Once logged into Ambari, click on “Launch Install Wizard” button to enter into cluster creation wizard. The wizard is self-explanatory and guides you through the steps necessary to provision a new PHD cluster. A few actions requiring particular attention are listed below:

    3.5.1 Modify YUM repository URLs

    In the Select Stack section, select 3.0 and click Advanced Repository Options to reveal a list of YUM repositories Ambari will search to get PHD stacks RPMs from. The values provided here out-of-the box need to be replaced with the URLs of the stack repositories you have installed previously. Replace http://SET_REPO_URL with the appropriate repository URL you have noted down earlier when you ran setup_repo.sh script for the stack. If you don’t have the links handy, you can always get them from the /etc/yum.repos.d/-.repo file.

    Note: After you deploy the cluster, you can update repositories via the Ambari UI (Admin > Repositories).

    3.5.2 Specify host names and SSH key

    In the Install Options section, you need to provide FQDN names for the hosts that will comprise your cluster. You can use the range expression using square brackets – for example, host[01-10].domain will describe 10 hosts. If you are deploying to EC2, use the internal private DNS host names.

    If you want Ambari to automatically provision and register Ambari Agents on the cluster hosts, you will need to provide a private key that you used to setup password-less SSH on your cluster. You can either pick and choose a file or copy&paste the file content into the screen form.

    Ambari Agents Manual Install

    Note: If you do not want to provide the private key or setup password-less SSH you will have to provision and configure the Ambari Agents manually. In this you have to:

    1. Setup Ambari Repository by copying /etc/yum.repos.d/ambari.repo file from the YUM repository server
    2. Install the Ambari Agent:

      ~> yum install ambari-agent 

    3. Edit the Ambari Agent configuration (/etc/ambari-agent/conf/ambari-agent.ini) to point it to the Ambari Server:

      [server]
      hostname={ambari.server.hostname}
      url_port=8440
      secured_url_port=8441
      

    4. Start the agent:

      ~> ambari-agent start

    The agent will register itself with the server when it starts.

    3.5.3 Choose Services

    You must choose the PHD services that you want to install initially. Also refer to Install HAWQ and PXF with Ambari in the HAWQ documentation if you are installing HAWQ and PXF with your PHD cluster.

    3.5.4 Assign Masters

    You need to assign “master” service components to your cluster hosts.

    Note: The HAWQ Master component MUST NOT reside on the same host that is used for Hive Metastore if Hive Metastore uses the new PostgreSQL database. This is because both these services will attempt to use port 5432. If it is absolutely required to co-locate these components on the same host, provision a PostgreSQL database beforehand on a port other than 5432 and choose “Existing PostgreSQL Database” option for Hive Metastore configuration. The same restriction applies to the admin host – neither HAWQ Master nor Hive Metastore will be able to run on the admin host where Ambari Server is installed.

    3.5.5 Assign Slaves and Clients

    You need to assign “slave” and “client” service components to your cluster hosts.

    Note: The UI panel displaying the list of services to provision on each host is scrollable, but the scroll bar is currently not visible. Make sure to scroll the main area of the page to the right to make sure you configure all the components properly.

    Note: If you are installing HAWQ service, make sure HAWQ Segment components are installed on the hosts where DataNode components are installed. While you do not have to install HAWQ Segment on each host that runs DataNode, it is mandatory to have DataNode run on a host that also runs HAWQ Segment.

    3.5.6 Install, Start and Test

    The screen shows the progress of cluster deployment on each host. Each component that is mapped to the host will be installed, started and a simple test will be run to verify the component is functional.

    You can see more details about completed and pending tasks for each host if you click on the link in the Message column.

    When “Successfully installed and started the services” message appears, click Next.

    On the Summary page, review the list of accomplished tasks and click Complete to open the cluster Dashboard.

    3.6 Cluster Dashboard

    The Dashboard is a central place that displays the services you deployed and their status. You can add new services or hosts, stop and start services and components, explore monitoring metrics and perform service specific actions.

    You cluster is ready !