PHD Reference Guide

PHD Reference Guide

PHD 3.0

Configuring Ports

The tables below specify which ports must be opened for which ecosystem components to communicate with each other. Make sure the appropriate ports are opened before you install PHD.

HDFS Ports

The following table lists the default ports used by the various HDFS services.

Service

Servers

Default Ports Used

Protocol

Description

Need End User Access?

Configuration Parameters

NameNode WebUI

Master Nodes (NameNode and any back-up NameNodes)

50070

http

Web UI to look at current status of HDFS, explore file system

Yes (Typically admins, Dev/Support teams)

dfs.http.address

50470

https

Secure http service

dfs.https.address

NameNode metadata service

8020/9000

IPC

File system metadata operations

Yes (All clients who directly need to interact with the HDFS)

Embedded in URI specified by fs.defaultFS

DataNode

All Slave Nodes

50075

http

DataNode WebUI to access the status, logs etc.

Yes (Typically admins, Dev/Support teams)

dfs.datanode.http.address

50475

https

Secure http service

dfs.datanode.https.address

50010

Data transfer

dfs.datanode.address

50020

IPC

Metadata operations

No

dfs.datanode.ipc.address

Secondary NameNode

Secondary NameNode and any backup Secondanry NameNode

50090

http

Checkpoint for NameNode metadata

No

dfs.secondary.http.address

JournalNode

8485

RPC

JournalNode RPC address

dfs.journalnode.rpc-address

JournalNode

8480

http

JournalNode http address

dfs.journalnode.http-addressdfs.journalnode.https-addressdfs.journalnode.https-address

JournalNode

8481

https

JournalNode https address

dfs.journalnode.https-address

MapReduce Ports: The following table lists the default ports used by the various MapReduce services.

Service

Servers

Default Ports Used

Protocol

Description

Need End User Access?

Configuration Parameters

MapReduce

10020

http

MapReduce JobHistory server address

mapreduce.jobhistory.address

MapReduce

19888

http

MapReduce JobHistory webapp address

mapreduce.jobhistory.webapp.address

MapReduce

13562

http

MapReduce Shuffle Port

mapreduce.shuffle.port

YARN Ports: The following table lists the default ports used by the various YARN services.

Service

Servers

Default Ports Used

Protocol

Description

Need End User Access?

Configuration Parameters

ResourceManager WebUI

Master Nodes (ResourceManager and any back-up Resource Manager node)

8088

http

Web UI for Resource Manager

Yes

yarn.resourcemanager.webapp.address

ResourceManager

Master Nodes (ResourceManager Node)

8050

IPC

For application submissions

Yes (All clients who need to submit the YARN applications including Hive, Hive server, Pig)

Embedded in URI specified by yarn.resourcemanager.address

ResourceManager

Master Nodes (Resource Manager Node)

8025

http

For application submissions

Yes (All clients who need to submit the YARN applications including Hive, Hive server, Pig)

yarn.resourcemanager.resource-tracker.address

ResourceManager

Master Nodes

9099

http

ResourceManager Proxy

Proxy server port

yarn.web-proxy.address

Scheduler

Master Nodes (ResourceManager Node)

8030

http

Scheduler Address

Yes (Typically admins, Dev/Support teams)

yarn.resourcemanager.scheduler.address

ResourceManager

Master Nodes (ResourceManager Node)

8141

http

Scheduler Address

Yes (Typically admins, Dev/Support teams)

yarn.resourcemanager.admin.address

NodeManager

Master Nodes (NodeManager) and Slave Nodes

45454

http

NodeManager Address

Yes (Typically admins, Dev/Support teams)

yarn.nodemanager.address

NodeManager

Slave Nodes

8040

NodeManager

Localizer port

yarn.nodemanager.localizer.address

NodeManager

Slave Nodes

8042

http

NodeManager

Webapp port

yarn.nodemanager.webapp.address

NodeManager

Slave Nodes

8044

https

NodeManager

Webapp port

yarn.nodemanager.webapp.https.address

Timeline Server

Master Nodes

10200

http

Timeline Server Address

Yes (Typically admins, Dev/Support teams)

yarn.timeline-service.address

Timeline Server

Master Nodes

8188

http

Timeline Server Webapp Address

Yes (Typically admins, Dev/Support teams)

yarn.timeline-service.webapp.address

Timeline Server

Master Nodes

8190

https

Timeline Server Webapp https Address

Yes (Typically admins, Dev/Support teams)

yarn.timeline-service.webapp.https.address

Hive Ports

The following table lists the default ports used by the various Hive services. ( Note: Neither of these services are used in a standard PHD installation.)

Service

Servers

Default Ports Used

Protocol

Description

Need End User Access?

Configuration Parameters

Hive Server

Hive Server machine (Usually a utility machine)

10000

Service for programatically (Thrift/JDBC) connecting to Hive

Yes (Clients who need to connect to Hive either programatically or through UI SQL tools that use JDBC)

ENV Variable HIVE_PORT

Hive Web UI

Hive Server machine (Usually a utility machine)

9999

http

Web UI to explore Hive schemas

Yes

hive.hwi.listen.port

Hive Metastore

9933

http

Yes (Clients that run Hive, Pig and potentially M/R jobs that use HCatalog)

hive.metastore.uris

HBase Ports

The following table lists the default ports used by the various HBase services.

Service

Servers

Default Ports Used

Protocol

Description

Need End User Access?

Configuration Parameters

HMaster

Master Nodes (HBase Master Node and any back-up HBase Master node)

60000

Yes

hbase.master.port

HMaster Info Web UI

Master Nodes (HBase master Node and back up HBase Master node if any)

60010

http

The port for the HBase­Master web UI. Set to -1 if you do not want the info server to run.

Yes

hbase.master.info.port

Region Server

All Slave Nodes

60020

Yes (Typically admins, dev/support teams)

hbase.regionserver.port

Region Server

All Slave Nodes

60030

http

Yes (Typically admins, dev/support teams)

hbase.regionserver.info.port

HBase REST Server (optional)

All REST Servers

8080

http

The port used by HBase Rest Servers. REST servers are optional, and not installed by default

Yes

hbase.rest.port

HBase REST Server Web UI (optional)

All REST Servers

8085

http

The port used by HBase Rest Servers web UI. REST servers are optional, and not installed by default

Yes (Typically admins, dev/support teams)

hbase.rest.info.port

HBase Thrift Server (optional)

All Thrift Servers

9090

The port used by HBase Thrift Servers. Thrift servers are optional, and not installed by default

Yes

HBase Thrift Server Web UI (optional)

All Thrift Servers

9095

The port used by HBase Thrift Servers web UI. Thrift servers are optional, and not installed by default

Yes (Typically admins, dev/support teams)

hbase.thrift.info.port

Oozie Ports: The following table lists the default ports used by Oozie.

Service

Servers

Default Ports Used

Protocol

Description

Need End User Access?

Configuration Parameters

Oozie

Oozie Server

11000

TCP

The port Oozie server runs.

Yes

OOZIE_HTTP_PORT in oozie_env.sh

Oozie

Oozie Server

11001

TCP

The admin port Oozie server runs.

No

OOZIE_ADMIN_PORT in oozie_env.sh

Oozie

Oozie Server

11443

TCP

The port Oozie server runs when using HTTPS.

Yes

OOZIE_HTTPS_PORT in oozie_env.sh

ZooKeeper Ports

Service

Servers

Default Ports Used

Protocol

Description

Need End User Access?

Configuration Parameters

ZooKeeper Server

All ZooKeeper Nodes

2888

Port used by ZooKeeper peers to talk to each other. See here for more information.

No

hbase.zookeeper.peerport

ZooKeeper Server

All ZooKeeper Nodes

3888

Port used by ZooKeeper peers to talk to each other.See here for more information.

No

hbase.zookeeper.leaderport

ZooKeeper Server

All ZooKeeper Nodes

2181

Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect.

No

hbase.zookeeper.property.clientPort

MySQL Ports: The following table lists the default ports used by the various MySQL services.

Service

Servers

Default Ports Used

Protocol

Description

Need End User Access?

Configuration Parameters

MySQL

MySQL database server

3306

Kerberos Ports: The following table lists the default port used by the designated Kerberos KDC.

Service

Servers

Default Ports Used

Protocol

Description

Need End User Access?

Configuration Parameters

KDC

Kerberos KDC server

88

Port used by the designated KDC

Controlling PHD Services Manually

Starting PHD Services

Start the Hadoop services in the following order:

  • Knox

  • ZooKeeper

  • HDFS

  • YARN

  • HBase

  • Hive Metastore

  • HiveServer2

  • WebHCat

  • Oozie

Instructions

  • Start Knox. When starting the gateway with the script below, the process runs in the background. The log output is written to  /var/log/knox and a PID (process ID) is written to /var/run/knox. Execute this command on the Knox host machine.

    su -l knox -c "/usr/phd/current/knox-server/bin/gateway.sh start"
  • Start ZooKeeper. Execute this command on the ZooKeeper host machine(s):

    su - zookeeper -c "export ZOOCFGDIR=/usr/phd/current/zookeeper-server/conf ; export ZOOCFG=zoo.cfg; source /usr/phd/current/zookeeper-server/conf/zookeeper-env.sh ; /usr/phd/current/zookeeper-server/bin/zkServer.sh start"
  • Start HDFS

    • If you are running NameNode HA (High Availability), start the JournalNodes by executing these commands on the JournalNode host machines:

      su <HDFS_USER>
      /usr/phd/current/hadoop-hdfs-journalnode/../hadoop/sbin/hadoop-daemon.sh start journalnode

      where <HDFS_USER> is the HDFS user. For example, hdfs.

    • Execute this command on the NameNode host machine(s):

      su -l hdfs -c "/usr/phd/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start namenode"
    • If you are running NameNode HA, start the Zookeeper Failover Controller (ZKFC) by executing the following command on all NameNode machines. The starting sequence of the ZKFCs determines which NameNode will become Active.

      su -l hdfs -c "/usr/phd/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start zkfc"
    • If you are not running NameNode HA, execute the following command on the Secondary NameNode host machine. If you are running NameNode HA, the Standby NameNode takes on the role of the Secondary NameNode.

      su -l hdfs -c "/usr/phd/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start secondarynamenode”
    • Execute these commands on all DataNodes:

      su -l hdfs -c "/usr/phd/current/hadoop-hdfs-datanode/../hadoop/sbin/hadoop-daemon.sh start datanode"
  • Start YARN

    • Execute this command on the ResourceManager host machine(s):

      su -l yarn -c "/usr/phd/current/hadoop-yarn-resourcemanager/sbin/yarn-daemon.sh start resourcemanager"
    • Execute this command on the History Server host machine:

      su -l yarn -c "/usr/phd/current/hadoop-mapreduce-historyserver/sbin/mr-jobhistory-daemon.sh start historyserver"
    • Execute this command on all NodeManagers:

      su -l yarn -c "/usr/phd/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh start nodemanager"
  • Start HBase

    • Execute this command on the HBase Master host machine:

      su -l hbase -c "/usr/phd/current/hbase-master/bin/hbase-daemon.sh start master; sleep 25"
    • Execute this command on all RegionServers:

      su -l hbase -c "/usr/phd/current/hbase-regionserver/bin/hbase-daemon.sh start regionserver"
  • Start the Hive Metastore. On the Hive Metastore host machine, execute the following commands:

    su $HIVE_USER
    nohup /usr/phd/current/hive-metastore/bin/hive --service metastore>/var/log/hive/hive.out 2>/var/log/hive/hive.log &

    Where $HIVE_USER is the Hive user. For example, hive.

  • Start HiveServer2. On the Hive Server2 host machine, execute the following commands:

    su $HIVE_USER
    nohup /usr/lib/hive/bin/hiveserver2 -hiveconf hive.metastore.uris=" " >>/tmp/hiveserver2HD.out 2>> /tmp/hiveserver2HD.log &

    Where $HIVE_USER is the Hive user. For example, hive.

  • Start WebHCat. On the WebHCat host machine, execute the following command:

    su -l hcat -c "/usr/phd/current/hive-webhcat/sbin/webhcat_server.sh start"
  • Start Oozie. Execute the following command on the Oozie host machine:

    su -l oozie -c "/usr/phd/current/oozie-server/bin/oozied.sh start"

Stopping PHD services

Before performing any upgrades or uninstalling software, stop all of the Hadoop services in the following order:

  • Knox

  • Oozie

  • WebHCat

  • HiveServer2

  • Hive Metastore

  • HBase

  • YARN

  • HDFS

  • Zookeeper

Instructions

  • Stop Knox. Execute the following command on the Knox host machine.

    su -l knox -c "/usr/phd/current/knox-server/bin/gateway.sh stop"
  • Stop Oozie. Execute the following command on the Oozie host machine.

    su -l oozie -c "/usr/phd/current/oozie-server/bin/oozied.sh stop"
  • Stop WebHCat. On the WebHCat host machine, execute the following command:

    su -l hcat -c "/usr/phd/current/hive-webhcat/sbin/webhcat_server.sh stop"
  • Stop Hive. Execute this command on the Hive Metastore and Hive Server2 host machine.

    ps aux | awk '{print $1,$2}' | grep hive | awk '{print $2}' | xargs kill >/dev/null 2>&1
  • Stop HBase

    • Execute this command on all RegionServers:

      su -l hbase -c "/usr/phd/current/hbase-regionserver/bin/hbase-daemon.sh stop regionserver"
    • Execute this command on the HBase Master host machine:

      su -l hbase -c "/usr/phd/current/hbase-master/bin/hbase-daemon.sh stop master"
  • Stop YARN

    • Execute this command on all NodeManagers:

      su -l yarn -c "/usr/phd/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh stop nodemanager"
    • Execute this command on the History Server host machine:

      su -l yarn -c "/usr/phd/current/hadoop-mapreduce-historyserver/sbin/mr-jobhistory-daemon.sh stop historyserver"
    • Execute this command on the ResourceManager host machine(s):

      su -l yarn -c "/usr/phd/current/hadoop-yarn-resourcemanager/sbin/yarn-daemon.sh stop resourcemanager"
  • Stop HDFS

    • Execute this command on all DataNodes:

      su -l hdfs -c "/usr/phd/current/hadoop-hdfs-datanode/../hadoop/sbin/hadoop-daemon.sh stop datanode"
    • If you are not running NameNode HA (High Availability), stop the Secondary NameNode by executing this command on the Secondary NameNode host machine:

      su -l hdfs -c "/usr/phd/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh stop secondarynamenode”
    • Execute this command on the NameNode host machine(s):

      su -l hdfs -c "/usr/phd/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh stop namenode"
    • If you are running NameNode HA, stop the Zookeeper Failover Controllers (ZKFC) by executing this command on the NameNode host machines:

      su -l hdfs -c "/usr/phd/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh stop zkfc"
    • If you are running NameNode HA, stop the JournalNodes by executing these commands on the JournalNode host machines:

      su $HDFS_USER
      /usr/phd/current/hadoop-hdfs-journalnode/../hadoop/sbin/hadoop-daemon.sh stop journalnode

      where $HDFS_USER is the HDFS user. For example, hdfs.

  • Stop ZooKeeper. Execute this command on the ZooKeeper host machine(s):

    su - zookeeper -c "export ZOOCFGDIR=/usr/phd/current/zookeeper-server/conf ; export ZOOCFG=zoo.cfg; source /usr/phd/current/zookeeper-server/conf/zookeeper-env.sh ; /usr/phd/current/zookeeper-server/bin/zkServer.sh stop"

Hadoop Service Accounts

You can configure service accounts using:

  • If you are performing a Manual Install of PHD, refer to Getting Ready to Install > Create System Users and Groups in the Installing PHD Manually guide.

  • If you are performing a Ambari Install of PHD, refer to the instructions in the Installing PHD Using Ambari guide.

Supported Database Matrix for Pivotal HD

This section contains certification information on supported databases for the Pivotal HD (PHD).

The following table identifies the supported databases for PHD.

Operating System

Component

Database

PostgreSQL 8.x

PostgreSQL 9.x

MySQL 5.x

Oracle 11gr2

Other

RHEL/Centos/Oracle Linux 5.x RHEL/CentOS/Oracle Linux 6.x SLES 11 Ubuntu 12

Hive / HCatalog

Supported. For instructions on configuring this database for the Hive metastore, see Getting Ready to Install > Meet Minimum System Requirements > Installing and Configuring the Metastore in the Installing PHD Manually guide.

Supported. For instructions on configuring this database for the Hive metastore, see Getting Ready to Install > Meet Minimum System Requirements > Installing and Configuring the Metastore in the Installing PHD Manually guide.

Default. For instructions on configuring this database for the Hive metastore, see Getting Ready to Install > Meet Minimum System Requirements > Installing and Configuring the Metastore in the Installing PHD Manually guide.

Supported. For instructions on configuring this database for the Hive metastore, see Getting Ready to Install > Meet Minimum System Requirements > Installing and Configuring the Metastore in the Installing PHD Manually guide.

Oozie

Supported. For instructions on configuring this database for the Oozie metastore, see Getting Ready to Install > Meet Minimum System Requirements > Installing and Configuring the Metastore in the Installing PHD Manually guide.

Supported. For instructions on configuring this database for the Oozie metastore, see Getting Ready to Install > Meet Minimum System Requirements > Installing and Configuring the Metastore in the Installing PHD Manually guide.

Supported. For instructions on configuring this database for the Oozie metastore, see Getting Ready to Install > Meet Minimum System Requirements > Installing and Configuring the Metastore in the Installing PHD Manually guide.

Supported. For instructions on configuring this database for the Oozie metastore, see Getting Ready to Install > Meet Minimum System Requirements > Installing and Configuring the Metastore in the Installing PHD Manually guide.

Derby (default).

Hue[1]

Supported. For instructions on configuring this database for Hue, see Installing Hue > Configuring Hue for an External Database > Using Hue with PostgreSQL in the Installing PHD Manually Guide.

Supported. For instructions on configuring this database for Hue, see Installing Hue > Configuring Hue for an External Database > Using Hue with PostgreSQL in the Installing PHD Manually guide.

Supported. For instructions on configuring this database for Hue, see Installing Hue > Configuring Hue for an External Database > Using Hue with MySQL in the Installing PHD Manually Guide.

Supported. For instructions on configuring this database for Hue, see Installing Hue > Configuring Hue for an External Database > Using Hue with Oracle in the Installing PHD Manually Guide.

SQLite (default)

Ambari[2]

Default. For more information, see Preparing to Install a PHD Cluster > Meet Minimum System Requirements > Database Requirements in the Installing PHD Using Ambari guide.

Supported.

Supported.

Supported..