Ambari 2.1.2 Troubleshooing Guide
Table of Contents
- 1. Troubleshooting Ambari Deployments
- Reviewing Ambari Log Files
- Resolving Ambari Install and Setup Problems
- Problem: Browser crashed before Install Wizard completes
- Problem: Install Wizard reports that the cluster install has failed
- Problem: Ambari Agents May Fail to Register with Ambari Server.
- Problem: The “yum install ambari-server” Command Fails
- Problem: HDFS Smoke Test Fails
- Problem: yum Fails on Free Disk Space Check
- Resolving Cluster Deployment Problems
- Problem: Trouble Starting Ambari on System Reboot
- Problem: Metrics and Host information display incorrectly in Ambari Web
- Problem: On SUSE 11 Ambari Agent crashes within the first 24 hours
- Problem: Attempting to Start HBase REST server causes either REST server or Ambari Web to fail
- Problem: Multiple Ambari Agent processes are running, causing re-register
- Problem: Ambari stops MySQL database during deployment, causing Ambari Server to crash.
- Problem: Cluster Install Fails with Groupmod Error
- Problem: Host registration fails during Agent bootstrap on SLES due to timeout.
- Problem: Host Check Fails if Transparent Huge Pages (THP) is not disabled.
- Problem: DataNode Fails to Install on RHEL/CentOS 7.
- Problem: When running Ambari Server as non-root, kadmin couldn't open log file.
- Problem: Adding client-only services does not automatically install component dependencies.
- Problem: Automatic Agent Registration with SSH fails for a non-root configuration.
- Resolving Cluster Upgrade Problems
- Resolving General Problems
- Problem: After upgrading to Ambari 2.1.2, you receive File Does Not Exist alerts.
- During Enable Kerberos, the Check Kerberos operation fails.
- Problem: Hive developers may encounter an exception error message during Hive Service Check
- Problem: API calls for PUT, POST, DELETE respond with a "400 - Bad Request"
- Problem: Ambari is checking disk full on non-local disks; causing a high number of auto-mounted home directories
- Problem: Links in pdf documentation not working.
- Problem: kadmin running Ambari Server as non-root, cannot open log file.
Table of Contents
- Reviewing Ambari Log Files
- Resolving Ambari Install and Setup Problems
- Problem: Browser crashed before Install Wizard completes
- Problem: Install Wizard reports that the cluster install has failed
- Problem: Ambari Agents May Fail to Register with Ambari Server.
- Problem: The “yum install ambari-server” Command Fails
- Problem: HDFS Smoke Test Fails
- Problem: yum Fails on Free Disk Space Check
- Resolving Cluster Deployment Problems
- Problem: Trouble Starting Ambari on System Reboot
- Problem: Metrics and Host information display incorrectly in Ambari Web
- Problem: On SUSE 11 Ambari Agent crashes within the first 24 hours
- Problem: Attempting to Start HBase REST server causes either REST server or Ambari Web to fail
- Problem: Multiple Ambari Agent processes are running, causing re-register
- Problem: Ambari stops MySQL database during deployment, causing Ambari Server to crash.
- Problem: Cluster Install Fails with Groupmod Error
- Problem: Host registration fails during Agent bootstrap on SLES due to timeout.
- Problem: Host Check Fails if Transparent Huge Pages (THP) is not disabled.
- Problem: DataNode Fails to Install on RHEL/CentOS 7.
- Problem: When running Ambari Server as non-root, kadmin couldn't open log file.
- Problem: Adding client-only services does not automatically install component dependencies.
- Problem: Automatic Agent Registration with SSH fails for a non-root configuration.
- Resolving Cluster Upgrade Problems
- Resolving General Problems
- Problem: After upgrading to Ambari 2.1.2, you receive File Does Not Exist alerts.
- During Enable Kerberos, the Check Kerberos operation fails.
- Problem: Hive developers may encounter an exception error message during Hive Service Check
- Problem: API calls for PUT, POST, DELETE respond with a "400 - Bad Request"
- Problem: Ambari is checking disk full on non-local disks; causing a high number of auto-mounted home directories
- Problem: Links in pdf documentation not working.
- Problem: kadmin running Ambari Server as non-root, cannot open log file.
The first step in troubleshooting any problem in an Ambari-deploying Hadoop cluster is Reviewing the Ambari Log Files.
Find a recommended solution to a troubleshooting problem in one of the following sections:
Find files that log activity on an Ambari host in the following locations:
Ambari Server logs on the Ambari Server host:
/var/log/ambari-server/ambari-server.log
Ambari Agent logs on any host with an Ambari Agent:
/var/log/ambari-agent/ambari-agent.log
Ambari Agent task logs on any host with an Ambari Agent:
/var/lib/ambari-agent/data/
This location contains logs for all tasks executed on an Ambari Agent host. Each log name includes:
command-N.json - the command file corresponding to a specific task.
output-N.txt - the output from the command execution.
errors-N.txt - error messages.
![]() | Note |
---|---|
You can configure the logging level for |
Try the recommended solution for each of the following problems.
Your browser crashes or you accidentally close your browser before the Install Wizard completes.
The response to a browser closure depends on where you are in the process:
The browser closes before you press the
Deploy
button.Re-launch the same browser and continue the install process. Using a different browser forces you to re-start the entire process.
The browser closes after you press
Deploy
, while or after theInstall, Start, and Test
screen opens.Re-launch the same browser and continue the process, or log in again, using a different browser. When the
Install, Start, and Test
displays, proceed.
The Install, Start, and Test screen reports that the cluster install has failed.
The response to a report of install failure depends on the cause of the failure:
The failure is due to intermittent network connection errors during software package installs.
Use the
Retry
button on theInstall, Start, and Test
screen.The failure is due to misconfiguration or other setup errors.
Use the left navigation bar to go back to the appropriate screen. For example,
Customize Services
.Make your changes.
Continue in the normal way.
The failure occurs during the start/test sequence.
Click
Next
andComplete,
then proceed to theMonitoring Dashboard
.Use the
Services View
to make your changes.Re-start the service using
Service Actions
.
The failure is due to something else.
Open an SSH connection to the Ambari Server host.
Clear the database. At the command line, type:
ambari-server reset
Clear your browser cache.
Re-run the Install Wizard.
When deploying PHD using Ambari 1.4.x or later on RHEL CentOS 6.5, click the “Failed” link on the Confirm Hosts page in the Cluster Install wizard to display the Agent logs. The following log entry indicates the SSL connection between the Agent and Server failed during registration:
INFO 2014-04-02 04:25:22,669 NetUtil.py:55 - Failed to connect to https://{ambari-server}:8440/cert/ca due to [Errno 1] _ssl.c:492: error:100AE081:elliptic curve routines:EC_GROUP_new_by_curve_name:unknown group
For more detailed information about this OpenSSL issue, see https://bugzilla.redhat.com/show_bug.cgi?id=1025598
In certain recent Linux distributions, such as RHEL/Centos/Oracle Linux 6.x, the
default value of nproc
is lower than the value required to deploy the
HBase service successfully. If you are deploying HBase, change the value of
nproc
:
Check the OpenSSL library version installed on your host(s):
rpm -qa | grepopenssl openssl-1.0.1e-15.el6.x86_64
If the output reads
openssl-1.0.1e-15.x86_64 (1.0.1 build 15),
you must upgrade the OpenSSL library. To upgrade the OpenSSL library, run the following command:yum upgrade openssl
Verify you have the newer version of OpenSSL (1.0.1 build 16):
rpm -qa | grep opensslopenssl-1.0.1e-16.el6.x86_64
Restart Ambari Agent(s) and click
Retry -> Failed
in the wizard user interface.
You are unable to get the initial install command to run.
If your DataNodes are incorrectly configured, the smoke tests fail and you get this error message in the DataNode logs:
DisallowedDataNodeException org.apache.hadoop.hdfs.server.protocol. DisallowedDatanodeException
If you boot your Hadoop DataNodes with/as a ramdisk, you must disable the free space check for yum before doing the install. If you do not disable the free space check, yum will fail with the following error:
Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install unzip'
returned 1. Error Downloading Packages: unzip-6.0-1.el6.x86_64: Insufficient space in
download directory /var/cache/yum/x86_64/6/base/packages * free 0 * needed 149
k
Try the recommended solution for each of the following problems.
If you reboot your cluster, you must restart the Ambari Server and all the Ambari Agents manually.
Charts appear incorrectly or not at all despite Host health status is displayed incorrectly.
SUSE 11 ships with Python version 2.6.0-8.12.2 which contains a known defect that causes this crash.
As an option you can start the HBase REST server manually after the install process is complete. It can be started on any host that has the HBase Master or the Region Server installed. If you install the REST server on the same host as the Ambari server, the http ports will conflict.
On a cluster host ps aux | grep ambari-agent
shows more than one agent process running. This causes Ambari Server to get incorrect ids from the host and forces Agent to restart and re-register.
The Hive Service uses MySQL Server by default. If you choose MySQL server as the database on the Ambari Server host as the managed server for Hive, Ambari stops this database during deployment and crashes.
The cluster fails to install with an error related to running groupmod
. This can occur in environments where groups are managed in LDAP, and not on local Linux machines. You may see an error message similar to the following one:
Fail: Execution of 'groupmod hadoop' returned 10. groupmod: group 'hadoop' does not exist in /etc/group
When using SLES and performing host registration using SSH, the Agent bootstrap may fail due to timeout when running the setupAgent.py
script. The host on which the timeout occurs will show the following process hanging:
c6401.ambari.apache.org:/etc/
# ps -ef | grep zypper
root 18318 18317 5 03:15 pts/1 00:00:00 zypper -q search -s --match-exact
ambari-agent
If you have a repository registered that is prompting to accept keys, via user interaction, you may see the hang and timeout. In this case, run
zypper refresh
and confirm all repository keys are accepted for the zypper command to work without user interaction.Another alternative is to perform manual Agent setup and not use SSH for host registration. This option does not require that Ambari call zypper without user interaction.
When installing Ambari on RHEL/CentOS 6 using the Cluster Installer Wizard at the Host Checks step, one or more host checks may fail if you have not disabled Transparent Huge Pages on all hosts.
Host Checks will warn you when a failure occurs.
Disable THP. On all hosts,
Add the following command to your
/etc/rc.local
file:if test -f /sys/kernel/mm/transparent_hugepage/enabled; then echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled fi if test -f /sys/kernel/mm/transparent_hugepage/defrag; then echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag fi
To confirm, reboot the host then run the following command:
$ cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never]
During cluster install, DataNode fails to install with the following error:
resource_management.core.exceptions. Fail: Execution of '/usr/bin/yum -d 0
-e 0 -y install snappy-devel' returned 1. Error: Package: snappy-devel-1.0.5-1.el6.x86_64
(PHD-UTILS-1.1.0.20) Requires: snappy(x86-64) = 1.0.5-1.el6 Installed:
snappy-1.1.0-3.el7.x86_64 (@anaconda/7.1) snappy(x86-64) = 1.1.0-3.el7 Available:
snappy-1.0.5-1.el6.x86_64 (PHD-UTILS-1.1.0.20) snappy(x86-64) =
1.0.5-1.el6
When running Ambari Server as non-root, when enabling Kerberos, if kadmin fails to authenticate, you will see the following error in ambari-server.log if Ambari cannot access the kadmind.log.
STDERR: Couldn't open log file /var/log/kadmind.log: Permission denied
kadmin: GSS-API (or Kerberos) error while initializing kadmin interface
When adding client-only services to a cluster (using Add Service), Ambari does not automatically install dependent client components with the newly added clients.
When using an Agent non-root configuration, if you attempt to register hosts automatically using SSH, the Agent registration will fail.
Try the recommended solution for each of the following problems.
After performing an upgrade and restarting Ambari Server and the Agents, if you browse to Admin > Stack and Versions in Ambari Web, the Versions tab does not display.
After upgrading to Ambari 2.1.2, you receive alerts for "DataNode Unmounted Data Dir" that the /var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist
file does not exist.
The hadoop-env/dfs.datanode.data.dir.mount.file configuration property is no longer customizable from Ambari. The original default value of /etc/hadoop/conf/dfs_data_dir_mount.hist
is now /var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist
, which is not customizable. On Ambari Agent upgrade, Ambari will automatically move the file from /etc/hadoop/conf/dfs_data_dir_mount.hist
to /var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist
. If you have not modified this configuration property, no action is required.
When enabling Kerberos using the wizard, the Check Kerberos operation fails. In /var/log/ambari-server/ambari-server.log, you see a message: 02:45:44,490 WARN [qtp567239306-238] MITKerberosOperationHandler:384 - Failed to execute kadmin:
Check that NTP is running and confirm your hosts and the KDC times are in sync. A time skew as little as 5 minutes can cause Kerberos authentication to fail.
MySQL is the default database used by the Hive metastore. Depending on several factors, such as the version and configuration of MySQL, a Hive developer may see an exception message similar to the following one:
An exception was thrown while adding/validating classes) : Specified key was too long; max key length is 767 bytes
When attempting to perform a REST API call, you receive a 400 error response. REST API calls require the "X-Requested-By" header.
When Ambari issues it's check to detect local disk capacity and use for each Ambari
Agent, it uses df
by default instead of df -l
to only
check local disks. If using NFS auto-mounted home directories, this can lead to a high
number of home directories being mounted on each host; causing shutdown delays and disk
capacity check delays.
Links in 2.1.1. pdf documentation not working.
When running Ambari Server as non-root, when enabling Kerberos, if kadmin fails to authenticate, you will see the following error in ambari-server.log if Ambari cannot access the kadmind.log.
STDERR: Couldn't open log file /var/log/kadmind.log: Permission denied kadmin: GSS-API (or Kerberos) error while initializing kadmin interface