Pivotal HD 3.0.1 Release Notes

Pivotal HD

Pivotal HD

Pivotal HD Release Notes


Chapter 1. Pivotal HD 3.0.1 Release Notes

The official Apache versions of all Pivotal HD 3.0 components are unchanged from Pivotal HD 3.0. All Pivotal HD 3.0 components listed here are official Apache releases of the most recent stable versions available.

Pivotal provides patches only when necessary to assure the interoperability of the components. Unless you are directed by Pivotal Support to apply a patch, each of the Pivotal HD 3.0 components needs to remain at the following package version levels to ensure a certified and supported copy of Pivotal HD 3.0.

  • Apache Hadoop 2.6.0

  • Apache HBase 0.98.4

  • Apache Hive 0.14.0

  • Hue 2.6.1

  • Apache Knox 0.5.0

  • Apache Oozie 4.1.0

  • Apache Pig 0.14.0

  • Apache Ranger 0.4.0

  • Apache Spark 1.2.1

  • Apache Tez 0.5.2

  • Apache ZooKeeper 3.4.6

As of Pivotal HD 3.0, the following third-party tools are deprecated, and will be removed in a future release:

  • Ganglia 3.5.0

  • Nagios 3.5.0

  • Ganglia Web 3.5.7

Unsupported Apache Features

The following features are shipped as part of Pivotal HD 3.0 HDFS, but are not supported:

The following Apache components are shipped as part of Pivotal HD 3.0 YARN, but are not supported:

  • Fair Scheduler

  • MapReduce Uber AM

  • MapReduce Eclipse Plug-in

The following components are shipped as part of Pivotal HD 3.0 Spark, but not supported:

  • Spark Standalone

  • BlinkDB

  • GraphX

Behavioral Changes

In Pivotal HD 3.0.1, behavioral changes affect the following Hadoop components:

  • Oozie

    Previous releases of the Pivotal HD Oozie client loaded the server-side configuration file, where JAVA_HOME is specific. The current version of the client does not load that server-side file. The workaround requires the following changes in its environmental setup:

    • The Oozie system environment requires JDK 1.6 or greater

    • The machine where the users will run the Oozie command line requires $JAVA_HOME to be set to version 1.6 or greater.

Tech Previews in This Release

All Pivotal HD 3.0 components listed here are official Apache releases of the most recent stable versions available. Pivotal provides patches only when necessary to assure the interoperability of the components. Unless you are explicitly directed by Pivotal Support to apply a patch, each of the Pivotal HD 3.0 components needs to remain at the following package version levels to ensure a certified and supported copy of Pivotal HD 3.0.

  • HDFS Transparent Data at Rest Encryption

  • YARN support for Docker

Apache Patch Information

This section describes Apache JIRAs that were addressed in Pivotal HD 3.0.1.

Hadoop Common/HDFS 2.6.0

Pivotal HD 3.0.1 provides Apache Hadoop Core 2.6.0 and the following additional Apache patches:

  • HDFS-3107: Introduce truncate.

  • HDFS-7009: Active NN and standby NN have different live nodes.

  • HDFS-7056: Snapshot support for truncate.

  • HDFS-7058: Tests for truncate CLI

  • HDFS-7263: Snapshot read can reveal future bytes for appended files.

  • HDFS-7425: NameNode block deletion logging uses incorrect appender.

  • HDFS-7443: Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume

  • HDFS-7470: SecondaryNameNode need twice memory when calling reloadFromImageFile.

  • HDFS-7489: Incorrect locking in FsVolumeList#checkDirs can hang datanodes.

  • HDFS-7503: Namenode restart after large deletions can cause slow processReport

  • HDFS-7606: Fix potential NPE in INodeFile.getBlocks().

  • HDFS-7634: Disallow truncation of Lazy persist files.

  • HDFS-7638: Small fix and few refinements for FSN#truncate.

  • HDFS-7643: Test case to ensure lazy persist files cannot be truncated.

  • HDFS-7655: Expose truncate API for Web HDFS.

  • HDFS-7659: Truncate should check negative value of the new length

  • HDFS-7676: Fix TestFileTruncate to avoid bug of HDFS-7611.

  • HDFS-7677: DistributedFileSystem#truncate should resolve symlinks.

  • HDFS-7707: Edit log corruption due to delayed block removal again.

  • HDFS-7714: Simultaneous restart of HA NameNodes and DataNode can cause DataNode to register successfully with only one NameNode.

  • HDFS-7733: NFS: readdir/readdirplus return null directory attribute on failure.

  • HDFS-7738: Revise the exception message for recover lease; add more truncate tests such as truncate with HA setup, negative tests, truncate with other operations and multiple truncates.

  • HDFS-7760: Document truncate for WebHDFS.

  • HDFS-7831. Fix the starting index and end condition of the loop in FileDiffList.findEarlierSnapshotBlocks().

  • HDFS-7843: A truncated file is corrupted after rollback from a rolling upgrade.

  • HDFS-7885: Datanode should not trust the generation stamp provided by client.

  • HADOOP-926: Do not fail job history iteration when encounting missing directories.

  • HADOOP-941: Addendum patch.

  • HADOOP-11321: copyToLocal cannot save a file to an SMB share unless the user has Full Control permissions.

  • HADOOP-11368: Fix SSLFactory truststore reloader thread leak in KMSClientProvider.

  • HADOOP-11381: Fix findbugs warnings in hadoop-distcp, hadoop-aws, hadoop-azure, and hadoop-openstack

  • HADOOP-11381: Revert. (Fix findbugs warnings in hadoop-distcp, hadoop-aws, hadoop-azure, and hadoop-openstack.)

  • HADOOP-11412: POMs mention "The Apache Software License" rather than "Apache License".

  • HADOOP-11412: Revert. (POMs mention "The Apache Software License" rather than "Apache License"

  • HADOOP-11490: Expose truncate API via FileSystem and shell command.

  • HADOOP-11509: change parsing sequence in GenericOptionsParser to parse -D parameters before -files.

  • HADOOP-11510: Expose truncate API via FileContext.

  • HADOOP-11523: StorageException complaining " no lease ID" when updating FolderLastModifiedTime in WASB.

  • HADOOP-11579: Documentation for truncate.

  • HADOOP-11595: Add default implementation for AbstractFileSystem#truncate.

  • MAPREDUCE-6230: Fixed RMContainerAllocator to update the new AMRMToken service name properly.

  • YARN-570: Time strings are formated in different timezone.

  • YARN-2246: Made the proxy tracking URL always be http(s)://proxy addr:port/proxy/<appId> to avoid duplicate sections.

  • YARN-2571: RM to support YARN registry

  • YARN-2683: registry config options: document and move to core-default

  • YARN-2837: Support TimeLine server to recover delegation token when restarting.

  • YARN-2917: Fixed potential deadlock when system.exit is called in AsyncDispatcher

  • YARN-2964: RM prematurely cancels tokens for jobs that submit jobs (oozie).

  • YARN-3103: AMRMClientImpl does not update AMRM token properly.

  • YARN-3207: Secondary filter matches entites which do not have the key being filtered for.

  • YARN-3227: Timeline renew delegation token fails when RM user's TGT is expired.

  • YARN-3239: WebAppProxy does not support a final tracking url which has query fragments and params.

  • YARN-3251: Fixed a deadlock in CapacityScheduler when computing absoluteMaxAvailableCapacity in LeafQueue.

  • YARN-3269: Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path.

HBase 0.98.4

Pivotal HD 3.0.1 provides HBase 0.98.4 and the following addtional Apache patches:

  • HBASE-212: Addendum. Fixes a unit test

  • HBASE-10499: In write-heavy scenario one of the regions does not get flushed causing RegionTooBusyException

  • HBASE-11569: Addendum for not skipping replayed edits for primary region replica

  • HBASE-12238: A few exceptions on startup - PARTIAL BACKPORT

  • HBASE-12533: staging directories are not deleted after secure bulk load

  • HBASE-12575: Sanity check table coprocessor classes are loadable

  • HBASE-12536: Reduce the effective scope of GLOBAL CREATE and ADMIN permission

  • HBASE-12562: Handling memory pressure for secondary region replicas - ADDENDUM for fixing findbug reported issues

  • HBASE-12791: HBase does not attempt to clean up an aborted split when the regionserver shutting down

  • HBASE-12714: RegionReplicaReplicationEndpoint should not set the RPC Codec

  • HBASE-12958: SSH doing hbase:meta get but hbase:meta not assigned

  • HBASE-13120: Allow disabling hadoop classpath and native library lookup

Hive 0.14.0

Pivotal HD 3.0.1 provides Hive 0.14.0 and the following additional Apache patches:

  • HIVE-480: HDFSCleanup thread holds reference to FileSystem

  • HIVE-6468: HiveServer2 (tcp mode, with SASL layer) OOMs when getting a non-thrift message

  • HIVE-6679: HiveServer2 should support TCP Keepalive & Server Socket Timeout on blocking calls

  • HIVE-7175: Provide password file option to beeline

  • HIVE-7270: Serde info is not shown in show create table statement, but shows in the desc table

  • HIVE-8295: Add batch retrieve partition objects for metastore direct sql

  • HIVE-8485: SUMMARY-Hive metastore NPE with Oracle DB when there is empty value for string for tblproperties/serdeproperties/etc, table not usable after creation

  • HIVE-8762: HiveMetaStore.BooleanPointer should be replaced with an AtomicBoolean

  • HIVE-8791 : Hive permission inheritance throws exception S3

  • HIVE-8850: SUMMARY-[ObjectStore:: rollbackTransaction() needs to be looked into further

  • HIVE-8881: Receiving json "could not find job" error when web client tries to fetch all jobs from WebHCat but HDFS does not have the data.

  • HIVE-8888: Hive on Tez query output duplicate rows when there is explode in subqueries for joins

  • HIVE-8891: SUMMARY-[Another possible cause to NucleusObjectNotFoundException from drops/rollback

  • HIVE-8893: Implement whitelist for builtin UDFs to avoid untrused code execution in multiuser mode

  • HIVE-8966: Delta files created by hive hcatalog streaming cannot be compacted

  • HIVE-9025: join38.q (without map join) produces incorrect result when testing with multiple reducers

  • HIVE-9038: Join tests fail on Tez

  • HIVE-9055: Tez: union all followed by group by followed by another union all gives error

  • HIVE-9106: improve the performance of null scan optimizer when several table scans share a physical path

  • HIVE-9112: Query may generate different results depending on the number of reducers

  • HIVE-9141: HiveOnTez: mix of union all, distinct, group by generates error

  • HIVE-9155: HIVE_LOCKS uses int instead of bigint hive-txn-schema-0.14.0.mssql.sql

  • HIVE-9205: Change default Tez install directory to use /tmp instead of /user and create the directory if it does not exist

  • HIVE-9234: HiveServer2 leaks FileSystem objects in FileSystem.CACHE

  • HIVE-9235: Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR

  • HIVE-9249: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.HiveVarcharWritable cannot be cast to org.apache.hadoop.hive.common.type.HiveVarchar when joining tables

  • HIVE-9278: Cached expression feature broken in one case

  • HIVE-9351: Running Hive Jobs with Tez cause templeton to never report percent complete

  • HIVE-9359: SUMMARY-[Hive export OOM error when table is huge. (32TB data, 4800+ partitions)]

  • HIVE-9382 : Query got rerun with Global Limit optimization on and Fetch optimization off

  • HIVE-9390: Enhance retry logic wrt DB access in TxnHandler

  • HIVE-9401: SimpleFetchOptimizer for limited fetches without filters

  • HIVE-9404 NPE in org.apache.hadoop.hive.metastore.txn.TxnHandler.determineDatabaseProduct()

  • HIVE-9436: SUMMARY-[RetryingMetaStoreClient does not retry JDOExceptions]

  • HIVE-9446: JDBC DatabaseMetadata.getColumns() does not work for temporary tables

  • HIVE-9473 : sql std auth should disallow built-in udfs that allow any java methods to be called

  • HIVE-9593: ORC reader should ignore new/unknown metadata streams.

  • HIVE-9652: STDERR redirection should not use in place updates TEZ UI

  • HIVE-9665: Parallel move task optimization causes race condition

  • HIVE-9673 : Set operationhandle in ATS entities for lookups

  • HIVE-9683: Client TCP keep-alive for Thrift as a workaround for THRIFT-2788

  • HIVE-9684: Incorrect disk range computation in ORC because of optional stream kind

  • HIVE-9743 Incorrect result set for vectorized left outer join

  • HIVE-9779 : ATSHook does not log the end user if doAs=false (it logs the hs2 server user)

  • HIVE-9836: Hive on Tez: fails when virtual columns are present in the join conditions (for e.g. partition columns)

  • HIVE-9841: IOException thrown by ORC should include the path of processing file

  • HIVE-9886: Tez NPE error when initialize reducer from 2 row_number function on each join side

  • HIVE-9892: HIVE schema failed to upgrade with schematool

  • HIVE-9832: Merge join followed by union and a map join in hive on tez fails.

Knox 0.5.0

Pivotal HD 3.0.1 provides Knox 0.5.0 and the following additional Apache patches:

  • KNOX-492: Support service level replayBufferLimit for Oozie, Hive and HBase.

Oozie 4.1.0

Pivotal HD 3.0.1 provides Apache Oozie 4.1.0 and the following additional Apache patch:

  • OOZIE-208: Adding missing oozie property oozie.service.HadoopAccessorService.hadoop.configurations to oozie install script.

Pig 0.14.0

Pivotal HD 3.0.1 provides Apache Pig 0.14.0 and the following additional Apache patches:

  • PIG-156: Pig command fails because the input line is too long

  • PIG-4334: PigProcessor does not set pig.datetime.default.tz

  • PIG-4342: Pig 0.14 cannot identify the uppercase of DECLARE and DEFAULT

  • PIG-4377: Skewed outer join produce wrong result in some cases (PIG-4377-2.patch)

  • PIG-4377: Skewed outer join produce wrong result in some cases

Ranger 0.4.0

Pivotal HD 3.0.1 provides Ranger 0.4.0 and the following additional Apache patch:

  • RANGER-188: Added LSB headers to Ranger Admin/Usersync init.d scripts

Tez 0.5.2

Pivotal HD 3.0.1 provides Apache Tez 0.5.2 and the following Apache patches:

  • TEZ-1642: TestAMRecovery sometimes fails.

  • TEZ-1775: Allow setting log level per logger.

  • TEZ-1800: Integer overflow in ExternalSorter.getInitialMemoryRequirement()

  • TEZ-1836: Provide better error messages when tez.runtime.io.sort.mb, spill percentage is incorrectly configured.

  • TEZ-1851: FileSystem counters do not differentiate between different FileSystems

  • TEZ-1852: Get examples to work in Local Mode.

  • TEZ-1861: Fix failing test: TestOnFileSortedOutput.

  • TEZ-1878: Task-specific log level override not working in certain conditions

  • TEZ-1924: Tez AM does not register with AM with full FQDN causing jobs to fail in some environments.

  • TEZ-1931: Publish tez version info to Timeline.

  • TEZ-1934: TestAMRecovery may fail due to the execution order is not determined.

  • TEZ-1942: Number of tasks show in Tez UI with auto-reduce parallelism is misleading.

  • TEZ-1949: Whitelist TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast edges

  • TEZ-1962: Fix a thread leak in LocalMode.

  • TEZ-2024: Compiliation error due to conflict.

  • TEZ-2024: TaskFinishedEvent may not be logged in recovery.

  • TEZ-2037: Should log TaskAttemptFinishedEvent if taskattempt is recovered to KILLED

  • TEZ-2135: ACL checks handled incorrectly in AMWebController.

As part of Pivotal HD 3.0.1 Pivotal is providing a Tez Debugging User Interface. This interface does not impact the behavior or function of jobs that leverage Tez, and its use is optional. Patches added to facilitate the Tez Debugging User Interface include:

  • TEZ-1990: Tez UI: DAG details page shows Nan for end time when a DAG is running.

  • TEZ-2031: Tez UI: horizontal scrollbars do not appear in tables, causing them to look truncated.

  • TEZ-2038: TEZ-UI DAG is always running in tez-ui when the app is failed but no DAGFinishedEvent is logged.

  • TEZ-2043: Tez UI: add progress info from am webservice to dag and vertex views.

  • TEZ-2052: Tez UI: log view fixes, show version from build, better handling of ats url config.

  • TEZ-2056: Tez UI: fix VertexID filter,show only tez configs by default,fix appattemptid.

  • TEZ-2063: Tez UI: Flaky log url in tasks table.

  • TEZ-2065: Setting up tez.tez-ui.history-url.base with a trailing slash can result in failures to redirect correctly.

  • TEZ-2068: Tez UI: Dag view should use full window height, disable webuiservice in localmode.

  • TEZ-2069: Tez UI: appId should link to application in dag details view.

  • TEZ-2077: Tez UI: No diagnostics on Task Attempt Details page if task attempt failed

  • TEZ-2078: Tez UI: Task logs url use in-progress url causing various errors.

  • TEZ-2079: Tez UI: trailing slash in timelineBaseUrl in ui should be handled.

  • TEZ-2092: Tez UI history url handler injects spurious trailing slash.

  • TEZ-2098: Tez UI: Dag details should be the default page for dag, fix invalid time entries for failed Vertices.

  • TEZ-2101: Tez UI: Issues on displaying a table.

  • TEZ-2102: Tez UI: DAG view has hidden edges, dragging DAG by holding vertex causes unintended click.

  • TEZ-2106: TEZ UI: Display data load time, and add a refresh button for items that can be refreshed.

  • TEZ-2112: Tez UI: fix offset calculation, add home button to breadcrumbs.

  • TEZ-2114: Tez UI: task/task attempt status is not available when its running.

  • TEZ-2116: Tez UI: dags page filter does not work if more than one filter is specified.

  • TEZ-2134: TEZ UI: On request failure, display request URL and server name in error bar.

  • TEZ-2136: Some enhancements to the new Tez UI.

  • TEZ-2142: TEZ UI: Breadcrumb border color looks out of place in wrapped mode.

  • TEZ-2158: TEZ UI: Display dag/vertex names, and task/attempt index in breadcrumb

  • TEZ-2160: Tez UI: App tracking URL should support navigation back.

  • TEZ-2165: Tez UI: DAG shows running status if killed by RM in some cases.

ZooKeeper 3.4.6

Pivotal HD 3.0.1 provides ZooKeeper 3.4.6 and the following additional Apache patches:

  • ZOOKEEPER-1506: Re-try DNS hostname -> IP resolution if node connection fails

Upgrading from Pivotal HD 3.0 to Pivotal HD 3.0.1

Pivotal HD 3.0.1 is a maintenance release of Pivotal HD 3.0. If you already have Pivotal HD 3.0 installed, upgrading your cluster to Pivotal HD 3.0.1 means:

  • Keeping the same configuration files you used for Pivotal HD 3.0

  • Keeping the same data and metadata in the same location you used for Pivotal HD 3.0

Before You Begin

  • Before upgrading the stack on your cluster, review all Hadoop services and hosts in your cluster. For example, use the Hosts and Services views in Ambari Web, which summarize and list the components installed on each Ambari host, to determine the components installed on each host. Make this list on a text editor for use later.

  • Make sure you know which PHD components need to be upgraded at your installation.

  • It is highly recommended that you validate the upgrade steps in a test environment to adjust and account for any special configurations for your cluster.

  • It is also recommended that you back up your databases before beginning the upgrade, including the Ambari database, Hive Metastore database, Oozie Server database, Ranger Admin database, and Ranger Audit database.

  • It is recommended that you save your console output and the precise upgrade steps you perform for troubleshooting any issues with Technical Support.

Upgrade Procedure

To upgrade your cluster from PHD 3.0 to PHD 3.0.1, you will need to upgrade Ambari as well as the PHD stack.

Upgrade Amabari from 1.7.1-87 to 1.7.1-88

Before you begin the Hadoop stack upgrade, upgrade your Ambari server and agent components on your cluster. These steps apply to you if you currently have Ambari version 1.7.1-87 installed. You can verify your current version using the following command:

yum list AMBARI*

  1. Clean up the old Ambari repo files from all the nodes. Below are some helper commands that you can use for each of your nodes:

    • CentOS/RHEL

      ssh root@{hostname} mkdir ~/old_repo_filesssh root@{hostname} \
      "mv /etc/yum.repos.d/ambari.repo  ~/old_repo_files”
    • SLES

      Remove the repository on all of the cluster nodes. Sample command:

      ssh root@hostname zypper rr AMBARI-1.7.1
  2. Download the appropriate Ambari RPM single repository tarball for your OS from Pivotal Network: https://network.pivotal.io/products/pivotal-hd#/releases/473

    You will need Ambari-1.7.1-88 for the PHD-3.0.1 stack. You may set up the local YUM repository in a way similar to instructions provided here: http://pivotalhd.docs.pivotal.io/docs/install-ambari.html#installing-ambari-server

  3. Untar and run setup-repo.sh on the Ambari tarball:

    tar zxvf AMBARI-1.7.1-88-<os>.tar.gz
    AMBARI-1.7.1/setup_repo.sh 
  4. Copy ambari.repo to ALL of the cluster nodes:

    • CentOS/RHEL

      scp /etc/yum.repos.d/ambari.repo \
         $cluster_host:/etc/yum.repos.d/ambari.repo
    • SLES

      scp /etc/zypp/repos.d/ambari.repo \
         $cluster_host:/etc/zypp/repos.d/ambari.repo
  5. Upgrade Ambari server and Ambari agent.

    On the Ambari server, run:

    • CentOS/RHEL

      yum clean all
      yum upgrade ambari-server ambari-log4j
    • SLES

      zypper clean
      zypper up ambari-server ambari-log4j
  6. Upgrade the Ambari Agent on ALL hosts. On each Ambari Agent host:

    • CentOS/RHEL

      yum upgrade ambari-agent ambari-log4j
    • SLES

      zypper up ambari-agent ambari-log4j

      After the upgrade process completes, check each host to make sure the new 1.7.1 files have been installed:

      rpm -qa | grep ambari
  7. Start the Ambari Server. At the Ambari Server host:

    ambari-server start
  8. Launch Ambari Web. Point your browser to http://<your.ambari.server>:8080 and login.

    If you have problems, refresh your browser and clear your browser cache manually, then restart Ambari Server.

  9. Finally, if you are using LDAP, review your Ambari LDAP authentication settings.

Your Ambari upgrade is now complete.

Upgrade Cluster from PHD 3.0 to PHD 3.0.1

To upgrade your cluster from Pivotal HD 3.0 to Pivotal HD 3.0.1:

  1. Clean up the old repo files from all the nodes. Below are some helper commands that you can use for each of your nodes.

    • CentOS/RHEL

      ssh root@{hostname} mkdir ~/old_repo_files
      ssh root@{hostname} "mv /etc/yum.repos.d/PHD.repo  ~/old_repo_files"
      ssh root@{hostname} "mv /etc/yum.repos.d/PHD-3*.repo  ~/old_repo_files"
      // Don't use PHD*.repo as it will move PHD-UTILS.repo too
    • SLES

      Remove the repository on all of the cluster nodes. Example:

      ssh root@{hostname} zypper rr PHD-3*
  2. Download the appropriate PHD RPM single repository tarball for your OS from Pivotal Network. (You don't need the PHD Utils tarball if upgrading from PHD 3.0; this file remains unchanged.)

    https://network.pivotal.io/products/pivotal-hd#/releases/473

    You will need the PHD-3.0.1.0-1 tarball for the PHD-3.0.1 stack. (3.0.1.0 is the PHD version and 1 is the build number.) You may set up the local YUM repository similar to instructions provided here:

    http://pivotalhd.docs.pivotal.io/docs/install-ambari.html#install-cluster

  3. Untar and run setup-repo.sh on the PHD stack:

    tar zxf PHD-3.0.1.0-1-<os>.tar.gz
    PHD-3.0.1.0/setup_repo.sh

    You should see output similar to this:

    PHD-3.0.1.0 Repo file successfully created at
       /etc/yum.repos.d/PHD-3.0.1.0.repo.
    Use http://<your.server>/PHD-3.0.1.0 to access the repository.

    Make a note of the PHD repo URL above for use later.

  4. Copy the new PHD.repo file to ALL nodes:

    • CentOS/RHEL

      scp -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no
              /etc/yum.repos.d/PHD-3.0.1.0.repo
              root@${hostname}:/etc/yum.repos.d/
    • SLES

      zypper rr PHD-3.0.0.0
      scp /etc/zypp/repos.d/PHD-3.0.1.0.repo root@${hostname}:/etc/zypp/repos.d
  5. Disable Security.

    If your stack has Kerberos Security turned on, turn it off before performing the upgrade. Choose Ambari Web UI > Admin > Security and click Disable Security. You can turn Kerberos Security on again after performing the upgrade.

  6. Use Ambari Web > Services > Service Actions to stop all services except HDFS and ZooKeeper.

  7. Stop any client programs that access HDFS.

  8. Perform steps 9 through 14 on the NameNode host.

    In a highly-available NameNode configuration, execute the following procedure on the primary NameNode.

    To locate the primary NameNode in an Ambari-managed Pivotal HD cluster, browse Ambari Web > Services > HDFS. In Summary, click NameNode. Hosts > Summary displays the host name FQDN.

  9. If HDFS is in a non-finalized state from a prior upgrade operation, you must finalize HDFS before upgrading further. Finalizing HDFS will remove all links to the metadata of the prior HDFS version. Do this only if you do not want to rollback to that prior HDFS version.

    On the NameNode host, as the HDFS user,

    su -l <HDFS_USER>
    hdfs dfsadmin -finalizeUpgrade

    where <HDFS_USER> is the HDFS Service user, for example, hdfs.

  10. Check the NameNode directory to ensure that there is no snapshot of any prior HDFS upgrade. Specifically, using Ambari Web > HDFS > Configs > NameNode, examine the <dfs.namenode.name.dir> or the <dfs.name.dir> directory in the NameNode Directories property. Make sure that only a "\current" directory and no "\previous" directory exists on the NameNode host.

  11. Create the following logs and other files. Creating these logs allows you to check the integrity of the file system, post-upgrade.

    As the HDFS user enter,

    su -l <HDFS_USER> 

    where <HDFS_USER> is the HDFS Service user, for example, hdfs.

    Run fsck with the following flags and send the results to a log. The resulting file contains a complete block map of the file system. You use this log later to confirm the upgrade.

    hdfs fsck / -files -blocks -locations > dfs-old-fsck-1.log

    Optional: Capture the complete namespace of the file system.

    The following command does a recursive listing of the root file system:

    hadoop dfs -ls -R / > dfs-old-lsr-1.log

    Create a list of all the DataNodes in the cluster:

    hdfs dfsadmin -report > dfs-old-report-1.log

    Optional: Copy all unrecoverable data stored in HDFS to a local file system or to a backup instance of HDFS.

  12. Save the namespace.

    You must be the HDFS service user to do this and you must put the cluster in Safe Mode.

    hdfs dfsadmin -safemode enter hdfs dfsadmin -saveNamespace

    In a highly-available NameNode configuration, the command hdfs dfsadmin -saveNamespace sets a checkpoint in the first NameNode specified in the configuration, in dfs.ha.namenodes.[nameservice ID]. You can also use the dfsadmin -fs option to specify which NameNode to connect. For example, to force a checkpoint in namenode 2:

    hdfs dfsadmin -fs hdfs://namenode2-hostname:namenode2-port -saveNamespace
  13. Copy the checkpoint files located in <dfs.name.dir/current> into a backup directory.

    Find the directory using Ambari Web > HDFS > Configs > NameNode > NameNode Directories on your primary NameNode host.

    In a highly-available NameNode configuration, the location of the checkpoint depends on where the saveNamespace command is sent, as defined in the preceding step.

  14. Store the layoutVersion for the NameNode.

    Make a copy of the file at <dfs.name.dir>/current/VERSION, where <dfs.name.dir> is the value of the config parameter NameNode directories. This file will be used later to verify that the layout version is upgraded.

  15. Stop HDFS.

  16. Stop ZooKeeper.

  17. Using Ambari Web > Services > <service.name> > Summary, review each service and make sure that all services in the cluster are completely stopped.

  18. At the Hive Metastore database host, stop the Hive metastore service, if you have not done so already. Make sure that the Hive metastore database is running.

  19. If you are upgrading Hive and Oozie, back up the Hive and Oozie metastore databases on the Hive and Oozie database host machines, respectively.

    Table 1.1. Hive Metastore Database Backup and Restore

    Database TypeBackupRestore
    MySQL

    mysqldump <dbname> > <outputfilename.sql>

    For example: mysqldump hive > /tmp/mydir/backup_hive.sql

    mysql <dbname> < <inputfilename.sql>

    For example: mysql hive < /tmp/mydir/backup_hive.sql

    PostgreSQL

    sudo -u <username> pg_dump <databasename> > <outputfilename.sql>

    For example: sudo -u postgres pg_dump hive > /tmp/mydir/backup_hive.sql

    sudo -u <username> psql <databasename> < <inputfilename.sql>

    For example: sudo -u postgres psql hive < /tmp/mydir/backup_hive


    Table 1.2. Oozie Metastore Database Backup and Restore

    Database TypeBackupRestore
    MySQL

    mysqldump <dbname> > <outputfilename.sql>

    For example: mysqldump oozie > /tmp/mydir/backup_oozie.sql

    mysql <dbname> < <inputfilename.sql>

    For example: mysql oozie < /tmp/mydir/backup_oozie.sql

    PostgreSQL

    sudo -u <username> pg_dump <databasename> > <outputfilename.sql>

    For example: sudo -u postgres pg_dump oozie > /tmp/mydir/backup_oozie.sql

    sudo -u <username> psql <databasename> < <inputfilename.sql>

    For example: sudo -u postgres psql oozie < /tmp/mydir/backup_oozie.sql


  20. Backup the files in the following directories on the Oozie server host and make sure that all files, including *site.xml files are copied.

    mkdir oozie-conf-bak
    cp -R /etc/oozie/conf/* oozie-conf-bak
  21. Backup Hue. If you are using the embedded SQLite database, you must perform a backup of the database before you upgrade Hue to prevent data loss. To make a backup copy of the database, stop Hue, then "dump" the database content to a file, as follows:

    ./etc/init.d/hue stop
    su $HUE_USER
    mkdir ~/hue_backup
    cd /var/lib/hue/
    sqlite3 desktop.db .dump > ~/hue_backup/desktop.bak

    For other databases, follow your vendor-specific instructions to create a backup.

  22. On the Ambari Server host, stop Ambari Server and confirm that it is stopped.

    ambari-server stop
    ambari-server status
  23. On all hosts, clean the yum repository.

    yum clean all
  24. For each host, identify the PHD components installed on that host. Use Ambari Web (as suggested earlier) to view components on each host in your cluster. Some components, like Spark, Ranger, and Hue, are not managed by Ambari 1.7.1, hence you’ll need to list them manually. Based on the PHD components installed, edit the following upgrade commands for each host to upgrade only those components residing on that host.

    For example, if you know that a host has no HBase service or client packages installed, then you can edit the command to not include HBase:

    yum install "hadoop_3_0_1*" "hive_3_0_1*" "zookeeper_3_0_1*"
       "hbase_3_0_1*" "hadooplzo_3_0_1*" "oozie_3_0_1*" "pig_3_0_1*"
       "tez_3_0_1*" "knox_3_0_1*" "ranger_3_0_1*" "spark_3_0_1*"

    Sample command to ssh from one server:

    ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no
       root@hostname
    yum install -y "hadoop_3_0_1*" "hive_3_0_1*" "zookeeper_3_0_1*"
       "hbase_3_0_1*" "hadooplzo_3_0_1*" "oozie_3_0_1*" "pig_3_0_1*"
       "tez_3_0_1*" "knox_3_0_1*" "ranger_3_0_1*" "spark_3_0_1*"

    Install all PHD components that you want to upgrade.

    Note: Ensure that you use straight quotes with your yum install commands, since some text editors may replace them with curly quotes inadvertently.

    RHEL/CentOS/Oracle Linux

    yum install "hadoop_3_0_1*" "hive_3_0_1*" "zookeeper_3_0_1*"
       "hbase_3_0_1*" "hadooplzo_3_0_1*" "oozie_3_0_1*" "pig_3_0_1*"
       "tez_3_0_1*" "knox_3_0_1*" "ranger_3_0_1*" "spark_3_0_1*"

    SLES

    zypper install "hadoop_3_0_1*" "hive_3_0_1*" "zookeeper_3_0_1*"
       "hbase_3_0_1*" "hadooplzo_3_0_1*" "oozie_3_0_1*" "pig_3_0_1*"
       "tez_3_0_1*" "knox_3_0_1*" "ranger_3_0_1*" "spark_3_0_1
  25. On each host in the cluster, use distro-select to switch all services to the PHD 3.0.1 version:

    distro-select set all 3.0.1.0-1

    or

    ssh root@hostname distro-select set all 3.0.1.0-1
  26. Verify that the components were upgraded.

    rpm -qa | grep hdfs && rpm -qa | grep hive && rpm -qa
      | grep hcatalog && rpm -qa | grep hadoop && rpm -qa
      | grep zookeeper && rpm -qa | grep hbase && rpm -qa
      | grep hadooplzo && rpm -qa | grep oozie && rpm -qa
      | grep pig && rpm -qa | grep tez && rpm -qa | grep knox && rpm -qa
      | grep ranger && rpm -qa | grep spark && rpm -qa | grep hue
  27. Delete /usr/phd/3.0.0.0-249 folder from ALL hosts.

    rm -rf /usr/phd/3.0.0.0-249

    or

    ssh root@hostname rm -rf /usr/phd/3.0.0.0-249
  28. Confirm the stack version with the following command:

    distro-select versions

    The result should be '3.0.1.0-1'.

  29. Start Ambari server.

    ambari-server start
    ambari-server status
  30. Update the repository Base URLs in Ambari Server for the PHD 3.0.1 stack:

    Browse to Ambari Web > Admin > Repositories, then update the value for the PHD repository Base URL. Use the local repository Base URL that you configured for the PHD Stack earlier (e.g., http://<your.server>/PHD-3.0.1.0).

  31. Start all PHD 3.0.1 services, in the following order:

    ZooKeeper

    su - zookeeper
    export ZOOCFGDIR=/usr/PHD/current/zookeeper-server/conf ;
    export ZOOCFG=zoo.cfg; source /usr/PHD/current/zookeeper-server/conf/zookeeper-env.sh ;
    /usr/phd/current/zookeeper-server/bin/zkServer.sh start

    (HA NameNode upgrade only) ZooKeeper Failover Controller Daemons

    /usr/phd/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start zkfc

    (HA NameNode upgrade only) JournalNodes

    su - hdfs
    /usr/phd/current/hadoop-hdfs-journalnode/../hadoop/sbin/hadoop-daemon.sh start journalnode

    HDFS NameNode(s)

    Start the HDFS NameNode(s). Because there is no metadata schema update for this upgrade, start the NameNode(s) in normal mode:

    su - hdfs
    /usr/phd/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start namenode
  32. Start the remaining PHD services.

    On each host in the cluster, start the services that are relevant to that cluster. Ambari should show the status check on all services Green.

You now have an upgraded cluster. Ensure that your workloads run correctly on this upgraded cluster.

Fixed in Pivotal HD 3.0.1

The following features and fixes were contributed back to Apache with the release of Pivotal HD 3.0.1:

Potential Data Loss

Component

Apache JIRA

Key

Summary

HDFSHADOOP-11482, HDFS-7208BUG-34432NameNode doesn't schedule replication when a DataNode fails
HDFSHDFS-7960, HDFS-7575, HDFS-7596BUG-34958The full block report should prune zombie storages even if they're not empty

Security

Component

Apache JIRA

Key

Summary

HBase HBASE-13239 BUG-33070HBASE grants at specific column level does not work for Groups
Hive BUG-33167Hive does not prevent non-whitelisted config params from being set under certain condtions
Hue BUG-35674 Support LDAP authentication via Hue to Hiveserver2
Ranger BUG-31949In hive plugin TABLE policy restriction for all tables {{*}} fails when UDF policy for all functions {{*}} is maintained for a database.
Ranger BUG-33792Bundle oraclejdbc.jar with Ranger
Ranger BUG-33822Do not display the Tomcat SessionId as part of Audit report
Ranger BUG-34640Admin REST API appears to be open to password-less modifications
Ranger BUG-35442UserSync Process didn't sync the group when groups are added to the user at later time
YARN BUG-23732Fix how ZooKeeperSecurity works with ResourceManager

Incorrect Results

Component

Apache JIRA

Key

Summary

YARN YARN-2906 BUG-35606CapacitySchedulerPage shows HTML tags for a queue's Active Users
Ranger BUG-36124First name and last name fields should support underscores, dashes and space characters
Ranger HDFS-8219 BUG-36123setStoragePolicy with folder behavior is different after cluster restart
Tez TEZ-2397 BUG-35923Tez jobs can fail on a cluster with HDFS HA

Stability

Component

Apache JIRA

Key

Summary

Hadoop Common, HDFS HADOOP-11333 BUG-34682Fix deadlock in DomainSocketWatcher when the notification pipe is full
Hadoop Common, HDFS HDFS-4882 BUG-34388Prevent the Namenode's LeaseManager from looping forever in checkLeases
HBaseHBASE-13515, HBASE-13169, HBASE-13469,HBASE-13518BUG-36599Fixes to region replicas
Hive HIVE-10208 BUG-34033Hive job via WebHCat now requires /tez-client/conf/tez-site.xml
Hive HIVE-20085 BUG-33904Laterview on top of a view throws RuntimeException error
Hive BUG-32795HIVE schema failed to upgrade with schematool
Hive BUG-33402HiveServer2 fails to start without kerberos ticket in place before starting.
Hive BUG-33403Tez job submission fails via WebHCat
Tez TEZ-2334 BUG-34965ContainerManagementProtocolProxy modifies IPC timeout configuration without making a backup copy
YARN YARN-2340 BUG-35602Fixed NPE when queue is stopped during RM restart
YARN YARN-2414 BUG-35604RM web UI: app page will crash if app is failed before any attempt has been created
YARN YARN-2816 BUG-35600NM fail to start with NPE during container recovery
YARN YARN-2874 BUG-35601Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further apps
YARN YARN-2905 BUG-35605AggregatedLogsBlock page can infinitely loop if the aggregated log file is corrupted
YARN YARN-2992 BUG-34388ZKRMStateStore crashes due to session expiry
YARN YARN-3393 BUG-33546Getting application(s) goes wrong when app finishes before starting the attempt

Upgrade

Component

Apache JIRA

Key

Summary

Knox

BUG-33818

Persisted service registry is not updated to support HA after upgrade

Knox

BUG-34770

Persisted service registry is not updated to support HA after upgrade

Usability

Component

Apache JIRA

Key

Summary

YARN

YARN-2096

BUG-35606

CapacitySchedulerPage shows HTML tags for a queue's Active Users

Ranger

BUG-36124

First name and last name fields should support underscores, dashes and space characters

Ranger

HDFS-8219

BUG-36123

setStoragePolicy with folder behavior is different after cluster restart

TEZ

TEZ=2397

BUG-35923

Tez jobs can fail on a cluster with HDFS HA

Performance

Component

Apache JIRA

Key

Summary

HDFS HDFS-7531 BUG-34296Improve the concurrent access on FsVolumeList.

Other

Component

Apache JIRA

Key

Summary

Hadoop Common HADOOP-11710 BUG-35343Make CryptoOutputStream behave like DFSOutputStream with respect to synchronization
Hadoop Common HADOOP-11730 BUG-35340Running Jobs using S3n results in 'java.net.SocketTimeoutException' when using large >2G data sets
HDFS HADOOP-11482 BUG-35643Use correct UGI when KMSClientProvider is called by a proxy user
HDFS HDFS-8219 BUG-36123. setStoragePolicy with folder behavior is different after cluster restart
Hive HIVE-10421 BUG-35670StorageBasedAuthorizationProvider can't drop partitioned table with database prefix
Hive BUG-35266Tez 2 sub queries with same group by key, when one set group by key empty, outer join on those group-by keys generate different wrong results depending on #reducers of the join
Ranger BUG-34504Remove condition to copy DB driver jar file
Ranger BUG-33342 Setup Nexus proxy within openstack
Ranger, Windows BUG-33815Validation of Ranger settings is missing in MSI

Known Issues for Pivotal HD 3.0.1

Key

Apache JIRA

Summary

Component/s

BUG-26944 E0803: IO error, null during scheduling feed on a cluster with wire enc onOozie
BUG-33764 distro-select needs to handle user-created directories in /usr/phdPHD stack
BUG-33113 HBase-HA master fail over immediatelyHBase
BUG-31943 test_IntegrationTestRegionReplicas region not servingHBase
BUG-32509 test_runIntegrationTestMTTRwithDistributedLogs due to failure in testKillRsHoldingMetaHBase
BUG-31353 IntegrationTestRegionReplicaPerf region not servingHBase
BUG-29903 IntegrationTestMTTR slownessHBase
BUG-31936 test_hbaseBigLinkedList[Loop] TimeoutExceptionHBase
BUG-27858 HBase HA tests fail with Table Namespace Manager not ready yetHBase
BUG-27509HBASE-12464test_ReplicaCopyTable_verifyCopyTable[tableone] fails after trying several times for 'Failed to find location, tableName=hbase:meta, row=' HBase
BUG-27256 Create mr based Hbase Long running application for RollingupgradeHBase
BUG-25961 HBase tests failing with org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: Timed out; 10000msHBase
BUG-23619 On Windows, tests fail complaining about Cannot run program /usr/bin/envHBase
BUG-23981 IOException: Invalid HFile block magic in Windows unit testsHBase
BUG-22841 HBase test failed with Insufficient permissions (user=hrt_qa@EXAMPLE.COM errorHBase
BUG-21254 hbck tool fixMeta flag not able to recreate deleted entries from hbase:meta table when just one column is deletedHBase
BUG-23897 HBase REST server continuously restarting in secure windows envHBase
BUG-33278HBASE-13239HBASE grants at specific column level does not work for GroupsHBase
BUG-27656HBASE-12472, HBASE-13192IntegrationTestBulkLoad fails on mapreduceHBase
BUG-27756 test_IntegrationTestMultiGetRegionReplicas slownessHBase
BUG-31624 Dropping a 2000+ partition table takes over 10 minutesHive
BUG-31671 HS2 concurr tests intermittently fail with http mode on secure Openstack throwing org.apache.hive.service.auth.HttpAuthenticationException: Kerberos authentication failed Hive
BUG-30984 Hive compaction tests failingHive
BUG-33168 hive authorization checks on configs should not be done if authorization is diabledHive
BUG-28218 Hive CLI hangsHive
BUG-27636 Oracle: intemittent acid_concurrency test failures due to NoSuchLockExceptionHive
BUG-26243 CBO : Q51 fails in explain with Failed to breakup Windowing invocations into Groups.Hive
BUG-27507 Hive CLI returns 0 even in case of failuresHive
BUG-29427 Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHARHive
BUG-27582 Beeline Client Connections Errors - Obscure errors and exception swallow - 1Hive
BUG-25064 Hive : Q51 fails in CBO with "Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns" Hive
BUG-23260 BUG-22878 insert after alter table issueHive
BUG-21468 With CBO enabled queries fail with Failed to breakup Windowing invocations into GroupsHive
BUG-20856 Column names are missing from join expression in Map join with CBO enabledHive
BUG-18450HIVE-7160Vectorization Udf: GenericUDFConcat, is not supported for String+LongHive
BUG-33338 Hiveserver2 in HTTP mode is not applying auth_to_local rulesHive
BUG-27094 Knox HBase tests skipped for ambari test configKnox
BUG-27091 Knox YarnSysTest skipped in secure mode for ambari test configKnox
BUG-27089 Knox YarnSysTest skipped in secure mode for gsInstaller test configKnox
BUG-28039 Knox Failed Authentication AuditingKnox
BUG-28031 Knox rolling upgrade system test support for wire encryptionKnox
BUG-25049 Two WebHdfsHaFuncTest are unstable and should be fixed and enabledKnox
BUG-23889 Windows security scripting should setup Knox to use ActiveDirectory for LDAPKnox
BUG-32763 yum cannot find Xvfb for centos06Oozie
BUG-31426 Oozie failed to start due to "SA" schema not createdOozie, RelEng
BUG-32026 test_mapred_via_oozie_ha[oozie-nn-rm-5-jobs] FAILEDOozie
BUG-31349 oozie job in running state is causing intermittent timeout failuresOozie
BUG-33180 oozie logging tests fail on MR framework with postgres 9Oozie
BUG-28614 Oozie property oozie.authentication.kerberos.principal is being used for authentication of webhdfs endpointOozie
BUG-23655 Oozie stuck on mapreduce job because workflow state isn't changing from RUNNING to KILLEDOozie
BUG-23527 Oozie config referred to versioned hadoop configOozie
BUG-26428 XaAdmin and XaAgent jobs for Debian failing on Unix clusters.Ranger
BUG-33056 {start,stop}-thriftserver equivalent script on Windows is missingSpark
BUG-33054 Spark examples are packaged differently on WindowsRelEng, Spark
BUG-33335 spark make-distribution.sh doesn't evaluate SPARK_HIVE on OSXSpark
BUG-31522 You cannot search by the name of 'Queue'Tez
BUG-31565 When you kill a job in the middle of running it, it Diagnostics shows 'killed/failed due to:null".Tez
BUG-26419 Tez application fails with diff client and AM versionTez
BUG-27562 WordCount on Tez with non-nightly failed with NPETez
BUG-24831 TEZ windows test framework needs to respect ${fs.defaultFS} in tez configs.Tez
BUG-25194 AM running out of memory for partitioned bucketed tablesTez
BUG-25380 For hive to work with Virtual columns and parittion names, we need information available within a record-readerTez
BUG-24518 Tez needs to support Application TagTez
BUG-27194 [YARN-2821] Failures encountered on Distributed Shell on windowsYARN
BUG-27590 Distributed shell job does not execute python script in WindowsYARN
BUG-22522 MR Job failed when run via oozie and NN and RM were being killedYARN
BUG-27632 Add feature of retry in zookeeper clientZookeeper
BUG-30784ZOOKEEPER-1952zookeeper.log.file property is not respected; log output goes only to the zookeeper.outZookeeper
BUG-19255 Uninstalling the ZooKeeper RPM does not remove all ZooKeeper directoriesZookeeper