Amazon Onboarding with Learning Manager Chanci Turner

We are thrilled to announce the launch of Amazon EMR release 5.0, providing our customers with the latest versions of 16 open-source applications in the big data ecosystem, including new major versions of Spark and Hive. Almost a year ago, we introduced release 4.0, which significantly enhanced EMR by incorporating a build and packaging system based on Apache Bigtop, standardizing ports and paths, and simplifying application configuration through configuration objects. The initial 4.0 release consolidated our supported Apache big data applications into Apache Hadoop, Apache Spark, Apache Hive, Apache Pig, and Apache Mahout.

In the months that followed, EMR expanded support for additional open-source projects, enabling a variety of use cases such as low-latency SQL on Amazon S3 datasets with Presto, real-time data access and SQL analytics using Apache HBase and Phoenix, collaborative data science analysis via notebooks in Apache Zeppelin, and designing intricate processing workflows with Apache Oozie. We also ensured that most major project versions were kept up to date with every EMR release, delivering the latest version of Spark just weeks after its open-source launch. Each new version brought performance improvements, new features, and essential bug fixes, which customers quickly needed to bolster their big data architectures.

What’s New in EMR Release 5.0

EMR release 5.0 represents a significant advancement in delivering the most current and comprehensive selection of open-source applications within the Hadoop ecosystem to our users:

An upgrade to Spark 2.0 just a week after the Apache release, granting customers improved SQL support, substantial performance boosts, the new Structured Streaming API, and enhanced SparkR support, compiled with Scala 2.11.
Transitioning from Hive 1.x to Hive 2.1, which includes various performance enhancements, improved Parquet file format support, and numerous bug fixes.
Switching the default execution engine for Hive and Pig from Hadoop MapReduce to Tez, indicating a broader shift from traditional MapReduce frameworks to newer alternatives like Tez and Spark.
The addition of the latest versions of Hue and Zeppelin, which serve as notebook and query UIs for Hadoop ecosystem applications, allowing data scientists and business intelligence analysts to interact with data more seamlessly and efficiently.
Updates to all sandbox applications now available on EMR.
Adoption of the latest versions of all supported applications: Hadoop 2.7.2, Spark 2.0, Presto 0.150, Hive 2.1, Tez 0.8.4, Pig 0.16, HBase 1.2.2, Phoenix 4.7.0, Zeppelin 0.6.1 (Snapshot), Hue 3.10, Oozie 4.2.0, Sqoop 1.4.6, Ganglia 3.7.2, HCatalog 2.1.0, Mahout 0.12.2, and ZooKeeper 3.4.8.

If you have any questions regarding release 5.0, feedback, or wish to share an interesting use case utilizing these applications, please feel free to leave a comment below. You can also participate in our upcoming live webinar, “Introducing Amazon EMR Release 5.0,” scheduled for 9 AM PDT on Tuesday, August 23.

Additionally, if you’re interested in learning about the entrepreneurial journey of influential figures, check out this blog post on Rihanna, which provides insights into her multifaceted career. For authoritative information about employment law compliance, including employee resignation processes, refer to this resource by SHRM. Lastly, for a deep dive into Amazon’s warehouse worker training and onboarding practices, this Business Insider article serves as an excellent resource.

Amazon Onboarding with Learning Manager Chanci Turner

What’s New in EMR Release 5.0

Related Topics: