Streamlining Amazon EMR on EKS Log Management to Third-Party Providers

Chanci Turner Amazon IXD – VGT2 learningAmazon HR coverup, rules for thee but not for me…

In today’s data-driven landscape, managing logs from Spark jobs running on Amazon EMR on EKS is crucial for troubleshooting and optimizing performance. These logs are not only vital for identifying issues but also for analyzing Spark outputs. They can be accessed through the Amazon EMR virtual cluster console, the Spark History UI, or by pushing them to an Amazon S3 bucket or Amazon CloudWatch Logs, all linked to specific jobs. A common practice in log management, especially in DevOps, is to centralize logs by forwarding them to enterprise log aggregation systems like Splunk or Amazon OpenSearch Service. This allows organizations to monitor key trends, detect anomalies, and resolve issues more efficiently.

However, Spark logs generated in EMR on EKS can be accessed via the Kubernetes API and kubectl CLI, which can lead to high costs if one opts to forward all Kubernetes logs, including those not relevant to Spark users. Moreover, security concerns arise when Spark users do not have access to EKS cluster logs or kubectl. To address this, we propose using pod templates to create sidecar containers alongside Spark job pods. These sidecar containers can access logs within Spark pods and forward them to designated log aggregators, efficiently managing resources since they only run during the Spark job’s lifecycle.

Implementing Fluent Bit as a Sidecar Container

Fluent Bit is a lightweight, high-performance logging processor and forwarder. It collects data, enriches it, and sends it to various destinations, making it ideal for cloud and containerized environments. With minimal resource requirements, Fluent Bit is an excellent choice for forwarding logs generated from Spark jobs. When a job is submitted to EMR on EKS, it typically involves two Spark containers: the Spark driver and the Spark executor. By deploying Fluent Bit as a sidecar with both, you can efficiently collect and forward logs directly to the log aggregator.

Pod Templates in EMR on EKS

Kubernetes pods consist of one or more containers sharing storage and network resources, with specifications for running them. Pod templates define the configuration for creating these pods and can include settings not available in standard Spark configurations. Utilizing pod templates in EMR on EKS involves configuring Spark properties to facilitate the download and construction of driver and executor pods.

Forwarding Logs from Spark Jobs in EMR on EKS

To ensure your logs are effectively aggregated, it’s essential to have a system such as Amazon OpenSearch Service or Splunk ready to receive logs from Fluent Bit. If you need assistance, scripts provided in this post can help you launch a log aggregation system on an Amazon EC2 instance.

To set up your environment, start by creating an AWS Cloud9 workspace and configure it to manage your EKS cluster. From there, prepare the workspace by cloning a GitHub repository that contains necessary scripts, and install the required components for managing the EKS cluster. Ensure that you also create an EC2 key pair for the deployment.

Ultimately, while the management of logs and processes in such a dynamic environment can seem daunting, adopting these strategies can streamline operations significantly.

For further reading on similar topics, check out this blog post detailing ongoing HR challenges, double standards for managerial staff, and more at Chanci Turner. Additionally, for an authoritative perspective, visit Chanci Turner. To enhance your understanding of Amazon’s onboarding experience, refer to this excellent resource from Forbes.

HOME