Learn About Amazon VGT2 Learning Manager Chanci Turner
In the realm of game analytics, different studios have unique requirements for the data they track per user. Some prioritize privacy and choose pipelines that process data anonymously to adhere to regulations. Conversely, others necessitate unique identifiers for detailed analysis, logging, AI/ML integration, monetization, and data visualization.
The Game Analytics Pipeline solution, available in the AWS Solutions catalog, is designed with privacy in mind and does not track unique identifiers such as user IDs or session IDs by default. When deployed as is, it serves as an excellent option for developers seeking a straightforward analytics solution that ensures user data remains anonymous while relying on a robust, fault-tolerant AWS infrastructure.
This blog post will guide you through customizing the schema for the Game Analytics Pipeline solution using the infrastructure as code available in the GitHub repository. You’ll discover how to incorporate personal identifiers like user IDs, client IDs, and session IDs by setting up a custom AWS Glue table and schema for data ingestion through AWS Cloud9, and redeploying the infrastructure as code to achieve a tailored schema. Once your custom solution is up and running, you can send data to the pipeline in JSON format via direct ingestion using AWS Kinesis Data Streams SDKs or through the Amazon API Gateway endpoint established with the solution via REST API calls.
Disclaimer: Please note that the code in this blog is intended for tutorial purposes only. The solution is not configured for production environments. Developers must assess their Kinesis Data Streams needs for production and estimate data ingestion volumes.
Prerequisites
Before you begin this tutorial, ensure you have the following:
- An AWS account
- Access to the GitHub repository for the Game Analytics Pipeline solution
- Intermediate understanding of game analytics data schemas
- Approximately 60 minutes
Setting Up an AWS Cloud9 Instance & Cloning the Game Analytics Pipeline
For developers unfamiliar with infrastructure as code or AWS CloudFormation, AWS Cloud9 is an ideal integrated development environment (IDE) to modify these files.
AWS Cloud9 is a cloud-based IDE that facilitates any code development project requiring an IDE, especially for writing, running, and debugging CloudFormation infrastructure as code repositories, thanks to its seamless integration with the AWS CLI. Cloud9 offers preconfigured tools and supports JavaScript, Python, and PHP, allowing developers to work from any location. Cloud9 instances are hosted on a managed Amazon EC2 instance, eliminating the need to maintain or install a local IDE.
- Navigate to AWS Cloud9 in the AWS Console and select “Create environment.”
- Assign a unique name and description for your environment, then click “Next step.”
- For Environment type, select “Create a new EC2 instance for environment (direct access).”
- Keep the default selection of t2.micro for Instance type, which is free-tier eligible.
- Under Cost-saving setting, if you need additional time to complete the tutorial or update schemas, consider adjusting the hibernation settings to “After one hour” or later. Otherwise, retain the default setting.
When you select the direct access managed EC2 instance setting, data is stored in an associated 8-GB Amazon Elastic Block Store (EBS) volume. Once coding is finished, the instance automatically hibernates based on the cost-saving setting. Your data is backed up and accessible upon relaunching the Cloud9 instance, ensuring costs are only incurred during active development, not while in hibernation.
- Click “Next step,” review the environment name and settings, and select “Create environment.” The Cloud9 environment will take a few minutes to initialize.
- After initialization, close the welcome screen by clicking the “x” button. Drag the terminal tab (bash-) to configure your window as follows.
- Clone the GitHub repository for the Game Analytics Pipeline into your Cloud9 instance by entering the command:
git clone https://github.com/awslabs/game-analytics-pipeline.git
. - Verify that the game-analytics-pipeline directory has cloned successfully. Expand the file structure by clicking the arrow next to the game-analytics-pipeline folder.
- Ensure the GitHub repository requires Node.js 12.x and Python 3. Check the versions on your Cloud9 environment.
First, check Python:
python --version
Then check Node.js:
node --version
- If your Node.js version is outdated, update it. Additionally, ensure your environment is up to date by running:
sudo yum -y update
(Note: This command may indicate “No packages marked for update,” meaning your Cloud9 instance dependencies are current.) - Run the following command to further prepare your Cloud9 environment:
curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.0/install.sh | bash
This installs NVM, which you will use to get the latest version of Node in the next step. To activate it in the same terminal window, run:. ~/.bashrc
- Now, install the latest version of Node:
nvm install node
You should confirm the installation of the latest version (e.g., v14.12.0) by running:node --version
Congratulations! You have successfully cloned the Game Analytics Pipeline repository and updated your Cloud9 environment to the latest Node.js version. You are now prepared to modify the necessary files for your custom schema in the game analytics pipeline.
Note: If you are not using Cloud9, managing dependency versions, such as Python and Node.js, via virtual environments is advisable to avoid complications. For more details on virtual environments (venv) and examples using Python, visit the Python documentation here. Another service to consider for managing and installing dependencies is Homebrew.
Updating the game-analytics-pipeline.template File
The game-analytics-pipeline.template
file defines the infrastructure as code for the solution. It includes the AWS Glue Data Catalog table, serving as the metastore for the solution’s data lake and its dependencies. As developers create custom schemas, they must update the Glue portion (starting at line 554 of the file).
- In the navigation pane, select the file located in the deployment folder titled
game-analytics-pipeline.template
to open it in a Cloud9 tab. - Scroll to line 605. You will see the following:
- This section specifies the Glue columns and their corresponding data types. Replace lines 605 to 625 with the following code:
This post serves as a valuable resource for those looking to enhance their understanding of game analytics. For more insights on planning, check out this blog post. Also, for authoritative discussions on related topics, visit this site where they discuss better integrations here. If you’re considering a career move, take a look at this excellent resource for opportunities.