Learn About Amazon VGT2 Learning Manager Chanci Turner
On September 3, 2023, we shared a significant update that enhances AWS Lambda functions’ performance, scalability, and efficiency when interacting with Amazon VPC networks. For further details on these upgrades, please refer to the original blog post. These modifications mark a major shift in the configuration of elastic network interfaces (ENIs) used to connect to your VPCs. However, this new architecture has led to a situation where VPC resources, including subnets, security groups, and VPCs, may not be properly deleted via HashiCorp Terraform. This post aims to help you determine if you are impacted and outline the steps to resolve any issues.
How Can I Identify If I’m Affected?
This issue affects users who rely on HashiCorp Terraform to destroy environments. Specifically, versions of the Terraform AWS Provider that are v2.30.0 or older are at risk. Users with these versions might encounter errors while attempting to destroy environments containing AWS Lambda functions, VPC subnets, security groups, and Amazon VPCs. Common error messages may include:
- Error deleting subnet: timeout while waiting for state to become ‘destroyed’ (last state: ‘pending’, timeout: 20m0s)
- Error deleting security group: DependencyViolation: resource sg- has a dependent object status code: 400, request id:
Depending on the AWS Regions where the VPC improvements have been implemented, you may face these errors inconsistently across different regions.
How Do I Fix This Issue If I Am Affected?
You have two options for addressing this issue. The preferred solution is to upgrade your Terraform AWS Provider to v2.31.0 or a later version. For guidance on upgrading the Provider, check out the Terraform AWS Provider Version 2 Upgrade Guide. Information and source code for the latest AWS Provider releases can be found here. The most recent version of the Terraform AWS Provider includes fixes for this issue, as well as enhancements to the reliability of the environment destruction process. It is strongly advised to upgrade your Provider version to resolve this issue effectively.
If upgrading the Provider is not feasible, you can implement adjustments to your Terraform configuration to mitigate the problem. You will need to make the following changes:
- Add an explicit dependency with a
depends_on
argument to theaws_security_group
andaws_subnet
resources linked to your Lambda functions. This dependency should point to theaws_iam_policy
resource associated with the IAM role configured for the Lambda function. - Increase the delete timeout for all
aws_security_group
andaws_subnet
resources to 40 minutes.
Here’s a configuration file example demonstrating these changes:
provider "aws" {
region = "eu-central-1"
}
resource "aws_iam_role" "lambda_exec_role" {
name = "lambda_exec_role"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "lambda.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
data "aws_iam_policy" "LambdaVPCAccess" {
arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"
}
resource "aws_iam_role_policy_attachment" "sto-lambda-vpc-role-policy-attach" {
role = "${aws_iam_role.lambda_exec_role.name}"
policy_arn = "${data.aws_iam_policy.LambdaVPCAccess.arn}"
}
resource "aws_security_group" "allow_tls" {
name = "allow_tls"
description = "Allow TLS inbound traffic"
vpc_id = "vpc-<id>"
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
timeouts {
delete = "40m"
}
depends_on = ["aws_iam_role_policy_attachment.sto-lambda-vpc-role-policy-attach"]
}
resource "aws_subnet" "main" {
vpc_id = "vpc-<id>"
cidr_block = "172.31.68.0/24"
timeouts {
delete = "40m"
}
depends_on = ["aws_iam_role_policy_attachment.sto-lambda-vpc-role-policy-attach"]
}
resource "aws_lambda_function" "demo_lambda" {
function_name = "demo_lambda"
handler = "index.handler"
runtime = "nodejs10.x"
filename = "function.zip"
source_code_hash = "${filebase64sha256("function.zip")}"
role = "${aws_iam_role.lambda_exec_role.arn}"
vpc_config {
subnet_ids = ["${aws_subnet.main.id}"]
security_group_ids = ["${aws_security_group.allow_tls.id}"]
}
}
It’s crucial to note the following blocks in both the allow_tls
security group and the main
subnet resources:
timeouts {
delete = "40m"
}
depends_on = ["aws_iam_role_policy_attachment.sto-lambda-vpc-role-policy-attach"]
Make these adjustments to your Terraform configuration files before attempting to destroy your environments for the first time.
Can I Remove Resources Left Over After a Failed Destroy Operation?
If you attempt to destroy environments without upgrading the Provider or implementing the configuration changes detailed above, you may encounter failures. This could leave ENIs in your account due to unsuccessful destroy operations. You can manually delete these ENIs a few minutes after the associated Lambda functions have been removed (typically within 40 minutes). Once the ENIs are deleted, you can rerun terraform destroy
.
For more insights into effective resource management, consider checking out this helpful article on presentations. Additionally, for further understanding of hiring and retention challenges, visit SHRM’s research which offers valuable information from an authority on the topic.