“There’s no compression algorithm for experience. You can’t learn certain lessons without going through the curve.” Andy Jassy, CEO AWS
There’s no substitute for experience, I’m sure this is a lesson we’ve all learned many times. I had a refresher course in this recently while trying to further refine a CloudFormation template I’ve been working on. I was experimenting with adding an S3 VPC Endpoint and it wasn’t working as I had anticipated (though it was working as it should, the problem was really a “user error” but let’s move on….) so I then decided to perform the following to resolve the S3 VPC Endpoint problem:
- Delete the S3 VPC Endpoint manually
- Tweak my CloudFormation template to fix the S3 VPC Endpoint creation statement
- Perform an update of my CloudFormation stack to deploy the new S3 VPC Endpoint
So I did that and when I ran a stack update, I received an UPDATE_FAILED message because the VPC Endpoint I had created with the previous stack update no longer existed:
This is a clean blog, so I’m not going to share the first word that came to my mind but suffice to say, I immediately thought about worst case scenarios, that I had ruined my stack and everything in it, that I had hosed my EC2 instances, etc. Granted, this isn’t a production environment so it wouldn’t be the end of the road is I had to start from scratch but still, who wants to do that?
The first lesson I learned is that resources created as part of an AWS CloudFormation stack must be managed and modified through stack updates. Some resources, like an IAM role that is tracked by name, may be re-created with the same exact name if they are manually deleted to get stack updates working once again. But other resources, like VPC Endpoints, are created with a unique ID and resources with a unique ID cannot be manually recreated.
Fortunately, I didn’t totally ruin my infrastructure and this failure was pretty easy to recover from.
- Remove references to the VPC Endpoint in your CloudFormation template and run a stack update.
With each new feature added into my CloudFormation template, I’d save the template with a new “version” number by simply adding v## to the end of the file name. In this case, I added the VPC Endpoint in v9 of my template so I ran a stack update using v8 of my CloudFormation template. As you can see below, performing this update would Remove the VPC Endpoint resource.
Second Lesson Learned: Keep “versions” of your CloudFormation templates.
- When the update completes, you should see a DELETE_COMPLETE event showing the removal of the VPC Endpoint as shown below:
- With the VPC Endpoint properly removed, edit your CloudFormation template (in this case I edited my v9 template) to set the correct VPC Endpoint statements/options and save it. Update the CloudFormation stack with the v9 template to see that a VPC Endpoint will now be added.
- When the stack update completes, a new VPC Endpoint should be created and available as shown below:
In wake of my mistake, I suggest the following when using CloudFormation to deploy AWS resources:
- Document your CloudFormation templates and share with your team
- Don’t develop and deploy CloudFormation templates without communicating to your team their purpose and what they create. Maybe you are the type who would never delete a resource created by CloudFormation, but you may have team members who will. In this example, when I manually deleted the VPC Endpoint, I wasn’t prompted that it was a resource created by CloudFormation and that deleting it could have consequences….so you’ve got to let people know what you’re doing and creating with CloudFormation templates
- This may not even need to be said, but remember to manage and/or modify resources created by CloudFormation with stack updates, not manually
- Use some method of versioning your templates, notating the changes so as to make it easier to recover from manual deletions of resources.
If you manually delete resources created by CloudFormation, don’t immediately jump to despair and the conclusion that you have just completed a resume generating event. Take a breath, evaluate the situation, and then execute the solution. In this example, I was able to perform multiple CloudFormation updates without affecting the availability of my EC2 instances.