Automating and Utilizing EKS Clusters to Provide Stable, Autoscaling Container Services in AWS

Introduction

Scale can be difficult to estimate when launching a new application, especially when the use of the app is planned for more than one client. Consider the following scenario. A client comes to you with a set of API services they want to host as part of a larger backend API they have developed. These API services come packaged as separate, containerized images which must all run concurrently to provide the functionality that the client solution boasts about. Through the mixed use of EKS, ECR, CodeBuild/CodeCommit, and a strong DevOps policy the issue of scale became a thing of the past.

Client Issue with Evaluating Scale needs

This environment’s application is a backend API that was developed by the client to connect themselves and others to a centralized banking core. Each of the services work together as one, but they are run individually. As a result, the hosting situation had to be evaluated, including scalability. If the client was to try to run each of these services on a server individually, it would work, however when the demand for this app goes up, the server could slow down. If there was triple the normal traffic that it sees, say on a holiday, or through COVID-19 lockdowns, the server would not survive. Hardware costs money, so without proper estimates, they could have ended up spending more than they needed to. Also, something we had to consider was Armdahl’s Law, which essentially mandates that at some point virtualization becomes more lucrative than adding compute power to an existing server. To add some perspective: If you have 4 CPUs, and a latency of 2.105, adding 4 more (8 now) will only bring the latency to 2.581 (higher being a better figure here). This is a prime example of when virtualization becomes more effective. Considering this, the solution we opted for was AWS Elastic Kubernetes Service (EKS). This service has an option to automatically auto scale based on CPU or memory needs. There was no longer a need to consider usage estimates as now the number of pods can increase and decrease on their own depending on demand.

DevOps Comes into Play

In order to properly manage a project of this scale we had to employ a slew of DevOps practices that work hand in hand to ensure that the environment is consistent, free of issues, well-maintained, and maintains scalability. First up is the code control. The client develops the application services themselves, so we do not have any input on the code itself. However, to help things along, we suggested the client use AWS Code Commit private repos allowing them to keep their code tightly controlled and versioned. Code additions or changes here are branched according to their ticket numbers so everything can be correlated to issue tracking or feature tickets. When new code is ready to be deployed, we use CodeBuild to ensure that code is built correctly and that we can monitor the status of it.
Second is the service builds. When a component of the API is built, it is packaged as an image and stored in AWS Elastic Container Registry. Any deployments to EKS come from ECR. This ensures that the containers come from a private repo that is accessible by the client’s network configuration. ECR also integrates with the CodeBuild and EKS quite well so it made perfect sense to go with the path of least resistance.
The third point, and one of the most crucial, is configuration management. For this client, we chose to use CloudFormation to manage the templates needed to create the environment resources. There is a stack specifically for creation of the VPCs, the RDS instances, the EKS stacks, the EKS nodes, and other resources needed by the environment. These templates are all controlled by a master template which handles all the templates in the order they need to run. Of course, there are some services that cannot be run with the main stack, for example the API Gateway. This requires the ARN of the Network Load Balancer (NLB) which is created in the original stack. Actions such as this, as well as creating the Route53 subdomain, can be performed after the initial CloudFormation run.
Fourth is the deployment process. This API is comprised of more than one container so running them all together would be a hassle if there wasn’t a solid deployment process behind this. We were able to meet the challenge using AWS CodeBuild pipelines. We were able to automate the code to container process for all the environments associated with this project with a buildspec file, which is a set of instructions to execute command line actions. CodeBuild uses this file to Dockerize the code from the CodeCommit repo, publish it to ECR, and then deploy to the EKS cluster. For all environments, the code is built in Dev first, then moved to the higher environments in succession using pipelines identical to the one described above ensuring that the builds are consistent.
The final practice is monitoring. AWS CloudWatch allows us to fully monitor every part of the environment to allow for reactive service should something happen. We configured multiple alarms for each EKS pod including memory and CPU usage. Alarms also exist for API health checks (Synthetic Canaries), EC2 disk usage on the bastions, API gateway traffic, and RDS storage and CPU usage. We used an SNS topic to send email alerts to our ticketing system so if any of the alerts go off a ticket gets created and assigned to an engineer as quickly as the notification can be received. Dashboards have been implemented for the API traffic to monitor for 4XX or 5XX errors as well as VPN tunnel statuses. These alerts exist for all environments including Dev so if there is any inconsistency anywhere, we know about it and can act on it in a timely manner.

Conclusion

To sum things up, we were able to create an autoscaling EKS cluster that runs Containers from ECR that support the client’s API. Using strong DevOps policies, we were able to automate the infrastructure and deployment with CloudFormation and CodeBuild. CloudWatch and SNS allow us to keep track of the state of the environment and its resources to ensure everything remains at peak functionality. This in turn allows the client to provide a stable API offering to their clients. Without the flexibility and availability of AWS or a strong DevOps policy this project would have been difficult to build and manage. Monitoring would also have been a hassle as a third-party solution would have been needed. In conclusion, AWS and DevOps assisted us in this environment build out, allowing us to meet all customer challenges.

Devops Home

Automating and Utilizing EKS Clusters to Provide Stable, Autoscaling Container Services in AWS

Stephen Hengeli, Cloud Engineer, STG

Introduction

Client Issue with Evaluating Scale needs

DevOps Comes into Play

Conclusion

Devops Home