If you've got a business, your code is, without a question, one of your most valuable assets. And as a corollary, if something happens to that code, it can-and will-have a devastating impact on your business. And as a corollary to that, the moment there’s a degradation in your services, you can bet you’ll start disappointing end users.
When deploying new code, it can be hard to prevent disruptions to your end users. At the same time, software development is moving towards shorter development cycles by adopting agile and DevOps methodologies. These shorter cycles and more efficient releases allow organizations to deploy new application features and other update releases faster than ever before.
While this approach brings with it great advantages, in parallel, microservices and cloud-native ecosystems have increased the size and complexity of applications. And with increased complexity and speed comes an obvious increase in the likelihood of holes making their way into production. And it’s these holes that can impact your end users and cause them to look else where for other, more dependable services.
To ensure there’s no downtime or impact to users, testing in pre-production is an essential part of the testing strategy. Unit tests, smoke tests, and regression tests are critical in implementing quality standards before new builds ever hit production. However, pre-production testing alone isn’t enough to catch all issues.
Most organizations treat staging as a miniature replica of the production environment. In such cases, keeping the staging environment in as similar of a state as possible with production becomes a requirement. The fact that staging is usually a much smaller cluster also means that configuration options for every service are going to be different.This is applicable to the configurations of load balancers, databases, and queues. It will also affect how applications behave and is difficult to reproduce.
A deployment strategy is a methodology of getting new application features and other update releases with a goal of implementing the changes without breaking the production, in such a way that the user won’t be impacted by the improvements.
There are a variety of techniques to deploy new applications to production, so choosing the right strategy is an essential decision. This is especially true when considering the techniques in terms of the impact changes may have on the system and on end users. Below, we’ll explore some of the most commonly used strategies, looking at their benefits and things to consider when deciding which path to take in your production environment.
In the Kamikaze technique, all nodes within a single environment are updated at the same time with a single new service version. You shut down version A, and then deploy version B after version A is turned off.
Benefits:
• Easy to implement
• Fast to deploy
Considerations:
• High Risk
• Outages, slow rollback
Blue-Green deployments use two deployment configurations, called “blue” (staging) and “green” (production). As you prepare a new release of your software, you do your final stage of testing in the Blue environment. Once the software is working in the green environment, you switch the router so that all incoming requests go to the blue environment. The green one is now idle.
If necessary, you can roll back to the older, green, version by switching service back to the previous version.
Benefits:
• Zero downtime - Because you just flip a switch (in the router component), there is no downtime.
• Instant rollback - If anything goes wrong, you switch the router back to your blue environment.
• Environment separation
Considerations:
• Cost and operational overhead
• Backward compatibility
In rolling deployment, teams maintain one production environment for a distributed application. It consists of multiple instances, each hosting a copy of the application. There is also usually a load balancer that routes traffic between servers.
To deploy a new version of the application, the team staggers the change releases so that the update activates on some servers or instances before others. In this case, some servers run the new version of the application, while others continue to host the older one. As traffic comes into the application, some users interact with the new code, while others land on the known-good production version.
Benefits:
• No downtime - You direct traffic to the updated deployment targets only after the new version of the application is ready to accept traffic.
• Reduced deployment risk - Any instability in the new version affects only a portion of the users.
Considerations:
• Slow rollback - If the new rollout is unstable, you can terminate the new replicas and redeploy the old version. However, like a rollout, a rollback is a gradual, incremental process.
• Backward compatibility
The canary deployment pattern is similar to a rolling deployment in that the team makes the new release available to some users before others. However, the canary technique targets certain users to receive access to the new application version, rather than certain servers.
Benefits:
• Ability to test live production traffic .
• Fast rollback - You can roll back quickly by redirecting the user traffic to the older version of the application.
• Zero downtime
Considerations:
• Slow rollout - Each incremental release requires monitoring for a reasonable period and, as a result, might delay the overall release. Canary tests can often take several hours.
• Observability - A prerequisite to implementing canary tests is the ability to effectively observe and monitor your infrastructure and application stack. Implementing robust monitoring requires substantial effort.
Today, microservices and container orchestration platforms like Kubernetes to support them have become the new norm. With Istio, you can create a robust deployment and release strategy through traffic mirroring. Traffic mirroring allows you to implement a similar setup for operational acceptance testing.
It goes the extra mile by enabling you to do testing using live traffic, without impacting end users.
In order to test configuration changes holistically, you need to take into account that not all configuration is static and on modern platforms, dynamic configuration changes are inevitable.
Testing configuration changes with the same rigor as code changes is something that is rarely done due to its complexity. But in today's reality, this is something that must be done.
Techniques like integration testing and blue green changes can help minimize the risk when deploying applications. But when changing configuration, the impact radius may be much larger than the exact controlled environment or service you’re trying to update.
Let’s imagine, for example, you have a 5,000 line Nginx file that handles a million requests a second at the edge of a major customer. Due to the complex dependencies between resources, even a small configuration change deployed to a small number of users can create an impact that goes beyond the limited users in the test.
Here's another example; Rolling out Envoy in larger environments requires writing a ton of configurations. And then, there’s rolling it out to a huge number of servers. The most common outcome in a situation like this? The many, many DevOps conversations about how to safely roll out configuration and how to truly understand the behavior and build confidence in configuration files before going to production.
This is why configurations must be verified before production!!!
Lightlytics Continuous Simulation seamlessly integrates into the existing CI/CD pipeline, without disturbing the existing workflow. Lightlytics platform executes a simulation of every configuration change or update to predict any possible outcomes before deploying new configurations to production. It’s just what organizations need to move fast and prevent devastating downtime. To learn more about Lightlytics and how it can benefit your organization, get in touch with us today.
we invite you to our early access program or contact me directly or@lightlytics.com
Stream Security leads in Cloud Detection and Response, modeling all cloud activities and configurations in real-time to uncover adversary intent. The platform correlates activities by principles, helping security teams connect the dots and understand correlations among cloud operations. It reveals each alert's exploitability and blast radius to predict the adversary's next move, enabling security teams to detect, investigate, and respond with confidence, outpacing the adversary.