The normal approach for the deployment of a service involves considerable downtime to cover tasks like deployment, health check and validation. There is also some complexity and time involved in rolling back the deployment in case of validation failure with the new version of the service.
In cloud-based services, a recommended strategy to do a deployment with ‘ZERO DOWNTIME’ is called the Blue Green deployment strategy. Using this strategy, we can achieve zero downtime during deployment and have minimal impact on consumers or end users.
Here we will assume you have familiarity with writing Spring Boot based apps and deploying on Pivotal Cloud Foundry (PCF) PaaS based cloud infrastructure.
Understanding Blue Green Deployment Strategy
This strategy requires we have two versions on the production environment. One is the version that is current and LIVE in Production (this we can call Blue version). Other is the new version which we plan to promote and make LIVE (this we can call Green version).
After deployment of the new version (Green) we will do some health checks and perform sanity tests to ensure this new version is safe to promote to LIVE traffic. Once the green version has been validated, we may choose to switch traffic to Green version. Now the Green version gets all the Live Traffic.
We can choose to keep the Blue version or discard it. At any point during the Blue Green deployment, if the Green version validation fails we can choose to roll back to the previous (Blue) version.
Challenges with Blue Green Deployment Strategy
One of the challenges with this strategy is making the application backward compatible as both the Blue and Green version would be running in parallel. Usually, if there is only application code change this should not be a big deal.
The real challenge comes when the new version of the application requires a database structure change like rename of column or dropping a column. One way to work around this is to design your database changes in a phased manner where the initial change will not modify existing object attributes but will add new ones.
Once everything has been tested a migration can be done. However, this ties the development strategy to the deployment and is one of the challenges that come with Blue Green deployment.
In router-based approach the traffic flow to the Live version of the service and the new version is controlled and switched via the Cloud Foundry (CF) router. We will try to understand it in a sequence of steps.
- Say we have a simple service that gives us the weather for a location. The current version of this service in production is v1. This is the Blue version. Now we want to promote a new version v1.1. This will be the Green version.
As you can see the weather API is accessible via the URL weather.demo.com. So any request coming for weather API is routed via the CF router to the current Live version of production (v1). The new version v1.1, though deployed, is not accessible yet via any URL. Now let us make the version accessible via a temporary URL. This can be done through Command Line Interface (CLI) command as below:
$ cf push green –n weather-green.
Now any request for weather API via the production URL weather.demo.com continues to be routed to the current production version while the new version will be accessible via the new temporary URL weather-green.demo.com
Now the developers and testers can validate the new version via the temporary URL. If validation of the new version is successful, we can also bind the original URL (route) to the new version.
$ cf map-route green demo.com -n weather
The router now load balances the requests for URL weather.demo.com between version v1 and v1.1 of the Weather API.
After some time, if we are able to verify the new version is running without any problems, we can unmap the production URL from the Blue version (v1). We can also go ahead and unmap and then optionally remove the temporary route mapped to the new version.
$ cf unmap-route blue example.com -n weather
$ cf unmap-route green example.com -n weather-green
This way we have actually promoted a new version of weather API into production without any downtime.
Service Discovery Based
In service discovery-based approach we use a service registry where services will be registered. Let’s take for example Netflix Eureka service registry. So, consumers of the service will not directly invoke specified endpoint URLs but will lookup URLs for services they want to invoke from the registry and then invoke those URLs.
We first need to make the service instances Discoverable. We do this by enabling Discovery Client with the annotation @EnableDiscoveryClient on the Spring Boot app main class. Before that, we need to add below dependency into our Spring Boot project.
So when we need to switch traffic between Blue and Green instances it is done by registering of a new version of service with the same name and unregistering the old version (live version). So consumers continue to invoke the service in the same way relying on the service registry to provide it with the service URLs. It can be done in stages as below.
- Deploy the new version of the service without registering it in the service registry. This is the Green version. The Live version we will call Blue version. We perform validation tests on the Green version independently.
- If the tests are good, we register the Green version of the service with the same app name. So now Live traffic goes to both blue and green instance.
- If everything seems normal, we unregister the Blue version and now live traffic goes only to Green instance.
A variant of Blue Green deployment is the canary deployment (coarse grained canary). The main goal of this strategy is to minimize the impact to users due to rolling out an erroneous version of the application into production. This can be explained in steps as below.
- Install the application to a server instance where Live production traffic cannot be reached.
- After internal validation of the application, we can start to route a small subset of the LIVE traffic to the new version. This can be done at the Router. Say we want to only allow internal company users to first use it and then slowly to users in a city, state or country and so on.
- Anytime during this process, if a critical issue is identified we can roll back the new version.
- If all looks good, we can route all the traffic to the new version and decommission the old version or hold it for some time as a backup.
This is one way to achieve coarse grained canary deployments without any special setup.
PCF Native Rolling App Update (Beta)
PCF 2.4 natively supports ZERO Downtime rolling deployment feature. This is however in Beta mode and you would need CLI v6.40 or later to use this feature. However, this is not a full feature Blue Green deployment process, rather it allows you to perform a rolling app deployment. Below are some of the commands that support this:
Deployment (Zero downtime): cf v3-zdt-push APP-NAME
Cancel deployment (No Zero downtime guarantee): cf v3-cancel-zdt-push APP-NAME
Restart (Zero downtime): cf v3-zdt-restart APP-NAME
However, before using these commands it must be noted that these are in beta phase and there are some limitations of the use. For more information, PCF documentation must be referred.
Native Fine Grained Canary (beta)
PCF is in the process of replacing its go router implementation with service mesh (Istio) based solution. This will allow for lots of new exciting capabilities including weighted routing. Weighted routing natively allows you to send percentage based traffic to the canary app.
We will look at these upcoming capabilities in a future article. For related information, check out our webinar.