As modern computing moves more and more workloads to the cloud, it is very important to have high availability and scalable design for applications to handle large a variation in the amount of workload.
Pivotal Cloud Foundry (PCF) provides functionality to “auto scale” the deployed application seamlessly without manual interventions. Autoscaling provides the ability to ramp up or ramp down the number of deployed application instances based on various parameters also known as ‘auto-scaling rules’.
In this article, we are assuming you are familiar with Spring boot and how to deploy an application to PCF using Cloud Foundry (CF) Command Line Interface (CLI).
What to Explore
Setup App Auto Scaler
To enable auto-scaling, the application must be bound to the App Auto Scaler service. This service can be instantiated from App manager or using CF CLI (which needs App Auto Scaler plugins for the CLI to be enabled).
Application can be bound to App Auto Scaler service using:
- Apps manager User interface
- CF command-line interface
Configuring Auto Scaler
Once the application has been bound to App Auto Scaler service, it can be configured using various parameters (i.e. auto scaling rules) which we will be seeing in brief below.
Configuring the scaling rules can also be achieved through Apps manager user interface or CF CLI.
The following are some useful CLI commands for configuring auto-scaling which are self-explanatory:
Similar to the `manifest.yml` file, auto-scaling rules can be maintained in app auto scaler YML file as seen below. We can give any name to this file.
Here auto-scaling has been configured using the parameter “CPU utilization”. If CPU utilization goes below 40% then CF will scale down the application up to MIN of 2 instances whereas if CPU utilization reaches above 70% then CF will scale to the application up to MAX of 4 instances.
Let’s say we have created a YML file with above mentioned scaling health rule as “demoApp-AutoScalar.yml” at the same level where our build file is. Then we can use the below command for configuring auto-scaling for our app names “DemoMyApp”.
cf configure-autoscaling DemoMyApp demoApp-AutoScalar.yml
I would highly recommend using YML file configuration as it can be maintained alongside of code base and provides advantages considering modern style deployment approach like Blue-Green deployment, etc.
How App Auto Scaler Determines When to Scale
App Auto scalar service determines whether to ramp up / ramp down application instance or maintain the current number of instance(s) by averaging the values of configured metric for the last 120 seconds.
After every 35 seconds, App Auto Scaler service evaluates whether to auto-scale the application or not by following the approach mentioned above.
App Auto Scaler scales the apps as follows:
- Increment by one instance when any metric exceeds the High threshold specified
- Decrement by one instance only when all metrics fall below the Low threshold specified
Understanding Auto Scale Health Rule
The table below lists the metrics that you can use App Auto Scaler rules on:
|Average CPU utilization for all instances of APP
|Average memory percentage for all instances of App
|Total App request per second divided number of instances
|Average latency of application response to HTTP request
|Rabbit MQ Depth
|Queue length of the specified queue
It is very important to understand application performance while applying scaling rules on HTTP Throughput and HTTP Latency. The following points should be considered while applying scale rules on throughput or latency of HTTP requests:
- Initial number of application instances.
- Performance benchmarking results of the application (to understand at what load application performance starts to deteriorate) and how many instances are needed to avoid going beyond that load.
- While calculating HTTP Latency time, any backend service/database communication should also be taken into consideration, and if there is any proportional deterioration in backing services, they should be taken into account so as to not escalate an already deteriorated situation.
- While setting up the rule on HTTP request, we should consider peak time traffic coming to application which helps to configure auto-scaling in an efficient manner. Your max instances for autoscaling should also be able to accommodate traffic considering the unavailability of other datacenters your app may be hosted on.
While setting up the Rabbit MQ based scale rule, –subtype is a required field which holds name of the queue. For example, as seen below, we can also configure more than 1 rabbitmq queues.
Newer versions of CF also allows to set autoscaling based on a combination multiple metrics, such as those identified below:
With the recent release of CF, we can also create custom metrics based on which we can configure auto scaling for our application.
Schedule Application Auto Scaler
It is best to set up auto-scaling with multiple rules to handle rare scenarios, such as an overnight increase in traffic due to holiday seasons like Thanksgiving. These kinds of occurrences can be scheduled ahead of time.
PCF Auto Scaler provides functionality to schedule “auto-scaling” to handle for rare ‘known’ events which may impact application availability/performance.
This can be achieved from Apps manager. For this go to your deployed application which is bound to app-auto scalar service and select ‘Manage scaling’ and select ‘schedule Limit Change’. Below is the sample rule setup:
The above configuration will scale up the application on Nov 14, 2019, at 8 PM and will scale down the application on Nov 15, 2019, at 8 PM.
Challenges While Configuring Auto Scaling
As mentioned in PCF Auto-Scaler known issues official documentation, some of the commands to enable or disable autoscaling from CLI may not be supported in future versions of the CLI, so it is best to stick with the apps manager or the autoscaler API for now.
It is very important while configuring application auto-scaling that we are selecting correct metrices. Improper metrics might result in unexpected results.
Consider the following scenario — it may seem like a good idea to scale on http latency since latency or response time seems like a good indicator of when the application is under load and may need to scale. Say, typically your app is taking 500 ms to respond. If there is a considerable load on the application, you would expect the response time to go up. But that may not always be true. Consider you app is under a DDOS attack. Most of the input coming to the app now is invalid and your app processes them in under 20 ms. If there are 1000s of such requests, it will actually bring down the average response time of your app, and your app may actually scale down instead of scale-up. In such scenarios, it might be better to combine multiple metrics such as CPU, http throughput, http latency or use some custom metric for scaling.
Thus, we have seen that, if used properly, application autoscaling can be an important tool to ensure the reliability and availability of your application. For related information, check out our webinar.
Editor's Note: The blog was originally posted on March 2020 and recently updated on April 2023 for accuracy.