Auto-scaling

Auto-scaling automatically adjusts the number of containers running your application based on CPU and memory utilization. This ensures your app can handle traffic spikes without manual intervention, and scales down during quiet periods to save resources.

Prerequisites

Your account must have auto-scaling enabled. Contact support to enable this feature.
Auto-scaling is available for application environments only. It is not available for database environments.

How to Configure

Navigate to your app environment
In the resources section, click the Auto-scaling button
Toggle Enable Autoscaling
Configure the scaling parameters (see below)
Apply changes

Configuration Options

Setting	Default	Description
Min Replicas	1	Minimum number of containers. Auto-scaling will never scale below this number.
Max Replicas	3	Maximum number of containers (or current container count if higher).
CPU Target	70%	Target CPU utilization percentage across containers.
Memory Target	(optional)	Target memory utilization percentage.
Scale Up Window	0s	Stabilization window before scaling up. Default is immediate.
Scale Down Window	300s	Stabilization window before scaling down (5 minutes by default).
Increment Step	1	Number of containers to add when scaling up.
Decrement Step	1	Number of containers to remove when scaling down.

Note: You must configure at least one metric target (CPU or Memory) for auto-scaling to work.

How It Works

When auto-scaling is enabled:

The system monitors CPU and/or memory utilization across your containers
If utilization exceeds the target percentage, new containers are added (up to Max Replicas)
If utilization drops below the target, containers are removed (down to Min Replicas)
Stabilization windows prevent rapid scaling changes (flapping)

Stabilization Windows

Stabilization windows add a delay before scaling actions take effect. This prevents the system from rapidly scaling up and down in response to brief spikes.

Scale Up Window (default: 0s) -- by default, scale-up is immediate to handle traffic spikes quickly
Scale Down Window (default: 300s / 5 minutes) -- a 5-minute delay before scaling down, preventing premature removal of containers after a brief dip in usage

Manual Scaling

When auto-scaling is enabled, the manual container count controls are disabled. You can still set an immediate desired count through the auto-scaling configuration, but ongoing scaling is managed automatically.

Best Practices

Start with defaults -- the default settings work well for most applications. Monitor your app before tuning.
Use scale-down stabilization -- keep the default 5-minute scale-down window to prevent flapping. Increase it if your traffic is very spiky.
Set Max Replicas thoughtfully -- consider your account limits and budget when setting the maximum. Auto-scaling can increase costs during sustained traffic.
Monitor after enabling -- use Metrics dashboards to observe how auto-scaling responds to your traffic patterns.
Consider memory targets for memory-intensive apps -- if your app is memory-bound rather than CPU-bound, add a memory target in addition to (or instead of) CPU.

API and MCP Access

You can configure auto-scaling programmatically:

Public API -- use the auto-scaling endpoints to enable, configure, and monitor
MCP tools -- use the update-app-env-autoscaling tool

Prerequisites​

How to Configure​

Configuration Options​

How It Works​

Stabilization Windows​

Manual Scaling​

Best Practices​

API and MCP Access​