Auto-scaling
Auto-scaling automatically adjusts the number of containers running your application based on CPU and memory utilization. This ensures your app can handle traffic spikes without manual intervention, and scales down during quiet periods to save resources.
Prerequisites
- Your account must have auto-scaling enabled. Contact support to enable this feature.
- Auto-scaling is available for application environments only. It is not available for database environments.
How to Configure
- Navigate to your app environment
- In the resources section, click the Auto-scaling button
- Toggle Enable Autoscaling
- Configure the scaling parameters (see below)
- Apply changes
Configuration Options
| Setting | Default | Description |
|---|---|---|
| Min Replicas | 1 | Minimum number of containers. Auto-scaling will never scale below this number. |
| Max Replicas | 3 | Maximum number of containers (or current container count if higher). |
| CPU Target | 70% | Target CPU utilization percentage across containers. |
| Memory Target | (optional) | Target memory utilization percentage. |
| Scale Up Window | 0s | Stabilization window before scaling up. Default is immediate. |
| Scale Down Window | 300s | Stabilization window before scaling down (5 minutes by default). |
| Increment Step | 1 | Number of containers to add when scaling up. |
| Decrement Step | 1 | Number of containers to remove when scaling down. |
Note: You must configure at least one metric target (CPU or Memory) for auto-scaling to work.
How It Works
When auto-scaling is enabled:
- The system monitors CPU and/or memory utilization across your containers
- If utilization exceeds the target percentage, new containers are added (up to Max Replicas)
- If utilization drops below the target, containers are removed (down to Min Replicas)
- Stabilization windows prevent rapid scaling changes (flapping)
Stabilization Windows
Stabilization windows add a delay before scaling actions take effect. This prevents the system from rapidly scaling up and down in response to brief spikes.
- Scale Up Window (default: 0s) -- by default, scale-up is immediate to handle traffic spikes quickly
- Scale Down Window (default: 300s / 5 minutes) -- a 5-minute delay before scaling down, preventing premature removal of containers after a brief dip in usage
Manual Scaling
When auto-scaling is enabled, the manual container count controls are disabled. You can still set an immediate desired count through the auto-scaling configuration, but ongoing scaling is managed automatically.
Best Practices
- Start with defaults -- the default settings work well for most applications. Monitor your app before tuning.
- Use scale-down stabilization -- keep the default 5-minute scale-down window to prevent flapping. Increase it if your traffic is very spiky.
- Set Max Replicas thoughtfully -- consider your account limits and budget when setting the maximum. Auto-scaling can increase costs during sustained traffic.
- Monitor after enabling -- use Metrics dashboards to observe how auto-scaling responds to your traffic patterns.
- Consider memory targets for memory-intensive apps -- if your app is memory-bound rather than CPU-bound, add a memory target in addition to (or instead of) CPU.
API and MCP Access
You can configure auto-scaling programmatically:
- Public API -- use the auto-scaling endpoints to enable, configure, and monitor
- MCP tools -- use the
update-app-env-autoscalingtool