Skip to main content

Auto-scaling

Auto-scaling automatically adjusts the number of containers running your application based on CPU and memory utilization. This ensures your app can handle traffic spikes without manual intervention, and scales down during quiet periods to save resources.

Prerequisites

  • Your account must have auto-scaling enabled. Contact support to enable this feature.
  • Auto-scaling is available for application environments only. It is not available for database environments.

How to Configure

  1. Navigate to your app environment
  2. In the resources section, click the Auto-scaling button
  3. Toggle Enable Autoscaling
  4. Configure the scaling parameters (see below)
  5. Apply changes

Configuration Options

SettingDefaultDescription
Min Replicas1Minimum number of containers. Auto-scaling will never scale below this number.
Max Replicas3Maximum number of containers (or current container count if higher).
CPU Target70%Target CPU utilization percentage across containers.
Memory Target(optional)Target memory utilization percentage.
Scale Up Window0sStabilization window before scaling up. Default is immediate.
Scale Down Window300sStabilization window before scaling down (5 minutes by default).
Increment Step1Number of containers to add when scaling up.
Decrement Step1Number of containers to remove when scaling down.

Note: You must configure at least one metric target (CPU or Memory) for auto-scaling to work.

How It Works

When auto-scaling is enabled:

  1. The system monitors CPU and/or memory utilization across your containers
  2. If utilization exceeds the target percentage, new containers are added (up to Max Replicas)
  3. If utilization drops below the target, containers are removed (down to Min Replicas)
  4. Stabilization windows prevent rapid scaling changes (flapping)

Stabilization Windows

Stabilization windows add a delay before scaling actions take effect. This prevents the system from rapidly scaling up and down in response to brief spikes.

  • Scale Up Window (default: 0s) -- by default, scale-up is immediate to handle traffic spikes quickly
  • Scale Down Window (default: 300s / 5 minutes) -- a 5-minute delay before scaling down, preventing premature removal of containers after a brief dip in usage

Manual Scaling

When auto-scaling is enabled, the manual container count controls are disabled. You can still set an immediate desired count through the auto-scaling configuration, but ongoing scaling is managed automatically.

Best Practices

  1. Start with defaults -- the default settings work well for most applications. Monitor your app before tuning.
  2. Use scale-down stabilization -- keep the default 5-minute scale-down window to prevent flapping. Increase it if your traffic is very spiky.
  3. Set Max Replicas thoughtfully -- consider your account limits and budget when setting the maximum. Auto-scaling can increase costs during sustained traffic.
  4. Monitor after enabling -- use Metrics dashboards to observe how auto-scaling responds to your traffic patterns.
  5. Consider memory targets for memory-intensive apps -- if your app is memory-bound rather than CPU-bound, add a memory target in addition to (or instead of) CPU.

API and MCP Access

You can configure auto-scaling programmatically:

  • Public API -- use the auto-scaling endpoints to enable, configure, and monitor
  • MCP tools -- use the update-app-env-autoscaling tool