Skip to content

Elasticity and CPU-driven scaling

LMI scales asynchronously from CPU utilisation, not from “one more request arrived.” There are no cold starts in the default-Lambda sense: invocations use already running execution environments on the fleet (AWS comparison table).

  • Scale-out: AWS states capacity can roughly double within about five minutes under load.
  • Scale-in: gradual; avoids flapping as CPU drops.
  • scaling_mode: Auto (default) vs options such as Manual where you need predictable capacity (see current AWS docs for your region).
  • max_vcpu_count: hard ceiling on total vCPU across instances in the provider.
Traffic load over time
───────────────────────────────────────────────────────────────────▶ time
Lambda (default):
████░░░░████████░░░████ ← new execution environments on demand; scale to zero at idle
LMI:
████████████████████████ ← baseline capacity; CPU rise triggers more instances
└── scale-out / scale-in driven by CPU, not request spikes alone

Do not conflate EC2 instances with Lambda execution environments:

LayerRole
Managed instancesEC2 hosts Lambda runs for your capacity provider (resiliency across AZs).
Execution environmentsRuntimes that execute your function code on that capacity; multi-concurrency packs many invocations per environment.

With no traffic, AWS describes LMI as scaling to minimum execution environments configured — you still have baseline instance capacity; environments can idle on that capacity. Align details with the Scaling behavior row in the official comparison table.