Skip to content

Elasticity and CPU-driven scaling

LMI scales asynchronously from CPU utilisation, not from “one more request arrived.” There are no cold starts in the default-Lambda sense: invocations use already running execution environments on the fleet (AWS comparison table).

Highlights

Scale-out: AWS states capacity can roughly double within about five minutes under load.
Scale-in: gradual; avoids flapping as CPU drops.
scaling_mode: Auto (default) vs options such as Manual where you need predictable capacity (see current AWS docs for your region).
max_vcpu_count: hard ceiling on total vCPU across instances in the provider.

  Traffic load over time
  ───────────────────────────────────────────────────────────────────▶ time

  Lambda (default):
  ████░░░░████████░░░████  ← new execution environments on demand; scale to zero at idle

  LMI:
  ████████████████████████  ← baseline capacity; CPU rise triggers more instances
        └── scale-out / scale-in driven by CPU, not request spikes alone

Instances vs execution environments

Do not conflate EC2 instances with Lambda execution environments:

Layer	Role
Managed instances	EC2 hosts Lambda runs for your capacity provider (resiliency across AZs).
Execution environments	Runtimes that execute your function code on that capacity; multi-concurrency packs many invocations per environment.

With no traffic, AWS describes LMI as scaling to minimum execution environments configured — you still have baseline instance capacity; environments can idle on that capacity. Align details with the Scaling behavior row in the official comparison table.

See also

Concurrency & runtimes — concurrency model
Placement & capacity — where scaling limits are configured
Supported instance families — C / M / R summary and instance type allow/exclude on the capacity provider