Elasticity and CPU-driven scaling
LMI scales asynchronously from CPU utilisation, not from “one more request arrived.” There are no cold starts in the default-Lambda sense: invocations use already running execution environments on the fleet (AWS comparison table).
Highlights
Section titled “Highlights”- Scale-out: AWS states capacity can roughly double within about five minutes under load.
- Scale-in: gradual; avoids flapping as CPU drops.
scaling_mode:Auto(default) vs options such as Manual where you need predictable capacity (see current AWS docs for your region).max_vcpu_count: hard ceiling on total vCPU across instances in the provider.
Traffic load over time ───────────────────────────────────────────────────────────────────▶ time
Lambda (default): ████░░░░████████░░░████ ← new execution environments on demand; scale to zero at idle
LMI: ████████████████████████ ← baseline capacity; CPU rise triggers more instances └── scale-out / scale-in driven by CPU, not request spikes aloneInstances vs execution environments
Section titled “Instances vs execution environments”Do not conflate EC2 instances with Lambda execution environments:
| Layer | Role |
|---|---|
| Managed instances | EC2 hosts Lambda runs for your capacity provider (resiliency across AZs). |
| Execution environments | Runtimes that execute your function code on that capacity; multi-concurrency packs many invocations per environment. |
With no traffic, AWS describes LMI as scaling to minimum execution environments configured — you still have baseline instance capacity; environments can idle on that capacity. Align details with the Scaling behavior row in the official comparison table.
See also
Section titled “See also”- Concurrency & runtimes — concurrency model
- Placement & capacity — where scaling limits are configured
- Supported instance families — C / M / R summary and instance type allow/exclude on the capacity provider