Behavior under overload · Queen 0.16 (pushser) · June 2026

What happens when you push faster than you can consume.

Every real system eventually runs behind: a traffic spike, a slow downstream, a deploy, or simply fan-out amplifying every message N times. The question that matters then isn't "what's the peak rate." It's what does the queue do when producers outrun consumers? So we built a realistic application workload: two consumer-group fan-out, explicit per-message ack, 20 ms handlers, transient failures + a dead-letter path, and a transactional ack-and-forward pipeline. Then we deliberately drove push above the sustainable consume rate for a full hour on a two-machine setup. This is the over-provisioned-push story.

Headline

Duration 1.0 h push held above consume

Messages pushed 43.2 M 12k/s, flat the whole hour

Push errors 0 producers never blocked

Lost messages 0 backlog ≠ loss

Delivered (fan-out) 78.3 M in order, to both groups

Backlog at peak 8.7 M bounded & self-draining

Broker CPU ~1.3 / 32 cores a thin pipe under lag

Restarts / crashes 0 no death-spiral

Verdict. Pushed ~1.5k msg/s above what it could drain for a full hour, Queen 0.16 did exactly what infrastructure should do under sustained lag: accepted every push (0 errors), lost nothing, preserved per-partition order, kept the backlog bounded and in memory, and drained itself cleanly once the load eased, all on a single Postgres node with the broker idling at ~1.3 cores. The queue absorbed the lag instead of failing under it.

The scenario: a realistic application, not a pipe

Unlike a raw throughput test, this models how an app actually uses a queue. Every knob below is on at once, on a dedicated broker machine with the load generator on a separate machine over a private network, so the broker+Postgres host has all 32 cores to itself.

Fan-out. Two independent consumer groups, each receiving a full copy of every message, so consume demand is 2× the push rate.
Explicit acknowledgement. Consumers pop, do real work (~20 ms per message, ±50% jitter, plus a 1% slow tail at 200 ms), then ack, a separate round-trip from the pop, exercising the lease lifecycle.
Per-entity ordering. 5,000 session partitions, each written by a single producer with a monotonic sequence, selected with a realistic Zipf skew (a few hot sessions, a long cold tail).
Failures & DLQ. 1% of deliveries fail transiently (redelivered), 0.1% are poison messages that always fail and land in the dead-letter queue after 5 attempts.
Transactional pipeline. 5% of the primary group's completions are an atomic ack-and-push into a second queue (exactly-once handoff), itself drained by a third consumer group.
The stress. Push is driven at a closed-loop 12,000 msg/s, chosen to sit just above the sustainable consume ceiling, for a full hour.

What Queen does when it's behind

Sustained over-rate is where many queues misbehave: they push back on producers, drop messages, reorder under contention, or spiral as their storage outgrows RAM. Across the full hour at 2× fan-out demand, Queen 0.16 did none of that:

Producers were never throttled. Push held exactly 12,000 msg/s for the entire hour with 0 push errors. The lag never propagated back to the producer. The broker accepted all 43,200,340 messages.
Nothing was lost. Every one of the ~5,000 partitions kept being delivered to both groups; the growing gap was simply pending work, fully accounted for, not dropped.
Order held. Per-partition FIFO was preserved within each consumer group even under heavy competing-consumer concurrency (independently verified with an offline order-checker on companion runs: every partition's sequence delivered in order, zero gaps).
The backlog stayed bounded, and self-drained. The pending queue grew linearly to ~8.7M and the live table to ~17 GB, but stayed inside Postgres' 24 GB cache the whole time (no cold-disk spiral), and retention reclaimed it back to ~4 GB once producers stopped. Backpressure, not breakage.
No errors, no crashes, no restarts across 43.2M pushes and 78.3M deliveries. The system degraded gracefully and predictably (linear, no cliff).

Throughput & backpressure

Push is closed-loop at the target; consume runs at whatever Postgres can sustain. The difference is the backlog, and it behaves exactly as a durable buffer should.

Push (in)

12.0k/s Held for60 min Total43,200,340 Errors0

Consume (out, 2 groups)

~22.5k/s Demand (2×)24.0k/s Delivered78,287,947 Deficit~1.5k/s

Backlog (the lag)

8.74M Growthlinear, bounded After loaddrained Lost0

End-to-end latency under lag: queue-wait p50 ~1 s, completion p50 ~4 s, exactly the depth of the buffer you've chosen to let build. The instant producers ease below the consume rate, that drains away. Latency under overload is a tunable (your target rate vs. consumer fleet), not a failure.

Correctness: messy workload, clean books

Volume

43.2M Pushed43,200,340 Delivered78,287,947 Groups2 (fan-out)

Errors

0 Push0 Pop0 Ack0

Reliability paths

DLQ ✓ Redeliveries302,643 Poison → DLQas designed Restarts0

Across 43.2M pushes and 78.3M deliveries, with 1% transient failures and a poison-message DLQ path active, the broker recorded zero push, pop, or ack errors. Failed messages were redelivered (at-least-once), poison messages reached the dead-letter queue, and per-partition order held throughout.

Storage & self-recovery: backpressure, not a spiral

The classic failure mode for a queue-on-a-database under sustained lag is the live table outgrowing RAM, after which inserts start faulting cold index pages from disk and throughput collapses. Queen 0.16's parallel retention (and a generous max_wal_size) keep that from happening.

Live table

17 → 4GB Peak (1h)~17 GB vs cache< 24 GB After drain~4 GB

Host memory

~34GB of64 GB OOMnone Cold readsnone

Where the work goes

PG ~18/32 Postgres~17-20 cores Broker~1.3 cores PG conns306, steady

The backlog lived entirely in cache and reclaimed itself. The live table tracked the pending backlog up to ~17 GB (under the 24 GB shared_buffers, so reads never went cold) while completed messages were retention-pruned continuously. When producers stopped, the pending-retention window reclaimed the backlog and the table fell back to ~4 GB on its own.

Transactions kept up, even while the queue lagged

Exactly-once handoff is the deliberately heavy path: an atomic ack-and-push has to serialize its partition (advisory lock + commit-ordered stamp) so the ack and the downstream push commit together. That cost is the guarantee.

Atomic ack + push

1.75M Committed1,747,085 Errors7 (4e-6) Stage-2 pending0

The point

Even while the primary queue was running a sustained backlog, the transactional pipeline stayed fully drained (stage-2 pending held at zero) and committed 1.75M exactly-once handoffs with a 0.0004% error rate. The serializing, commit-bound path was the reliable part of the system under stress, not a bottleneck that fell over.

Configuration: exactly what ran

Broker: Queen 0.16 (pushser)

Host	32 vCPU · 64 GiB · dedicated
Image	`smartnessai/queen-mq:0.16`
`NUM_WORKERS`	10
`DB_POOL_SIZE`	50 (+250 sidecar)
`RETENTION_PARALLELISM`	8
Network	private VPC (loader off-host)

PostgreSQL 18

`shared_buffers`	24 GB
`effective_cache_size`	48 GB
`max_wal_size`	96 GB
`max_connections`	400
`work_mem` / `maint`	32 MB / 1 GB
`synchronous_commit`	on

Loader: goload app-mode

Host	16 vCPU · 32 GiB · separate VM
Target push	12,000 msg/s (closed-loop)
Producers	300 · push-batch 20 · single-writer/session
Partitions	5,000 sessions · Zipf skew 1.1
Consumer groups	2 (fan-out)
Consumers	400 / group (800 total)
Pop	batch 100 · 10 partitions/pop · long-poll
Handler	20 ms ±50% · 1% slow @ 200 ms
Failures	1% transient · 0.1% poison → DLQ (retry 5)
Transactions	5% of group-0 → stage-2 (atomic ack+push)
Payload	256-2,048 B (variable JSON)
Lease / retention	30 s · completed 120 s · pending 3600 s
Duration	3,600 s

Why 12k? For this realistic profile (explicit ack + 2× fan-out), the flat-sustainable push rate on this single Postgres node is ~11k msg/s. The throughput ceiling is Postgres CPU on the pop/ack procedures, not the broker. Running at 12k puts the system deliberately ~10% over that line so we can watch it carry a real backlog. To sustain a higher flat rate you reduce per-message database work (server-side auto-ack to fuse pop+ack, or fewer consumer groups), not add consumers.

Reproduce it

The load generator is goload in app-mode (built on the official Go client), in benchmark-queen/. The exact run:

goload -mode app -url http://<broker>:6632 \
  -queue appq -stage2-queue appq-stage2 \
  -target-rate 12000 -producers 300 -push-batch 20 \
  -sessions 5000 -skew 1.1 -consumer-groups 2 -consumers-per-group 400 \
  -pop-batch 100 -pop-partitions 10 -pop-wait -idle-conns 8192 \
  -process-ms 20 -process-jitter 0.5 -slow-msg-pct 1 -slow-msg-ms 200 \
  -fail-pct 1 -poison-pct 0.1 -retry-limit 5 \
  -tx-pct 5 -stage2-consumers 80 \
  -payload-min 256 -payload-max 2048 -lease-time 30 \
  -completed-retention 120 -pending-retention 3600 \
  -report 30 -duration 3600

Companion benchmark