Same suite, new engine: faster on 10 of 11 cells.
We re-ran the April benchmark matrix on Queen 0.16 (the pushser
architecture) on the same 32 vCPU / 62 GiB host with the same
PostgreSQL tuning and the same autocannon harness, so the only thing that changed is
the broker. Baseline is Queen 0.14.0.alpha.3 from the
April 2026 run. Full numbers and raw data:
benchmark-queen/2026-06-07.
Headline
Push throughput vs 0.14
Push throughput per cell (producer autocannon aggregate × messages-per-push), 300 s per
cell, fresh database each time. factor = 0.16 ÷ 0.14.
| Cell | Setup | 0.14 msg/s | 0.16 msg/s | 0.16 p50 | Factor |
|---|---|---|---|---|---|
bp-1 | 50 conns, batch 1 | 5.8k | 17.5k | 2 ms | 3.02× |
bp-10 | 50 conns, batch 10 | 39.1k | 85.1k | 4 ms | 2.18× |
bp-100 | 50 conns, batch 100 | 104.4k | 171.6k | 18 ms | 1.64× |
q-1 | 1 queue, batch 10 | 39.1k | 84.7k | 4 ms | 2.17× |
q-10 | 10 queues, batch 10 | 40.5k | 86.5k | 4 ms | 2.13× |
bp-10-cg5 | 5 consumer groups | 26.9k | 68.0k | 5 ms | 2.53× |
bp-10-cg10 | 10 consumer groups | 17.9k | 52.7k | 7 ms | 2.94× |
hi-part-10 | 500 conns, 10 part | 27.9k | 24.7k | 23 ms | 0.89× |
hi-part-100 | 500 conns, 100 part | 26.0k | 33.8k | 13 ms | 1.30× |
hi-part-1000 | 500 conns, 1k part | 25.0k | 33.4k | 13 ms | 1.33× |
hi-part-10000 | 500 conns, 10k part | 21.0k | 32.5k | 13 ms | 1.55× |
hi-part-10 (−11%): few partitions under high connection count is the new
engine's single weak spot.
Partition scaling: flat where 0.14 declined
Same producer load (500 connections, 1 msg/push), increasing the partition count from 10 to 10,000. In Queen, partitions are logical lanes, so the cost is index size, not state machine size, and the new engine flattens the curve.
Queen 0.16, holds flat
Queen 0.14, declines
Consumer-group fan-out
Same producer, increasing the number of consumer groups draining the queue:
| Groups | 0.14 push msg/s | 0.16 push msg/s | Factor |
|---|---|---|---|
| 5 groups | 26.9k | 68.0k | 2.53× |
| 10 groups | 17.9k | 52.7k | 2.94× |
Resource efficiency
CPU as cores (of 32). The broker stays tiny; Postgres is where the work, and the ceiling, lives.
| Cell | msg/s | Queen CPU | Queen RSS | PG CPU | PG RSS |
|---|---|---|---|---|---|
bp-10 | 85.1k | 2.0 | 46 MB | 8.7 | 13.5 GB |
bp-100 | 171.6k | 0.9 | 73 MB | 11.2 | 25.0 GB |
bp-10-cg10 | 52.7k | 2.3 | 101 MB | 15.0 | 9.7 GB |
hi-part-10000 | 32.5k | 5.4 | 53 MB | 8.5 | 6.1 GB |
bp-100 pushes 171.6k msg/s on ~0.9
broker cores, big batches mean few HTTP requests, so almost no broker CPU; PG
does the inserting.
The config lesson: match push-hold to your concurrency
The pushser push path fuses requests, holding briefly to batch them. At low client
concurrency (this harness's 50 connections) a non-zero hold serializes the push
lane; setting QUEEN_PUSH_MAX_HOLD_MS=0 pipelines it. The same knob, tuned to
the workload, is the difference between 26k and 92k on bp-10:
| Config | push msg/s | p50 | p99 |
|---|---|---|---|
| 0.14 baseline | 39.1k | 11 ms | 38 ms |
| 0.16, default hold | 26.0k | 18 ms | 48 ms |
| 0.16, hold = 2 ms | 34.2k | 13 ms | 39 ms |
| 0.16, hold = 0 | 92.5k | 4 ms | 31 ms |
hold=0 (pipelines the push path). High-concurrency throughput, like the
24-hour soak with 650 producers, wants a larger
hold so fusion packs many messages per Postgres commit (it sustained ~118k msg/s that
way). All matrix numbers above use hold=0, the right setting for this harness.
Caveats
- This is the low-concurrency autocannon harness (50-500 connections). It measures per-cell push efficiency, not the engine's sustained ceiling, for that see the 24-hour soak (~118k msg/s sustained, balanced).
- The metric is push throughput (producer side); pop throughput is recorded per cell in the raw consumer logs.
q-100is omitted (no April baseline + the producer log didn't record cleanly). A handful of April-only cells weren't re-run.- Errors were negligible across all cells (
non2xx≤ 2 per run, ~3×10⁻⁵%).
0.16 · 24-hour durability soak
10.4 billion messages, ~119k msg/s sustained, flat broker memory, zero loss.
April 2026 archive (0.14)
The original 18-test suite, plus the head-to-head vs Kafka and RabbitMQ.
benchmark-queen/2026-06-07
Per-cell logs, status/analytics JSON, docker stats, and the patched runner.
