Queen MQ

← All benchmarks

Matrix · Queen 0.16 (pushser) vs 0.14 · June 2026

Same suite, new engine: faster on 10 of 11 cells.

We re-ran the April benchmark matrix on Queen 0.16 (the pushser architecture) on the same 32 vCPU / 62 GiB host with the same PostgreSQL tuning and the same autocannon harness, so the only thing that changed is the broker. Baseline is Queen 0.14.0.alpha.3 from the April 2026 run. Full numbers and raw data: benchmark-queen/2026-06-07.

Headline

Cells faster vs 0.14 10 / 11 by 1.3× - 3.0×
Peak push 171.6k msg/s bp-100 · was 104.4k
Partition scaling flat to 10k ~33k msg/s · 0.14 decayed
Fan-out (10 cg) 2.94× vs 0.14 on the same cell
Same hardware, same Postgres, same client, only the broker changed. The pushser engine is faster on every cell but one, and its biggest gains are exactly where it matters: batched producers, high partition counts, and consumer-group fan-out.

Push throughput vs 0.14

Push throughput per cell (producer autocannon aggregate × messages-per-push), 300 s per cell, fresh database each time. factor = 0.16 ÷ 0.14.

CellSetup0.14 msg/s0.16 msg/s0.16 p50Factor
bp-150 conns, batch 15.8k17.5k2 ms3.02×
bp-1050 conns, batch 1039.1k85.1k4 ms2.18×
bp-10050 conns, batch 100104.4k171.6k18 ms1.64×
q-11 queue, batch 1039.1k84.7k4 ms2.17×
q-1010 queues, batch 1040.5k86.5k4 ms2.13×
bp-10-cg55 consumer groups26.9k68.0k5 ms2.53×
bp-10-cg1010 consumer groups17.9k52.7k7 ms2.94×
hi-part-10500 conns, 10 part27.9k24.7k23 ms0.89×
hi-part-100500 conns, 100 part26.0k33.8k13 ms1.30×
hi-part-1000500 conns, 1k part25.0k33.4k13 ms1.33×
hi-part-10000500 conns, 10k part21.0k32.5k13 ms1.55×
Biggest wins are the batched, low-concurrency cells (bp/q/cg: 2-3×), where the pushser push pipeline does the most work per round-trip. The one regression is hi-part-10 (−11%): few partitions under high connection count is the new engine's single weak spot.

Partition scaling: flat where 0.14 declined

Same producer load (500 connections, 1 msg/push), increasing the partition count from 10 to 10,000. In Queen, partitions are logical lanes, so the cost is index size, not state machine size, and the new engine flattens the curve.

Queen 0.16, holds flat

10 part24.7k
10033.8k
1 00033.4k
10 00032.5k

Queen 0.14, declines

10 part27.9k
10026.0k
1 00025.0k
10 00021.0k
This is an architectural win. 0.14 falls 27.9k → 21.0k msg/s as partitions grow 10 → 10,000; 0.16 stays ~32-34k flat. The advantage widens with scale, 1.55× at 10,000 partitions, which is exactly the "one partition per entity" regime Queen is built for.

Consumer-group fan-out

Same producer, increasing the number of consumer groups draining the queue:

Groups0.14 push msg/s0.16 push msg/sFactor
5 groups26.9k68.0k2.53×
10 groups17.9k52.7k2.94×
Fan-out gets relatively better as you add groups (2.53× → 2.94×): the pushser batching amortizes the extra per-group pop work more effectively than 0.14 did.

Resource efficiency

CPU as cores (of 32). The broker stays tiny; Postgres is where the work, and the ceiling, lives.

Cellmsg/sQueen CPUQueen RSSPG CPUPG RSS
bp-1085.1k2.046 MB8.713.5 GB
bp-100171.6k0.973 MB11.225.0 GB
bp-10-cg1052.7k2.3101 MB15.09.7 GB
hi-part-1000032.5k5.453 MB8.56.1 GB
The broker is lean everywhere, 32-106 MB RSS and ≤6 cores across the whole matrix. The headline: bp-100 pushes 171.6k msg/s on ~0.9 broker cores, big batches mean few HTTP requests, so almost no broker CPU; PG does the inserting.

The config lesson: match push-hold to your concurrency

The pushser push path fuses requests, holding briefly to batch them. At low client concurrency (this harness's 50 connections) a non-zero hold serializes the push lane; setting QUEEN_PUSH_MAX_HOLD_MS=0 pipelines it. The same knob, tuned to the workload, is the difference between 26k and 92k on bp-10:

Configpush msg/sp50p99
0.14 baseline39.1k11 ms38 ms
0.16, default hold26.0k18 ms48 ms
0.16, hold = 2 ms34.2k13 ms39 ms
0.16, hold = 092.5k4 ms31 ms
Tune the hold to the workload. Low-concurrency / batched clients want hold=0 (pipelines the push path). High-concurrency throughput, like the 24-hour soak with 650 producers, wants a larger hold so fusion packs many messages per Postgres commit (it sustained ~118k msg/s that way). All matrix numbers above use hold=0, the right setting for this harness.

Caveats

Companion benchmark

0.16 · 24-hour durability soak

10.4 billion messages, ~119k msg/s sustained, flat broker memory, zero loss.

Baseline

April 2026 archive (0.14)

The original 18-test suite, plus the head-to-head vs Kafka and RabbitMQ.

Raw data

benchmark-queen/2026-06-07

Per-cell logs, status/analytics JSON, docker stats, and the patched runner.