Benchmarks

18 tests. 1.6 billion message events. Zero loss.

All numbers on this page come from benchmark sessions run on 2026-04-25 and 2026-04-26 on a fresh 32 vCPU / 62 GiB host with PostgreSQL upstream postgres:latest, Queen 0.14.0.alpha.3, Apache Kafka 3.7 (KRaft single-node), and RabbitMQ 3.12. Each test ran for 15 minutes against a fresh server state. The full raw data, per-minute time series, Postgres stats, system metrics, autocannon and perf-test output, lives in benchmark-queen/2026-04-26.

Headline

Push (batch=100) 104.4k msg/s single producer · 1×50 conns

Push p99 (batch=10) 38 ms at 39k msg/s sustained

Fan-out (10 cg) 165k msg/s total deliveries · 9.3× pop multiplier

Server RSS at peak 52 MB vs Kafka 3.1–7.2 GB

Partition density 10 001 21k msg/s sustained · zero degradation

DB pool active 2.5 / 50 Vegas finds the right number itself

PG cache hit rate 99.99 % across the entire suite

Lost messages 0 across 1.6 billion events

Test environment

Host	32 vCPU · 62 GiB RAM · no swap · Ubuntu 24.04 (kernel 6.8)
PostgreSQL	`postgres:latest` · `shared_buffers=24 GB` · `effective_cache_size=48 GB` · `autovacuum_naptime=10s` · `autovacuum_vacuum_scale_factor=0.05`
Queen	`0.14.0.alpha.3` · `NUM_WORKERS=10` · `DB_POOL_SIZE=50` · `SIDECAR_POOL_SIZE=250` · `nofile=65535`
Cleanup	`docker rm -v` + `docker volume prune -f` before each test. Every test starts with an empty database.
Duration	15 min (900 s) per test
Client	autocannon v8 on Node 22, same host
Payload	~28 bytes per message (matched across all systems compared)

Throughput & latency together

Throughput numbers without latency are half the story. Here are the four push configurations side by side, with both the producer and consumer latency profiles:

Test	Setup	Push msg/s	p50 / p99 push	p50 / p99 pop	Queen CPU
`bp-1`	1×50, batch=1	5 796	8 / 13 ms	11 / 355 ms	5.2 vCPU
`bp-10`	1×50, batch=10	39 060	11 / 38 ms	16 / 356 ms	7.4 vCPU
`bp-100`	1×50, batch=100	104 400	45 / 131 ms	35 / 254 ms	19 vCPU
`hi-part-1`	5×100 conns, batch=1	29 487	15 / 44 ms	59 / 4 199 ms	27 vCPU

The latency story is the surprising one. Going from bp-1 to bp-10, push throughput jumps 6.7× while p99 only goes from 13 ms to 38 ms, a 2.2× efficiency gain per request. At 100k msg/s peak, p99 is still 131 ms. Compare to Kafka under sustained unbounded load: 1.5M msg/s but p99 of 2 966 ms, Queen at 100k msg/s has ~22× lower tail latency than Kafka at peak.

Partition scaling, the Queen claim, validated

Same producer concurrency (5×100 connections), increasing partition count from 2 to 10 001. If partitions were physical commit-logs, this would degrade catastrophically. In Queen they're logical lanes, so the cost is index size, not state machine size.

2 partitions 29 487 msg/s

11 27 902 msg/s

101 26 041 msg/s

1 001 25 016 msg/s

10 001 21 044 msg/s

Partition count goes up by 5 000×; throughput drops by 29%. Zero errors at every scale. This is the curve that lets Queen offer "one partition per chat conversation" in production without operational drama.

Time-series stability, what 15 minutes of `bp-10` looks like

Throughput averages don't tell you whether a system is stable or oscillating. Below is the per-minute pop rate from bp-10's own queue-ops.json output, a rendering of the actual time series.

bp-10 · pop msg/s · 15 min run · per-minute aggregates

min 1 min 5 min 10 min 15 (winding down)

Per-minute throughput bounces between 35k and 46k msg/s for 14 minutes, then ramps down as the producer hits its target. No drift, no degradation, no warm-up artifact. The same is true of every sustained test in the suite.

Consumer-group fan-out

Groups	Push msg/s	Total deliveries	Pop / push	Per-group p99	Pending at end
1	39 060	38 351 msg/s	1.0×	356 ms	1.5%
5	26 890	127 777 msg/s	4.9×	315 ms	0
10	17 890	165 480 msg/s	9.3×	471 ms	0

Fairness across groups holds at any scale tested. At 10 consumer groups, all 10 deliver 360 ± 1 req/s with p99 within 5 ms of each other. The dispatch is fair, no group gets favored, none gets starved. Per-group CPU cost is sub-linear: Queen needs 7.4 vCPU for 1 group, but only 18 vCPU for 10 groups , about 25% the per-group cost of running them alone, thanks to libqueen's per-JobType batching amortizing across requests.

The cost. Adding consumer groups slows the producer 31% at 5 groups, 54% at 10. They're competing for Queen worker slots and PG resources, by design. The system isn't oversubscribed. If you need higher push throughput while consuming, add hardware.

Multi-tenancy is essentially free

Same total client load, distributed across one queue vs ten queues:

Test	Setup	Push msg/s	Pop msg/s	Δ
`q-1`	1 queue, 1×50, batch=10, MP=1000	39 130	~38 000	baseline
`q-10`	10 queues, same total load	40 500	38 540	+3.5% / +1.4%

Routing to ten different queues instead of one was slightly faster (less contention on the single partition's advisory lock), well within noise. Multi-tenant deployments don't pay a partitioning tax.

Realistic pipeline, producer → worker → q2 fan-out

The throughput numbers above are pure broker-side push/pop micro-benchmarks. Most real workloads look different: multiple stages, real client SDK with batching and long-poll, work simulation between stages, multiple downstream consumer groups. This benchmark exercises that shape end-to-end with the official queen-mq JS client and pm2:

[producer ×2] ──push──▶  pipe-q1  ──pop──▶  [worker ×7] ──push──▶  pipe-q2 ─┬─pop──▶ [analytics ×7]
                                                                              │
                                                                              └─pop──▶ [log ×7]

Producer pushes single messages per HTTP call. Each worker pops a batch from q1, simulates 5–20 ms of long-tail per-message work in parallel via Promise.all, forwards the batch to q2 preserving per-partition ordering, then acks q1 (at-least-once: separate push then ack, no transactional commit). Both analytics and log consumer groups drain q2 with the same shape, fan-out, simulated work, batch ack. The whole thing runs on a 16 vCPU / 64 GB DigitalOcean VM with PG synchronous_commit=on.

Same engine, same client, same pipeline shape, two configurations of the partition knob:

High-cardinality config

1 000 partitions · per-entity ordering

1 000 partitions, batch = 100, partitions/pop = 10
Producer = worker drain = 3 688 msg/s
End-to-end p50 / p99 = 359 ms / 1 024 ms
Per-entity FIFO ordering preserved (Kafka-like)
For: chat rooms, per-tenant streams, per-user state

Throughput-tuned config

10 partitions · weak ordering, max throughput

10 partitions, batch = 1 000, partitions/pop = 1
Producer = worker drain = 6 673 msg/s (+81 %)
End-to-end p50 / p99 = 755 ms / 1 103 ms
Per-shard FIFO (RabbitMQ-like, but ordering still per-lane)
For: task queues, work distribution, log shipping

Best end-to-end throughput 6 673 msg/s producer = worker drain · q1 stays drained

End-to-end p99 (both configs) ~1 100 ms producer push → analytics ack

Delivery completeness 99.9 % ~few-thousand in-flight at cutoff

Duplicate processing 0 at-least-once held end-to-end · both configs

Stage / Config	1 000-part p50 / p99 / max	10-part p50 / p99 / max
End-to-end (producer → analytics)	359 / 1 024 / 15 464 ms	755 / 1 103 / 1 747 ms
q1 → worker	213 / 705 / 15 387 ms	440 / 745 / 1 010 ms
q2 → analytics	114 / 514 / 3 198 ms	312 / 601 / 999 ms

Partition count is a continuous knob, not a binary choice. Slide it toward many partitions for per-entity ordering at lower throughput; slide it toward few partitions for higher throughput with weak (per-shard) ordering. Both modes share the same SDK, same C++ engine, same Postgres, same durability tier. 0 duplicates and 0 lost messages in both runs. The 10-partition config also gives 9× tighter max latency (1.75 s vs 15.5 s) because every lane is always being drained.

Resource (steady state)	1 000-part config	10-part config
queen container CPU	~390 % (3.9 vCPU)	~340 % (3.4 vCPU)
queen container RSS	~70 MB	~175 MB (bigger in-flight batches)
postgres CPU	~1 500 % (15 vCPU)	~470 % (4.7 vCPU, 3.2× cheaper)
postgres RSS at end	16.7 GB	12.3 GB

In the throughput-tuned config Postgres CPU is 3.2× lower at nearly double the throughput, bigger batches amortise the per-call SQL overhead so much that the per-message PG cost collapses. The system is then producer-bound, not PG-bound: adding more producer processes would push the rate well into the 30–50 k/s range without changing anything broker-side. Full writeups with reproduction recipes: pipeline-queen.md (high-cardinality) and pipeline-queen-throughput.md (throughput-tuned).

Queen vs Kafka vs RabbitMQ, same hardware, same payload

Single-node, 1 000 partitions/queues, persistent durability, 15-min run, ~28-byte message payload, same 32 vCPU / 62 GiB host. All three measured directly: Kafka with kafka-producer-perf-test.sh, RabbitMQ with the official pivotalrabbitmq/perf-test. Each system uses its own native binary protocol (HTTP/JSON for Queen, Kafka wire protocol, AMQP for RabbitMQ).

Test methodology, important asymmetries you should know about

Each system was tested in its idiomatic high-throughput client configuration, not with literally identical client settings. The protocols differ enough that "same number of connections" doesn't mean the same thing in each system. Specifically:

System	Client config	Effective concurrency
Queen	autocannon, 50 HTTP connections, batch=10, `pipelining=1`	~50 in-flight HTTP requests
Kafka	1 producer process, `linger.ms=10`, `batch.size=16384`, `max.in.flight=5`	~5 in-flight batched requests on 1 TCP connection
RabbitMQ	1 producer process, `--confirm 200`	~200 unconfirmed publishes on 1 AMQP connection

Durability tiers are also not identical:

Queen: synchronous_commit=on, fsync of WAL before HTTP 201. Strictest.
Kafka: acks=1, broker writes to OS page cache, no fsync. If broker crashes before OS flushes (~1 s), messages can be lost. Single broker means no replication backup either.
RabbitMQ: delivery_mode=2 + --confirm 200, written to queue index on disk before confirm, but flushes are batched at the index level.

What this means for the comparison: Re-running with matched client concurrency (50 producers each) would push Kafka above 2 M msg/s and RabbitMQ to probably ~80-120 k msg/s. Tightening Kafka's durability to fsync-per-message would drop it to maybe 100-300 k msg/s. The numbers shown are each system at its idiomatic high-throughput config, with its idiomatic durability tier. That's a defensible-but-not-exhaustive choice. Queen's memory and architectural advantages hold regardless; the throughput numbers are the most sensitive to client setup.

Queen MQ `bp-10`

39k msg/s Push p9938 ms Server RSS52 MB CPU7.4 vCPU Disk per msg~400 B Per-key ordernative Replaytimestamp / offset Ops surface1 binary + PG

Kafka 3.7 `kafka-1000p`

1.52M msg/s Push p992 966 ms Server heap3.1–7.2 GB CPU3.5 vCPU Disk per msg~36 B Per-key ordernative Replaytimestamp / offset Ops surfacebroker + KRaft

RabbitMQ 3.12 `rabbitmq-1000q`

34.7k msg/s Confirm p999.3 ms Server RSS188 MB CPU1.5 vCPU Disk written~9 GB Per-key order1 queue per key Replaystreams only Ops surfacebroker + Erlang

Honest summary. Kafka does 39× more throughput at 1.5M msg/s, but at a weaker durability tier (acks=1, no fsync) and 78× higher saturation latency and ~80× more memory. RabbitMQ ties Queen on throughput (35k vs 39k msg/s) and decisively wins on latency (9 ms vs 38 ms p99 confirm) and CPU (1.5 vs 7.4 vCPU), AMQP binary + Mnesia is cheaper per message than HTTP/JSON + Postgres INSERTs. Queen wins on memory (52 MB vs 188 MB, ~3.6× lighter), the strictest default durability, and on architectural features no benchmark can show: per-key ordering with parallel consumers, replay-from-timestamp, transactional integration with PG, dynamic high-cardinality partitions. None of the three dominates on raw numbers at this tier, pick based on the architectural fit and the operational story.

Detailed cross-system reports: vs-kafka.md · vs-rabbitmq.md

What broker benchmarks miss, your workers are usually the bottleneck

Every benchmark on this page measures broker capacity. But most production workloads aren't broker-bound. If your messages do real work, a database write, an API call, an LLM inference, a webhook delivery, your consumer fleet becomes the bottleneck long before the broker does. That changes which system matters, and by how much.

Here's the math at 20 ms of work per message (representative of a typical DB write or moderate API call). One worker processes 50 msg/s. Useful throughput equals workers × 50 msg/s:

Target msg/s	Workers needed	Cost-of-fleet order of magnitude*
5 000 msg/s	100	~$500 / month
10 000 msg/s	200	~$1 k / month
39 000 msg/s (Queen `bp-10` ceiling)	780	~$5 k / month
104 000 msg/s (Queen peak, batch=100)	2 080	~$10 k / month
500 000 msg/s	10 000	~$50 k / month
1 500 000 msg/s (Kafka measured)	30 000	~$150 k / month

*Rough estimate at small-EC2 / small-container per-worker pricing. Real cost depends on your stack and what each worker does.

Kafka's 1.5 M msg/s is mostly unreachable headroom for real workloads. Saturating it requires ~30 000 worker processes, a fleet most companies will never have. Real-world business workloads typically run on 100–2 000 workers, which means ~5 k–100 k msg/s of actual demand. That's well within Queen's envelope, with Kafka's broker idle at ~5 % and Queen's at 25–80 %.

Said differently: at a real workload of 5 k msg/s, all three systems sit at low utilization on the broker side:

System	Broker ceiling	Utilization at 5 k msg/s with 100 workers
Queen `bp-10`	39 k msg/s	13 %
RabbitMQ classic 1000q	35 k msg/s	14 %
Kafka 3.7 single-node	1.5 M msg/s	0.3 %

When the broker isn't the bottleneck, what actually matters is:

Latency at low load, Queen ~5–10 ms p99 (unsaturated), Kafka ~5–10 ms, RabbitMQ ~3–5 ms. Roughly tied. The “Kafka 39× faster than Queen” comparison was for saturated brokers; nobody runs production workloads saturated.
Operational cost, Queen is 1 binary + your existing PG. Kafka is broker cluster + KRaft + topic admin + monitoring stack. The annual operational difference dwarfs the broker license/hardware cost.
Durability semantics, Queen's synchronous_commit=on default is the strictest of the three. Kafka with acks=1 isn't equivalent; matched durability (acks=all + flush.messages=1) cuts Kafka throughput by 5–10×.
Integration with your stack, Queen lets you do BEGIN; INSERT order; queen.push(...); COMMIT; in one PG transaction. The transactional-outbox pattern (a real cost in Kafka deployments) goes away.

The broker comparison only matters when work per message is sub-millisecond: true streaming pipelines (Kafka Streams, Flink, ksqlDB), log shipping, click-stream analytics. For those, Kafka is the right tool and nothing else competes. For everything else, order processing, notifications, webhooks, ML inference jobs, chat handling, workflow steps, the broker sits idle, the worker fleet sets your cost, and the differentiator is what's easy to operate and integrate.

The honest framing. Queen's 39 k–104 k msg/s broker is “ceiling you can actually use” for most production workloads. Kafka's 1.5 M is mostly headroom you'll never reach. RabbitMQ is in Queen's territory but with a different feature set. If your messages do real work, pick the system that's easiest to operate and integrate, not the one with the biggest headline number.

Resource efficiency

The most consistent signal across all 18 tests: Queen is small.

Metric	bp-1	bp-10	bp-100 (peak)	bp-10-cg10 (10 groups)
Queen RSS max	30 MB	52 MB	72 MB	169 MB
Queen CPU avg	5.2 vCPU	7.4 vCPU	19 vCPU	18 vCPU
DB pool active avg	2.4	2.4	2.7	2.7
PG cache hit rate	100%	100%	99.99%	100%
`messages_consumed` table	,	84 MB	,	743 MB

The one operational caveat: the messages_consumed table grows fast under fan-out, 743 MB after 15 minutes at 10 consumer groups (~70 GB/day sustained). TTL retention is critical for any deployment running more than a few days. Configure completedRetentionSeconds on hot queues.

Adaptive concurrency in action

Queen ships with DB_POOL_SIZE=50 by default. Across the entire benchmark suite, including the 100k msg/s peak test, the libqueen Vegas controller kept the active connection count at ~2.5. The other 47 connections sit idle in the pool, available as overflow for transient spikes.

Configured 50

Active avg 2.5

This is exactly what the TCP-Vegas-style controller is supposed to do: when adding in-flight work doesn't reduce RTT, the controller knows the pipe isn't congested and stays put. You can't manually tune your way to a better number for this workload, the controller already found it. See the architecture page for the math.

0.14 vs 0.12, adaptive engine impact

Same five tests, run once on each version, fresh DB:

Test	0.14 push	0.12 push	0.14 pop	0.12 pop
`bp-10`	39 060	31 820	38 351	17 149
`bp-100`	104 400	64 400	101 675	61 279
`hi-part-1`	29 487	13 696	27 849	3 194
`hi-part-10000`	21 044	17 331	17 825	3 643
`q-10`	40 500	31 610	38 540	29 555

Pop throughput improved 80–90% under partition contention, the single biggest win from the libqueen rewrite. PG memory usage 30–70% lower for the same workload. PG deadlock mode under heavy fan-out eliminated.

Data integrity, the headline number

Short suite

~3B events Tests18 Duration15 min each ackFailed0 DLQ messages0

Long-running test

1.5B messages Duration14 hours Sustained rate~28k msg/s messages table35 GB at end Lost0

Combined

1.6B+ events Hardware crashes0 PG outages0 Data loss0 Failover replay100%

Bugs we found

Honest accounting. Four issues surfaced during the run; all four are cosmetic, none caused data loss. Listed here because a benchmark page that doesn't tell you what broke isn't useful.

Bug	When it fires	Severity	Fix
`StatsService.refresh_all_stats_v1` 30 s timeout	Sustained ≥30k msg/s with many partitions, or multi-cg load	Cosmetic, advisory lock prevents pile-up	One line: `SET LOCAL statement_timeout = 0;`
`evict_expired_waiting_messages` 30 s timeout	10 consumer-group load only	Cosmetic	Same one-line fix in retention procedure
PG `deadlock detected` during high-concurrency push	10 001 partitions on 0.12 (mostly fixed in 0.14)	None, file-buffer failover catches everything	0.14's v3 push procedure eliminates most cases
`queue-ops` reports `pushMessages: 0`	Always (in 0.14.0.alpha.3)	Cosmetic, reporting only	Bug in per-queue stats aggregator

Reproduce it yourself

All raw data, runner scripts, configs, and analysis live in the repo: benchmark-queen/2026-04-26. Each test directory has the per-consumer logs, queue-ops time-series JSON, postgres stats, system metrics, and final docker-stats output. The HOW-TO-RUN.md walks through reproducing the entire suite step by step (Docker images, PostgreSQL tuning, autocannon configuration).

Sizing

18 tests. 1.6 billion message events. Zero loss.

Headline

Test environment

Throughput & latency together

Partition scaling, the Queen claim, validated

Time-series stability, what 15 minutes of `bp-10` looks like

Consumer-group fan-out

Multi-tenancy is essentially free

Realistic pipeline, producer → worker → q2 fan-out

1 000 partitions · per-entity ordering

10 partitions · weak ordering, max throughput

Queen vs Kafka vs RabbitMQ, same hardware, same payload

Queen MQ `bp-10`

Kafka 3.7 `kafka-1000p`

RabbitMQ 3.12 `rabbitmq-1000q`

What broker benchmarks miss, your workers are usually the bottleneck

Resource efficiency

Adaptive concurrency in action

0.14 vs 0.12, adaptive engine impact

Data integrity, the headline number

Short suite

Long-running test

Combined

Bugs we found

Reproduce it yourself

How much Postgres do I need?

Full README

pipeline-queen.md

pipeline-queen-throughput.md

HOW-TO-RUN.md

vs-kafka.md

cg-axis-comparison.md

18 tests. 1.6 billion message events. Zero loss.

Headline

Test environment

Throughput & latency together

Partition scaling, the Queen claim, validated

Time-series stability, what 15 minutes of bp-10 looks like

Consumer-group fan-out

Multi-tenancy is essentially free

Realistic pipeline, producer → worker → q2 fan-out

1 000 partitions · per-entity ordering

10 partitions · weak ordering, max throughput

Queen vs Kafka vs RabbitMQ, same hardware, same payload

Queen MQ bp-10

Kafka 3.7 kafka-1000p

RabbitMQ 3.12 rabbitmq-1000q

What broker benchmarks miss, your workers are usually the bottleneck

Resource efficiency

Adaptive concurrency in action

0.14 vs 0.12, adaptive engine impact

Data integrity, the headline number

Short suite

Long-running test

Combined

Bugs we found

Reproduce it yourself

How much Postgres do I need?

Full README

pipeline-queen.md

pipeline-queen-throughput.md

HOW-TO-RUN.md

vs-kafka.md

cg-axis-comparison.md

Time-series stability, what 15 minutes of `bp-10` looks like

Queen MQ `bp-10`

Kafka 3.7 `kafka-1000p`

RabbitMQ 3.12 `rabbitmq-1000q`