Sizing

How much Postgres do I need?

Queen's broker container is small (~70 MB RSS, 3.5 vCPU under sustained pipeline load). The Postgres backing it is what you actually need to size for. The numbers on this page are derived directly from the benchmark suite: 6 hours of sustained production load, three multi-stage pipeline runs, and the bp-* push/pop sweep. No vendor folklore, no synthetic numbers.

The headline rule

PostgreSQL delivers roughly 8 000 to 14 000 row-touch operations per second per vCPU under Queen's workload, regardless of whether those rows are pushes, pops, or acks. Where your messages-per-second number lands depends on two things: how many PG operations one logical message turns into (1 for push-only, 3 for a simple queue, up to 7 for a multi-stage fan-out pipeline), and how well your batches fill (which is mostly a function of partition depth and producer batching).

The Queen container itself stays out of your way: across every measured benchmark the queen process used 30 to 50 percent of the Postgres vCPU usage. If you size for Postgres, queen will sit comfortably below it.

Calculator

Pick a target message rate and your workload shape. The result is the Postgres vCPU budget you need to reserve. Numbers update as you type.

Target end-to-end msg/s

Workload shape

Partition depth (drives batch fill)

Safety headroom

Pick your numbers above. The estimate updates live.

? PG vCPU

The estimate uses the headline rule plus your operation-per-message multiplier and the chosen safety headroom. The result is rounded up to the nearest whole vCPU. The colour reflects how close you are to the single-instance ceiling.

Operations per message

A "message" in your application maps to several PG row touches, depending on the journey. This is the conversion factor between "msg/s" and "PG ops/s":

Workload shape	PG ops per message	Reasoning
Push-only firehose	`1`	One push call, one row inserted. No consumer reading.
Simple queue: push + pop + ack	`3`	Producer pushes, worker pops, worker acks. One consumer group.
Fan-out × 2: push + 2 consumer groups	`5`	One push, then each of the 2 groups does a pop and an ack. Each consumer group's cursor is independent.
Multi-stage pipeline: producer , worker , fan-out × 2	`7`	q1 push, q1 pop, q1 ack, q2 push, q2 pop × 2, q2 ack × 2. Each stage costs PG operations on its own queue.

If your workload doesn't match any of these exactly, count: 1 for the push, plus (1 pop + 1 ack) for every consumer group that reads each queue. A worker that pushes to a downstream queue contributes another push to the count for that queue.

Measured throughput per Postgres vCPU

All numbers below are during-run averages from the run artifacts (not post-run docker-stats snapshots, which show the idle state after the test). Source for each row is linked.

Benchmark	Push msg/s	Pop msg/s	PG vCPU avg	msg/s per PG vCPU	PG ops/s per vCPU
Long-running (6h sustained)	29 081	28 283	6.6	8 692 in+out	~8 700
`bp-10` (push batch=10, pop batch=100)	39 060	38 351	~6.0	12 902 in+out	~13 000
`bp-100` (push batch=100, pop batch=100)	104 400	101 675	~15	13 738 in+out	~14 000
Pipeline ordered (1 000 partitions, batch=100)	3 688 e2e (× 7 ops)		15	246 e2e	~1 475
Pipeline throughput-tuned (10 partitions, batch=1 000)	6 673 e2e (× 7 ops)		4.7	1 420 e2e	~8 519

The PG ops/s per vCPU number is remarkably stable across workloads: ~8 500 to ~14 000. What varies wildly is "msg/s per vCPU", because the same message can cost 1 op (push-only) or 7 ops (full pipeline). Once you convert your workload into operations, the sizing math becomes simple.

When to bail out of single-instance Postgres

PG vCPU range	Status	What to expect
1 to 4	Comfortable	The default Postgres config will work. `shared_buffers = 25% RAM` and you're done.
5 to 16	Standard	Tune `autovacuum` aggressively, `shared_buffers = 25 to 40%`, and check WAL throughput at peak. Single PG instance handles this comfortably.
17 to 32	Tunable	Approaching the single-instance ceiling. You'll need to actively tune: bigger `maintenance_work_mem`, more aggressive vacuum thresholds, possibly partitioned `messages` table by month. Watch HOT-update efficiency and WAL bytes/sec.
33 and up	Past the sweet spot	You are out of Queen's design envelope. Options: shard your queues across multiple Queen instances, move analytics-style workloads to a read-replica, or consider Kafka if your throughput requirements are durable. Don't fight a single Postgres past this.

The single-instance ceiling is closer to 32 vCPU than to "infinity". Past that point, contention on the messages table indexes and the partition_lookup working set starts to dominate. Queen's adaptive concurrency control (TCP Vegas) holds up gracefully under contention but it can't manufacture PG capacity that doesn't exist. If your sizing math points above 32 vCPU on a single PG, the right answer is architectural, not "make PG bigger".

Caveats and things this calculator is not

This is steady-state throughput, not peak burst. If your traffic spikes 5× over baseline for 10 minutes, you need to size for the peak (use the 50% or 2× headroom option in the calculator).
The "ops/s per vCPU" number assumes good Postgres tuning. The reference numbers come from a host with shared_buffers=24 GB, effective_cache_size=48 GB, autovacuum_naptime=10s, autovacuum_vacuum_scale_factor=0.05. With default Postgres settings you'll get half of this. The HOW-TO-RUN.md has the full tuning we use.
Disk I/O can become the bottleneck before CPU. At synchronous_commit=on (the durability tier we recommend), every push waits for fsync. On slow SSDs (less than ~30 k IOPS sustained), the WAL becomes the bottleneck before vCPU does. NVMe is the safe default.
Message size matters less than you'd expect. All these numbers are at ~28 byte payloads. Jumping to 1 KB payloads adds maybe 10 to 20 percent CPU; jumping to 100 KB adds significant TOAST and disk cost and you should re-benchmark. Don't extrapolate above ~10 KB without measuring.
The "consumer-group fan-out" multiplier compounds with batch quality. If your fan-out groups have different consumption rates or different partition assignments, the slow group can cause q2 backlog and indirectly impact PG efficiency on the q1 side. The pipeline benchmarks show clean fan-out at 2× because both downstream groups drain at the same rate; mixed-rate fan-out is more nuanced.

Bottom line: if you can run a Postgres at 4 to 16 vCPU comfortably, you can run Queen serving 5k to 30k msg/s of useful pipeline work. That covers the throughput band of a substantial majority of business workloads. Above that, you're shipping into the territory where the architectural conversation matters more than the raw performance number.

Back to benchmarks · Architecture · Raw benchmark data on GitHub