How much Postgres do I need?
Queen's broker container is small (~70 MB RSS, 3.5 vCPU under sustained pipeline load). The Postgres backing it is what you actually need to size for. The numbers on this page are derived directly from the benchmark suite: 6 hours of sustained production load, three multi-stage pipeline runs, and the bp-* push/pop sweep. No vendor folklore, no synthetic numbers.
The headline rule
The Queen container itself stays out of your way: across every measured benchmark the queen process used 30 to 50 percent of the Postgres vCPU usage. If you size for Postgres, queen will sit comfortably below it.
Calculator
Pick a target message rate and your workload shape. The result is the Postgres vCPU budget you need to reserve. Numbers update as you type.
The estimate uses the headline rule plus your operation-per-message multiplier and the chosen safety headroom. The result is rounded up to the nearest whole vCPU. The colour reflects how close you are to the single-instance ceiling.
Operations per message
A "message" in your application maps to several PG row touches, depending on the journey. This is the conversion factor between "msg/s" and "PG ops/s":
| Workload shape | PG ops per message | Reasoning |
|---|---|---|
| Push-only firehose | 1 |
One push call, one row inserted. No consumer reading. |
| Simple queue: push + pop + ack | 3 |
Producer pushes, worker pops, worker acks. One consumer group. |
| Fan-out × 2: push + 2 consumer groups | 5 |
One push, then each of the 2 groups does a pop and an ack. Each consumer group's cursor is independent. |
| Multi-stage pipeline: producer , worker , fan-out × 2 | 7 |
q1 push, q1 pop, q1 ack, q2 push, q2 pop × 2, q2 ack × 2. Each stage costs PG operations on its own queue. |
If your workload doesn't match any of these exactly, count: 1 for the push, plus (1 pop + 1 ack) for every consumer group that reads each queue. A worker that pushes to a downstream queue contributes another push to the count for that queue.
Measured throughput per Postgres vCPU
All numbers below are during-run averages from the run artifacts (not post-run docker-stats snapshots, which show the idle state after the test). Source for each row is linked.
| Benchmark | Push msg/s | Pop msg/s | PG vCPU avg | msg/s per PG vCPU | PG ops/s per vCPU |
|---|---|---|---|---|---|
| Long-running (6h sustained) | 29 081 | 28 283 | 6.6 | 8 692 in+out | ~8 700 |
bp-10 (push batch=10, pop batch=100) |
39 060 | 38 351 | ~6.0 | 12 902 in+out | ~13 000 |
bp-100 (push batch=100, pop batch=100) |
104 400 | 101 675 | ~15 | 13 738 in+out | ~14 000 |
| Pipeline ordered (1 000 partitions, batch=100) | 3 688 e2e (× 7 ops) | 15 | 246 e2e | ~1 475 | |
| Pipeline throughput-tuned (10 partitions, batch=1 000) | 6 673 e2e (× 7 ops) | 4.7 | 1 420 e2e | ~8 519 | |
When to bail out of single-instance Postgres
| PG vCPU range | Status | What to expect |
|---|---|---|
| 1 to 4 | Comfortable | The default Postgres config will work. shared_buffers = 25% RAM and you're done. |
| 5 to 16 | Standard | Tune autovacuum aggressively, shared_buffers = 25 to 40%, and check WAL throughput at peak. Single PG instance handles this comfortably. |
| 17 to 32 | Tunable | Approaching the single-instance ceiling. You'll need to actively tune: bigger maintenance_work_mem, more aggressive vacuum thresholds, possibly partitioned messages table by month. Watch HOT-update efficiency and WAL bytes/sec. |
| 33 and up | Past the sweet spot | You are out of Queen's design envelope. Options: shard your queues across multiple Queen instances, move analytics-style workloads to a read-replica, or consider Kafka if your throughput requirements are durable. Don't fight a single Postgres past this. |
messages table indexes and the
partition_lookup working set starts to dominate. Queen's adaptive
concurrency control (TCP Vegas) holds up gracefully under contention but it
can't manufacture PG capacity that doesn't exist. If your sizing math points
above 32 vCPU on a single PG, the right answer is architectural, not "make PG
bigger".
Caveats and things this calculator is not
- This is steady-state throughput, not peak burst. If your traffic spikes 5× over baseline for 10 minutes, you need to size for the peak (use the 50% or 2× headroom option in the calculator).
-
The "ops/s per vCPU" number assumes good Postgres tuning.
The reference numbers come from a host with
shared_buffers=24 GB,effective_cache_size=48 GB,autovacuum_naptime=10s,autovacuum_vacuum_scale_factor=0.05. With default Postgres settings you'll get half of this. TheHOW-TO-RUN.mdhas the full tuning we use. -
Disk I/O can become the bottleneck before CPU. At
synchronous_commit=on(the durability tier we recommend), every push waits for fsync. On slow SSDs (less than ~30 k IOPS sustained), the WAL becomes the bottleneck before vCPU does. NVMe is the safe default. - Message size matters less than you'd expect. All these numbers are at ~28 byte payloads. Jumping to 1 KB payloads adds maybe 10 to 20 percent CPU; jumping to 100 KB adds significant TOAST and disk cost and you should re-benchmark. Don't extrapolate above ~10 KB without measuring.
- The "consumer-group fan-out" multiplier compounds with batch quality. If your fan-out groups have different consumption rates or different partition assignments, the slow group can cause q2 backlog and indirectly impact PG efficiency on the q1 side. The pipeline benchmarks show clean fan-out at 2× because both downstream groups drain at the same rate; mixed-rate fan-out is more nuanced.
Back to benchmarks · Architecture · Raw benchmark data on GitHub
