Queen MQ
Visual tour 7 animated diagrams

How Queen works,
in seven moving pictures.

Read top-to-bottom for the full story: partitions and fan-out, the two side-by-side comparisons, atomic ack & push, replay from any timestamp, lease/retry/DLQ, and finally the disk-failover path.

01 · The pipeline

One queue, ordered lanes per agent session, fan-out per group.

Producers push tasks into agent-tasks. Each agent session is its own FIFO partition, created on first push, no preallocation. Consumer groups agent runner and tracer each consume the same stream with their own offset. A 28-second tool call on sess:9c1d stalls only itself; the other 9,999 sessions keep flowing.

Four producers (chat ui, scheduler, webhook, replay) push into one queue (agent-tasks). The queue is split into five visible ordered partitions per agent session plus 9,995 more. Two consumer groups (agent runner with lag 0, tracer with lag 12) each see every message independently. One slow session running a 28-second tool call (sess:9c1d) stalls only its own partition.
No head-of-line blocking · FIFO per partition · fan-out per group. Read more in partitions, consumer groups.
10,000+ partitions normal 104k msg/s peak push (batch=100) 0 rebalance protocol
02 · Comparison

Queen vs Kafka: head-of-line blocking at high entity cardinality.

Kafka shards into a fixed number of physical partitions and routes entities by hash. With 12 entities and 4 partitions, three unrelated entities share each lane. When user:003 goes slow, user:007 and user:011 are blocked because they share p2. Queen gives every entity its own logical lane — one slow entity stalls only itself.

Side-by-side comparison: 12 entities feed Kafka (left) and Queen (right). Kafka hash-mods them into 4 fixed partitions; user:007 and user:011 are blocked when user:003 in p2 goes slow. Queen gives every entity its own logical lane, so user:003 stalls only itself; the other 11 lanes keep flowing.
Same input, different blast radius. 2 of 12 unrelated entities blocked on Kafka; 0 on Queen. Read the deeper argument in the partition-scaling benchmark.
03 · Comparison

Queen vs RabbitMQ: 10,000 ordered streams, not 10,000 Erlang processes.

To get per-entity ordering in RabbitMQ you allocate one queue per entity. With 10,000 entities, that's 10,000 Erlang processes, ~2.4 GB of process heap, and 10,000+ file handles. Queen gets the same per-entity ordering with one queue whose partitions are index rows in PostgreSQL: 52 MB of broker memory, 0 extra processes, and a btree on (queue_id, partition) that stays flat to a million-plus partitions.

RabbitMQ requires one queue per entity for per-entity ordering: 10,000 Erlang processes and around 2.4 GB of heap RAM. Queen achieves the same per-entity ordering with a single queue containing 10,000 logical partitions, costing 52 MB of broker memory and zero extra processes. The right side shows partition pills inside one queue card; the left shows a wall of separate queue-per-entity cards each with its own process heartbeat.
~46× less memory, no per-entity scheduler load. Both diagrams use the same partition vocabulary, so you can read them as one artifact.
04 · Atomic pipelines

Atomic ack + push, in one PostgreSQL transaction.

A worker pops a message from raw-events, transforms it, and pushes the derived message into processed-events. The ack of the input and the insert of the output are wrapped in one PG BEGIN…COMMIT with a single WAL record. Either both rows commit, or neither does — no orphan output, no duplicate ack, no separate outbox table.

A worker pops a message from queue A (raw-events), transforms it inside a glowing BEGIN/COMMIT envelope, and pushes the derived message into queue B (processed-events). The ack and the insert are in the same PostgreSQL transaction. The bottom inset shows the actual SQL: BEGIN; UPDATE messages_consumed SET status='completed'; INSERT INTO messages (...); COMMIT.
Transactional outbox, but as a primitive. Read the API in transactions.
05 · Replay

Replay any window, without disturbing live consumers.

PostgreSQL is the source of truth, so every message ever pushed is queryable by every consumer group. The agent-runner group is at the head; a new report-fall-2024 group joins with subscriptionFrom: 2024-09-01 and replays the historical window while the live group keeps running unaffected. No log retention to manage; PG retention policies decide how far back you can go.

A long horizontal tape of messages from September 2024 to NOW. The agent-runner consumer group cursor sits at the head with lag 0. A new report-fall-2024 group materialises at September 2024 with subscriptionFrom set, and sweeps fast toward the head, replaying the historical window. Below the tape, three subscription mode cards show 'all', 'new', and 'from:timestamp'.
Per-group offset · replay any timestamp · independent progress. See subscription modes for the full table.
06 · Reliability

Lease · retry · dead-letter queue.

A popped message is leased to one consumer for leaseTime. If the consumer crashes or fails, the lease expires and the message returns to the head of the partition with the retry counter incremented. Order is preserved. After retryLimit retries it routes to the per-queue DLQ with the original payload, last error, and retry trace — ready to browse, filter, and replay from the dashboard.

State diagram: a queue feeds a worker holding a 30-second lease. Status flows through three branches. On success, the message goes to a 'completed' card (top right). On failure or lease expiry, the message returns to the head of the partition and the retry counter increments; this loops up to retryLimit times. After retryLimit (3), the message routes to the dead-letter queue (bottom right) with the original payload and last error; the DLQ message count ticks up.
No phantom workers · order preserved on retry · DLQ replay from dashboard. See leases & retries and DLQ.
07 · Durability

Zero message loss across PostgreSQL outages.

When PostgreSQL becomes unreachable, pushes don't fail. The broker writes them to a local FILE_BUFFER on disk and acknowledges to the client. When PostgreSQL recovers, a background scanner streams the buffered events back into the database in order. Verified across 1.6 billion events in our benchmark suite with zero loss and zero duplicates.

Three-phase loop. Phase 1: producers push to broker, broker commits to PostgreSQL, status 'all healthy'. Phase 2: PostgreSQL becomes unreachable, the PG card border flashes red, broker spills new pushes to the FILE_BUFFER on local disk, buffered counter grows from 0 to 312, status 'postgres unreachable, spilling pushes to local FILE_BUFFER, clients still get HTTP 201'. Phase 3: PostgreSQL recovers, the broker drains the buffer back into the database in order, status 'postgres recovered, draining file buffer back to db'. The 'messages lost' counter remains 0 throughout.
Push path always succeeds · order preserved across recovery · verified across 1.6B events. See disk failover.
What now?

Pick a path.

Spin it up locally with two docker run commands, browse the reference docs, or read the comparison benchmarks against Kafka and RabbitMQ.