Architecture

How Queen works under the hood.

Queen is three layers stacked end-to-end. PostgreSQL stores everything. A C++ engine called libqueen fuses many HTTP requests into batched stored-procedure calls. A uWebSockets HTTP server distributes work across worker threads and routes responses back to the right connection. This page is the mental model, enough to predict how the system behaves under any workload, without reading the source.

The three layers

Read the system top-down. Each layer has one job.

┌────────────────────────────────────────────────────────────┐ │ HTTP shell, uWebSockets, C++17 │ │ │ │ Acceptor on :6632 → adopts sockets to N workers │ │ Each worker: │ │ • routes/{push, pop, ack, transaction, …} │ │ • per-worker ResponseRegistry (lock-free) │ │ • per-worker libqueen instance │ │ • shared FileBufferManager (disk failover) │ └──────────────────────┬─────────────────────────────────────┘ │ submit(JobRequest, callback) ▼ ┌────────────────────────────────────────────────────────────┐ │ libqueen, request-fusion engine, one per worker │ │ │ │ PerTypeQueue × 6 (PUSH / POP / ACK / TRANSACTION / │ │ RENEW_LEASE / CUSTOM) │ │ BatchPolicy × 6 (preferred, max_hold_ms, max_batch) │ │ ConcurrencyControl × 6 (Static or Vegas adaptive) │ │ DrainOrchestrator (libuv-driven, single thread) │ │ DBConnection slots (libpq async, uv_poll'd sockets) │ └──────────────────────┬─────────────────────────────────────┘ │ SELECT queen.push_messages_v3($1::jsonb) ▼ ┌────────────────────────────────────────────────────────────┐ │ PostgreSQL, schema "queen" │ │ │ │ Tables: queues, partitions, messages, │ │ partition_consumers, partition_lookup, │ │ consumer_watermarks, dead_letter_queue, │ │ traces, stats, system_metrics │ │ Procedures: push_v3, pop_v3, ack_v2, transaction_v2, │ │ renew_lease_v2, has_pending, … │ └────────────────────────────────────────────────────────────┘

The trick that makes this fast: HTTP threads never touch libpq directly. They drop a request into libqueen's per-type queue and return immediately. libqueen collects requests of the same type, fuses them into one batched stored-procedure call on a non-blocking PostgreSQL connection, then disperses the per-row results back to each waiting HTTP request via a cross-thread defer().

Data model, PostgreSQL is the source of truth

Everything the system knows about, every queue, every message, every consumer offset , lives in PostgreSQL rows. There is no in-memory queue state on the server that survives a restart. The schema is intentionally small:

Table	What it holds
`queues`	Per-queue configuration: `lease_time`, `retry_limit`, retention windows, encryption flag, DLQ flags, priority.
`partitions`	Named lane inside a queue. Logical only, Queen does not use PostgreSQL native table partitioning.
`messages`	The payload. Dedupe key is `(partition_id, transaction_id)`, not just `transaction_id`, idempotency is partition-scoped.
`partition_consumers`	The lease + offset cursor for each `(partition_id, consumer_group)` pair. Holds `last_consumed_id`, `lease_expires_at`, `worker_id`, `batch_size`, `acked_count`, `batch_retry_count`. This single row is what makes "queue mode" and "consumer-group mode" share the same code path.
`consumer_groups_metadata`	Per-named-group subscription mode (`all` / `new` / from timestamp).
`partition_lookup`	"Head of stream" snapshot per partition: `last_message_id`, `last_message_created_at`, `updated_at`. Lets pop decide whether a partition has anything new without scanning `messages`.
`consumer_watermarks`	Wildcard-pop optimization: `last_empty_scan_at` per `(queue, group)`. Caught-up consumers skip stale partitions.
`dead_letter_queue`	Failed messages that exceeded the retry limit, plus the original error message.
`traces`, `messages_consumed`, `stats`, `system_metrics`	Append-only logs and rollups powering the dashboard.

Two indexes carry the hot path

messages (partition_id, created_at, id), the cursor scan that pop walks to find the next batch of unconsumed messages on a claimed partition.
UNIQUE (partition_id, transaction_id) on messages, enforces dedupe via ON CONFLICT DO NOTHING in push.

Update-heavy tables (partition_lookup, partition_consumers, stats) ship with FILLFACTOR=50 and tuned autovacuum knobs to keep them HOT-update-friendly under sustained write load.

Two coordination primitives

Both live inside the procedures, not the schema:

POP wildcard claim, pg_try_advisory_xact_lock(hashtextextended(partition_id::text, …)). Non-blocking try; the first request to claim a partition wins it for the duration of its transaction.
ACK serialization, pg_advisory_xact_lock(...md5(partition_id || consumer_group)...). Blocking lock; prevents two concurrent acks from racing on the same cursor row.

These are the entire concurrency story. There are no application-level mutexes, no external coordination services. PostgreSQL is the lock manager.

The `__QUEUE_MODE__` trick

When a consumer pops without specifying a group, every code path internally uses the literal string '__QUEUE_MODE__' as the consumer-group value. Same partition_consumers row mechanism, same offset-tracking, same procedures , just a special name. This is why the behavioral difference between "queue mode" and "consumer-group mode" is small enough to coexist in one stored procedure.

Stored procedures, the hot-path API

Every hot-path operation is a named, versioned PostgreSQL procedure. They all take a JSON array (the batch from libqueen) and return a JSON result. This is what makes request fusion safe: many client requests collapse into one procedure call, one network round trip, one transaction.

Procedure	What it does
`queen.push_messages_v3`	Upserts queue + partition rows, deduplicates within the batch and against existing rows via `ON CONFLICT`, inserts messages, returns inserted ids and per-partition `partition_updates` the server then writes to `partition_lookup`.
`queen.pop_unified_batch_v4`	One procedure for both queue mode and consumer-group mode, both fixed-partition and wildcard. Scans `partition_lookup` + `consumer_watermarks`, claims partitions with an advisory try-lock, takes the lease in `partition_consumers`, fetches messages by cursor. v4 supports a per-call `max_partitions` field: the wildcard branch can drain up to N partitions in a single call (one shared lease across all of them, `batch_size` as the global message cap). v3 is preserved for emergency revert by flipping the dispatched SQL string in `lib/queen/pending_job.hpp`.
`queen.ack_messages_v2`	For each ack: serialize on `(partition, group)`, validate the lease, advance the cursor (on `completed`) or release+retry/DLQ-route (on `failed`). Clears the lease when `acked_count` hits `batch_size`.
`queen.execute_transaction_v2`	Inlines simplified ack + push operations in one PL/pgSQL function = one PG transaction. Either everything commits or nothing does. This is what makes "exactly-once across queues" actually true.
`queen.renew_lease_v2`	Extends `lease_expires_at` on rows where `worker_id` matches and the lease hasn't already expired. Idempotent and batchable.
`queen.has_pending_messages`	Cheap `EXISTS` with the same predicate as pop's candidate scan. Used by long-poll readiness checks without claiming anything.
`queen.update_partition_lookup_v1`	Monotonic UPSERT into `partition_lookup`. Called by the server after every push commit, replacing what used to be a row trigger on `messages`.

How push actually works

Worth a closer look, push is the highest-throughput hot path. push_messages_v3 is three statements with no temp tables:

queen.push_messages_v3, sketch

-- 1. Ensure all queues in this batch exist.
INSERT INTO queen.queues (name, …)
SELECT DISTINCT … FROM jsonb_array_elements(p_items)
ON CONFLICT (name) DO NOTHING;

-- 2. Ensure all partitions exist.
INSERT INTO queen.partitions (queue_id, name)
SELECT DISTINCT … FROM jsonb_array_elements(p_items) JOIN queues …
ON CONFLICT (queue_id, name) DO NOTHING;

-- 3. One CTE: parse → resolve partition_id → row_number for intra-batch
--    dedupe → INSERT … ON CONFLICT DO NOTHING → RETURNING.
WITH parsed AS MATERIALIZED ( … gen_random_uuid(), partition_id … ),
     deduped AS (SELECT *, row_number() OVER (PARTITION BY (partition_id, transaction_id)
                                              ORDER BY idx) AS dup_rank
                 FROM parsed),
     inserted AS (INSERT INTO queen.messages …
                  SELECT * FROM deduped WHERE dup_rank = 1
                  ON CONFLICT (partition_id, transaction_id) DO NOTHING
                  RETURNING …)
SELECT jsonb_build_object('items', …, 'partition_updates', …);

Three SQL statements, one network round trip from libqueen, hundreds of pushed messages per call under load. The partition_updates aggregate in the response is what the server replays into update_partition_lookup_v1 after commit, so the wildcard pop path always finds the latest writes.

How pop finds the next message

Pop is the most intricate procedure because it covers four shapes (queue / consumer group × fixed / wildcard partition) in one function. The wildcard-by-consumer-group path is the interesting one:

Read the last_empty_scan_at watermark from consumer_watermarks.
Scan partition_lookup for partitions where updated_at >= watermark - 2 minutes (caught-up consumers skip the rest), the lease is free, and the cursor is behind the head.
For each candidate, try pg_try_advisory_xact_lock on the partition id. The first successful claim wins; the loop exits.
UPSERT a row in partition_consumers, then UPDATE it to take the lease and read the cursor.
SELECT … FROM messages WHERE partition_id = $ AND (created_at, id) > (cursor) ORDER BY (created_at, id) LIMIT batch_size.

If no partition was claimed, the watermark is bumped and the procedure returns no_available_partition. libqueen turns this into a long-poll backoff rather than an error response.

libqueen, the request-fusion engine

libqueen is a per-HTTP-worker C++ library that owns its own libuv event loop and a pool of non-blocking PostgreSQL connections. Its job: take many HTTP requests of the same kind and turn them into one batched stored-procedure call.

The `JobType` enum

lib/queen/pending_job.hpp

enum class JobType : uint8_t {
    PUSH = 0,        // batchable
    POP,             // batchable in groups
    ACK,             // batchable
    TRANSACTION,     // NOT batchable, atomic, serial
    RENEW_LEASE,     // batchable
    CUSTOM,          // NOT batchable, per-job SQL
};

// Every batchable type maps to exactly one stored procedure:
{ JobType::PUSH,        "SELECT queen.push_messages_v3($1::jsonb)" },
{ JobType::POP,         "SELECT queen.pop_unified_batch_v4($1::jsonb)" },
{ JobType::ACK,         "SELECT queen.ack_messages_v2($1::jsonb)" },
{ JobType::TRANSACTION, "SELECT queen.execute_transaction_v2($1::jsonb)" },
{ JobType::RENEW_LEASE, "SELECT queen.renew_lease_v2($1::jsonb)" },

Each JobType has its own queue, its own batch policy, and its own concurrency controller. They run in parallel, a high-load pop doesn't block push, and vice versa.

BatchPolicy, when to fire

Three knobs per type, settable via QUEEN_<TYPE>_* env vars:

lib/queen/batch_policy.hpp, should_fire

FireDecision should_fire(size_t queue_size,
                         milliseconds oldest_age) const {
    if (queue_size == 0)                            return HOLD;
    if (queue_size >= preferred_batch_size)         return FIRE;  // throughput
    if (oldest_age >= max_hold_ms)                  return FIRE;  // latency cap
    return HOLD;
}

Fire when there are enough requests queued, or when the oldest one has waited too long. So latency is bounded by max_hold_ms when load is low; throughput compounds via batching when load is high. Defaults: PUSH 50/20ms, POP 20/5ms (latency-sensitive), ACK 50/20ms, TRANSACTION 1/0 (never fuse).

ConcurrencyController, adaptive in-flight cap

Two implementations, switched by QUEEN_CONCURRENCY_MODE:

Static: a fixed cap. Atomic counter against max_concurrent. Predictable and boring.
Vegas (default): TCP-Vegas-inspired adaptive limit. The controller computes queue_load = in_flight × (1 − rtt_min ⁄ rtt_recent). If queue_load < α (default 3) → grow the limit; if queue_load > β (default 12) → shrink. rtt_min is a 30-second sliding minimum (so a single fast outlier doesn't pin it). Adjusted at most once per second to prevent thrashing.

This is why QUEEN_PUSH_MAX_CONCURRENT ships at 24 even though Vegas typically converges to ~17 in steady state, Vegas needs headroom to grow into when load rises. The cap is a safety ceiling, not an operating point.

DrainOrchestrator, the heart

All scheduling runs on libqueen's dedicated libuv thread. The drain loop wakes on four sources:

Wake source	Mechanism
New job submitted	`uv_async_send` from any thread
Batch completed	Slot freed in the connection pool
Hold timer fired	`uv_timer_t` re-armed at the earliest `front_enqueue_time + max_hold_ms`
Long-poll backoff ready	Earliest pop `next_check` deadline

One drain pass: process expired long-polls; round-robin over the six JobTypes (with a rotating start index so no type starves); for each type, while the BatchPolicy says fire and the ConcurrencyController has a slot and there's a free DB connection, take up to max_batch jobs, merge their JSON arrays, and call PQsendQueryParams on a non-blocking connection. Re-arm the safety-net timer.

libpq async, integrated with libuv

Each PostgreSQL connection is wrapped in a DBConnection with the libpq socket fd and a uv_poll_t. When a batch fires, libqueen starts polling for UV_WRITABLE | UV_READABLE:

On writable → PQflush until the query is sent.
On readable → PQconsumeInput + PQisBusy loop, then PQgetResult reads the JSON response.
The procedure returns one result row containing one JSONB array; libqueen splits it back into per-job results and invokes each PendingJob::callback(result_str).
Slot returns to the free pool, kicks the orchestrator again, which is why throughput stays high when many slots become free at once.

Long polling

When pop returns no_available_partition, libqueen tracks a per-request next_check with exponential backoff (initial 100ms, multiplier 2.0, cap 1000ms) and a wait_deadline from the client's timeout parameter. The safety-net timer fires the request again at next_check without holding any DB connection in between, this is what makes "30k idle long-pollers" actually fine.

HTTP shell, uWebSockets, acceptor + workers

The HTTP layer is small and uniform. main_acceptor.cpp is 76 lines; acceptor_server.cpp wires everything together.

Acceptor / worker pattern

One acceptor thread listens on :6632. When a TCP connection arrives, it adoptSocket()s the socket onto one of the worker uWS::App event loops. The acceptor never serves a request itself.
N worker threads (default 10), each running its own uWS::App and its own Queen instance. Each worker has its own ResponseRegistry, so cross-worker response delivery has no shared mutex.
Worker 0 bootstraps schema + starts the background services: MetricsCollector, RetentionService, EvictionService, StatsService, PartitionLookupReconcileService, and SharedStateManager. Other workers wait on a condvar until worker 0 signals ready.

Cross-thread response delivery

This is the trickiest part of the C++ code, but the model is simple. When a route handler calls queen->submit(job, callback), the callback runs on libqueen's libuv thread, not the uWS thread that owns the HttpResponse*. Touching that pointer from the wrong thread is undefined behavior. So:

routes/push.cpp, the cross-thread hop

// On uWS thread: register the response, get a request_id back.
auto request_id = registry->register_response(res, worker_id, …);

// Submit to libqueen (runs immediately, callback fires later).
ctx.queen->submit(std::move(job_req),
    [worker_loop, worker_id, request_id](std::string result) {
        // We're on libqueen's libuv thread now.
        // Hop back to the uWS worker loop.
        worker_loop->defer([result, worker_id, request_id]() {
            // Safe to touch the response now.
            registry->send_response(request_id, parse(result), …);
        });
    });

uWS::Loop::defer() queues the lambda on the worker's event loop. If the client disconnected in the meantime, res->onAborted already invalidated the registry entry, late deliveries become no-ops.

End-to-end walkthrough

One push, in detail

Client sends POST /api/v1/push with {items: [...]}.
Acceptor adopts the socket onto worker N.
Auth middleware validates the JWT and stamps producerSub from the verified sub claim. Client-supplied producerSub is ignored, this is what closes the impersonation hole.
If the queue has encryption enabled, EncryptionService replaces each payload with {encrypted, iv, authTag} (AES-256-GCM).
The route registers the response in the per-worker ResponseRegistry, gets back a request_id.
It stashes the items in push_failover_storage[request_id] in case the DB call fails.
It builds a JobRequest{op_type=PUSH, params=[items_json]} and calls queen->submit(). The HTTP thread is unblocked.
libqueen's drain orchestrator collects this push along with others queued nearby. When BatchPolicy says fire, it merges their item arrays and calls PQsendQueryParams("SELECT queen.push_messages_v3($1::jsonb)", ...).
The procedure runs three SQL statements (queues upsert, partitions upsert, messages insert) and returns {items: [...], partition_updates: [...]}.
libqueen reads the response, splits it back into per-request results, invokes each callback.
The callback defer()s onto the uWS worker loop. There, the route writes the response. It also fires a UDP MESSAGE_AVAILABLE packet to peer instances and writes partition_updates via update_partition_lookup_v1.

A transactional pipeline

The path is the same up to queen->submit, but the JobType is TRANSACTION with max_concurrent=1, max_batch=1. libqueen runs the request alone, calling execute_transaction_v2. Inside the procedure, ack and push are inlined as one PL/pgSQL function, i.e. one PostgreSQL transaction. Any RAISE in either operation rolls back the whole bundle: the input message stays unacked, the output message never landed. On commit, both are durable in the same WAL record. That is exactly-once across queues.

Failover, multi-instance sync, encryption

Disk failover for push

If PostgreSQL is unreachable mid-push, the route handler's callback parses {success: false, error: …} and replays the items from push_failover_storage into the FileBufferManager, a rotating JSON-Lines file in FILE_BUFFER_DIR. A background scanner reads the file every FILE_BUFFER_FLUSH_MS (default 100ms) and replays buffered events into the database once it's reachable again. On startup, worker 0 runs a recovery pass over leftover files before accepting traffic.

Pop and ack do not have a file-buffer path. They require the database to be live. The disk buffer is about not losing inbound writes, not about serving reads from disk. For full HA the database itself must be HA (Patroni, RDS Multi-AZ, Cloud SQL, etc.), Queen just reconnects when the upstream reappears.

Multi-instance UDP sync

Queen servers are stateless above PostgreSQL, you can run as many as you want pointed at the same DB. Two optional UDP layers make horizontal scale fast:

Peer notifications: when a server commits a push, it broadcasts a MESSAGE_AVAILABLE packet to peers. libqueen's POP backoff tracker hears this and resets the affected long-polls to fire immediately. Sub-millisecond cross-instance wake-up.
Distributed cache (UDPSYNC): heartbeats + shared-state cache for queue config, partition-id LRU, and server-health tracking. Heartbeats authenticate via HMAC-SHA256 (QUEEN_SYNC_SECRET).

The cluster works fine without UDP, long-poll consumers just fall back to a slightly longer wake-up interval.

At-rest encryption

When a queue has encryptionEnabled=true and QUEEN_ENCRYPTION_KEY is set (32 bytes hex), the push route encrypts each payload with AES-256-GCM before sending it to PostgreSQL. The stored payload becomes {encrypted, iv, authTag}. Pop transparently decrypts on the way out. The key never leaves the server process; PostgreSQL never sees plaintext.

Authentication

Optional JWT, supporting HS256, RS256, and EdDSA. Routes are gated by access level (read-only / read-write / admin) derived from a configurable role claim. When auth is enabled the server stamps the validated sub claim onto every pushed message as producerSub, clients can't set this field, so even a compromised JWT cannot impersonate another producer in the message log.

Why this design

Source of truth

PostgreSQL is the queue

Every queue, partition, message, lease, and offset lives in PostgreSQL rows. There is no in-memory state on the server that survives a restart. The system is as durable as your database.

No coordination service

Locks live in Postgres

No ZooKeeper, no Raft cluster, no etcd. Two advisory-lock conventions inside the procedures handle all coordination, one transaction-scoped try-lock for partition claims, one blocking lock per (partition, group) for ack serialization.

Fusion over fan-out

One JobType, one procedure

Each batchable hot-path operation maps to exactly one named, versioned PostgreSQL procedure. libqueen fuses many concurrent requests into one procedure call, fewer round trips, better commit amortization, simpler operational story.

Adaptive concurrency

Vegas instead of fixed pool size

Concurrency caps are ceilings, not operating points. The TCP-Vegas-inspired controller grows under stable RTT and shrinks when PostgreSQL queueing rises, so the system finds its own throughput sweet spot without manual tuning.

Two-event-loop hand-off

HTTP threads never block on libpq

Each worker has its own uWS event loop and its own libqueen libuv loop. Submit returns immediately; results hop back via a deferred callback. uWS responses are written from the loop that owns them, no shared mutex on the response path.

Push-side failover

Disk buffer for inbound writes

If PostgreSQL is unreachable, pushes spill to a local file buffer and replay on recovery. Pop and ack require the database to be live, the disk buffer is about not losing inbound writes, not about serving reads from disk.

What we traded away

Every architectural choice has a counter-cost. Queen's choices buy strong durability, ordering at high partition cardinality, and operational simplicity, and they pay for it in four real ways. None of these are bugs; they're consequences of being a Postgres-backed message queue speaking HTTP/JSON. Knowing them upfront helps you decide whether Queen is the right tool for what you're building.

Throughput ceiling

One Postgres is the limit

We measured Queen at ~104k msg/s peak on a 32-vCPU host with a well-tuned PostgreSQL. Past that, you start running into PG's own scaling limits, WAL throughput, autovacuum cycles, checkpoint pressure, not Queen's. There is no Queen-native multi-broker design. If your workload genuinely needs >200k msg/s sustained, Kafka is the right tool. Sharding Queen across multiple Postgres instances would work but you'd be doing it at the application layer, not the queue layer.

Protocol cost

HTTP/JSON tax on the hot path

We picked HTTP/JSON because every language has an HTTP client and the operational story is dead simple. The cost: ~5× more CPU per msg/s than RabbitMQ's AMQP, and ~80× more than Kafka's binary protocol. At 10k msg/s you don't notice. At 100k msg/s, the protocol cost is most of Queen's CPU budget. A future binary protocol would help here; today HTTP is what we ship.

No multi-region

PG logical replication is your only path

Kafka has MirrorMaker, RabbitMQ has federation. Queen has whatever you can build on top of pg_logical, async, eventually consistent, with the usual gotchas (lag during heavy writes, sequence collisions, no automatic failover). For multi-region active-active, Queen is the wrong tool unless you can accept the complexity of running it on top of a multi-region PG topology yourself.

Storage growth

`messages_consumed` needs aggressive retention

Each consumer-group delivery writes a row in queen.messages_consumed. We measured ~743 MB after 15 minutes at 10 consumer groups , extrapolated to ~70 GB/day on a sustained workload. Queen has retention controls (completedRetentionSeconds) but you have to set them: out of the box, a busy multi-tenant deployment will outgrow its disk in a week. This is operational, not architectural, but you have to know about it.

Lease recovery latency

Default 60s for failed-consumer takeover

When a consumer dies mid-batch, the partition's lease has to expire before another consumer can take over. Default is leaseTime=60 seconds, tunable per-queue. Tighter leases mean faster recovery but more overhead from lease renewals on long-running jobs. Compare to Kafka's consumer group rebalance (typically 1–10s with modern static membership). If sub-second recovery from consumer crashes is critical, this is a real constraint.

Index size

2× storage cost on the messages table

We measured queen.messages at 22 GB indexes vs 13 GB heap after 14 hours of sustained 28k msg/s. The indexes (messages_pkey, idx_messages_partition_created, messages_partition_transaction_unique) are what make per-partition pop fast, but they roughly double your storage cost compared to a single-index heap. Plan capacity accordingly.

These trade-offs are real, but they're scoped: they only matter if your workload sits at the edges of Queen's envelope. For most business workloads (<100k msg/s, single-region, single-Postgres), none of this is a problem. The benchmarks page has the full picture with measured numbers and side-by-side comparisons against Kafka and RabbitMQ.

Where to read more

SQL