Queen MQ
v0.14 · partitions on PostgreSQL Apache 2.0

Per-entity ordering
without preallocating partitions.

Queen MQ is a message queue backed by PostgreSQL with the partition count as a continuous knob. Slide it toward thousands and you have Kafka-shaped per-entity FIFO ordering without preallocating anything, one chat per partition, one user per partition, one workflow per partition. Slide it toward ten and you have RabbitMQ-shaped competing consumers at higher throughput, with per-shard ordering as a free bonus. Same engine, same SDK, same durability tier (synchronous_commit=on). One Docker container plus the Postgres you already run. Sized for workloads where one beefy Postgres is enough.

6.7k msg/s end-to-end pipeline · p99 ≈ 1 s · 0 duplicates 104k msg/s peak push (batch=100) 100 000 active partitions, no degradation 52 MB broker RAM · single container

Numbers from our benchmark page, including a 4-stage real-client pipeline (producer → worker → fan-out × 2). Use the sizing calculator to convert your target msg/s into a Postgres vCPU budget. Above ~200k msg/s sustained, or for multi-region replication, you want Kafka.

Why Queen exists

One partition per thing, not per shard.

Kafka partitions are physical shards. They're brilliant for log-shipping, replication, and shovelling huge volumes through a streaming pipeline. They're awkward when you want one ordered lane per business entity, one chat per partition, one user per partition, one workflow per partition. Kafka makes you preallocate a fixed number and hash-mod into them, which means unrelated entities share a partition and one slow consumer can hold up many.

Queen reframes the partition as a logical ordering scope in a PostgreSQL-backed queue. You can have tens of thousands per queue, created on first push, with no preallocation. Slow processing on one chat doesn't slow another. It's not a Kafka replacement, it's a different shape, suited to a different range of workloads.

Born in production

Queen was built at Smartness to power Smartchat, an AI guest-messaging platform: 100k+ concurrent chat conversations, AI translation steps, agent replies. One ordered partition per chat, one slow translation no longer freezes the others.

The power of Queen

One example. Five primitives. Production-grade.

The script below is the examples/base.js from the repo. It creates a queue, pushes a message, consumes it with a consumer group, then atomically acknowledges the input and pushes a derived message into a second queue, exactly-once across both operations.

examples/base.js, verified against http://localhost:6632
import { Queen } from 'queen-mq'

const queen = new Queen('http://localhost:6632')

// 1) Configure queue: lease, retries, retention, encryption.
await queen.queue('test-queue').delete()
await queen.queue('test-queue')
  .config({ leaseTime: 10, retryLimit: 3, retentionSeconds: 86400 })
  .create()

// 2) Push to a partition. All messages on partition 'p1' are FIFO-ordered.
await queen.queue('test-queue')
  .partition('p1')
  .push([{ transactionId: '123', data: { message: 'Hello, world!' } }])

// 3) Consume in a consumer group with manual ack.
//    Messages stay leased to this consumer; auto-renew every 2s.
await queen.queue('test-queue')
  .group('analytics')
  .renewLease(true, 2000)
  .autoAck(false)
  .batch(10)
  .each()
  .consume(async (message) => {
    // your logic here
    return { nextPartition: 'p2' }
  })
  .onSuccess(async (message, result) => {
    // 4) Atomically ack the input AND push a derived message
    //    onto another queue. Either both happen, or neither.
    await queen
      .transaction()
      .ack(message, 'completed', { consumerGroup: 'analytics' })
      .queue('trx-queue').partition(result.nextPartition)
      .push([{ data: result, transactionId: message.transactionId + '-next' }])
      .commit()
  })
  .onError(async (message, error) => {
    await queen.ack(message, false, { group: 'analytics' })
  })
1 · Queue

Configurable container

Lease time, retry limit, retention, optional encryption, optional DLQ. One POST /api/v1/configure call.

2 · Partition

Ordered lane, no quota

Push to partition('p1') and every message in that lane is processed in order. Lanes are cheap, make one per user, tenant, or chat.

3 · Consumer group

Kafka-style fan-out

Each group has its own offset. Workers in a group share the load; separate groups all see every message.

4 · Lease + retry

No phantom workers

Messages are leased to one consumer at a time. Renew leases for long jobs; failed ones retry, then go to the DLQ.

5 · Transaction

Ack + push, atomic

ack(input).push([output]).commit(). Wired straight into PostgreSQL's transaction, exactly-once across queues.

+ Failover

Disk buffer if PG goes down

Pushes are written to a local file buffer when PostgreSQL is unreachable, then replayed automatically when it recovers. No lost messages.

Same grammar, five languages.

The example above is JavaScript, but every Queen client speaks the same fluent verbs: queue(name).partition(p).group(g).push() / .pop() / .consume(). All shipping today, not on a roadmap: JavaScript (Node 22+ & browser, npm install queen-mq) · Python (3.8+, pip install queen-mq) · Go (1.24+) · PHP / Laravel (8.3+, composer require smartpricing/queen-mq) · C++ (header-only, C++17). Or skip SDKs entirely and call the raw HTTP API. See the full matrix →

Feature surface

What you get out of the box.

Ordering

FIFO partitions at high cardinality

Per-user, per-tenant, per-chat, tens of thousands of ordered lanes per queue. Slow processing on one lane doesn't slow another.

Fan-out

Consumer groups & subscriptions

Process from beginning, only new messages, or replay from a timestamp. Each group has its own offset.

Atomicity

Transactional pipelines

Atomic ack + push across queues, in one PostgreSQL transaction. Exactly-once between Queen operations; downstream effects are still your responsibility.

Latency

Long polling, no busy loops

Server holds the connection until a message arrives. Inter-instance UDP wakeup keeps fan-out fast.

Reliability

Dead-letter queue

Configurable retry limit. Failed messages land in the DLQ with the error message and full payload.

Durability

Disk failover

If PostgreSQL is unreachable, pushes spill to a local file buffer and replay on recovery.

Observability

End-to-end tracing

Trace a message across queues, transformations, retries, and consumer groups. Visualize timelines in the dashboard.

Observability

Prometheus & Grafana

Native /metrics/prometheus exposition with per-queue, per-worker, and DLQ series. One DB call per scrape; ready for Grafana out of the box.

Security

JWT auth + role gates

HS256, RS256, and EdDSA. Read-only / read-write / admin role tiers. Per-message producerSub stamped from the JWT.

Operate

Vue 3 dashboard

Real-time queues, message browser, analytics, trace explorer, DLQ management, all served by the same C++ binary.

Postgres health

Bloat & vacuuming, addressed by design

Update-heavy tables ship with FILLFACTOR=50 and tuned autovacuum knobs to stay HOT-update-friendly. Advisory locks replace row contention on the hot path. Retention windows on messages_consumed keep the message log bounded.

High availability

Single container, or K8s StatefulSet

One Docker container is enough for most setups. For HA, run a StatefulSet behind a headless service: pods coordinate via UDP peer wake-ups and the UDPSYNC shared-state cache, with affinity routing in the clients.

Honest positioning

Where Queen fits, and where it doesn't.

Queen is good at a specific shape of workload, not at every queue use case. Picking infrastructure honestly matters more than picking the “winner” of every benchmark. Here's the truth, in three columns:

Use Queen

~80% of business workloads

  • You need per-key ordering with parallel consumers, one chat, one user, one workflow per partition.
  • You're under ~100k msg/s sustained throughput.
  • You already run PostgreSQL and want one less thing to operate.
  • You want transactional integration: insert business data and queue a message in the same PG transaction.
  • You need replay from a timestamp for new consumer groups.
  • You have high-cardinality ordering keys (10k–1M+ entities, each needing its own ordered lane).
Use Kafka

The streaming tier

  • You need >200k msg/s sustained on a single broker, or millions of msg/s across a cluster.
  • You're building a log-as-source-of-truth system with multi-day or unbounded retention.
  • You need multi-region active-active replication out of the box (MirrorMaker).
  • You want the streaming ecosystem: Kafka Streams, ksqlDB, Flink connectors, Schema Registry.
  • Your team already operates Kafka and the operational cost is sunk.
Use RabbitMQ

AMQP-required & rich routing

  • You need a specific wire protocol Queen doesn't speak, AMQP 0.9.1, MQTT, STOMP, because of compliance, existing tooling, or IoT clients.
  • You need complex routing rules: topic exchanges with wildcard patterns (orders.*.created), headers exchanges, message priorities, alternate exchanges on unroutable.
  • You're already operating RabbitMQ in production and migrating is more expensive than the operational savings.
  • You want federation / shovel tooling for cross-cluster forwarding with translation rules.

Note: for the typical RabbitMQ workload, competing consumers with persistent messages and acks, Queen at low partition count gets you the same shape at higher throughput, lower RAM (~70 MB vs 2–5 GB), and a simpler protocol. The reasons left to pick RabbitMQ are mostly protocol / ecosystem, not raw queue mechanics.

A note on the “~80%” claim

Most production message-queue workloads at most companies are below 100k msg/s and need ordering of some sort. Queen targets that median workload, not Kafka's upper tail and not RabbitMQ's specific routing strengths. Our benchmarks show Queen, Kafka, and RabbitMQ head-to-head on identical hardware so you can decide for yourself, with numbers we measured rather than wishful thinking.

Why broker throughput numbers can mislead

If your messages do real work, a DB write, an API call, an LLM inference, then your worker fleet is the bottleneck, not the broker. At 20 ms of work per message, one worker handles 50 msg/s. Saturating Kafka's 1.5M msg/s requires 30 000 worker processes (~$150k/month of consumer fleet). Most companies have 100–2 000 workers and run at 5 k–100 k msg/s of actual demand , well within Queen's broker envelope. In that regime, the broker comparison stops mattering, and Queen wins on operational cost, durability semantics, and Postgres-transactional integration. The benchmarks page has the full math.

Get started

Two containers. Three commands.

PostgreSQL + Queen, then push a message and pop it back. No SDK required.

shell
docker network create queen
docker run --name qpg --network queen \
  -e POSTGRES_PASSWORD=postgres -p 5433:5432 -d postgres
docker run -p 6632:6632 --network queen \
  -e PG_HOST=qpg -e PG_PASSWORD=postgres -e NUM_WORKERS=2 \
  smartnessai/queen-mq:latest

curl -X POST http://localhost:6632/api/v1/push \
  -H 'Content-Type: application/json' \
  -d '{"items":[{"queue":"demo","payload":{"hello":"world"}}]}'

curl 'http://localhost:6632/api/v1/pop/queue/demo?autoAck=true'