Requirements
2 million users. 50,000 notification events per second at peak. Under 500ms delivery to connected users. At-least-once delivery guarantee with client-side deduplication.
Stage 1: Event Production
Every service publishes notification events to Redis Streams — persistent across restarts, consumer group support for parallel processing, built-in delivery acknowledgement. A single Redis node handled 50,000 writes per second comfortably in our load testing.
Stage 2: Stream Processing
A pool of Node.js worker processes reads from Redis Streams, enriches events with user notification preferences (opted in? quiet hours active?), and writes filtered notifications to PostgreSQL and a Redis fanout queue.
Stage 3: Delivery
A Node.js cluster manages WebSocket connections with sticky-session load balancing. A Redis connection registry allows any process to route a notification to the correct process managing a given user’s connection. Disconnected users receive push notifications via FCM and APNs.
Results
50,000 events per second sustained over 4 hours during a flash sale. P99 WebSocket delivery latency: 180ms. Zero notification events dropped in 6 months of production operation.
