How to Design a Real-Time Chat Application like Discord
Handling 10 million concurrent active users on a chat application isn't just about WebSockets. Discover how to architect the Presence Service, Message Fan-out, and NoSQL storage engines.
TL;DR (Recommended Architecture): For a real-time chat application, the optimal architecture uses a WebSocket API Gateway for stateful persistent connections, Redis Pub/Sub or Kafka for message routing between nodes, and Apache Cassandra (or DynamoDB) for high-throughput, sequential write-heavy message storage.
When you type a message in Discord, Slack, or WhatsApp, you expect it to appear on your friend's screen in less than 100 milliseconds. Building a system that achieves this latencies for millions of concurrent users requires a radical departure from traditional stateless HTTP request-response patterns.
Back-of-the-envelope Estimation
Let's define the scale for a platform like Discord or Slack.
| Metric | Value | Justification |
|---|---|---|
| Concurrent Users | 10,000,000 | Real-time persistent connections (WebSockets). |
| Write QPS (Messages) | 200,000/s | Users typing messages in busy global channels. |
| Read QPS (Messages) | 500,000/s | Opening the app and fetching history (infinite scroll). |
| Storage per Year | ~1.5 PB | 200k msg/s * 365 days * 250 bytes per msg. Long-term cold storage required. |
High-Level Architecture
The architecture consists of three core independent clusters: The Stateful Gateway, the Stateless API, and the Presence Service.
Rendering architecture diagram...
Deep Dive: The Toughest Challenges
1. Managing 10 Million Persistent Connections (The C10M Problem)
Traditional servers spin up a Thread for every HTTP request. If you attempt this with WebSockets, a server with 64GB of RAM will crash at around 10k connections due to thread context-switching overhead.
The Solution: You must use Asynchronous Event-Driven WebSockets (e.g., Node.js, Go's goroutines, or Java Netty).
Even with optimized servers, a single machine can safely hold about 500,000 connections. To handle 10 Million, you need a Cluster of at least 20-30 WebSocket Gateways behind a Layer 4 Network Load Balancer (TCP balancing).
2. Message Routing: How does Node A talk to Node B?
Assume Alice is connected to WebSocket Node 1. Bob is connected to WebSocket Node 42.
When Alice sends a message to Bob, Node 1 receives it. But Node 1 doesn't have a connection to Bob. How does the message get to Bob?
The Solution:
- Node 1 accepts Alice's message via WebSocket.
- Node 1 queries a fast Redis Cache (or ZooKeeper) that maps:
User -> Node IP. It finds Bob is on Node 42. - Node 1 pushes the message to an internal Message Broker (like Redis Pub/Sub or Kafka) targeting a channel subscribed to by Node 42.
- Node 42 receives the internal event, finds Bob's active WebSocket connection in its local memory pool, and pushes the raw payload down the socket.
3. The Database Choice: Why Cassandra?
Why not Postgres? Chat applications possess a unique data access pattern:
- Insane Write Throughput: Billions of tiny inserts per day.
- Sequential Reads: Users only fetch the latest 50 messages, then page backwards (infinite scroll).
- No Complex Joins: Messages are bound strictly to a
Channel_ID.
Apache Cassandra (or its C++ rewrite, ScyllaDB) is the industry standard here (used by Discord). It utilizes an LSM-Tree (Log-Structured Merge Tree) under the hood, turning random writes into sequential disk I/O.
By designing our Partition Key as Channel_ID and Clustering Key as Message_ID (TimeUUID), we guarantee that querying the last 50 messages requires scanning a single contiguous block on disk.
Ready to test these skills?
Practice this exact system design scenario with our AI interviewer and get graded on your architecture choices.
