Back to Blog
ArchitectureMarch 4, 202614 min read

How to Design a Real-Time Chat Application like Discord

Handling 10 million concurrent active users on a chat application isn't just about WebSockets. Discover how to architect the Presence Service, Message Fan-out, and NoSQL storage engines.

TL;DR (Recommended Architecture): For a real-time chat application, the optimal architecture uses a WebSocket API Gateway for stateful persistent connections, Redis Pub/Sub or Kafka for message routing between nodes, and Apache Cassandra (or DynamoDB) for high-throughput, sequential write-heavy message storage.

When you type a message in Discord, Slack, or WhatsApp, you expect it to appear on your friend's screen in less than 100 milliseconds. Building a system that achieves this latencies for millions of concurrent users requires a radical departure from traditional stateless HTTP request-response patterns.

Back-of-the-envelope Estimation

Let's define the scale for a platform like Discord or Slack.

MetricValueJustification
Concurrent Users10,000,000Real-time persistent connections (WebSockets).
Write QPS (Messages)200,000/sUsers typing messages in busy global channels.
Read QPS (Messages)500,000/sOpening the app and fetching history (infinite scroll).
Storage per Year~1.5 PB200k msg/s * 365 days * 250 bytes per msg. Long-term cold storage required.

High-Level Architecture

The architecture consists of three core independent clusters: The Stateful Gateway, the Stateless API, and the Presence Service.

Rendering architecture diagram...

Deep Dive: The Toughest Challenges

1. Managing 10 Million Persistent Connections (The C10M Problem)

Traditional servers spin up a Thread for every HTTP request. If you attempt this with WebSockets, a server with 64GB of RAM will crash at around 10k connections due to thread context-switching overhead.

The Solution: You must use Asynchronous Event-Driven WebSockets (e.g., Node.js, Go's goroutines, or Java Netty). Even with optimized servers, a single machine can safely hold about 500,000 connections. To handle 10 Million, you need a Cluster of at least 20-30 WebSocket Gateways behind a Layer 4 Network Load Balancer (TCP balancing).

2. Message Routing: How does Node A talk to Node B?

Assume Alice is connected to WebSocket Node 1. Bob is connected to WebSocket Node 42. When Alice sends a message to Bob, Node 1 receives it. But Node 1 doesn't have a connection to Bob. How does the message get to Bob?

The Solution:

  1. Node 1 accepts Alice's message via WebSocket.
  2. Node 1 queries a fast Redis Cache (or ZooKeeper) that maps: User -> Node IP. It finds Bob is on Node 42.
  3. Node 1 pushes the message to an internal Message Broker (like Redis Pub/Sub or Kafka) targeting a channel subscribed to by Node 42.
  4. Node 42 receives the internal event, finds Bob's active WebSocket connection in its local memory pool, and pushes the raw payload down the socket.

3. The Database Choice: Why Cassandra?

Why not Postgres? Chat applications possess a unique data access pattern:

  • Insane Write Throughput: Billions of tiny inserts per day.
  • Sequential Reads: Users only fetch the latest 50 messages, then page backwards (infinite scroll).
  • No Complex Joins: Messages are bound strictly to a Channel_ID.

Apache Cassandra (or its C++ rewrite, ScyllaDB) is the industry standard here (used by Discord). It utilizes an LSM-Tree (Log-Structured Merge Tree) under the hood, turning random writes into sequential disk I/O. By designing our Partition Key as Channel_ID and Clustering Key as Message_ID (TimeUUID), we guarantee that querying the last 50 messages requires scanning a single contiguous block on disk.


Ready to test these skills?

Practice this exact system design scenario with our AI interviewer and get graded on your architecture choices.

Start Mock Interview