Dr Milan Milanović (@techworldwithmilan): "𝗛𝗼𝘄 𝗢𝗽𝗲𝗻𝗔𝗜 𝘀𝗰𝗮𝗹𝗲𝗱 𝗣𝗼𝘀𝘁𝗴𝗿𝗲𝗦𝗤𝗟 𝘁𝗼 𝟴𝟬𝟬 𝗺𝗶𝗹𝗹𝗶𝗼𝗻 𝘂𝘀𝗲𝗿𝘀 𝘄𝗶𝘁𝗵 𝗮 𝘀𝗶𝗻𝗴𝗹𝗲 𝗶𝗻𝘀𝘁𝗮𝗻𝗰𝗲? If you work with databases at scale, this might surprise you. OpenAI serves 800 million active ChatGPT users from a single PostgreSQL primary i…"

The app for independent voices

𝗛𝗼𝘄 𝗢𝗽𝗲𝗻𝗔𝗜 𝘀𝗰𝗮𝗹𝗲𝗱 𝗣𝗼𝘀𝘁𝗴𝗿𝗲𝗦𝗤𝗟 𝘁𝗼 𝟴𝟬𝟬 𝗺𝗶𝗹𝗹𝗶𝗼𝗻 𝘂𝘀𝗲𝗿𝘀 𝘄𝗶𝘁𝗵 𝗮 𝘀𝗶𝗻𝗴𝗹𝗲 𝗶𝗻𝘀𝘁𝗮𝗻𝗰𝗲?

If you work with databases at scale, this might surprise you. OpenAI serves 800 million active ChatGPT users from a single PostgreSQL primary instance. No sharding, no exotic database technology. Just one writer and about 50 read replicas handle over a million queries per second.

Here is how they did it and what problems they had.

𝗧𝗵𝗲 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲

OpenAI uses Azure Database for PostgreSQL with a simple setup: one primary instance handles all writes, while nearly 50 read replicas handle read traffic globally. PgBouncer manages connection pooling. The response times improved from 50ms to under 5ms.

So the basic principle was to keep the primary at all costs.

𝟭. 𝗥𝗲𝗮𝗱 𝗧𝗿𝗮𝗳𝗳𝗶𝗰. They offload read traffic to replicas whenever possible. To prevent high-priority requests from being interfered with by low-priority queries, they created dedicated replicas for high-priority requests. Some reads must remain on the primary since they are part of write transactions.

𝟮. 𝗪𝗿𝗶𝘁𝗲 𝗧𝗿𝗮𝗳𝗳𝗶𝗰. For write traffic, shardable write-heavy workloads were moved to a sharded system, like Azure CosmosDB. Application optimization code did not write when it did not have to write.

Creating new tables in PostgreSQL is not permitted, and new workloads should be put in a sharded system.

𝗧𝗵𝗲 𝗣𝗼𝘀𝘁𝗴𝗿𝗲𝗦𝗤𝗟 𝗠𝗩𝗖𝗖 𝗣𝗿𝗼𝗯𝗹𝗲𝗺

Andy Pavlo from CMU calls PostgreSQL's MVCC implementation "the worst among major relational databases." Here is why it creates challenges at scale:

𝟯. 𝗪𝗿𝗶𝘁𝗲 𝗔𝗺𝗽𝗹𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻. When you update one column, PostgreSQL copies the entire row. Update a single field in a table with 1000 columns, and you create a new version with 999 untouched columns. MySQL and Oracle store compact deltas instead.

𝟰. 𝗜𝗻𝗱𝗲𝘅 𝗕𝗹𝗼𝗮𝘁. Every row version can create index entries. Five versions of a single row could mean five index entries. This bloats indexes, consumes memory, and slows queries.

𝟱. 𝗗𝗲𝗮𝗱 𝗧𝘂𝗽𝗹𝗲𝘀. Old row versions hang around indefinitely until removed by autovacuum. When writes are heavy, dead rows outpace cleaning of live rows. A table that contains 10 million live rows might therefore contain 40 million dead rows.

𝟲. 𝗔𝘂𝘁𝗼𝘃𝗮𝗰𝘂𝘂𝗺 𝗕𝗹𝗼𝗰𝗸𝘀. Long-running transactions block autovacuum. More dead tuples lead to stale statistics, slower queries, and longer-running transactions. A vicious cycle.

𝗛𝗼𝘄 𝗢𝗽𝗲𝗻𝗔𝗜 𝗦𝗼𝗹𝘃𝗲𝗱 𝗧𝗵𝗲𝘀𝗲

To address this overhead issue, they introduced PgBouncer for connection pooling. This resolved the issue in an impressive manner. They also evaluated the queries generated by ORMs for executing queries on the database. Any complicated query generated by the ORMs had to be converted into raw queries.

In addition, they configured idle_in_transaction_session_timeout to prevent connections from blocking autovacuum.

For schema changes, they impose strict constraints:

- No tables are added

- Column addition has to be performed within 5 seconds

- No table rewriting is allowed

- Indexes have to be changed CONCURRENTLY

𝗧𝗵𝗲 𝗧𝗿𝗮𝗱𝗲-𝗼𝗳𝗳

OpenAI's PostgreSQL cannot grow organically anymore. Any new functionality that requires additional database tables must use Cosmos DB. Even sharding their current implementation will take months or years, and will require changes to hundreds of their application endpoints.

Read more about it in the text in the comments.

Image: OpenAI.

The app for independent voices

Log in or sign up