Postgres load balancing is secretly broken: The cancellation problem

On certain workloads a single Postgres server cannot deliver the desired performance. If the traffic is read heavy, you could already resolve this by adding read replicas. If the traffic is write heavy then the upcoming Citus 11 release comes to the rescue. In both cases the queries of your application need to be sent to a randomly chosen Postgres server, to make these servers share the load. A TCP load balancer can do this easily. However, such a load balancer has a hidden downside. When you use it and try to cancel a query, you'll notice that cancelling only works some of the time. This talk will explain why this problem with cancelations occurs. And it shows a few ways to work around it, including changes that I proposed to Postgres and PgBouncer. If any of the following topics sound interesting then this talk is for you: 1. Postgres read replicas 2. Scaling writes with Citus 11 3. Running PgBouncer on multiple CPU cores 4. Implementation details of query cancellations

Jelte Fennema avatar

Jelte Fennema

Senior Engineer @ Microsoft

Currently I'm working on Citus and Postgres at Microsoft. Before that I was a big time Postgres user at Stream, where I worked on low latency APIs for chat and social timelines. I studied at the University of Amsterdam where I got my BSc in Computer Science and MSc in System and Network Engineering.

Tickets available

Uptime brings together developers, architects, data engineers, DevOps professionals, and anyone who wants to learn about open source data tools. Register and get your tickets now!