Discussion
Downtime Caused by the Postgres Transaction ID Wraparound Problem
jffry: tl;dr: autovacuum was seen to be active during an earlier incident, assumed to be at fault, and was disabled. It was never re-enabled. The long-term implications of disabling autovacuum were not actively considered.
throwatdem12311: TL;DR Don’t turn off auto vacuum and periodically tweak your write heavy tables so they are vacuumed regularly enough so this never happens.
rastignack: Just monitor it and you’re done. I’ve delivered and maintained hundreds of pg instances and never faced this issue. There is so much literature about it that at some point no one even slightly skilled will face it.
plasticeagle: AI;DRWhich is why it'sTL;DRBoring shit article about obvious problem.
fmajid: It's not as obvious as you think, GitLab was hit by this a few years ago. But yes, low-quality article and the SQL Server plug is in poor taste.
johnbarron: >> Just monitor it and you’re done.This is just anecdote, colliding with documented database behavior, who is not an issue on Oracle, SQL Server, or IBM DB2.PostgreSQL explicitly documents xid wraparound as a failure mode that can lead to catastrophic data loss and says vacuuming is required to prevent it. Near exhaustion, it will refuse commands.Small sample of known outages:- Sentry — Transaction ID Wraparound in Postgreshttps://blog.sentry.io/transaction-id-wraparound-in-postgres...Mailchimp / Mandrill — What We Learned from the Recent Mandrill Outagehttps://mailchimp.com/what-we-learned-from-the-recent-mandri...Joyent / Manta — Challenges deploying PostgreSQL (9.2) for high availabilityhttps://www.davepacheco.net/blog/2024/challenges-deploying-p...BattleMetrics — March 27, 2022 Postgres Transacton ID Wraparoundhttps://learn.battlemetrics.com/article/64-march-27-2022-pos...Duffel — concurrency control & vacuuming in PostgreSQLhttps://duffel.com/blog/understanding-outage-concurrency-vac...Figma — Postmortem: Service disruption on January 21–22, 2020https://www.figma.com/blog/post-mortem-service-disruption-on...Even AWS updated their recommendation as recently as Feb 2025, and is an issue in Aurora Postgres as well as Postgres."Prevent transaction ID wraparound by using postgres_get_av_diag() for monitoring autovacuum" https://aws.amazon.com/blogs/database/prevent-transaction-id...
ozten: Many SEV-1s are “obvious”. Still feels like a kick in the stomach if your the one that was response LOLz.