June 03, 2017
Learning the Hard Way: Postgres Recovery from the File System
There are some lessons in life that are perhaps better learned the hard way. That you should have a proper backup system for your data is not one of them. And it's especially not a lesson to learn the hard way when you have been building and working with computers for nearly two decades. But so it goes.
Earlier this week, I had a hard disk fail in one of my servers that doubles as my main workstation. It wasn't the first time I have had a disk fail (or even the first time one has failed in this particular machine), but it was by far the most damaging instance. For reasons that now escape me, I never bought a second hard drive to setup a RAID 1 mirror when I upgraded the system a couple years back (which I normally always do for this system), nor did I ever get my regular backup system running after the upgrade. Even more inexplicably, during that same time period, I went from using that machine solely for development to using it host my definitive photo database (described here).
So you can imagine how I felt when the computer refused to boot after an update. After several consecutive nights of staying up into the wee hours, I can say that I've almost brought things back to normal. No part of the process was pleasant, but by far the worst of it was trying to recover the Postgres database cluster from the file system because my database backups generated by pg_dump
were too old.
Given the significant complexity involved with this chore, I thought I would document it here for posterity. Read on for more detail (if you must).
Background
The machine in question was running Postgres 9.4 on FreeBSD. The disk failure I mentioned above resulted in several sectors across multiple file systems becoming unreadable. After some effort, I was able to boot into the system with an old kernel, and then using rsync
, transfer much of the data to a new drive I had purchased and installed. Some files and directories couldn't be transferred due to bad blocks and sectors on the disk, however, which is where I first saw that my Postgres database was going to be affected.
While most of the data directory for the cluster was salvaged, notably the directory data/global
was all unreadable. To make matters worse, running fsck
on the damaged file system resulted in a bunch of files in the lost found
directory without file names. After looking at a clean install of the Postgres data directory and realizing that almost all the file names in the global
directory are arbitrary numbers (I think mapping to the Postgres OID of the system object), I almost gave up. Trying to map the unnamed binary files to arbitrary numbers in the global
directory seemed like an intractable problem.
After studying the Postgres internals, however, I realized that my databases were probably intact because they are stored in the base
folder; if I could get enough of the global
directory restored, perhaps I could get the server to run for long enough to dump the data.
pg_filedump
to the Rescue
After several hours of research, I came across this article that was on point (albeit for an older version of Postgres) and discovered the pg_filedump
utility, which prints out Postgres binary files in a (somewhat) human-readable format. Unfortunately, the port for this utility on FreeBSD had fallen into disrepair, so I had to compile the program myself.
That ended up being a chore in and of itself, but luckily, I (still) have enough C knowledge to debug compiler errors. This is what I had to do to the Makefile to make it compile on FreeBSD:
CC=clang /* Changed from gcc */
CFLAGS=-g -O -Wall -Wmissing-prototypes -Wmissing-declarations -I/usr/local/include /* Need to add this to pull in libintl.h */
...
PGSQL_INCLUDE_DIR=/usr/local/include/postgresql/server
With pg_filedump
installed and working, I went to work on the files in the lost found
directory.
Mapping Lost and Found Files to the data/global
Directory
The next step—i.e. mapping Postgres system catalog files to the data/global
directory was painstakingly manual, and took several hours. The iteration consisted of:
First, try to start a Postgres server in single mode on the data directory.
pgsql@autocrat % pwd
/usr/local/pgsql/data
pgsql@autocrat % postgres --single -P -D '.' postgres
That would yield some sort of missing file error, such as:
ERROR: could not open file "global/12447": No such file or directory
Second, run pg_filedump
on a fresh install of the data directory (I used initdb
to create one so I could see what the global
directory had in it):
root@autocrat % ./pg_filedump -f -i ../fresh-install/data/global/12447
Third, if the missing file is blank in the fresh install, then create a blank file:
touch global/12340
Fourth, if the missing file wasn't blank, cycle through the files in the lost found
directory and identify any that look similar based on the dump. A few things that were helpful in this process: (a) checking the number of Attributes
generated by the pg_filedump
against the description of the Postgres systems catalog; and (b) checking file sizes of the corresponding files in the fresh install. For instance, I think 12447 probably maps to the pg_database
system catalog, as the dump had database names in clear text and 13 attributes (same as pg_database
sans the hidden OID attribute).
root@autocrat % ./pg_filedump -f -i /mnt/lost found/#08832096
Fifth, once you have found the file that matches, copy the file from the lost found
directory to the data/global
directory, and go back to the first step.
root@autocrat % cp /mnt/lost found/#08832096 global/12447
After iterating through that process for a number of times, the Postgres server finally started.
Rebuild the Index
After getting the server running, I then logged into each database I wanted to recover (and the postgres
, template0
, and template1
system databases), and rebuilt the indices:
pgsql@autocrat % postgres --single -P -D '.' riot
PostgreSQL stand-alone backend 9.4.12
backend> reindex system riot
Dump the Database (and Rejoice!)
Once I had rebuilt the system indices, I was then able to dump most of the databases in the cluster using pg_dump
normally.
pgsql@autocrat % postgres -P -D '.'
pgsql@autocrat % pg_dump -v --format=c --file=riot-2017-06-03 riot
With two of the databases in the cluster, I got errors while dumping that I wasn't able to resolve (corrupted pg_toast
index), but luckily those were development databases.
Lesson(s) Learned
Don't be an idiot like me. Regular, automated backups are important for the whole system. But especially so for Postgres databases.
I frankly got really lucky that this disk corruption affected the system tables (that were (relatively) easy to identify by comparing against a fresh install) and not the databases themselves. The next failure probably won't be so fortunate, and I am not planning to find out.
Also, would it hurt Postgres to use names instead of numbers to identify system files on the file system? Or at least embed the file number in the file header so it can be seen with a utility like pg_filedump
? What a clusterfuck.
Damn. lucky duck. also, still continually impressed that a lawyer did this - even though I know your background :)
I almost lost a few years of photos in college when a disk died, and since then I've gone full paranoid and have them (and other data) backed up to no fewer than 4 places (computer, external disk, cloud, off-site disk at parent's house).
You seem to have gotten luckier than GitLab: http://checkyourbackups.work
Posted by Streeter | June 07, 2017 06:51:59 -0700 | Permalink