Rohit's Realm

// rohitsrealm.com / archive / 2017 / 06 / 03 / learning-the-hard-way-postgres-recovery-from-the-file-system

June 03, 2017

Learning the Hard Way: Postgres Recovery from the File System

There are some lessons in life that are perhaps better learned the hard way. That you should have a proper backup system for your data is not one of them. And it's especially not a lesson to learn the hard way when you have been building and working with computers for nearly two decades. But so it goes.

Earlier this week, I had a hard disk fail in one of my servers that doubles as my main workstation. It wasn't the first time I have had a disk fail (or even the first time one has failed in this particular machine), but it was by far the most damaging instance. For reasons that now escape me, I never bought a second hard drive to setup a RAID 1 mirror when I upgraded the system a couple years back (which I normally always do for this system), nor did I ever get my regular backup system running after the upgrade. Even more inexplicably, during that same time period, I went from using that machine solely for development to using it host my definitive photo database (described here).

So you can imagine how I felt when the computer refused to boot after an update. After several consecutive nights of staying up into the wee hours, I can say that I've almost brought things back to normal. No part of the process was pleasant, but by far the worst of it was trying to recover the Postgres database cluster from the file system because my database backups generated by pg_dump were too old.

Given the significant complexity involved with this chore, I thought I would document it here for posterity. Read on for more detail (if you must).

Background

The machine in question was running Postgres 9.4 on FreeBSD. The disk failure I mentioned above resulted in several sectors across multiple file systems becoming unreadable. After some effort, I was able to boot into the system with an old kernel, and then using rsync, transfer much of the data to a new drive I had purchased and installed. Some files and directories couldn't be transferred due to bad blocks and sectors on the disk, however, which is where I first saw that my Postgres database was going to be affected.

While most of the data directory for the cluster was salvaged, notably the directory data/global was all unreadable. To make matters worse, running fsck on the damaged file system resulted in a bunch of files in the lost found directory without file names. After looking at a clean install of the Postgres data directory and realizing that almost all the file names in the global directory are arbitrary numbers (I think mapping to the Postgres OID of the system object), I almost gave up. Trying to map the unnamed binary files to arbitrary numbers in the global directory seemed like an intractable problem.

After studying the Postgres internals, however, I realized that my databases were probably intact because they are stored in the base folder; if I could get enough of the global directory restored, perhaps I could get the server to run for long enough to dump the data.

pg_filedump to the Rescue

After several hours of research, I came across this article that was on point (albeit for an older version of Postgres) and discovered the pg_filedump utility, which prints out Postgres binary files in a (somewhat) human-readable format. Unfortunately, the port for this utility on FreeBSD had fallen into disrepair, so I had to compile the program myself.

That ended up being a chore in and of itself, but luckily, I (still) have enough C knowledge to debug compiler errors. This is what I had to do to the Makefile to make it compile on FreeBSD:

CC=clang /* Changed from gcc */
CFLAGS=-g -O -Wall -Wmissing-prototypes -Wmissing-declarations -I/usr/local/include /* Need to add this to pull in libintl.h */

...

PGSQL_INCLUDE_DIR=/usr/local/include/postgresql/server

With pg_filedump installed and working, I went to work on the files in the lost found directory.

Mapping Lost and Found Files to the data/global Directory

The next step—i.e. mapping Postgres system catalog files to the data/global directory was painstakingly manual, and took several hours. The iteration consisted of:

First, try to start a Postgres server in single mode on the data directory.

pgsql@autocrat % pwd
/usr/local/pgsql/data
pgsql@autocrat % postgres --single -P -D '.' postgres

That would yield some sort of missing file error, such as:

ERROR: could not open file "global/12447": No such file or directory

Second, run pg_filedump on a fresh install of the data directory (I used initdb to create one so I could see what the global directory had in it):

root@autocrat % ./pg_filedump -f -i ../fresh-install/data/global/12447

Third, if the missing file is blank in the fresh install, then create a blank file:

touch global/12340

Fourth, if the missing file wasn't blank, cycle through the files in the lost found directory and identify any that look similar based on the dump. A few things that were helpful in this process: (a) checking the number of Attributes generated by the pg_filedump against the description of the Postgres systems catalog; and (b) checking file sizes of the corresponding files in the fresh install. For instance, I think 12447 probably maps to the pg_database system catalog, as the dump had database names in clear text and 13 attributes (same as pg_database sans the hidden OID attribute).

root@autocrat % ./pg_filedump -f -i /mnt/lost found/#08832096

Fifth, once you have found the file that matches, copy the file from the lost found directory to the data/global directory, and go back to the first step.

root@autocrat % cp /mnt/lost found/#08832096 global/12447

After iterating through that process for a number of times, the Postgres server finally started.

Rebuild the Index

After getting the server running, I then logged into each database I wanted to recover (and the postgres, template0, and template1 system databases), and rebuilt the indices:

pgsql@autocrat % postgres --single -P -D '.' riot
PostgreSQL stand-alone backend 9.4.12
backend> reindex system riot

Dump the Database (and Rejoice!)

Once I had rebuilt the system indices, I was then able to dump most of the databases in the cluster using pg_dump normally.

pgsql@autocrat % postgres -P -D '.'
pgsql@autocrat % pg_dump -v --format=c --file=riot-2017-06-03 riot

With two of the databases in the cluster, I got errors while dumping that I wasn't able to resolve (corrupted pg_toast index), but luckily those were development databases.

Lesson(s) Learned

Don't be an idiot like me. Regular, automated backups are important for the whole system. But especially so for Postgres databases.

I frankly got really lucky that this disk corruption affected the system tables (that were (relatively) easy to identify by comparing against a fresh install) and not the databases themselves. The next failure probably won't be so fortunate, and I am not planning to find out.

Also, would it hurt Postgres to use names instead of numbers to identify system files on the file system? Or at least embed the file number in the file header so it can be seen with a utility like pg_filedump? What a clusterfuck.

Comments

Damn. lucky duck. also, still continually impressed that a lawyer did this - even though I know your background :)

I almost lost a few years of photos in college when a disk died, and since then I've gone full paranoid and have them (and other data) backed up to no fewer than 4 places (computer, external disk, cloud, off-site disk at parent's house).

You seem to have gotten luckier than GitLab: http://checkyourbackups.work

Add Comment


 


 


 


 


* required field

E-mail addresses will never be displayed. The following HTML tags are allowed:
a abbr acronym address big blockquote br cite del em li ol p pre q small strong sub sup ul