Recovering Firefox PDF cache with Bash

A while ago, I was viewing an online PDF file in my browser of choice, Firefox. I closed the tab, thinking that I’m done with it, but to find out that I need to refer to it again. Alas, the PDF has been deleted from the server!


What was nice was that Firefox keeps a cached copy of the PDF. By navigating to about:cache, you’ll be able to view the hex dump of your PDF. In the image below, I’m using Apple’s iOS Security Guide PDF.

Press ‘List Cache Entries’ to get started
Sample PDF document cached by Firefox

You’d want to save the HTML page to your computer by hitting Ctrl+S  or otherwise. Its default name is likely to be Cache entry information.html.

Hex Dump?

Remember that underneath the hood, your PDF file is a string of 0’s and 1’s. Hexademicals (‘hex’ for short) is another more compact and readable way to represent the data. In the diagram above, the left-most column represents line number. The middle pairs of values are the data that makes up the PDF we are recovering. The right-most column represents the same data in ASCII, which is much more human-readable.

Getting the Hex Dump

The HTML file also shows a lot of cache information, which we don’t need for reconstructing the PDF. They are removed in a text editor, and the resulting data is saved as a text file PDF_dump.txt.

Remove the top and bottom HTML tags.

Reversing the Hex Dump

Bash has a very useful command for making and reversing hex dump, called xxd. Unfortunately, the hex dump format as prepared by Firefox comes with a lot more spaces than as expected by xxd.

We are going to remove the spaces and reverse the dump with Bash! I don’t have a lot of experiences with Bash, but I managed to botch together something that works. There is likely to be a more elegant approach, but this is the bash script I came up with:

In line 6 above, I made a copy of the PDF_dump.txt file. If you compare Firefox’s and xxd’s hex dump, you’ll realise the former can be converted to the latter by repeatedly removing 1 and 2 spaces. To be precise, it is going to be repeated 8 times.

The command  sed 's/ / /1' is the part that is responsible for the manipulations. The parameter s means substitution, and the expected syntax is as follows:  s/regular expression/replacement/flags. A number can be set as a flag  to signify the number of occurrences. In our case, our flag is 1, which means that only the first occurrence of doubles spaces in every line get acted upon.

Finally, the last line of code recovers the PDF.

Recovered PDF

Let’s verify the recovered PDF!

Now you can’t really do this if the recovered PDF is all you have, but since I have access to the original document, let’s do a check to make sure that the original and our recovered PDF are indeed identical!


Firefox keeps a cached copy of your PDF as hex dump, which can be easily recovered. While there may be easier ways to recover the PDF, one way to do so is with Bash. A script is written in Bash to remove extra spaces, giving a format that  xxd expects. Finally, the recovered PDF is verified to be identical with the original PDF.

Any thoughts or comments, or ideas that can recover PDF more easily, please be sure to share in the comment area below!

Leave a Reply

Your email address will not be published. Required fields are marked *