Recovering Firefox PDF cache with Bash

A while ago, I was viewing an online PDF file in my browser of choice, Firefox. I closed the tab, thinking that I’m done with it, but to find out that I need to refer to it again. Alas, the PDF has been deleted from the server!


What was nice was that Firefox keeps a cached copy of the PDF. By navigating to about:cache, you’ll be able to view the hex dump of your PDF. In the image below, I’m using Apple’s iOS Security Guide PDF.

Press ‘List Cache Entries’ to get started
Sample PDF document cached by Firefox

You’d want to save the HTML page to your computer by hitting Ctrl+S  or otherwise. Its default name is likely to be Cache entry information.html.

Hex Dump?

Remember that underneath the hood, your PDF file is a string of 0’s and 1’s. Hexademicals (‘hex’ for short) is another more compact and readable way to represent the data. In the diagram above, the left-most column represents line number. The middle pairs of values are the data that makes up the PDF we are recovering. The right-most column represents the same data in ASCII, which is much more human-readable.

Getting the Hex Dump

The HTML file also shows a lot of cache information, which we don’t need for reconstructing the PDF. They are removed in a text editor, and the resulting data is saved as a text file PDF_dump.txt.

Remove the top and bottom HTML tags.

Reversing the Hex Dump

Bash has a very useful command for making and reversing hex dump, called xxd. Unfortunately, the hex dump format as prepared by Firefox comes with a lot more spaces than as expected by xxd.

We are going to remove the spaces and reverse the dump with Bash! I don’t have a lot of experiences with Bash, but I managed to botch together something that works. There is likely to be a more elegant approach, but this is the bash script I came up with:

In line 6 above, I made a copy of the PDF_dump.txt file. If you compare Firefox’s and xxd’s hex dump, you’ll realise the former can be converted to the latter by repeatedly removing 1 and 2 spaces. To be precise, it is going to be repeated 8 times.

The command  sed 's/ / /1' is the part that is responsible for the manipulations. The parameter s means substitution, and the expected syntax is as follows:  s/regular expression/replacement/flags. A number can be set as a flag  to signify the number of occurrences. In our case, our flag is 1, which means that only the first occurrence of doubles spaces in every line get acted upon.

Finally, the last line of code recovers the PDF.

Recovered PDF

Let’s verify the recovered PDF!

Now you can’t really do this if the recovered PDF is all you have, but since I have access to the original document, let’s do a check to make sure that the original and our recovered PDF are indeed identical!


Firefox keeps a cached copy of your PDF as hex dump, which can be easily recovered. While there may be easier ways to recover the PDF, one way to do so is with Bash. A script is written in Bash to remove extra spaces, giving a format that  xxd expects. Finally, the recovered PDF is verified to be identical with the original PDF.

Any thoughts or comments, or ideas that can recover PDF more easily, please be sure to share in the comment area below!

Bypassing primitive PDF DRM

When I downloaded a PDF off my university online library, I quickly noticed that it comes with some form of DRM. The file is a direct download, and not through Adobe Digital Editions (a common platform for sharing copyrighted contents). I wanted to see how it works and what it takes to break the protection.

Continue reading Bypassing primitive PDF DRM

ITB Kendo Club online system

As some may have known, I am currently the president of Kendo Club in my university. There is actually a lot more work and less glamour than it sounds.

Our student council MPP requests an updated details of our club members quarterly, and it is a pain to get my members to fill in the forms. As university students, we are more often than not busy with coursework and revisions, and thus co-curriculum activities are viewed as stress relievers that get second-rated attention — and that’s the way it should be. However, unfortunately that also means my members might not be able to fill in the forms before the deadline.

I’m fed up. As a Computer Science student, I know what to do. Continue reading ITB Kendo Club online system