Lunchtime Semaphore

from and to 372433 143758 48 N

Pdf diff ?

May 13th, 2008 by

Situation

Your exchanging pdf files with an editor for a journal publication. Each time you send a list of modifications to be applied and the modified pdf file comes back to you, usually with some additional modifications that you didn’t ask for. How to spot these change easily and to make sure they are relevant?

Problem

The solution that come to mind is to use a pdf diff tool, similar to what you usually use for text files or source code: diff, diff3, kdiff3, meld are such examples. However, a quick search on the internet couldn’t reveal any tool fulfilling the conditions (being free and open source are mandatory conditions).

Solution

The easy trick is to first convert these pdf to text file and then use the usual text comparison programs:

pdftotext file1.pdf
pdftotext file2.pdf
kdiff3 file1.txt file2.txt

and that’s it, you are now able to see what has been modified between the two files. Don’t count on that to be able to make a merge though!

This entry was posted on Tuesday, May 13th, 2008 at 01:56 UTC and is filed under Linux. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Loading Facebook Comments ...

6 responses about “Pdf diff ?”

  1. Mathieu Malaterre said:

    Or even better in your .vimrc:

    autocmd BufReadPre *.pdf set ro
    autocmd BufReadPre *.pdf set hlsearch!
    autocmd BufReadPost *.pdf silent %!pdftotext -layout -nopgbrk “%” – |fmt -csw78

    It saves you from the two extra steps (manual pdftotext).

  2. Melaneum said:

    Thanks that’s a nice one for the VI aficionados!
    For those who want to try this, just be careful when copying the “%”.
    Next, you can try calling
    vimdiff file1.pdf file2.pdf

  3. Jon Grant said:

    Hi, can you change your text?
    this is barely readable:

    pdftotext file1.pdf
    pdftotext file2.pdf
    kdiff3 file1.txt file2.txt

    How about 12pt?

  4. Whatnick said:

    Well apparently binary diff of pdf files generated from .odt can differ. Do you have any ideas for this – http://stackoverflow.com/questions/2903774/reliable-and-fast-way-to-convert-a-zillion-odt-files-in-pdf ?

  5. Melaneum said:

    Use the google doc API, upload all the odt and download all the pdf: that’s processing on the cloud ;-)
    Never tried, but seems possible!

  6. Ed said:

    Even better solution: use pdiff (for mac) to see all diffs highlighted in the pdfs. Much easier to track all changes in the actual layout than in plain text output of pdftotext. Saved me a lot of trouble! Ed

Leave a Reply