4 Step 2: process

Once you have one PNG file per page in the original scanned PDF document, you can now use ImageMagick tools to process the individual images. To make the processed PDF file easy to load and small, the following is a starting point:

mogrify -density 300x300 -threshold 70% -monochrome single-001.png  
  

Let’s take a look at this command:

-density 300x300 injects the correct DPI resolution into the file. PNG is one of the few image formats that can store DPI information. This one should match the scan resolution.

-threshold 70% turns all pixels that is 70% from black to white pixels. The lower this number, the more white pixels. You have to experiment with a few “question” pages to find the right setting. When you are experimenting, use convert (and specify an output file name) instead of mogrify because the latter overwrites the original with the converted version!

I suggest that you use a quick previewing tool to scan through the pages and find the ones that have the least contrast. Test the “threshold” setting on these low contrast pages to make sure the result is satisfactory.

-monochrome specifies that the output be black-and-white only. This significantly saves space!

simple-001.png refers to a single page scanned.

When you are satisfied with the parameters, you can run mogrify with a file name pattern such as single-*.png.