Difference between revisions of "PDF"
Line 33: | Line 33: | ||
== Extract images from PDF files == |
== Extract images from PDF files == |
||
=== Recipe 1 === |
|||
Platforms: Linux, MacOS X [not confirmed]<br /> |
Platforms: Linux, MacOS X [not confirmed]<br /> |
||
Requirments: [http://poppler.freedesktop.org/ Poppler] |
Requirments: [http://poppler.freedesktop.org/ Poppler] |
||
Line 55: | Line 56: | ||
INPUT-999.jpg |
INPUT-999.jpg |
||
=== Recipe 2 === |
|||
Platforms: Linux, Unix, MacOS X, Windows<br /> |
|||
Requires: [http://imagemagick.org ImageMagick], [http://www.ghostscript.com/ Ghostscript] |
|||
ImageMagick offers some hand here as well and is particularly handy when you want to convert to some format that the <tt>pdfimages</tt> command does not offer. Always use the <tt>-density</tt> option or ImageMagick will downsample the images. When converting to JPEGs set the <tt>-quality</tt> option to keep get the picture quality you desire. |
|||
In the below examples we set the dpi to 300 |
|||
convert -density 300x300 MULTIPAGE.pdf OUPUT-%d.png |
|||
or |
|||
convert -density 300x300 -quality 100 MULTIPAGE.pdf OUPUT-%d.jpg |
|||
== References == |
== References == |
Revision as of 08:48, 5 May 2012
Reduce the size of a PDF file consisting of scanned images
Platforms: Linux, Unix, MacOS X, Windows
Requires: ImageMagick, Ghostscript
It is wise to scan documents at the highest resolution possible as downsampling can be done at any point. The fastest way I found so far the tools of the ImageMagick suite or with GhostScript.
Downsampling to 150 dpi without changing the type of image, aka -compress is as below.
convert -density 150 INPUT.PDF OUTPUT.pdf
Downsampling a PDF with images scanned at a high resolution to 150dpi converting the internal image to JPEG at a ratio of 80%. Useful for sending by mail.
convert -density 150 -compression jpeg -quality 80 INPUT.pdf OUTPUT.pdf
Convert images to PDF documents
Platforms: Linux, Unix, MacOS X, Windows
Requires: ImageMagick, Ghostscript
PDFs from scans are a very common occurence these days. Depending on the purpose conversion is sometimes required. It helps to understand how PDFs store the raster data internally to make to choose best option for the task at hand.
An overview of the can be found at wikipedia In short the below examples produce either an embedded JPEG or TIFF.
Assuming you have a few images laying around that need to be converted to a PDF file.
convert -repage <format> -compress <algorithm> [-quality <quality in %>] INPUT.tif OUTPUT.pdf
Creating a A4 PDF with lossy JPEG compression at a compression ration of 80%. A higher number under quality yields a clearer image but requires more disk space. Note: the -quality option is optional but if you want to retain full control over the outcome I would suggest you use it.
convert -repage a4 -compress jpeg -quality 80 INPUT01.tif INPUT02.tif INPUT03.tif OUTPUT.pdf
For lossless PDFs in size A4 using the the TIFF format for storage there are two options either LZW compression or ZIP. ZIP seems to be a bit more efficient. Note the -quality field is not required.
convert -repage a4 -compress lzw INPUT01.tif INPUT02.tif INPUT03.tif OUTPUT.pdf
or
convert -repage a4 -compress zip INPUT01.tif INPUT02.tif INPUT03.tif OUTPUT.pdf
Extract images from PDF files
Recipe 1
Platforms: Linux, MacOS X [not confirmed]
Requirments: Poppler
Getting all the images out of a PDF file can be quite a task. The Poppler library comes with some handy tools that can be of tremendous help. To extract images from pdf pdfimages is easy to use.
pdfimages INPUT.pdf INPUT
will result in files with the names
INPUT-000.ppm INPUT-001.ppm . . . INPUT-999.ppm
A better way is to preserve embedded JPEG images as such. Assuming a PDF with only embedded JPEGs we use the -j option.
pdfimages -j INPUT.pdf INPUT
will yield the following files
INPUT-000.jpg INPUT-001.jpg . . . INPUT-999.jpg
Recipe 2
Platforms: Linux, Unix, MacOS X, Windows
Requires: ImageMagick, Ghostscript
ImageMagick offers some hand here as well and is particularly handy when you want to convert to some format that the pdfimages command does not offer. Always use the -density option or ImageMagick will downsample the images. When converting to JPEGs set the -quality option to keep get the picture quality you desire.
In the below examples we set the dpi to 300
convert -density 300x300 MULTIPAGE.pdf OUPUT-%d.png
or
convert -density 300x300 -quality 100 MULTIPAGE.pdf OUPUT-%d.jpg