Discussion:
convert pdf -> with mixture of pages & spreads
Michael Howard
2011-01-21 20:25:06 UTC
Permalink
My issues relate to convert PDF -> JPG

SUMMARY

Q: Given a .pdf file with a mixture of individual pages and
2-page-spreads, how can I use ImageMagick convert to generate .jpg
files of individual pages?


BACKGROUND

I have an archive (college alumni magazines) of .pdf files that I need
to convert to .jpg page images.

I don't know what tools were used to create these .pdf files. In Adobe
Reader the .pdf properties say PDF Producer:Creo Normalize jTP

The magazines are essentially 8.5x11. I would like to use ImageMagick
convert to generate individual .jpg files of the individual page
images.

The colorspace on the .pdf files is CMYK, which makes me wonder/assume
that these are the actual files which were sent to the print shop for
printing the paper copies of the magazines. I was able to use
-colorspace RGB to correct the colorspace on the output .jpg files.


PROBLEM

Some of the .pdf files contain a mixture of 8.5x11 portrait pages and
11x17 landscape 2-page-spreads. I can confirm this by opening the
files in Adobe Reader.

When I run them through "convert" these 2-page-spreads are getting
compressed horizontally into a single 8.5x11 portrait page.

I assume that convert is seeing the 8.5x11 size on the first page and
is making all subsequent pages fit that size.

The command line that I am using is:

convert -quality 70 -density 300 -colorspace RGB TLN20100801.pdf
300dpi70/tulane_magazine_201008_%04d.jpg


Advice/recommendations from the IM gurus would be greatly appreciated.


Michael
Wolfgang Hugemann
2011-01-28 07:51:39 UTC
Permalink
Post by Michael Howard
When I run them through "convert" these 2-page-spreads are getting
compressed horizontally into a single 8.5x11 portrait page.
I could not verify this behaviour. I have just converted a PDF
containing a mixture of DIN A4 and DIN A3 pages (which is Europe's
equivalent to the US formats you mentioned) and the JPEGs of the larger
DIN A3 pages have exactly double the pixels of those of the smaller DIN
A4 pages.
Post by Michael Howard
I assume that convert is seeing the 8.5x11 size on the first page and
is making all subsequent pages fit that size.
Actually, ImageMagick does see nothing directly, as it uses GhostScript
as the delegate for PDF. I am using GPL GhostSrcipt 8.70.

Possibly, this behaviour has something to do with your PDF input files.
Could place a smaller one somewhere where I can download it?

Greetings from Germany
Wolfgang Hugemann
Michael Howard
2011-02-02 18:44:51 UTC
Permalink
Wolfgang,

Thank you for your response. There must have been some type of mailing
list / email problem; your message is dated Fri 28 Jan but I received
it several days later.
Post by Wolfgang Hugemann
Post by Michael Howard
When I run them through "convert" these 2-page-spreads are getting
compressed horizontally into a single 8.5x11 portrait page.
I could not verify this behaviour. I have just converted a PDF
containing a mixture of DIN A4 and DIN A3 pages (which is Europe's
equivalent to the US formats you mentioned) and the JPEGs of the larger
DIN A3 pages have exactly double the pixels of those of the smaller DIN
A4 pages.
I have made a number of configuration changes to my production system
over the past few days.
Post by Wolfgang Hugemann
Post by Michael Howard
I assume that convert is seeing the 8.5x11 size on the first page and
is making all subsequent pages fit that size.
Actually, ImageMagick does see nothing directly, as it uses GhostScript
as the delegate for PDF. I am using GPL GhostSrcipt 8.70.
Possibly, this behaviour has something to do with your PDF input files.
Could place a smaller one somewhere where I can download it?
At this point I cannot reproduce the problem. I have changed
everything ... shell scripts + imagemagick + ghostscript + os.

Thank you very much for your offers of assistance. I apologize for the
inconvenience.


Michael

Wolfgang Hugemann
2011-01-29 17:58:13 UTC
Permalink
Post by Michael Howard
When I run them through "convert" these 2-page-spreads are getting
compressed horizontally into a single 8.5x11 portrait page.
I could not verify this behaviour. I have just converted a PDF
containing a mixture of DIN A4 and DIN A3 pages (which is Europe's
equivalent to the US formats you mentioned) and the JPEGs of the larger
DIN A3 pages have exactly double the pixels of those of the smaller DIN
A4 pages.
Post by Michael Howard
I assume that convert is seeing the 8.5x11 size on the first page and
is making all subsequent pages fit that size.
Actually, ImageMagick does see nothing directly, as it uses GhostScript
as the delegate for PDF. I am using GPL GhostSrcipt 8.70.

Possibly, this behaviour has something to do with your PDF input files.
Could place a smaller one somewhere where I can download it?

Greetings from Germany
Wolfgang Hugemann
Wolfgang Hugemann
2011-02-01 08:32:05 UTC
Permalink
Post by Michael Howard
When I run them through "convert" these 2-page-spreads are getting
compressed horizontally into a single 8.5x11 portrait page.
I could not verify this behaviour. I have just converted a PDF
containing a mixture of DIN A4 and DIN A3 pages (which is Europe's
equivalent to the US formats you mentioned) and the JPEGs of the larger
DIN A3 pages have exactly double the pixels of those of the smaller DIN
A4 pages.
Post by Michael Howard
I assume that convert is seeing the 8.5x11 size on the first page and
is making all subsequent pages fit that size.
Actually, ImageMagick does see nothing directly, as it uses GhostScript
as the delegate for PDF. I am using GPL GhostSrcipt 8.70.

Possibly, this behaviour has something to do with your PDF input files.
Could place a smaller one somewhere where I can download it?

Greetings from Germany
Wolfgang Hugemann
Tyson Boellstorff
2011-02-02 14:53:31 UTC
Permalink
Post by Michael Howard
My issues relate to convert PDF -> JPG
PROBLEM
Some of the .pdf files contain a mixture of 8.5x11 portrait pages and
11x17 landscape 2-page-spreads. I can confirm this by opening the
files in Adobe Reader.
Advice/recommendations from the IM gurus would be greatly appreciated.
I currently use PerlMagick to automatically convert pdf files that have a wide
mixture of orientations/formats into something that can not make a printer
blow chunks.

What you are dealing with is familiar in part. My goal is to read them,
identify any rotated pages/outsized pages, and rotate/resize/drop colors. (I
have spreadsheets, oversized maps, color photos, you name it in these files,
and the images will all be faxed out, so I don't want high color or anything
other than 8.5x11.)

I do this in several passes -- at first, I read the entire image into an
object, and iterate through each page to cope with each orientation. If that
fails, then I use convert to write each page out as a bmp, and do the same. If
it still fails, I then use ghostscript directly to write each page out as a
bitmap.

Part of the reason this works/fails the way it does is a lack of space in
/tmp, and there's nothing I can do about that. The second iteration uses a
little less /tmp space, and the third writes to a filesystem with plenty of
space -- but this is dog-slow, so it is never my first choice. (some of these
pdf files are over 1,000 pages, and at 8Mb/page for 300x300, can overrun my
filesystem -- I can run df repeatedly from another terminal and watch space
fill up until it hits 100% and then the fun stops). Another reason has to do
with how different versions of ghostscript handle things. That's not Image
Magick's problem.

I suggest that rather than go directly to jpeg, you try writing to bitmap and
convert them to jpeg, but if that fails, try a ghostscript command like:

gs -q -dNOPAUSE -dBATCH -sDEVICE=bmp256 -r200x200 -sOutputFile=foo-%d.bmp
bar.pdf

and look at the bitmaps to see what you have.

ymmv.

Vary the device and resolution to suit.

Try the jpeg device if you want, but if you can't get raw bitmaps, then you're
wasting your time trying to get jpegs.

Also, does anybody know how to point my Image::Magick object to a different
filesystem for temp file usage?
Continue reading on narkive:
Loading...