How to automatically figuring out the optimal resolution for a pdf file?

Discussion:

Peng Yu

2011-04-21 14:03:39 UTC

Hi,

I'm trying to converting a pdf without explicitly specify the density
as Acrobat can automatically figure out the best resolution. But it
seems that convert can not do so correctly, it use a far too low
resolution. I'm wondering if there is a way to automatically determine
the best resolution. (Let is assume the pdf was created by converting
from a tif file. the optimal resolution is just the resolution of the
original tif file.)

convert image.pdf +adjoin image_%02d.tif

--
Regards,
Peng

Anthony Thyssen

2011-04-23 07:33:36 UTC

Permalink

And how would acrobat do that! That is the question!

PDF's (or postscript) are not supposed to have a 'ideal density' or
resolution. But if they contain raster images, those raster images may
have a ideal resoltuion. However a PFD could easily have multiple
rasters each with different ideal resolution.

So how any program can pick an ideal resolution is beyond me!

Anthony Thyssen ( System Programmer ) <***@griffith.edu.au>
--------------------------------------------------------------------------
To think is human, to compute, divine.
--------------------------------------------------------------------------
Anthony's Castle http://www.ict.griffith.edu.au/anthony/

On Thu, 21 Apr 2011 09:03:39 -0500
magick-users-***@imagemagick.org wrote:

Hi,

I'm trying to converting a pdf without explicitly specify the density
as Acrobat can automatically figure out the best resolution. But it
seems that convert can not do so correctly, it use a far too low
resolution. I'm wondering if there is a way to automatically determine
the best resolution. (Let is assume the pdf was created by converting
from a tif file. the optimal resolution is just the resolution of the
original tif file.)

convert image.pdf +adjoin image_%02d.tif

Peng Yu

2011-04-25 01:28:37 UTC

Permalink

On Sat, Apr 23, 2011 at 2:33 AM, Anthony Thyssen

I think that you may not need to think about very complex cases. The
simplest case is that each pdf page only have one image (as it is
originally from a tif file). The best resolution for any page is just
the resolution of the original image. Is this very difficult to be
added in convert?

--
Regards,
Peng

Anthony Thyssen

2011-04-25 02:52:17 UTC

Permalink

On Sun, 24 Apr 2011 20:28:37 -0500
Peng Yu <***@gmail.com> wrote:
| On Sat, Apr 23, 2011 at 2:33 AM, Anthony Thyssen
| <***@griffith.edu.au> wrote:
| >
| > And how would acrobat do that! That is the question!
| >
| > PDF's (or postscript) are not supposed to have a 'ideal density' or
| > resolution. But if they contain raster images, those raster images may
| > have a ideal resoltuion. However a PFD could easily have multiple
| > rasters each with different ideal resolution.
| >
| > So how any program can pick an ideal resolution is beyond me!
|
| I think that you may not need to think about very complex cases. The
| simplest case is that each pdf page only have one image (as it is
| originally from a tif file). The best resolution for any page is just
| the resolution of the original image. Is this very difficult to be
| added in convert?
|

Actually convert (AKA ImageMagick) does not deal with PDF files.
IT passes it to ghostscript. However the original problem of how
to determine the 'ideal resolution' remains.

Anthony Thyssen ( System Programmer ) <***@griffith.edu.au>
--------------------------------------------------------------------------
"You must realize that the computer has it in for you. The irrefutable
proof of this is that the computer always does what you tell it to do."
--------------------------------------------------------------------------
Anthony's Castle http://www.ict.griffith.edu.au/anthony/

Wolfgang Hugemann

2011-04-25 18:54:16 UTC

Permalink

Theoretically, a PDF might contain raster images of various resolution.
Practically speaking, however, this will seldomly be the case. In the
easiest case, a PDF will just binds(scanned) TIFFs of one and the same
resolution. If the PDF is produced by a printer driver or distilled fom
a PostScript file, one will generally define fixed resolutions for
raster images and black and white images.

So I guess there generally *is* something like an ideal resultion for an
PDF when being converted to a raster image. But this is more of a
Ghostscript problem than a problem of ImageMagick.

If your PDF just binds raster images, you should give pdfimages from
Xpdf a try, which losslessly extracts them from the PDF, extracting the
accordings streams.

Wolfgang Hugemann

Anthony Thyssen

2011-04-26 23:28:37 UTC

Permalink

On Mon, 25 Apr 2011 20:54:16 +0200
magick-users-***@imagemagick.org wrote:
| Theoretically, a PDF might contain raster images of various resolution.
| Practically speaking, however, this will seldomly be the case. In the
| easiest case, a PDF will just binds(scanned) TIFFs of one and the same
| resolution. If the PDF is produced by a printer driver or distilled fom
| a PostScript file, one will generally define fixed resolutions for
| raster images and black and white images.
|
| So I guess there generally *is* something like an ideal resultion for an
| PDF when being converted to a raster image. But this is more of a
| Ghostscript problem than a problem of ImageMagick.
|
| If your PDF just binds raster images, you should give pdfimages from
| Xpdf a try, which losslessly extracts them from the PDF, extracting the
| accordings streams.
|
| Wolfgang Hugemann

I did not know that Xpdf can extract the images from PDF losslessly.
Hmmm the package does not even require the use of ghostscript!
I wonder how it is processing PDF files?

Can you give us an example of extracting exact images from a PDF?

Anthony Thyssen ( System Programmer ) <***@griffith.edu.au>
--------------------------------------------------------------------------
At 300 dpi you can tell she's wearing a swimsuit.
At 600 dpi you can tell it's wet.
At 1200 dpi you can tell it's painted on.
I suppose at 2400 dpi you can tell if the paint is giving her a rash.
-- Joshua R. Poulson
--------------------------------------------------------------------------
Anthony's Castle http://www.ict.griffith.edu.au/anthony/

Tei

2011-04-27 01:18:55 UTC

Permalink

Post by Anthony Thyssen
On Mon, 25 Apr 2011 20:54:16 +0200
| Theoretically, a PDF might contain raster images of various resolution.
| Practically speaking, however, this will seldomly be the case. In the
| easiest case, a PDF will just binds(scanned) TIFFs of one and the same
| resolution. If the PDF is produced by a printer driver or distilled fom
| a PostScript file, one will generally define fixed resolutions for
| raster images and black and white images.
|
| So I guess there generally *is* something like an ideal resultion for an
| PDF when being converted to a raster image. But this is more of a
| Ghostscript problem than a problem of ImageMagick.
|
| If your PDF just binds raster images, you should give pdfimages from
| Xpdf a try, which losslessly extracts them from the PDF, extracting the
| accordings streams.
|
| Wolfgang Hugemann
I did not know that Xpdf can extract the images from PDF losslessly.
Hmmm the package does not even require the use of ghostscript!
I wonder how it is processing PDF files?

Xpdf is a complete PDF engine.

Don't quote me on that, but probably theres maybe less than 6 complete
pdf engines. Xpdf, Evince, Foxt, Adobe,... (If that is a myth, is a
pretty fun myth) Supposedly the PDF spec is a horrible and bloated
document, so few people have actually tried to complete the task to
make a engine for it. Most reuse code, retain sanity. Or make a
lightweight version of it.

--
--
ℱin del ℳensaje.

Wolfgang Hugemann

2011-04-26 23:28:55 UTC

Permalink

Post by Anthony Thyssen
I did not know that Xpdf can extract the images from PDF losslessly.
Hmmm the package does not even require the use of ghostscript!
I wonder how it is processing PDF files?

We do it about once a day over here; it functions reliably. I don't know about the details, but PDF is ASCII-based. So you can even extract the streams yourself by opening the PDF in an ASCII editor, looking for a stream that you know to be, say, a JPEG and extract it. At http://www.unfallrekonstruktion.de/imagemagick/BMW.pdf you'll find a PDF i have just generated.

You can extract the part between "stream" and "endstream" and save it as a JPEG -- it is a simple as this (at least on my Windows computer); I've just tried.

Wolfgang Hugemann

Anthony Thyssen

2011-04-27 01:53:32 UTC

Permalink

On Wed, 27 Apr 2011 09:28:55 +1000
magick-users-***@imagemagick.org wrote:
| > I did not know that Xpdf can extract the images from PDF losslessly.
| > Hmmm the package does not even require the use of ghostscript!
| > I wonder how it is processing PDF files?
|
| We do it about once a day over here; it functions reliably. I don't know about the details, but PDF is ASCII-based. So you can even extract the streams yourself by opening the PDF in an ASCII editor, looking for a stream that you know to be, say, a JPEG and extract it. At http://www.unfallrekonstruktion.de/imagemagick/BMW.pdf you'll find a PDF i have just generated.
|
| You can extract the part between "stream" and "endstream" and save it as a JPEG -- it is a simple as this (at least on my Windows computer); I've just tried.
|
| Wolfgang Hugemann

Hmmm that may be how PDF is storing your scanned JPEG's but I tried it
on a more normal PDF document (mostly text with a few images) and could
not locate the stream with the image.

Also what if the embeded image contains a '<CR>endstream<CR>' character
sequence! It is unlikely, but posible. There will need to be some
escaping sequence involved.

In other words your method may work well for your PDF wrapped JPEG's
but it fails for other situations. It is however a nice technique to
know, just not general.

So the question again becomes how to extract EXACT images from PDF files!

Anthony Thyssen ( System Programmer ) <***@griffith.edu.au>
--------------------------------------------------------------------------
"All I can say is, enthusiasm, sincerity, genuine compassion,
humour, can carry you through any lack of experience with...
higher numerical values!" -- Ivoniva's Theory of Relationships
Babylon 5, "Sic Transit Vir"
--------------------------------------------------------------------------
Anthony's Castle http://www.ict.griffith.edu.au/anthony/

Wolfgang Hugemann

2011-04-27 06:25:19 UTC

Permalink

Post by Anthony Thyssen
In other words your method may work well for your PDF wrapped JPEG's
but it fails for other situations. It is however a nice technique to
know, just not general.

This may hold for my simple, hand-work method, but pdfimages works on any PDF file. E-Mail "your" PDF file to me and I will give it a try. I am quite sure that it will work, unless it is protected, which will of course inhibits extraction.

Extracting the streams is not limited to simple PDFs which only wrap images. It works on very complex PDFs, too -- I've tried that several times.

Wolfgang Hugemann

Continue reading on narkive:

Search results for 'How to automatically figuring out the optimal resolution for a pdf file?' (Questions and Answers)

replies

pleasssse i need help!!!! :( :(?

started 2007-12-27 04:33:48 UTC

physics