Quantcast
Channel: Aspose.Pdf Product Family
Viewing all articles
Browse latest Browse all 3131

PDF imge extracion issues

$
0
0

Hi Aspose Support.

 

- We have a PDF as attached.

- We extract images from PDF exactly as you describe it on your webpage (http://www.aspose.com/docs/display/pdfjava/Extract+Images+from+the+PDF+File):

 

pageCollection = asposePdfDocument.getPages();

 

for (Object pageObject : pageCollection) {

            Page page = (Page) pageObject;

            XImageCollection imageCollection = page.getResources().getImages();

            for (Object imageObject : imageCollection) {

                images.add(processImage((XImage) imageObject, page.getNumber()));

            }

        }

 

private ResolvedImageData processImage(XImage image, int pageNumber) {

        String imageName = image.getName();

        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

        image.save(outputStream);

    }

 

Problem is: almost no images are extracted.

1.       There are 6 identical pictures on page 1. Apose.PDF found and extracted only two.

2.       Right chart on page 3 has Chinese city names saved as images. Apose.PDF did not extract any.

3.       Same behaviour on top-left chart on page 4: no image with text was extracted.

The same situation is both on Windows and Linux

 

We use: 

- Aspose.PDF for Java 4.6.0 

- Windows XP

- Red Hat 4.4.7-3, JBoss 5.2.0.1 

 

The original file is attached.

 

Can you please advise if we need to do anything specific for running the image extraction or if it is a bug in Aspose.PDF 4.6.0.

 

Thanks in advance,
Kind Regards,
Michał Wasiak

 


Viewing all articles
Browse latest Browse all 3131

Trending Articles