Hi Aspose Support.
- We have a PDF as attached.
- We extract images from PDF exactly as you describe it on your webpage (http://www.aspose.com/docs/display/pdfjava/Extract+Images+from+the+PDF+File):
pageCollection = asposePdfDocument.getPages();
for (Object pageObject : pageCollection) {
Page page = (Page) pageObject;
XImageCollection imageCollection = page.getResources().getImages();
for (Object imageObject : imageCollection) {
images.add(processImage((XImage) imageObject, page.getNumber()));
}
}
private ResolvedImageData processImage(XImage image, int pageNumber) {
String imageName = image.getName();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
image.save(outputStream);
}
Problem is: almost no images are extracted.
1. There are 6 identical pictures on page 1. Apose.PDF found and extracted only two.
2. Right chart on page 3 has Chinese city names saved as images. Apose.PDF did not extract any.
3. Same behaviour on top-left chart on page 4: no image with text was extracted.
The same situation is both on Windows and Linux
We use:
- Aspose.PDF for Java 4.6.0
- Windows XP
- Red Hat 4.4.7-3, JBoss 5.2.0.1
The original file is attached.
Can you please advise if we need to do anything specific for running the image extraction or if it is a bug in Aspose.PDF 4.6.0.
Thanks in advance,
Kind Regards,
Michał Wasiak