A customer of ours has a PDF which is a single page PDF but where that single page contains a fairly large drawing. The PDF itself is a fairly modest 15MB in size.
When a TextAbsorber is used to try to extract text from this PDF the call does not return, CPU usage goes to max and memory is increasingly used until eventually an OutOfMemoryException is thrown.
The code being used to extract the text is:
Aspose.Pdf.Document doc = new Aspose.Pdf.Document(path);
TextAbsorber textAbsorber = new TextAbsorber();
for (int i = 1; i <= doc.Pages.Count; i++)
textAbsorber.Visit(doc.Pages[i]); // call does not return - causes eventual OutOfMemoryException
We are currently using Aspose.PDF.dll version 7.9, but I did try a test with the current version 9.4 and got the same behaviour.
Can I supply you with the PDF through a private channel for the problem to be investigated please.