I am trying to extract the text from a PDF file using the TextAbsorber class; here is the code I am using:
Stream stream = File.OpenRead("test.pdf");
using (Document pdfDocument = new Document(stream)) {
TextAbsorber textAbsorber = new TextAbsorber(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw));
pdfDocument.Pages.Accept(textAbsorber);
File.WriteAllText("pdfContent.txt", textAbsorber.Text);
}
I would expect that the content of the pdfContent.txt file contains "Pdf test" multiple times, see the attached pdfContent.txt file for actual content. (Note: Not using aspose licence here to keep things simple)
Can you tell me whats wrong with the PDF file?