Quantcast
Channel: Aspose.Pdf Product Family
Viewing all articles
Browse latest Browse all 3131

Strange text extracted using TextAbsorber

$
0
0
I am trying to extract the text from a PDF file using the TextAbsorber class; here is the code I am using:

    Stream stream = File.OpenRead("test.pdf");

    using (Document pdfDocument = new Document(stream)) {
      TextAbsorber textAbsorber = new TextAbsorber(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw));

      pdfDocument.Pages.Accept(textAbsorber);
      File.WriteAllText("pdfContent.txt", textAbsorber.Text);
    }

I would expect that the content of the pdfContent.txt file contains "Pdf test" multiple times, see the attached pdfContent.txt file for actual content. (Note: Not using aspose licence here to keep things simple)

Can you tell me whats wrong with the PDF file?

Viewing all articles
Browse latest Browse all 3131

Trending Articles