Quantcast
Channel: Aspose.Pdf Product Family
Viewing all articles
Browse latest Browse all 3131

System.OutOfMemoryException when extracting text from some PDF files.

$
0
0

Hello,

We would like to report a problem we encounter when extracting text from PDF files using Aspose.PDF in .NET. We have noticed that some PDF files cause the memory usage to grow until .NET throws an OutOfMemory exception. This is a serious problem since we use this functionality on a server with many other components running and a lot of memory available.

We have tried using the newest version currently available (9.8.0)

PDF file that causes this issue is attached.

The issue can be reproduced using this code sample:

[C#]
using Aspose.Pdf;
using Aspose.Pdf.Devices;
using Aspose.Pdf.InteractiveFeatures.Annotations;
using Aspose.Pdf.Text.TextOptions;
using System.IO;
using System.Text;

namespace AsposePdfTest
{
    class Program
    {
        static void Main(string[] args)
        {
            Document pdfDocument = new Document("Example1.pdf");

            string extractedText = "";

            foreach (Page pdfPage in pdfDocument.Pages)
            {
                using (MemoryStream textStream = new MemoryStream())
                {
                    TextDevice textDevice = new TextDevice();

                    TextExtractionOptions textExtOptions = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure);

                    textDevice.ExtractionOptions = textExtOptions;

                    textDevice.Process(pdfPage, textStream);

                    textStream.Close();

                    extractedText = Encoding.Unicode.GetString(textStream.ToArray());
                }
            }

            File.WriteAllText("ExtractedText.txt", extractedText, Encoding.Unicode);
        }
    }
}

Regards,
Christoffer


Viewing all articles
Browse latest Browse all 3131

Trending Articles