Hello,
We would like to report a problem we encounter when extracting text from PDF files using Aspose.PDF in .NET. We have noticed that some PDF files cause the memory usage to grow until .NET throws an OutOfMemory exception. This is a serious problem since we use this functionality on a server with many other components running and a lot of memory available.
We have tried using the newest version currently available (9.8.0)
PDF file that causes this issue is attached.
The issue can be reproduced using this code sample:
[C#]
using Aspose.Pdf;
using Aspose.Pdf.Devices;
using Aspose.Pdf.InteractiveFeatures.Annotations;
using Aspose.Pdf.Text.TextOptions;
using System.IO;
using System.Text;
namespace AsposePdfTest
{
class Program
{
static void Main(string[] args)
{
Document pdfDocument = new Document("Example1.pdf");
string extractedText = "";
foreach (Page pdfPage in pdfDocument.Pages)
{
using (MemoryStream textStream = new MemoryStream())
{
TextDevice textDevice = new TextDevice();
TextExtractionOptions textExtOptions = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure);
textDevice.ExtractionOptions = textExtOptions;
textDevice.Process(pdfPage, textStream);
textStream.Close();
extractedText = Encoding.Unicode.GetString(textStream.ToArray());
}
}
File.WriteAllText("ExtractedText.txt", extractedText, Encoding.Unicode);
}
}
}
Regards,
Christoffer