Re: How to limit memory usage when extracting text from large PDFs?

March 28, 2014, 5:43 am

≫ Next: Accented Characters in PDF Files

≪ Previous: Java code to read "Marathi" (Indian local language) PDF and store it in MySQL and retrieving it .

Has there been any change in this issue? I am facing exactly the same problem.

↧

Accented Characters in PDF Files

April 24, 2014, 7:21 am

≫ Next: Remove Image Stamp from PDF

≪ Previous: Re: How to limit memory usage when extracting text from large PDFs?

Hi,

When I try to generate a PDF with the word "Książęce" in it, it comes out with spaces in the place of certain characters.

See a complete, reproducable code listing

 class Program    {        static void Main(string[] args)        {            string testStr = "Książęce";            System.IO.File.WriteAllBytes("TestFile.pdf", SavePdf(testStr));        }        /// <summary>        /// Saves the PDF.        /// </summary>        /// <param name="inputHtml">The input HTML.</param>        /// <returns>A byte array representing the generated PDF</returns>        public static byte[] SavePdf(string inputHtml)        {            Pdf pdf = new Pdf();            Section section = pdf.Sections.Add();            section.PageInfo.Margin.Top = 5;            section.PageInfo.Margin.Left = 5;            section.PageInfo.Margin.Bottom = 5;            section.PageInfo.Margin.Right = 5;            section.PageInfo.PageHeight = Aspose.Pdf.Generator.PageSize.A4Height;            section.PageInfo.PageWidth = Aspose.Pdf.Generator.PageSize.A4Width;            Text text = new Text(section, inputHtml);            text.IsHtmlTagSupported = true;            section.Paragraphs.Add(text);            text.IsFitToPage = true;            byte[] pdfBytes;            using (MemoryStream s = new MemoryStream())            {                pdf.Save(s);                pdfBytes = s.ToArray();            }            return pdfBytes;        }    }

I have attached a generated PDF to this post.

This was reproduced using Aspose.Pdf.dll 8.4.0.0

Many Thanks,

James

↧

Remove Image Stamp from PDF

April 26, 2014, 6:10 am

≫ Next: need support about aspose pdf

≪ Previous: Accented Characters in PDF Files

Hi Team,

I have a requirement to add a QR code on PDF and later on it should be removed from the file.

So I use below code to add the QR code image in to PDF.

//Open documentDocument pdfDocument = new Document("input.pdf");

//Create image stampImageStamp imageStamp = new ImageStamp("aspose-logo.jpg");
imageStamp.Background = true;
imageStamp.XIndent = 100;
imageStamp.YIndent = 100;
imageStamp.Height = 300;
imageStamp.Width = 300;
imageStamp.Rotate = Rotation.on270;
imageStamp.Opacity = 0.5;
//Add stamp to particular pagepdfDocument.Pages[1].AddStamp(imageStamp);

//Save output documentpdfDocument.Save("output.pdf");

Now I want to remove the image stamps that I have added. How can I remove the image stamps from PDF?

↧

need support about aspose pdf

April 26, 2014, 12:55 am

≫ Next: PDF password protection not working

≪ Previous: Remove Image Stamp from PDF

how to create dinamic row table

This message was posted using Banckle Live Chat 2 Forum

↧

PDF password protection not working

April 24, 2014, 11:02 am

≫ Next: HTML with multibyte characters don't convert to PDF properly

≪ Previous: need support about aspose pdf

I have the Aspose Total lic. (pdf.dll v9.1.0.0)

I an using the PDF password protect API but when I open the PDF file
the OWNER and USER see the same menu options
- what i mean is that the OWN also see the menu options disabled.

here's the code I am using...

Aspose.Pdf.Permissions _Permissions = new Aspose.Pdf.Permissions(); // nothing enabled
Aspose.Pdf.CryptoAlgorithm _CryptoAlgorithm = Aspose.Pdf.CryptoAlgorithm.AESx128;

doc.Encrypt(UserPassword,
                            OwnerPassword,
                            _Permissions,
                            _CryptoAlgorithm);

↧

HTML with multibyte characters don't convert to PDF properly

April 24, 2014, 12:30 pm

≫ Next: Saving as TIF issue

≪ Previous: PDF password protection not working

When I create a pdf document from html string, multibyte characters are missing in saved document, see code below:

var pdf = new Aspose.Pdf.Generator.Pdf();

pdf.SetUnicode();

string html1 = " <div >TEST_中文文档资料_TEXT</div>";

pdf.ParseToPdf(html1);

//AsposePdfCreator.ConvertToWordDoc(pdf, html1, pdftype);

pdf.SetUnicode();

pdf.Save(@"c:\test2.pdf");

SetUnicode() does not help

I am using Aspose.pdf version 9.1

Am I doing somethisg wrong?

Thank you,

Alexei

↧

Saving as TIF issue

April 27, 2014, 2:46 pm

≫ Next: split pdf file to individual pages get this error - value can not be null in java

≪ Previous: HTML with multibyte characters don't convert to PDF properly

The resulting Tif is not viewable in TIFF Viewer (Windows) - says it is corrupted.

Code below:

Aspose.Pdf.Document pdfDoc = new Aspose.Pdf.Document("HelloWorld.pdf");

pdfDoc.Save("HelloWorld.tif");

I am attaching the two files.

Thanks

↧

split pdf file to individual pages get this error - value can not be null in java

April 10, 2014, 4:18 am

≫ Next: Wrong PageInfo for landscape ?

≪ Previous: Saving as TIF issue

hi dears,

i try pdf splitting in aspose java and occured this error

Caused by: class com.aspose.ms.System.g: Value cannot be null.

Parameter name: path1

com.aspose.ms.System.e.ae.bp(Unknown Source)

com.aspose.g.i.r.aNo(Unknown Source)

com.aspose.g.m.h.bfO(Unknown Source)

com.aspose.pdf.Document.<clinit>(Unknown Source)

here's my code

Document pdf = new Document(myDir + myFile);// error occure in this line..

for (int i = 1; i <= pdf.getPages().size(); i++) {

Document document = new Document();

document.getPages().add(pdf.getPages().get_Item(i));

document.save(myDir + outputName + "_" + i + ".pdf");

}

pdf file is in attachment

thanks your suggests.

↧

Wrong PageInfo for landscape ?

April 28, 2014, 1:04 am

≫ Next: converting text to PDF doesn't finish

≪ Previous: split pdf file to individual pages get this error - value can not be null in java

The pages of my Pdf actually are in landscape mode... But how can I detect this. My goal is simply to initialize a PngDevice in order to export Pages with their original ratio. Below is the PageInfo I obtain... According to http://stackoverflow.com/questions/12050743/how-to-find-whether-pdf-has-landscape-orientation-or-portrait, the IsLandscape property (and IMO the PageInfo.Width/Height) seems not to be correctly infered by Aspose... I understand that all those Rects are not simple to "integrate" into one "simple final Rect for the Page". I thought that the PageInfo SHOULD hold the result of this subtle computation... PageInfo: Height 842.0 IsLandscape false Margin {Aspose.Pdf.MarginInfo} Width 595.0 Rect: {0,0,841,595} TrimBox: {0,0,841,595} MediaBox: {0,0,841,595} ArtBox: {0,0,841,595} BleedBox: {0,0,841,595} CropBox: {0,0,841,595} Which properties should I consider to correctly obtain the Rect to use to configure the PngDevice? Regards, Olivier

↧

converting text to PDF doesn't finish

April 6, 2014, 3:38 am

≫ Next: Problem signing a PDF file

≪ Previous: Wrong PageInfo for landscape ?

Hi.

i have a text file that when i try to convert to pdf, the save command runs for hours and don't complete.

i am using latest PDF dll 9.0.0 for 3.5 .net framework

here is the example code:

Aspose.Pdf.Generator.Pdf pdf1 = new Aspose.Pdf.Generator.Pdf();

Aspose.Pdf.Generator.Section sec1 = pdf1.Sections.Add();

System.IO.TextReader objReader = new System.IO.StreamReader("c:\\file_10MB.txt", true);

{

Aspose.Pdf.Generator.Text t2 = new Aspose.Pdf.Generator.Text(objReader.ReadLine());

sec1.Paragraphs.Add(t2);

} while (objReader.Peek() != -1);

pdf1.Save("c:/text.pdf");

i'm also attaching the file.

thanks.

↧

Problem signing a PDF file

April 28, 2014, 5:18 am

≫ Next: How to according to the number of PDF pages to convert.

≪ Previous: converting text to PDF doesn't finish

We are migrating the obsolete Aspose.Pdf.Kit library to Aspose.Pdf version 9.0.0 and experience difficulties with signing a document.

The example code in the JavaDoc gives the same compilation error as we experience, so this can be used as a reproduction for the problem:

 1  String inFile = TestPath + "example1.pdf";
 2  String outFile = TestPath + "signature.pdf";
 3  PKCS1 sig = new PKCS1("certificate.pfx", "password");
 4  sig.setReason ("Some reason");
 5  sig.setContact ("Smith");
 6  sig.setLocation ("New York");
 7  PdfFileSignature pdfSign = new PdfFileSignature(inFile, outFile);
 8  Rectangle rect = new Rectangle(100, 100, 200, 100);
 9  pdfSign.setSignatureAppearance ( TestPath + "butterfly.jpg");
10  pdfSign.sign(2, true, rect, sig);
11  pdfSign.save();

Line 3 does not compile and gives the following error:

- com.aspose.pdf.engine.security cannot be resolved to a type
- The type com.aspose.pdf.b.d cannot be resolved. It is indirectly referenced from required .class files

↧

How to according to the number of PDF pages to convert.

April 28, 2014, 3:28 am

≫ Next: how to add watermark with rotate 45 to pdf (aspose java)

≪ Previous: Problem signing a PDF file

How to according to the number of PDF pages to convert. No convert the entire PDF file .

For example:

A PAF File include 5 pages.

How to convert page1 and page5 to word?

Thanks!

↧

how to add watermark with rotate 45 to pdf (aspose java)

April 28, 2014, 12:29 am

≫ Next: Using custom font from custom folder in generator.pdf

≪ Previous: How to according to the number of PDF pages to convert.

hi friends,

i have tired to add watermark to pdf but there're 3 options, on90, on180 and on270. i want to add on45 from left bottom corner to right top corner. is it possible if it how can i.

thanks for all suggest,

Best regards.

↧

Using custom font from custom folder in generator.pdf

April 22, 2014, 9:11 am

≫ Next: Strange text extracted using TextAbsorber

≪ Previous: how to add watermark with rotate 45 to pdf (aspose java)

Hi,

I have a problem with specifying non-default font folder when creating pdf using generator.

The scenario is following:

I have custom folder that contains true type fonts.

I want to create pdf from a plain tex file with one of these fonts.

Class TextInfo allows me to set FontName ... this is ok, I know it.

But the font that is not in default (system folder) is not found and used.

I can also specify the TruetypeFontFileName which partially solves this issue.

The problem is that I dont know the font file name.

The FontRepository does not provide me useful functionality for this,

becase found font it does not contain file name or path.

Sample:

Pdf pdf = new Pdf();

var section = pdf.Sections.Add();

var text = new Text(File.ReadAllText(inputFile, Encoding.UTF8));

text.TextInfo.FontName = plainTextFont;

text.TextInfo.TruetypeFontFileName = Path.Combine(fontFolder, fontFile);

section.Paragraphs.Add(text);

pdf.SetUnicode();

pdf.Save(outputFile);

I need to find convinient way to do this without knowing fontFile.

Thanks for your answer.

Ondrej Boruvka, Y Soft

↧

Strange text extracted using TextAbsorber

April 10, 2014, 6:59 am

≫ Next: PDF convert to HTML - some words overlap

≪ Previous: Using custom font from custom folder in generator.pdf

I am trying to extract the text from a PDF file using the TextAbsorber class; here is the code I am using:

Stream stream = File.OpenRead("test.pdf");

using (Document pdfDocument = new Document(stream)) {

TextAbsorber textAbsorber = new TextAbsorber(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw));

pdfDocument.Pages.Accept(textAbsorber);

File.WriteAllText("pdfContent.txt", textAbsorber.Text);

}

I would expect that the content of the pdfContent.txt file contains "Pdf test" multiple times, see the attached pdfContent.txt file for actual content. (Note: Not using aspose licence here to keep things simple)

Can you tell me whats wrong with the PDF file?

↧

PDF convert to HTML - some words overlap

August 2, 2013, 2:30 pm

≫ Next: Convert PDF to TIFF

≪ Previous: Strange text extracted using TextAbsorber

Hello,

We are evaluating Aspose.pdf/kit (we already license Aspose.Words for Java for a long time now). The primary thing we're interested in is converting PDF docs to HTML format (text only).

I downloaded aspose-pdf-kit-4.6.1-java and gave it a try. The resulting HTML doc contains the text very nicely formatted, but there are numerous places where the end of one phrase overlaps the beginning of the next phrase. The original PDF doc and the resulting HTML doc are attached. You can see the occurrences starting at the top of the HTML doc with the phone number overlapping the following word 'Email', and then the first bullet under Summary has 'Java' and 'J2EE' overlapping, etc. Looking at the HTML source it seems that the absolute left positioning of the phrases may not be correct in some cases.

Will this problem be fixed?

When will it be fixed?

My test code is below.

Thanks in advance-

Becky McElroy

______________________

public static void main(String[] args)

{

if (args.length < 1)

{

System.out.println("Enter arg: path to pdf file");

System.exit(1);

}

try

{

// /create PdfExtractor object

PdfExtractor extractor = new PdfExtractor();

// bind input pDF file

File f = new File(args[0]);

extractor.bindPdf(new FileInputStream(f));

// extract text

extractor.extractText();

// save extracted text as HTML

extractor.extractTextAsHTML(args[0].substring(0, args[0].length() - 3) + "html");

// close PdfExtractor object

extractor.close();

}

catch (Exception e)

{

e.printStackTrace();

}

↧

Convert PDF to TIFF

April 21, 2014, 6:24 pm

≫ Next: Insert image into PDF at specified coordinates - placement, scaling?

≪ Previous: PDF convert to HTML - some words overlap

We are trying to convert PDF to TIFF. If the PDF is just text and can be converted using CCITT4 compression the resulting TIFF is fine and easily read. However, if the PDF includes color images and text then the text in the resulting TIFF file cannot be easily read and the size of the file is large. I am using the following which is from another forum post.

//create PdfConverter object and bind input PDF file
Aspose.Pdf.Facades.PdfConverter pdfConverter = new Aspose.Pdf.Facades.PdfConverter();

// create Resolution object with 300 as an argument
Aspose.Pdf.Devices.Resolution resolution = new Aspose.Pdf.Devices.Resolution(SetSaveOptionResolution());

// specify the resolution value for PdfConverter object - default is 150
pdfConverter.Resolution = resolution;

// bind the source PDF file
pdfConverter.BindPdf(cnvFileData.CnvFileName);

// start the conversion process
pdfConverter.DoConvert();

//create TiffSettings object, set Compression and ColorDepth
Aspose.Pdf.Devices.TiffSettings tiffSettings = new Aspose.Pdf.Devices.TiffSettings();

if (BlackAndWhite ||
    CheckOverrideCompression(fileExt) ||
    Compression.Equals("Group4FaxEncoding", StringComparison.CurrentCultureIgnoreCase))
    tiffSettings.Compression = Aspose.Pdf.Devices.CompressionType.CCITT4;
else
    tiffSettings.Compression = Aspose.Pdf.Devices.CompressionType.LZW;

retFileName = System.IO.Path.ChangeExtension(cnvFileData.CnvFileName, format.ToLower());

pdfConverter.SaveAsTIFF(retFileName, tiffSettings);

pdfConverter.Close();

//create PdfConverter object and bind input PDF file
Aspose.Pdf.Facades.PdfConverter pdfConverter = new Aspose.Pdf.Facades.PdfConverter();

// create Resolution object with 300 as an argument
Aspose.Pdf.Devices.Resolution resolution = new Aspose.Pdf.Devices.Resolution(SetSaveOptionResolution());

// specify the resolution value for PdfConverter object - default is 150
pdfConverter.Resolution = resolution;

// bind the source PDF file
pdfConverter.BindPdf(cnvFileData.CnvFileName);

// start the conversion process
pdfConverter.DoConvert();

//create TiffSettings object, set Compression and ColorDepth
Aspose.Pdf.Devices.TiffSettings tiffSettings = new Aspose.Pdf.Devices.TiffSettings();

retFileName = System.IO.Path.ChangeExtension(cnvFileData.CnvFileName, format.ToLower());

pdfConverter.SaveAsTIFF(retFileName, tiffSettings);

pdfConverter.Close();

This is being done in a console app or service. Are there settings that will give better results for color both file size and legibility of text? Can we query attributes of the PDF in order to make better decision on the compression and resolution? For instance if the PDF is all text but saved as color can we determine that it is all text and can be saved as CCITT4.

File to be converted is attached.

↧

Insert image into PDF at specified coordinates - placement, scaling?

April 28, 2014, 3:04 pm

≫ Next: Could not find any font error on linus

≪ Previous: Convert PDF to TIFF

Hello there,

I need to insert an image of (PNG or JPEG) at specific coordinates of a page in the PDF file.

The web application gets the file from the document library and renders it into individual PNG files using this method http://www.aspose.com/docs/display/pdfnet/Convert+PDF+pages+to+JPEG+images, like that:

// Open PDF file
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(inputFilePath);

// Measure it
int pageCount = pdfDocument.Pages.Count;

try
{
for (int pageNumber = 1; pageNumber
{
string pageName = String.Format("Page{0}.png", pageNumber);
string pagePath = Path.Combine(dataDirectory, pageName);
using (FileStream imageStream = new FileStream(pagePath, FileMode.Create))
{
//create PNG device with specified attributes
//Width, Height, Resolution, Quality
//Quality [0-100], 100 is Maximum
//create Resolution object
Resolution resolution = new Resolution(100);

PngDevice pngDevice = new PngDevice(new PageSize(PageSize.A4.Width, PageSize.A4.Height), resolution);

//convert a particular page and save the image to stream
pngDevice.Process(pdfDocument.Pages[pageNumber], imageStream);

//close stream
imageStream.Close();
}
}
}
catch (IndexOutOfRangeException ex)
{
if (ex.Source == "Aspose.Pdf")
{
// At most 4 elements (for any collection) can be viewed in evaluation mode.
// Eval mode in Aspose throws it

Logger.Instance.Log(Logger.SERVICE_TRACE_SOURCE, System.Diagnostics.TraceEventType.Error, String.Format(
"DocumentManager.convertPDFFileToPageImages: Aspose.Pdf in evaluation mode"));
}
else
{
throw;
}
}

User reviews the pages in the web browser, and clicks where they want the image to be inserted.

Then, I use method described here http://www.aspose.com/docs/display/pdfnet/Add+Image+in+existing+PDF+file to add an image at the specified location, like that

// For each image selected
foreach (ImageLocation imageLocation in imageLocations)
{
// Add images to the right pages
Page page = pdfDocument.Pages[imageLocation.PageNumber];
using (FileStream imageStream = new FileStream(imageLocation.FilePath, FileMode.Open))
{
page.Resources.Images.Add(imageStream);
}

XImage xImage = page.Resources.Images[page.Resources.Images.Count];

Logger.Instance.Log(Logger.SERVICE_TRACE_SOURCE, System.Diagnostics.TraceEventType.Verbose, String.Format(
"DocumentManager.InsertImagesIntoFileInStorage: Added '{0}' file to '{1}' page as XImage.Name='{2}', width='{3}', height='{4}''",
imageLocation.FilePath,
imageLocation.PageNumber,
xImage.Name,
xImage.Width,
xImage.Height));

// Insert images into right location

//using GSave operator: this operator saves current graphics state
page.Contents.Add(new Operator.GSave());
//create Rectangle and Matrix objects
Aspose.Pdf.Rectangle rectangle = new Aspose.Pdf.Rectangle(imageLocation.XPosition, imageLocation.YPosition, imageLocation.XPosition + xImage.Width, imageLocation.YPosition + xImage.Height);
Aspose.Pdf.DOM.Matrix matrix = new Aspose.Pdf.DOM.Matrix(new double[] { rectangle.URX - rectangle.LLX, 0, 0, rectangle.URY - rectangle.LLY, rectangle.LLX, rectangle.LLY });
//using ConcatenateMatrix (concatenate matrix) operator: defines how image must be placed
page.Contents.Add(new Operator.ConcatenateMatrix(matrix));
//using Do operator: this operator draws image
page.Contents.Add(new Operator.Do(xImage.Name));
//using GRestore operator: this operator restores graphics state
page.Contents.Add(new Operator.GRestore());

}

I am having difficulty in placing the image at exact location I specified and having it be the same size I specified.

Attached is the document. It already contained a 100x100 PNG with yellow background and "2" upside down in the bottom-right corner, at the bottom left of the first page.

The user inserted 100x100 PNG with orange background and "1" proper way up in the upper left corner at x=74 and y=176 coordinates as measured from bottom left of the page, just above the original image and aligned to it's left side

Added 'D:\Projects\\Documents\21\2_Stamp1.png' file to '1' page as XImage.Name='Im1', width='100', height='100'
Aspose.Pdf.Rectangle LLX='74', LLY='176', URX='174', URY='276', Width='100', Height='100'
Aspose.Pdf.DOM.Matrix A='100', B='0', C='0', D='100', E='74', F='176'

The original image measures 1.04" in size (as it was when I inserted it.

The second image is displayed visually offset from where I want it to. It also measures 1.78" in size, and 1.04/1.78 happens to be pretty close to 0.75% which makes me think some sort of scaling going on

Question: How do I insert the image at the right location and scale in the page?

I am using Aspose.PDF 9.1.0.0.

Thank you

↧

Could not find any font error on linus

April 28, 2014, 5:07 pm

≫ Next: tif to pdf convert error: Unable to read values for Xresolution tag

≪ Previous: Insert image into PDF at specified coordinates - placement, scaling?

My web app runs well on windows (dev machine), but it stop working on centos env. I got these error messages:

com.aspose.ms.System.e: Could not find any font. Use Document.addLocalFontPath(String path) to set correct path for your fonts location.

at com.aspose.pdf.b.c.g.c.p.a(Unknown Source)

at com.aspose.pdf.b.c.g.c.p.populateMaps(Unknown Source)

at com.aspose.pdf.b.c.g.c.p.dcS(Unknown Source)

at com.aspose.pdf.b.c.g.c.p.dcR(Unknown Source)

at com.aspose.pdf.ADocument.preSave(Unknown Source)

at com.aspose.pdf.ADocument.saveInternal(Unknown Source)

at com.aspose.pdf.Document.saveInternal(Unknown Source)

at com.aspose.pdf.ADocument.save(Unknown Source)

at com.aspose.pdf.Document.save(Unknown Source)

Can anyone help me?

Regards,

Green

↧

tif to pdf convert error: Unable to read values for Xresolution tag

April 29, 2014, 4:02 am

≫ Next: setIsBlackWhite() method error

≪ Previous: Could not find any font error on linus

hi friends,

i use aspose pdf java 9.0. when i try to convert tif file to pdf i get an error like this;

Exception in thread "main" java.lang.IllegalStateException: Unable to read values for Xresolution tag. Message : Cannot read 8 bytes from the stream.

at com.aspose.ms.c.g.at.c(Unknown Source)

at com.aspose.ms.c.g.ce.e(Unknown Source)

at com.aspose.ms.c.g.ce.d(Unknown Source)

at com.aspo...

...a(Unknown Source)

at aspose.pdf.Pdf.save(Unknown Source)

at com.aspose.Test.SplitTiff2Pdf.tif2Pdf(SplitTiff2Pdf.java:76)

at com.aspose.Test.SplitTiff2Pdf.main(SplitTiff2Pdf.java:87)

error occuer in pdf1.save() line. i use win7 x64 and jdk 1.7, editor is Spring Tool Suite, how can i resolve this cause could you help me please, my java class and tif file in attachment

thanks for all supports,

Best regards.

↧