Quantcast
Channel: Aspose.Pdf Product Family
Viewing all articles
Browse latest Browse all 3131

Apsose pdf introduces space when the pdf document has chinese characters

$
0
0
Hi, 

I am trying to extract text from a pdf document. The pdf document has some chinese characters. So when I use the text absorber, I am seeing that it introduces unnecessary spaces between the characters.

private String extractText(byte[] inputContent) throws IOException{

ByteArrayInputStream stream = new ByteArrayInputStream(inputContent);

Document pdfDocument = new Document(stream);

// Create TextAbsorber object to extract text

TextAbsorber textAbsorber = new TextAbsorber();

// Accept the absorber for all the pages

pdfDocument.getPages().accept(textAbsorber);

// Get the extracted text

String extractedText = textAbsorber.getText();

returnextractedText;

}


I am also attaching the java code and the pdf file.


Viewing all articles
Browse latest Browse all 3131

Trending Articles