Hi.
I used the JAVA Aspose for PDF (version 9.7.1) in order to convert a PDF to html.
My (first) problem is that some of the text is converted to html by separating each character to its own <div >. I think it happen when there is a style change like italic-font. This is a critical issue for me because it affects the search results.
Another issue is that the rest of the text is converted when each line is in its own <div > but each 2-3 words are in a separated <span >.
(Note that I used the java-aspose and I got these two issues, but when I used the .Net-aspose the first issue didn't occure, only the second one)
I attached a test pdf (copied from Wikipedia).
My JAVA code in order to convert the attached pdf file to html is:
--------------------------------------------------------------------------------
File mainHtmlFile = createNewHtmlFile();
com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(inStream);
HtmlSaveOptions options = new HtmlSaveOptions(SaveFormat.Html);
options.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
options.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
options.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
pdfDocument.save(mainHtmlFile.getAbsolutePath(), options);
---------------------------------------------------------------------------------
Can you help me to fix these two issues? or at least the first one?
Thanks
Tami