Quantcast
Channel: Aspose.Pdf Product Family
Viewing all articles
Browse latest Browse all 3131

PDF to HTML - Retrieve raw HTML and Handle Images

$
0
0
I am attempting to save a PDF to HTML, get the resulting html, and override the logic that saves the images, I need to handle this in a very specific manner.

1. I need to be able to get the resulting raw HTML from the save process of converting the PDF to HTML.

For this first process, in your other products, such as Words, I am able to save to a memory stream.  From there I can do whatever it is I need to.  It doesn't have to be a MemoryStream, but I need to be able to read that raw html, without writing to disk.  I keep getting continued errors with saving a Pdf to a MemoryStream as the format of HTML.  Is this at all feasible?

If the above is not possible, how do I go about converting a PDF to a Word document? I already have code in place that does the above from a Word document.

2. I need to be able to dictate where any embedded images are saved to.  With the Aspose.Words product this is handled in an HtmlSaveOption with an ImageSavingCallback.  I've included the code below which works in the Aspose.Words namespace, is there any way I can invoke something similar for the PDF saving that I'm missing?

//Aspose.Words method - Working great!
MemoryStream writeStream = new MemoryStream();
HtmlSaveOptions options = new HtmlSaveOptions(SaveFormat.Html);
options.ImageSavingCallback = new AsposeImageSavingCallback();
doc.Save(writeStream, options);

If something like the above is not possible.... how can I control how the images are saved and the <img> tags are generated in the resulting save file?


(I found these two links to be somewhat helpful, but not entirely what I need)

Viewing all articles
Browse latest Browse all 3131

Trending Articles