Quantcast
Channel: Aspose.Pdf Product Family
Viewing all articles
Browse latest Browse all 3131

hocr and searchable PDF.

$
0
0
Hi,

I try to convert PDF (aspose pdf 11.9.0) with Hocr generated from Tesseract 3.0.4.
 - With Html hocr  : do it nothing ! PDF is same before the transform.
 - With xhtml hocr  : convert method throw FormatException.

you can reproduce the issue using the attached project.



Here is a sample of my code:

public
 void Save(Func<intStream> getStream)
{    using (var s = getStream(0))    {        this.asposeDoc.Convert(hocrTesseract);        this.asposeDoc.Save(s);    }
}private string hocrTesseract(System.Drawing.Image img)
{    using (var ocr = new TesseractEngine(@"(...)""fra"EngineMode.Default))    using (var bitmap = new Bitmap(img))    using (var page = ocr.Process(bitmap))    {        returnpage.GetHOCRText(0);    }
}



Viewing all articles
Browse latest Browse all 3131

Trending Articles