Quantcast
Channel: Aspose.Pdf Product Family
Viewing all articles
Browse latest Browse all 3131

Extract Title text of pdf issue

$
0
0
hello,
 I try to use Aspose.pdf for .net to extract the title of each page,then, encountered some problems.how to extract the title of each page of PDF? or the first paragraph. Does Aspose have an interface or property to extract title text?thank u very much!

i try to extract the text from particular region,however, it doesn't works well.

 Aspose.Pdf.Document pdf = new Aspose.Pdf.Document(filePath);
 var PageLen = pdf.Pages.Count;
 var indexx = 0;
 string txtPath = imgPath + fileName + ".txt";
  if (File.Exists(txtPath))
   {
    File.Delete(txtPath);
   }

 string extractedText = "";
 string pagetext = "";
for (int index = 1; index <= PageLen; index++)
{++indexx;
   Aspose.Pdf.Text.TextAbsorber textAbsorber = 
new Aspose.Pdf.Text.TextAbsorber();
                        
                                textAbsorber.TextSearchOptions.LimitToPageBounds = true;

                                textAbsorber.TextSearchOptions.Rectangle = new Aspose.Pdf.Rectangle(
                                 pdf.Pages[1].Rect.LLX,
                                 pdf.Pages[1].Rect.LLY + 55,
                                 pdf.Pages[1].Rect.URX,
                                 pdf.Pages[1].Rect.URY -55);
                          
                            pdf.Pages[indexx].Accept(textAbsorber);
     pagetext = textAbsorber.Text.Trim();
                            
     var blankpos = pagetext.IndexOf("\r");
     if (blankpos != -1)
     {
    pagetext = pagetext.Substring(0, blankpos);
}
      pagetext = MergeSpace(pagetext);
      extractedText = extractedText + indexx.ToString() + "." + pagetext + "\r\n";
     }
     TextWriter tw = new StreamWriter(txtPath);
     tw.WriteLine(extractedText);
      tw.Close();
}

Viewing all articles
Browse latest Browse all 3131

Trending Articles