Quantcast
Channel: Aspose.Pdf Product Family
Viewing all articles
Browse latest Browse all 3131

How to extract every word of text and their position coordinates

$
0
0
Hello,

I'm testing ASPOSE PDF for .NET. (VB.NET)
I'm trying to extract every word and to know his position coordinates.
I have next code and 2 questions:
#########################################################
Dim pdfDocument As Aspose.Pdf.Document
Dim license As Aspose.Pdf.License = New Aspose.Pdf.License()
license.SetLicense("Aspose.Pdf.lic")
license.Embedded = True
pdfDocument = New Aspose.Pdf.Document("c:\test.pdf")
For pageNo As Integer = 1 To pdfDocument.Pages.Count
    Dim textFragmentAbsorber As New Aspose.Pdf.Text.TextFragmentAbsorber()
    pdfDocument.Pages(pageNo).Accept(textFragmentAbsorber)
    Dim textFragmentCollection As Aspose.Pdf.Text.TextFragmentCollection = textFragmentAbsorber.TextFragments
    For Each textFragment As Aspose.Pdf.Text.TextFragment In textFragmentCollection
        For Each textSegment As Aspose.Pdf.Text.TextSegment In textFragment.Segments
            MsgBox "Word=" & textSegment.Text & vbCrLf & "Position=" & textSegment.Position.XIndent & "," & textSegment.Position.YIndent
        Next textSegment
    Next textFragment
Next pageNo
pdfDocument.Dispose()
#########################################################
But it doesn't return every word. It returns "Hello World" (for instance).
1) I would like that returns first one "Hello" and next one "World"
2) I need coordinates of every word and width/height too.

Thanks.

Toni Jiménez

Viewing all articles
Browse latest Browse all 3131

Trending Articles