Hello,
I'm testing ASPOSE PDF for .NET. (VB.NET)
I'm trying to extract every word and to know his position coordinates.
I have next code and 2 questions:
#########################################################
Dim pdfDocument As Aspose.Pdf.Document
Dim license As Aspose.Pdf.License = New Aspose.Pdf.License()
license.SetLicense("Aspose.Pdf.lic")
license.Embedded = True
pdfDocument = New Aspose.Pdf.Document("c:\test.pdf")
For pageNo As Integer = 1 To pdfDocument.Pages.Count
Dim textFragmentAbsorber As New Aspose.Pdf.Text.TextFragmentAbsorber()
pdfDocument.Pages(pageNo).Accept(textFragmentAbsorber)
Dim textFragmentCollection As Aspose.Pdf.Text.TextFragmentCollection = textFragmentAbsorber.TextFragments
For Each textFragment As Aspose.Pdf.Text.TextFragment In textFragmentCollection
For Each textSegment As Aspose.Pdf.Text.TextSegment In textFragment.Segments
MsgBox "Word=" & textSegment.Text & vbCrLf & "Position=" & textSegment.Position.XIndent & "," & textSegment.Position.YIndent
Next textSegment
Next textFragment
Next pageNo
pdfDocument.Dispose()
#########################################################
But it doesn't return every word. It returns "Hello World" (for instance).
1) I would like that returns first one "Hello" and next one "World"
2) I need coordinates of every word and width/height too.
Thanks.
Toni Jiménez
I'm testing ASPOSE PDF for .NET. (VB.NET)
I'm trying to extract every word and to know his position coordinates.
I have next code and 2 questions:
#########################################################
Dim pdfDocument As Aspose.Pdf.Document
Dim license As Aspose.Pdf.License = New Aspose.Pdf.License()
license.SetLicense("Aspose.Pdf.lic")
license.Embedded = True
pdfDocument = New Aspose.Pdf.Document("c:\test.pdf")
For pageNo As Integer = 1 To pdfDocument.Pages.Count
Dim textFragmentAbsorber As New Aspose.Pdf.Text.TextFragmentAbsorber()
pdfDocument.Pages(pageNo).Accept(textFragmentAbsorber)
Dim textFragmentCollection As Aspose.Pdf.Text.TextFragmentCollection = textFragmentAbsorber.TextFragments
For Each textFragment As Aspose.Pdf.Text.TextFragment In textFragmentCollection
For Each textSegment As Aspose.Pdf.Text.TextSegment In textFragment.Segments
MsgBox "Word=" & textSegment.Text & vbCrLf & "Position=" & textSegment.Position.XIndent & "," & textSegment.Position.YIndent
Next textSegment
Next textFragment
Next pageNo
pdfDocument.Dispose()
#########################################################
But it doesn't return every word. It returns "Hello World" (for instance).
1) I would like that returns first one "Hello" and next one "World"
2) I need coordinates of every word and width/height too.
Thanks.
Toni Jiménez