Hi ,
I am developing a project in java , which reads data from PDF (Marathi - (Indian local Language) ) and that data will be formatted .i.e. Only required fields will be stored in database. e.g.
Name of Voter,Address , age ( we can use for it split() or any other function in String) .
When user tries to search by name then all information about him/her will be displayed . I tried to read data from PDF using UTF-8. Its showing o/p but not in proper format .
i.e. some marathi words and some characters in between them. I want to store clear "Marathi" data in mysql and retrieve it also.
I tried following code for displaying "Marathi" data in console as initial step . after that I will store it in Mysql and then will display it. But following o/p shows only some Marathi woeds and some symbols.
Again in project its required to use "Marathi" keyboard . i.e. user will enter in Marathi data and will get "marathi" o/p.
Note- I also changed default encoding from eclipse by pressing ctrl+Enter . Encoding - UTF-8
Following is code I tried as first step.:
---------------------------------------------------------------------------------------------------------------------
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.Locale;
//iText imports
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import com.itextpdf.text.pdf.parser.TextRenderInfo;
public class iTextReadDemo {
public static void main(String[] args) {
try {
PdfReader reader = new PdfReader("D://Vikram//Workspace//Projects//Election//List.pdf");
System.out.println("This PDF has "+reader.getNumberOfPages()+" pages.");
int i=reader.getNumberOfPages();
byte[] bytes = new byte[10];
Locale loc = new Locale("hi","IN");
for(int i1=1;i1<=i;i1++)
{
String page = PdfTextExtractor.getTextFromPage(reader, 1);
System.out.println("Page Content:\n\n"+new String(page.getBytes("UTF-8"))+"\n\n");
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
I am developing a project in java , which reads data from PDF (Marathi - (Indian local Language) ) and that data will be formatted .i.e. Only required fields will be stored in database. e.g.
Name of Voter,Address , age ( we can use for it split() or any other function in String) .
When user tries to search by name then all information about him/her will be displayed . I tried to read data from PDF using UTF-8. Its showing o/p but not in proper format .
i.e. some marathi words and some characters in between them. I want to store clear "Marathi" data in mysql and retrieve it also.
I tried following code for displaying "Marathi" data in console as initial step . after that I will store it in Mysql and then will display it. But following o/p shows only some Marathi woeds and some symbols.
Again in project its required to use "Marathi" keyboard . i.e. user will enter in Marathi data and will get "marathi" o/p.
Note- I also changed default encoding from eclipse by pressing ctrl+Enter . Encoding - UTF-8
Following is code I tried as first step.:
---------------------------------------------------------------------------------------------------------------------
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.Locale;
//iText imports
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import com.itextpdf.text.pdf.parser.TextRenderInfo;
public class iTextReadDemo {
public static void main(String[] args) {
try {
PdfReader reader = new PdfReader("D://Vikram//Workspace//Projects//Election//List.pdf");
System.out.println("This PDF has "+reader.getNumberOfPages()+" pages.");
int i=reader.getNumberOfPages();
byte[] bytes = new byte[10];
Locale loc = new Locale("hi","IN");
for(int i1=1;i1<=i;i1++)
{
String page = PdfTextExtractor.getTextFromPage(reader, 1);
System.out.println("Page Content:\n\n"+new String(page.getBytes("UTF-8"))+"\n\n");
}
} catch (IOException e) {
e.printStackTrace();
}
}
}