I’m working on an Android app that needs to extract text from images. I want to avoid using the Tesseract library and instead use Google Drive’s OCR capabilities.
I followed the Google Drive quickstart guide and successfully uploaded an image to Google Docs. The upload works fine and I get a success message with the filename.
For retrieving the extracted text, I implemented this method:
public void extractTextFromFile(File document) {
String textExportUrl = document.getExportLinks().get("text/plain");
HttpClient httpClient = new DefaultHttpClient();
HttpGet getRequest = new HttpGet(textExportUrl);
HttpResponse httpResponse;
StringBuilder textBuilder = new StringBuilder();
BufferedReader reader = null;
try {
httpResponse = httpClient.execute(getRequest);
reader = new BufferedReader(new InputStreamReader(httpResponse.getEntity().getContent()));
String line;
while ((line = reader.readLine()) != null) {
textBuilder.append(line);
}
Log.d("Extracted Text", textBuilder.toString());
reader.close();
} catch (ClientProtocolException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
However, instead of getting the actual text from my uploaded image, I’m receiving HTML content that looks like Google Docs welcome page HTML.
I also tried using DriveRequest for downloading but got ClassCastException when trying to cast HttpRequest to DriveRequest.
How can I properly retrieve the OCR text results from the image I uploaded to Google Drive? Any help would be appreciated.