Saturday, January 18, 2025
HomeEveryday WordPressHow To Build Your Own OCR API in Python

How To Build Your Own OCR API in Python


Extracting text from images has been a popular problem in software engineering for long. Optical Character Recognition (OCR) has been a pioneer technology used widely to solve this problem. With its ability to transform images containing text into machine-readable data, OCR has revolutionized various industries, from document processing automation to language translation.

While commercial OCR solutions exist, building your own OCR API in Python, a versatile and powerful programming language, offers several advantages, including customization, control over data privacy, and the potential for cost savings.

This guide will walk you through creating your own OCR API using Python. It explores the necessary libraries, techniques, and considerations for developing an effective OCR API, empowering you to harness the power of OCR for your applications.

Prerequisites

To follow along, you need a basic understanding of Python & Flask and a local copy of Python installed on your system.

Creating the OCR API

In this guide, you learn how to build a Flask application that allows users to upload images through a POST endpoint, which then loads using Pillow, and processes using the PyTesseract wrapper (for the Tesseract OCR engine). Finally, it returns the extracted text as the response to the request.

You can further customize this API to provide options such as template-based classification (extracting line items from invoices, inputs in tax forms, etc.) or OCR engine choices (you can find more OCR engines here).

To start off, create a new directory for your project. Then, set up a new virtual environment in the folder by running the following commands:

python3 -m venv env
source env/bin/activate

Next, install Flask, PyTesseract, Gunicorn, and Pillow by running the following command:

pip3 install pytesseract flask pillow gunicorn

Once these are installed, you need to install the Tesseract OCR engine on your host machine. The installation instructions for Tesseract will vary according to your host operating system. You can find the appropriate instructions here.

For instance, on MacOS, you can install Tesseract using Homebrew by running the following command:

brew install tesseract

Once this is done, the PyTesseract wrapper will be able to communicate with the OCR engine and process OCR requests.

Now, you are ready to write the Flask application. Create a new directory named ocrapi and a new file in this directory with the name main.py. Save the following contents in it:

from flask import Flask, request, jsonify
from PIL import Image
import pytesseract

app = Flask(__name__)

@app.route('/ocr', methods=['POST'])
def ocr_process():
    if request.method == 'POST':
        image_file = request.files['image']
        image_data = Image.open(image_file)

        # Perform OCR using PyTesseract
        text = pytesseract.image_to_string(image_data)

        response = {
            'status': 'success',
            'text': text
        }

        return jsonify(response)

The code above creates a basic Flask app that has one endpoint—/ocr. When you send a POST request to this endpoint with an image file, it extracts the file, uses the pytesseract wrapper to perform OCR using its code_to_string() method, and sends back the extracted text as part of the response.

Create a wsgi.py file in the same ocrapi directory and save the following contents in it:

from ocrapi.main import app as application

if __name__ == "__main__":
    application.run()

You can now run the app using the following command:

gunicorn ocrapi.wsgi

Your basic OCR API is ready, and it’s time to test it!



Source link

RELATED ARTICLES
Continue to the category

LEAVE A REPLY

Please enter your comment!
Please enter your name here


Most Popular

Recent Comments