OCR

This project implements the classical OCR pipeline using opencv-python.

The code is written in python and uses the following libraries:

opencv-python for image processing tools and functions
numpy for image manipulation as an array
sklearn for svm classifiers
os for directory manipulation
pickle for saving and loading the svm model
scipy for loading the emnist-letters dataset from a mat format file
subprocess to open the text file into a notepad.

The code takes a text image as input and outputs a txt file with the text in editable form as extracted from the image. The datasets used are:

Emnist-letters
Char 74k The images from the char 74k dataset are first converted into a npy file following the mnist dataset format and then sent for training to a linear kernel svm classifier.

This project uses image processing techniques of the spatial domain to segment the given text image into lines,words and finally into characters. The segmented characters are then sent for recognition to the svm classifier. Once the characters are recognised they are sent for text reconstruction and then saved into a text file.

Note: The code is not free of errors and exceptions. If the quality of the image is poor then exceptions can occur.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
test image		test image
README.md		README.md
_config.yml		_config.yml
character_detector.py		character_detector.py
character_recognition.py		character_recognition.py
documentation.docx		documentation.docx
image_to_np_data.py		image_to_np_data.py
line_detector.py		line_detector.py
mnist_loader.py		mnist_loader.py
module_interface.py		module_interface.py
noise_removal.py		noise_removal.py
output_file_opener.py		output_file_opener.py
pre_recognition_processing.py		pre_recognition_processing.py
resize.py		resize.py
skew_correction.py		skew_correction.py
svm.py		svm.py
text_generator.py		text_generator.py
word_detector.py		word_detector.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR

About

Releases

Packages

Contributors 2

Languages

saurabhojha/OCR-

Folders and files

Latest commit

History

Repository files navigation

OCR

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages