This repository contains an Optical Character Recognition (OCR) system built using Transfer Learning with modern deep learning techniques. The goal of this project is to extract text from images efficiently with high accuracy using pre-trained models.

Kindly download the ZIP file which contains sample input images. This will help you quickly test and run the OCR App without generating your own data.
- ✅ Utilizes Transfer Learning for faster training and better accuracy
- ✅ Supports printed and handwritten text recognition
- ✅ Preprocessing pipeline for noise removal and image enhancement
- ✅ Model training, validation, and evaluation modules included
- ✅ Easy to deploy and extend
| Component | Technology/Library |
|---|---|
| Framework | TensorFlow / PyTorch |
| Transfer Model | CNN, CRNN, or Transformer |
| Image Processing | OpenCV, PIL |
| OCR Engine | Deep Learning-based Model |
| Language Support | English (extendable) |
A clear breakdown of the folder structure:
📂 OCR-Project/
├── 📁 artifacts/ # Stores trained OCR models (pickle files) generated from screenshot dataset
├── 📁 src/ # Core Python scripts for model building, training, and OCR prediction
│ ├── model.py # Model architecture / transfer learning implementation
│ ├── preprocessing.py# Image preprocessing and utilities
│ └── inference.py # Script to load model and perform prediction
├── 📁 static/ # Frontend assets
│ ├── css/ # Styling files
│ ├── js/ # JavaScript logic
│ └── images/ # UI images/assets
├── 📁 templates/ # Flask HTML templates
│ ├── base.html # Reusable main layout
│ ├── home.html # Home page (text extraction UI)
│ └── login.html # User authentication page
├── app.py # Main Flask application integrating frontend and backend
├── requirements.txt # Dependency file to install required libraries
└── README.md # Project documentation
# Clone the repository
$ git clone https://github.com/tusharkolekar24/OCR
$ cd OCR
# Create virtual environment (optional)
$ python -m venv venv
$ source venv/bin/activate # Linux/Mac
$ venv\Scripts\activate # Windows
# Install dependencies
$ pip install -r requirements.txt- You can use any OCR dataset like IAM, MNIST OCR, or a custom dataset.
- Image Preprocessing – Resize, grayscale conversion, noise reduction
- Model Training – Using transfer learning from a pre-trained network
- Prediction – Model outputs recognized text from the image
- Evaluation – Accuracy and CER (Character Error Rate) calculation
python src/train.py --epochs 20 --batch_size 32python src/infer.py --image_path sample.jpg- ✅ Accuracy: 98%
- ✅ Inference Speed: 2s/image
- ✅ Precision: 95%
- ✅ Recall: 92%
- ✅F1 Score : 93%
You can deploy this OCR model via:
- Flask/FastAPI web app (
app.py) - Docker Container
- Add support for multiple languages
- Improve accuracy using Attention-based Transformers
- Deploy on cloud platforms (AWS, Azure)
Contributions are welcome! Feel free to submit pull requests or open issues.
This project is licensed under the MIT License. See the LICENSE file for more details.
Developed by Tushar Kolekar