Skip to main content

OCR for Engineering Mechanical Drawings

Project description

eDOCr

eDOCr is a packaged version of keras-ocr that facilitates end-to-end digitization of mechanical EDs. Developed for Windows OS and using Python as the primary programming language.

Getting Started

Installation

eDOCr supports Python >= 3.6 and TensorFlow >= 2.0.0. Install your prefered distribution platform, Anaconda is recommended. Open Anaconda Prompt and type the following commands:

conda create -n edocr -ypip
conda activate edocr

# To install from PyPi
conda install pip
pip install eDOCr

# To install from Source
cd path/to/your/folder
git clone https://github.com/javvi51/eDOCr
cd eDOCr
pip install -r requirements.txt

Using

There are two ways of using eDOCr: from terminal and from your own python file.

From Terminal:

We need to locate the ocr_it.py file in our system. If you have installed using pip, it will probably come in C:\Users\YOUR_USER\.conda\envs\edocr\Lib\site-packages\eDOCr\ocr_it.py. If you have installed from source, it will be at your selected folder. All you need to do is:

python PATH/TO/YOUR/FOLDER/eDOCr/ocr_it.py PATH/TO/YOUR/DRAWING/my_drawing.pdf

Additional commands you can use are:

# Specify the destination path. By default, it is the same as your drawing.
--dest-folder PATH/TO/YOUR/DESTINATION/FOLDER
# Does the drawing have watermark you want to remove? By default, it is not considered.
--water
# Advance Setting: Set a custom threshold distance (in px.) for grouping detections. Default is 20px.
--cluster 25

From your own python file

More customization is possible using your own python file, such as selecting a different model, alphabet or changing colors.

# Importing packages
import os
from eDOCr import tools
import cv2
import string
from skimage import io

# Loading image and destination file
dest_DIR = 'tests/test_Results'
file_path = 'tests/test_samples/Candle_holder.jpg'
filename = os.path.splitext(os.path.basename(file_path))[0]
img = cv2.imread(file_path)

# Selecting alphabet and model (Note that alphabet and alphabet model need to match)
GDT_symbols = '⏤⏥○⌭⌒⌓⏊∠⫽⌯⌖◎↗⌰'
FCF_symbols = 'ⒺⒻⓁⓂⓅⓈⓉⓊ'
Extra = '(),.+-±:/°"⌀'

alphabet_dimensions = string.digits + 'AaBCDRGHhMmnx' + Extra
model_dimensions = 'eDOCr/keras_ocr_models/models/recognizer_dimensions.h5'
alphabet_infoblock = string.digits+string.ascii_letters+',.:-/'
model_infoblock = 'eDOCr/keras_ocr_models/models/recognizer_infoblock.h5'
alphabet_gdts = string.digits + ',.⌀ABCD' + GDT_symbols
model_gdts = 'eDOCr/keras_ocr_models/models/recognizer_gdts.h5'

# Selecting personalized color palette and cluster setting
color_palette = {'infoblock': (180, 220, 250), 'gdts': (94, 204, 243), 'dimensions': (93, 206, 175), 'frame': (167, 234, 82), 'flag': (241, 65, 36)}
cluster_t = 20

# eDOCr functions
class_list, img_boxes = tools.box_tree.findrect(img)
boxes_infoblock, gdt_boxes, cl_frame, process_img = tools.img_process.process_rect(class_list, img)
io.imsave(os.path.join(dest_DIR, filename + '_process.jpg'), process_img)

infoblock_dict = tools.pipeline_infoblock.read_infoblocks(boxes_infoblock, img, alphabet_infoblock, model_infoblock)
gdt_dict = tools.pipeline_gdts.read_gdtbox1(gdt_boxes, alphabet_gdts, model_gdts, alphabet_dimensions, model_dimensions)
 
process_img = os.path.join(dest_DIR, filename + '_process.jpg')

dimension_dict = tools.pipeline_dimensions.read_dimensions(process_img, alphabet_dimensions, model_dimensions, cluster_t)
mask_img = tools.output.mask_the_drawing(img, infoblock_dict, gdt_dict, dimension_dict, cl_frame, color_palette)

# Record the results
io.imsave(os.path.join(dest_DIR, filename + '_boxes.jpg'), img_boxes)
io.imsave(os.path.join(dest_DIR, filename + '_mask.jpg'), mask_img)
tools.output.record_data(dest_DIR, filename, infoblock_dict, gdt_dict, dimension_dict)

example of labeled image

Training a model on a custom alphabet

Fonts are not loaded if installing from pip. To train new models, please install from source.

To train a model in a custom alphabet, a python file is provided, so that the only steps needed are:

# Importing Packages
import os
import string
from eDOCr import keras_ocr
from eDOCr.keras_ocr_models import train_recognizer

# Fixing paths and alphabet
DIR = os.getcwd()
recognizer_basepath = os.path.join(DIR, 'eDOCr/Keras_OCR_models/models')
data_dir = './tests'
alphabet = string.digits + 'AaBCDRGHhMmnx' + '().,+-±:/°"⌀'

# Number of autogenerated samples
samples = 10000

# Load white backgrounds and fonts
backgrounds = []
for i in range(0, samples):
    backgrounds.append(os.path.join('./eDOCr/Keras_OCR_models/backgrounds/0.jpg'))

fonts = []
for i in os.listdir(os.path.join(DIR, 'eDOCr/Keras_OCR_models/fonts')):
    fonts.append(os.path.join('./eDOCr/Keras_OCR_models/fonts', i))

# Choose a pretrained model if you like

pretrained_model = None 
#pretrained_model = os.path.join(recognizer_basepath,'recognizer_dimensions.h5')

# Start Training 
train_recognizer.generate_n_train(alphabet, backgrounds, fonts, recognizer_basepath=recognizer_basepath, pretrained_model=pretrained_model)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eDOCr-0.0.4.tar.gz (55.6 kB view details)

Uploaded Source

Built Distribution

eDOCr-0.0.4-py3-none-any.whl (61.4 kB view details)

Uploaded Python 3

File details

Details for the file eDOCr-0.0.4.tar.gz.

File metadata

  • Download URL: eDOCr-0.0.4.tar.gz
  • Upload date:
  • Size: 55.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for eDOCr-0.0.4.tar.gz
Algorithm Hash digest
SHA256 929e85807560d1c525fb9e6d3435b89f64855a6c137e993945cf0ff0e435661d
MD5 297b75272bf850b346f4d9a3efefd1b3
BLAKE2b-256 94022f614b4379456d4c89a38e366433b50d1f5449686923c1dc7307549cb729

See more details on using hashes here.

File details

Details for the file eDOCr-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: eDOCr-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 61.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for eDOCr-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d072b6c04986fa7f429e9566828e2d4199a737c1fdbb895641e975496409652a
MD5 3389f4a82c0868a1a238cdde7951e8dd
BLAKE2b-256 03ec93c784a366fb790c50d36e6e7020cdb91d3f4904d2d89dfb2bbc0ebc018f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page