Skip to main content

An open-source Python3 tool for Optical Character Recognition (OCR) and LaTeX expression extraction from images, a Free Alternative to Mathpix

Project description

中文 | English

Pix2Text

Update 2024.01.10: Release V0.3

Major Changes:

  • Support for recognizing 80+ languages; for a detailed list of supported languages, see List of Supported Languages;
  • Added domestic sites for automatic model downloads;
  • Optimized the logic for merging detection boxes.

Update 2023.07.03: Released V0.2.3

Major changes:

  • Trained a new formula recognition model for P2T Online Service to use. The new model has higher accuracy, especially for handwritten formulas and multi-line formulas. See: New Formula Recognition Model for Pix2Text | Breezedeus.com.
  • Optimized the sorting logic of detected boxes and the processing logic of mixed images to make the final recognition results more intuitive.
  • Optimized the merging logic of recognition results to automatically determine line breaks and paragraph breaks.

See more at: RELEASE.md .

Pix2Text aims to be a free and open-source Python alternative to Mathpix. It can already complete the core functionalities of Mathpix. Starting from V0.2, Pix2Text (P2T) supports recognizing mixed images containing both text and formulas, with output similar to Mathpix. The core principles of P2T are shown below (text recognition supports both Chinese and English):

Pix2Text workflow

P2T uses the open-source tool CnSTD to detect formula regions in the image. The formulas are then fed into LaTeX-OCR to recognize their LaTeX expressions. The remaining text regions are recognized by CnOCR. Finally, P2T merges all results to get the full recognized texts. Thanks to these great open-source projects!

For beginners who are not familiar with Python, we also provide the free-to-use P2T Online Service. Just upload your image and it will output the P2T parsing results. The online service uses the latest models and works better than the open-source ones.

If interested, please scan the QR code below to add the assistant WeChat account, and send p2t to get invited to the P2T user group. The group shares the latest updates of P2T and related tools:

WeChat Group QR Code

The author also maintains Planet of Knowledge P2T/CnOCR/CnSTD Private Group, welcome to join. The Planet of Knowledge Private Group will release some P2T/CnOCR/CnSTD related private materials one after another, including non-public models, discount for paid models, answers to problems encountered during usage, etc. This group also releases the latest research materials related to VIE/OCR/STD.

List of Supported Languages

The text recognition engine of Pix2Text supports 80+ languages, including English, Simplified Chinese, Traditional Chinese, Vietnamese, etc. Among these, English and Simplified Chinese recognition utilize the open-source OCR tool CnOCR, while recognition for other languages employs the open-source OCR tool EasyOCR. Special thanks to the respective authors.

List of Supported Languages and Language Codes are shown below:

↓↓↓ Click to show details ↓↓↓
Language Code Name
Abaza abq
Adyghe ady
Afrikaans af
Angika ang
Arabic ar
Assamese as
Avar ava
Azerbaijani az
Belarusian be
Bulgarian bg
Bihari bh
Bhojpuri bho
Bengali bn
Bosnian bs
Simplified Chinese ch_sim
Traditional Chinese ch_tra
Chechen che
Czech cs
Welsh cy
Danish da
Dargwa dar
German de
English en
Spanish es
Estonian et
Persian (Farsi) fa
French fr
Irish ga
Goan Konkani gom
Hindi hi
Croatian hr
Hungarian hu
Indonesian id
Ingush inh
Icelandic is
Italian it
Japanese ja
Kabardian kbd
Kannada kn
Korean ko
Kurdish ku
Latin la
Lak lbe
Lezghian lez
Lithuanian lt
Latvian lv
Magahi mah
Maithili mai
Maori mi
Mongolian mn
Marathi mr
Malay ms
Maltese mt
Nepali ne
Newari new
Dutch nl
Norwegian no
Occitan oc
Pali pi
Polish pl
Portuguese pt
Romanian ro
Russian ru
Serbian (cyrillic) rs_cyrillic
Serbian (latin) rs_latin
Nagpuri sck
Slovak sk
Slovenian sl
Albanian sq
Swedish sv
Swahili sw
Tamil ta
Tabassaran tab
Telugu te
Thai th
Tajik tjk
Tagalog tl
Turkish tr
Uyghur ug
Ukranian uk
Urdu ur
Uzbek uz
Vietnamese vi

Ref: Supported Languages .

Usage

Pix2Text is very simple to use and the following is an example:

from pix2text import Pix2Text, merge_line_texts

img_fp = './docs/examples/formula.jpg'
p2t = Pix2Text()
outs = p2t(img_fp, resized_shape=608)  # # can also use `p2t.recognize(img_fp)`
print(outs)
# To get just the text contents, use: 
only_text = merge_line_texts(outs, auto_line_break=True)
print(only_text)

The returned outs is a dict where position gives the box coordinates, type the predicted type, and text the recognized texts. See API Interfaces for details.

Examples

English

Recognition Results:

Pix2Text Recognizing English

Recognition Command:

$ p2t predict -l en --use-analyzer -a mfd -t yolov7 --analyzer-model-fp ~/.cnstd/1.2/analysis/mfd-yolov7-epoch224-20230613.pt --latex-ocr-model-fp ~/.pix2text/0.3/formula/p2t-mfr-20230702.pth --resized-shape 768 --save-analysis-res out_tmp.jpg --text-ocr-config '{"rec_model_name": "doc-densenet_lite_666-gru_large"}' --auto-line-break -i docs/examples/en1.jpg

Note ⚠️: The above command uses premium models. A free version of the models can also be used as follows, although the results may be slightly inferior:

$ p2t predict -l en --use-analyzer -a mfd -t yolov7_tiny --resized-shape 768 --save-analysis-res out_tmp.jpg --auto-line-break -i docs/examples/en1.jpg

Simplified Chinese

Recognition Results:

Pix2Text Recognizing Simplified Chinese

Recognition Command:

$ p2t predict -l en,ch_sim --use-analyzer -a mfd -t yolov7 --analyzer-model-fp ~/.cnstd/1.2/analysis/mfd-yolov7-epoch224-20230613.pt --latex-ocr-model-fp ~/.pix2text/0.3/formula/p2t-mfr-20230702.pth --resized-shape 768 --save-analysis-res out_tmp.jpg --text-ocr-config '{"rec_model_name": "doc-densenet_lite_666-gru_large"}' --auto-line-break -i docs/examples/mixed.jpg

Note ⚠️: The above command uses premium models. A free version of the models can also be used as follows, although the results may be slightly inferior:

$ p2t predict -l en,ch_sim --use-analyzer -a mfd -t yolov7_tiny --resized-shape 768 --save-analysis-res out_tmp.jpg --auto-line-break -i docs/examples/mixed.jpg

Traditional Chinese

Recognition Results:

Pix2Text Recognizing Traditional Chinese

Recognition Command:

$ p2t predict -l en,ch_tra --use-analyzer -a mfd -t yolov7 --analyzer-model-fp ~/.cnstd/1.2/analysis/mfd-yolov7-epoch224-20230613.pt --latex-ocr-model-fp ~/.pix2text/0.3/formula/p2t-mfr-20230702.pth --resized-shape 768 --save-analysis-res out_tmp.jpg -i docs/examples/ch_tra.jpg

Note ⚠️: The above command uses premium models. A free version of the models can also be used as follows, although the results may be slightly inferior:

$ p2t predict -l en,ch_tra --use-analyzer -a mfd -t yolov7_tiny --resized-shape 768 --save-analysis-res out_tmp.jpg --auto-line-break -i docs/examples/ch_tra.jpg

Vietnamese

Recognition Results:

Pix2Text Recognizing Vietnamese

Recognition Command:

$ p2t predict -l en,vi --use-analyzer -a mfd -t yolov7 --analyzer-model-fp ~/.cnstd/1.2/analysis/mfd-yolov7-epoch224-20230613.pt --latex-ocr-model-fp ~/.pix2text/0.3/formula/p2t-mfr-20230702.pth --resized-shape 768 --save-analysis-res out_tmp.jpg -i docs/examples/vietnamese.jpg

Note ⚠️: The above command uses premium models. A free version of the models can also be used as follows, although the results may be slightly inferior:

$ p2t predict -l en,vi --use-analyzer -a mfd -t yolov7_tiny --resized-shape 768 --save-analysis-res out_tmp.jpg --auto-line-break -i docs/examples/vietnamese.jpg

Model Download

Free Open-source Models

After installing Pix2Text, the system will automatically download the model files and store them in ~/.pix2text directory when you use Pix2Text for the first time (the default path under Windows is C:\Users\<username>\AppData\Roaming\pix2text).

Note

If you have successfully run the above example, the model has completed its automatic download and you can ignore the subsequent contents of this section.

For LaTeX-OCR, the system will also try to automatically download its model files into ~/.pix2text/0.3/formula. If failed, you need to download them from Baidu Cloud and put them under ~/.pix2text/0.3/formula; code: bdbd.

Paid Models

In addition to the above free open-source models, we also trained higher-accuracy formula detection and recognition models for P2T. They are used by the P2T Online Service on which you can try the performance. These models are not free (sorry open-source developers need coffee too🥤). See Pix2Text (P2T) | Breezedeus.com for details.

Install

Well, one line of command is enough if it goes well.

pip install pix2text

If you need to recognize languages other than English and Simplified Chinese, please use the following command to install additional packages:

pip install pix2text[multilingual]

If the installation is slow, you can specify a domestic installation source, such as using the Aliyun source:

pip install pix2text -i https://mirrors.aliyun.com/pypi/simple

If it is your first time to use OpenCV, then probably the installation will not be very easy. Bless.

Pix2Text mainly depends on CnOCR>=2.2.2 , and LaTeX-OCR. If you encounter problems with the installation, you can also refer to their installation instruction documentations.

Warning

If you have never installed the PyTorch, OpenCV python packages before, you may encounter a lot of problems during the first installation, but they are usually common problems that can be solved by Baidu/Google.

API Interfaces

Class Initializer

Main class called Pix2Text , with initialization function:

class Pix2Text(object):

    def __init__(
        self,
        *,
        languages: Union[str, Sequence[str]] = ('en', 'ch_sim'),
        analyzer_config: Dict[str, Any] = None,
        text_config: Dict[str, Any] = None,
        formula_config: Dict[str, Any] = None,
        device: str = 'cpu',  # ['cpu', 'cuda', 'gpu']
        **kwargs,
    ):

The parameters are described as follows:

  • languages (str or Sequence[str]): Sequence of language codes for text recognition; default is ('en', 'ch_sim'), which means it can recognize English and Simplified Chinese;

  • analyzer_config (dict): Configuration for the classifier model. Default to None meaning using default config (MFD Analyzer):

    {
        'model_name': 'mfd' # can be 'mfd' or 'layout'
    }
    
  • text_config (dict): Configuration for the general recognizer. Default to None meaning using default:

    {}
    
  • formula_config (dict): Configuration for the formula recognizer. Default to None meaning using default:

    {}
    
  • device (str): Device for running the code, can be ['cpu', 'cuda', 'gpu']. Default: 'cpu'

  • **kwargs (): Other reserved parameters. Currently not used.

Class Function for Recognition

The text or Latex recognition of one specified image is done by invoking the class function .recognize() of class Pix2Text. The class function .recognize() is described as follows.

    def recognize(
        self, img: Union[str, Path, Image.Image], **kwargs
    ) -> List[Dict[str, Any]]:

where the input parameters are described as follows.

  • img (str or Image.Image): the path of the image to be recognized, or the image Image that has been read by using Image.open().
  • **kwargs: Can contain:
    • resized_shape: Resize image width to this before processing. Default: 700.
    • save_analysis_res: Save analysis visualization to this file/dir. Default: None meaning not saving.
    • embed_sep: LaTeX delimiter for embedded formulas. Only useful with MFD. Default: (' $', '$ ').
    • isolated_sep: LaTeX delimiter for isolated formulas. Only useful with MFD. Default: ('$$\n', '\n$$').
    • det_bbox_max_expand_ratio (float): Expand the height of the detected text bounding box (bbox). This value represents the maximum expansion ratio above and below relative to the original bbox height; default value is 0.2.

It returns a list of dict, each dict contains:

  • type: The category of the recognized image;

    • For MFD Analyzer (Mathematical Formula Detection), the values can be text (pure text), isolated (mathematical formulas in isolated lines), or embedding (mathematical formulas embedded in lines).
    • For Layout Analyzer (Layout Analysis), the values correspond to the categories of layout analysis results.
  • text: Recognized text or latex.

  • position: Detected box coordinates, np.ndarray, with shape [4, 2].

  • line_number: Exists only when using MFD Analyzer. Indicates the line number (starting from 0) of the box. Boxes with the same line_number are on the same line.

    Note: This is new since P2T v0.2.3. Not in previous versions.

The Pix2Text class also implements the __call__() function, which does exactly the same thing as the .recognize() function. So you can call it like:

from pix2text import Pix2Text, merge_line_texts

img_fp = './docs/examples/formula.jpg'
p2t = Pix2Text(analyzer_config=dict(model_name='mfd'))
outs = p2t(img_fp, resized_shape=608) # Equal to p2t.recognize()
print(outs)
# To get just the text contents, use: 
only_text = merge_line_texts(outs, auto_line_break=True)
print(only_text)

Recognizing Pure Text Images

The class method .recognize_text() of the class Pix2Text is used to perform text recognition on specified images. In this case, Pix2Text provides general text recognition functionality. The class function .recognize_text() is described as follows:

    def recognize_text(
        self, img: Union[str, Path, Image.Image], **kwargs
    ) -> str:

The input parameters are explained as follows:

  • img (str or Image.Image): The path of the image to be recognized, or an Image object already read using Image.open().
  • kwargs: Other parameters passed to the text recognition interface.

The return result is the recognized text string.

Recognizing Pure Formula Images

The class method .recognize_formula() of the class Pix2Text is used to recognize mathematical formulas in specified images and convert them into Latex representation. The class function .recognize_formula() is described as follows:

    def recognize_formula(self, img: Union[str, Path, Image.Image]) -> str:

The input parameters are explained as follows:

  • img (str or Image.Image): The path of the image to be recognized, or an Image object already read using Image.open().

The return result is the recognized Latex representation string.

Script Usage

P2T includes the following command-line tools.

Recognizing a single image or all images in a directory

Use the p2t predict command to predict a single image or all images in a directory. Below is the usage guide:

$ p2t predict -h
Usage: p2t predict [OPTIONS]

  Use Pix2Text (P2T) to predict the text information in an image

Options:
  --use-analyzer / --no-use-analyzer
                                  Whether to use MFD (Mathematical Formula
                                  Detection) or Layout Analysis  [default:
                                  use-analyzer]
  -l, --languages TEXT            Languages for Text-OCR to recognize,
                                  separated by commas  [default: en,ch_sim]
  -a, --analyzer-name [mfd|layout]
                                  Which Analyzer to use, either MFD or Layout
                                  Analysis  [default: mfd]
  -t, --analyzer-type TEXT        Which model to use for the Analyzer,
                                  'yolov7_tiny' or 'yolov7'  [default:
                                  yolov7_tiny]
  --analyzer-model-fp TEXT        File path for the Analyzer detection model.
                                  Default: `None`, meaning using the default
                                  model
  --latex-ocr-model-fp TEXT       File path for the Latex-OCR mathematical
                                  formula recognition model. Default: `None`,
                                  meaning using the default model
  --text-ocr-config TEXT          Configuration information for Text-OCR
                                  recognition, in JSON string format. Default:
                                  `None`, meaning using the default
                                  configuration
  -d, --device TEXT               Choose to run the code using `cpu`, `gpu`,
                                  or a specific GPU like `cuda:0`  [default:
                                  cpu]
  --resized-shape INTEGER         Resize the image width to this size before
                                  processing  [default: 608]
  -i, --img-file-or-dir TEXT      File path of the input image or the
                                  specified directory  [required]
  --save-analysis-res TEXT        Save the analysis results to this file or
                                  directory (If '--img-file-or-dir' is a
                                  file/directory, then '--save-analysis-res'
                                  should also be a file/directory). Set to
                                  `None` for not saving
  --rec-kwargs TEXT               kwargs for calling .recognize(), in JSON
                                  string format
  --auto-line-break / --no-auto-line-break
                                  Whether to automatically determine to merge
                                  adjacent line results into a single line
                                  result  [default: no-auto-line-break]
  --log-level TEXT                Log Level, such as `INFO`, `DEBUG`
                                  [default: INFO]
  -h, --help                      Show this message and exit.

This command can be used to print detection and recognition results for the specified image. For example, run:

$ p2t predict --use-analyzer -a mfd --resized-shape 608 -i docs/examples/en1.jpg --save-analysis-res output-en1.jpg

The above command prints the recognition results, and it will also store the detection results in the output-en1.jpg file, similar to the effect below:

P2T Mathematical Formula Detection Effect Image

HTTP Server

Pix2Text adds the FastAPI-based HTTP server. The server requires the installation of several additional packages, which can be installed using the following command.

> pip install pix2text[serve]

Once the installation is complete, the HTTP server can be started with the following command (-p followed by the port, which can be adjusted as needed).

> p2t serve -p 8503

p2t serve command usage guide:

$ p2t serve -h
Usage: p2t serve [OPTIONS]

  Start the HTTP service.

Options:
  -H, --host TEXT     server host  [default: 0.0.0.0]
  -p, --port INTEGER  server port  [default: 8503]
  --reload            whether to reload the server when the codes have been
                      changed
  -h, --help          Show this message and exit.

After the service starts, you can call the service in the following ways.

Command Line

As an example, if the file to be recognized is docs/examples/mixed.jpg, use curl to invoke the server:

$ curl -F image=@docs/examples/mixed.jpg --form 'use_analyzer=true' --form 'resized_shape=600' http://0.0.0.0:8503/pix2text

Python

To call the service, refer to the following method in the file scripts/try_service.py:

import requests

url = 'http://0.0.0.0:8503/pix2text'

image_fp = 'docs/examples/mixed.jpg'
data = {
    "use_analyzer": True,
    "resized_shape": 608,
    "embed_sep": " $,$ ",
    "isolated_sep": "$$\n, \n$$"
}
files = {
    "image": (image_fp, open(image_fp, 'rb'))
}

r = requests.post(url, data=data, files=files)

outs = r.json()['results']
only_text = '\n'.join([out['text'] for out in outs])
print(f'{only_text=}')

Other Language

Please refer to the curl format for your own implementation.

Use Script

Script scripts/screenshot_daemon.py automatically invokes Pixe2Text to recognize formulas or texts on screenshot images. How does this work?

Here's the process (please install Pix2Text first):

  1. Find one favorite screenshot tool that supports storing screenshot images in a specified folder. For example, the free Xnip for Mac works very well.

  2. In addition to installing Pix2Text, you need to install an additional Python package pyperclip, which you can use to copy the recognition results into the system clipboard:

    $ pip install pyperclip
    
  3. Download the script file scripts/screenshot_daemon.py to your computer, edit the line where "SCREENSHOT_DIR" is located (line 17) and change the path to the directory where your screenshot images are stored.

  4. Run this script.

    $ python scripts/screenshot_daemon.py
    

Alright, now give it a shot using your screenshot software. Once you've taken the screenshot, the recognition results will be written to your computer's clipboard. Simply press Ctrl-V / Cmd-V to paste and use it.

For a more detailed introduction, please refer to the video: "Pix2Text: A Free Python Open Source Tool to Replace Mathpix".

A cup of coffee for the author

It is not easy to maintain and evolve the project, so if it is helpful to you, please consider offering the author a cup of coffee 🥤.


Official code base: https://github.com/breezedeus/pix2text. Please cite it properly.

For more information on Pix2Text (P2T), visit: https://www.breezedeus.com/pix2text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pix2text-0.3.tar.gz (56.6 kB view hashes)

Uploaded Source

Built Distribution

pix2text-0.3-py3-none-any.whl (57.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page