An Open-Source Python3 tool for Optical Character Recognition (OCR) and LaTeX expression extraction from images; a Free Alternative to Mathpix

These details have not been verified by PyPI

Project links

Homepage

Project description

last-release last-commit

👩🏻‍💻 Online Demo | 💬 Contact

中文 | English

Pix2Text

Update 2024.02.26: V1.0 Released

Main Changes:

The Mathematical Formula Recognition (MFR) model employs a new architecture and has been trained on a new dataset, achieving state-of-the-art (SOTA) accuracy. For detailed information, please see: Pix2Text V1.0 New Release: The Best Open-Source Formula Recognition Model | Breezedeus.com.

Update 2024.01.10: V0.3 Released

Major Changes:

Support for recognizing 80+ languages; for a detailed list of supported languages, see List of Supported Languages;
Added domestic sites for automatic model downloads;
Optimized the logic for merging detection boxes.

Update 2023.07.03: V0.2.3 Released

Major changes:

Trained a new formula recognition model for P2T Online Service to use. The new model has higher accuracy, especially for handwritten formulas and multi-line formulas. See: New Formula Recognition Model for Pix2Text | Breezedeus.com.
Optimized the sorting logic of detected boxes and the processing logic of mixed images to make the final recognition results more intuitive.
Optimized the merging logic of recognition results to automatically determine line breaks and paragraph breaks.

See more at: RELEASE.md .

Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix. It can already complete the core functionalities of Mathpix. Starting from V0.2, Pix2Text (P2T) supports recognizing mixed images containing both text and formulas, with output similar to Mathpix. The core principles of P2T are shown below (text recognition supports both Chinese and English):

P2T utilizes the open-source tool CnSTD to detect the locations of mathematical formulas in images. These detected areas are then processed by P2T's own formula recognition engine (LatexOCR) to recognize the LaTeX representation of each mathematical formula. The remaining parts of the image are processed by a text recognition engine (CnOCR or EasyOCR) for text detection and recognition. Finally, P2T merges all recognition results to obtain the final image recognition outcome. Thanks to these great open-source projects!

For beginners who are not familiar with Python, we also provide the free-to-use P2T Online Service. Just upload your image and it will output the P2T parsing results. The online service uses the latest models and works better than the open-source ones.

If interested, please scan the QR code below to add the assistant WeChat account, and send p2t to get invited to the P2T user group. The group shares the latest updates of P2T and related tools:

The author also maintains Planet of Knowledge P2T/CnOCR/CnSTD Private Group, welcome to join. The Planet of Knowledge Private Group will release some P2T/CnOCR/CnSTD related private materials one after another, including non-public models, discount for paid models, answers to problems encountered during usage, etc. This group also releases the latest research materials related to VIE/OCR/STD.

List of Supported Languages

The text recognition engine of Pix2Text supports 80+ languages, including English, Simplified Chinese, Traditional Chinese, Vietnamese, etc. Among these, English and Simplified Chinese recognition utilize the open-source OCR tool CnOCR, while recognition for other languages employs the open-source OCR tool EasyOCR. Special thanks to the respective authors.

List of Supported Languages and Language Codes are shown below:

↓↓↓ Click to show details ↓↓↓

Language	Code Name
Abaza	abq
Adyghe	ady
Afrikaans	af
Angika	ang
Arabic	ar
Assamese	as
Avar	ava
Azerbaijani	az
Belarusian	be
Bulgarian	bg
Bihari	bh
Bhojpuri	bho
Bengali	bn
Bosnian	bs
Simplified Chinese	ch_sim
Traditional Chinese	ch_tra
Chechen	che
Czech	cs
Welsh	cy
Danish	da
Dargwa	dar
German	de
English	en
Spanish	es
Estonian	et
Persian (Farsi)	fa
French	fr
Irish	ga
Goan Konkani	gom
Hindi	hi
Croatian	hr
Hungarian	hu
Indonesian	id
Ingush	inh
Icelandic	is
Italian	it
Japanese	ja
Kabardian	kbd
Kannada	kn
Korean	ko
Kurdish	ku
Latin	la
Lak	lbe
Lezghian	lez
Lithuanian	lt
Latvian	lv
Magahi	mah
Maithili	mai
Maori	mi
Mongolian	mn
Marathi	mr
Malay	ms
Maltese	mt
Nepali	ne
Newari	new
Dutch	nl
Norwegian	no
Occitan	oc
Pali	pi
Polish	pl
Portuguese	pt
Romanian	ro
Russian	ru
Serbian (cyrillic)	rs_cyrillic
Serbian (latin)	rs_latin
Nagpuri	sck
Slovak	sk
Slovenian	sl
Albanian	sq
Swedish	sv
Swahili	sw
Tamil	ta
Tabassaran	tab
Telugu	te
Thai	th
Tajik	tjk
Tagalog	tl
Turkish	tr
Uyghur	ug
Ukranian	uk
Urdu	ur
Uzbek	uz
Vietnamese	vi

Ref: Supported Languages .

Usage

Recognizing Mixed Images with Both Text and Formulas

For mixed images containing both text and mathematical formulas, use the .recognize() function to identify the text and mathematical formulas in the image. For example, for the following image (docs/examples/en1.jpg):

The method is as follows:

from pix2text import Pix2Text, merge_line_texts

img_fp = './docs/examples/en1.jpg'
p2t = Pix2Text()
outs = p2t.recognize(img_fp, resized_shape=608)  # You can also use `p2t(img_fp)` to get the same result
print(outs)
# If you only need the recognized texts and LaTeX representations, use the following line of code to merge all results
only_text = merge_line_texts(outs, auto_line_break=True)
print(only_text)

The returned result outs is a dict, where the key position indicates Box location information, type indicates the category information, and text represents the recognition result. For more details, see API Interfaces.

Recognizing Pure Formula Images

For images containing only mathematical formulas, the function .recognize_formula() can be used to recognize the mathematical formula as a LaTeX expression. For example, for the following image (docs/examples/math-formula-42.png):

The method is as follows:

from pix2text import Pix2Text

img_fp = './docs/examples/math-formula-42.png'
p2t = Pix2Text()
outs = p2t.recognize_formula(img_fp)
print(outs)

The returned result is a string, which is the corresponding LaTeX expression. For more details, see API Interfaces.

Recognizing Pure Text Images

For images that contain only text and no mathematical formulas, the function .recognize_text() can be used to recognize the text in the image. In this case, Pix2Text acts as a general text OCR engine. For example, for the following image (docs/examples/general.jpg):

The method is as follows:

from pix2text import Pix2Text

img_fp = './docs/examples/general.jpg'
p2t = Pix2Text()
outs = p2t.recognize_text(img_fp)
print(outs)

The returned result is a string, which is the corresponding sequence of text. For more details, see API Interfaces.

Examples

English

Recognition Results:

Pix2Text Recognizing English

Recognition Command:

$ p2t predict -l en -a mfd -t yolov7 --analyzer-model-fp ~/.cnstd/1.2/analysis/mfd-yolov7-epoch224-20230613.pt --formula-ocr-config '{"model_name":"mfr-pro","model_backend":"onnx"}' --resized-shape 768 --save-analysis-res out_tmp.jpg --text-ocr-config '{"rec_model_name": "doc-densenet_lite_666-gru_large"}' --auto-line-break -i docs/examples/en1.jpg

Note ⚠️: The above command uses premium models. A free version of the models can also be used as follows, although the results may be slightly inferior:
$ p2t predict -l en -a mfd -t yolov7_tiny --resized-shape 768 --save-analysis-res out_tmp.jpg --auto-line-break -i docs/examples/en1.jpg

Simplified Chinese

Recognition Results:

Pix2Text Recognizing Simplified Chinese

Recognition Command:

$ p2t predict -l en,ch_sim -a mfd -t yolov7 --analyzer-model-fp ~/.cnstd/1.2/analysis/mfd-yolov7-epoch224-20230613.pt --formula-ocr-config '{"model_name":"mfr-pro","model_backend":"onnx"}' --resized-shape 768 --save-analysis-res out_tmp.jpg --text-ocr-config '{"rec_model_name": "doc-densenet_lite_666-gru_large"}' --auto-line-break -i docs/examples/mixed.jpg

Note ⚠️: The above command uses premium models. A free version of the models can also be used as follows, although the results may be slightly inferior:
$ p2t predict -l en,ch_sim -a mfd -t yolov7_tiny --resized-shape 768 --save-analysis-res out_tmp.jpg --auto-line-break -i docs/examples/mixed.jpg

Traditional Chinese

Recognition Results:

Pix2Text Recognizing Traditional Chinese

Recognition Command:

$ p2t predict -l en,ch_tra -a mfd -t yolov7 --analyzer-model-fp ~/.cnstd/1.2/analysis/mfd-yolov7-epoch224-20230613.pt --formula-ocr-config '{"model_name":"mfr-pro","model_backend":"onnx"}' --resized-shape 768 --save-analysis-res out_tmp.jpg -i docs/examples/ch_tra.jpg

Note ⚠️: The above command uses premium models. A free version of the models can also be used as follows, although the results may be slightly inferior:
$ p2t predict -l en,ch_tra -a mfd -t yolov7_tiny --resized-shape 768 --save-analysis-res out_tmp.jpg -i docs/examples/ch_tra.jpg

Vietnamese

Recognition Results:

Pix2Text Recognizing Vietnamese

Recognition Command:

$ p2t predict -l en,vi -a mfd -t yolov7 --analyzer-model-fp ~/.cnstd/1.2/analysis/mfd-yolov7-epoch224-20230613.pt --formula-ocr-config '{"model_name":"mfr-pro","model_backend":"onnx"}' --resized-shape 768 --save-analysis-res out_tmp.jpg -i docs/examples/vietnamese.jpg

Note ⚠️: The above command uses premium models. A free version of the models can also be used as follows, although the results may be slightly inferior:
$ p2t predict -l en,vi -a mfd -t yolov7_tiny --resized-shape 768 --save-analysis-res out_tmp.jpg -i docs/examples/vietnamese.jpg

Model Download

Free Open-source Models

After installing Pix2Text, the system will automatically download the model files and store them in ~/.pix2text/1.0 directory when you use Pix2Text for the first time (the default path under Windows is C:\Users\<username>\AppData\Roaming\pix2text\1.0).

Note

If you have successfully run the above example, the model has completed its automatic download and you can ignore the subsequent contents of this section.

Paid Models

In addition to the above free open-source models, we also trained higher-accuracy formula detection and recognition models for P2T. They are used by the P2T Online Service on which you can try the performance. These models are not free (sorry open-source developers need coffee too🥤). See Pix2Text (P2T) | Breezedeus.com for details.

Install

Well, one line of command is enough if it goes well.

pip install pix2text

If you need to recognize languages other than English and Simplified Chinese, please use the following command to install additional packages:

pip install pix2text[multilingual]

If the installation is slow, you can specify a domestic installation source, such as using the Aliyun source:

pip install pix2text -i https://mirrors.aliyun.com/pypi/simple

If it is your first time to use OpenCV, then probably the installation will not be very easy. Bless.

Pix2Text mainly depends on CnOCR>=2.2.2 , and transformers>=4.37.0. If you encounter problems with the installation, you can also refer to their installation instruction documentations.

Warning

If you have never installed the PyTorch, OpenCV python packages before, you may encounter a lot of problems during the first installation, but they are usually common problems that can be solved by Baidu/Google.

API Interfaces

Class Initializer

Main class called Pix2Text , with initialization function:

class Pix2Text(object):

    def __init__(
        self,
        *,
        languages: Union[str, Sequence[str]] = ('en', 'ch_sim'),
        analyzer_config: Dict[str, Any] = None,
        text_config: Dict[str, Any] = None,
        formula_config: Dict[str, Any] = None,
        device: str = None,
        **kwargs,
    ):

The parameters are described as follows:

languages (str or Sequence[str]): Sequence of language codes for text recognition; default is ('en', 'ch_sim'), which means it can recognize English and Simplified Chinese;
analyzer_config (dict): Configuration for the classifier model. Default to None meaning using default config (MFD Analyzer):
```
{
    'model_name': 'mfd' # can be 'mfd' or 'layout'
}
```
text_config (dict): Configuration for the general recognizer. Default to None meaning using default:
```
{}
```
formula_config (dict): Configuration for the formula recognizer. Default to None meaning using default:
```
{}
```
device (str): Specifies the computing resource to be used. Supports options like ['cpu', 'cuda', 'gpu', 'mps']; the default is None, which indicates automatic selection of the device.
**kwargs (): Other reserved parameters. Currently not used.

Class Function for Recognition

Recognizing Mixed Images containing both Text and Formulas

The text or Latex recognition of one specified image is done by invoking the class function .recognize() of class Pix2Text. The class function .recognize() is described as follows.

    def recognize(
        self, img: Union[str, Path, Image.Image], **kwargs
    ) -> List[Dict[str, Any]]:

where the input parameters are described as follows.

img (str or Image.Image): the path of the image to be recognized, or the image Image that has been read by using Image.open().
**kwargs: Can contain:
- resized_shape: Resize image width to this before processing. Default: 700.
- save_analysis_res: Save analysis visualization to this file/dir. Default: None meaning not saving.
- embed_sep: LaTeX delimiter for embedded formulas. Only useful with MFD. Default: (' $', '$ ').
- isolated_sep: LaTeX delimiter for isolated formulas. Only useful with MFD. Default: ('$$\n', '\n$$').
- det_bbox_max_expand_ratio (float): Expand the height of the detected text bounding box (bbox). This value represents the maximum expansion ratio above and below relative to the original bbox height; default value is 0.2.
- mfr_batch_size (int): The batch size used for MFR (Mathematical Formula Recognition) prediction; the default value is 1.

It returns a list of dict, each dict contains:

type: The category of the recognized image;
- For MFD Analyzer (Mathematical Formula Detection), the values can be text (pure text), isolated (mathematical formulas in isolated lines), or embedding (mathematical formulas embedded in lines).
- For Layout Analyzer (Layout Analysis), the values correspond to the categories of layout analysis results.
text: Recognized text or latex.
position: Detected box coordinates, np.ndarray, with shape [4, 2].
line_number: Exists only when using MFD Analyzer. Indicates the line number (starting from 0) of the box. Boxes with the same line_number are on the same line.

Note: This is new since P2T v0.2.3. Not in previous versions.

The Pix2Text class also implements the __call__() function, which does exactly the same thing as the .recognize() function. So you can call it like:

from pix2text import Pix2Text, merge_line_texts

img_fp = './docs/examples/formula.jpg'
p2t = Pix2Text(analyzer_config=dict(model_name='mfd'))
outs = p2t.recognize(img_fp, resized_shape=608) # Equal to p2t(img_fp, resized_shape=608)
print(outs)
# To get just the text contents, use: 
only_text = merge_line_texts(outs, auto_line_break=True)
print(only_text)

Recognizing Pure Text Images

The class method .recognize_text() of the class Pix2Text is used to perform text recognition on specified images. In this case, Pix2Text provides general text recognition functionality. The class function .recognize_text() is described as follows:

    def recognize_text(
        self,
        imgs: Union[str, Path, Image.Image, List[str], List[Path], List[Image.Image]],
        **kwargs,
    ) -> Union[str, List[str]]:

The input parameters are explained as follows:

imgs (Union[str, Path, Image.Image, List[str], List[Path], List[Image.Image]]): The path of the image(s) to be recognized, or Image objects that has been read in using Image.open(). Supports a single image or a list of multiple images.
kwargs: Other parameters passed to the text recognition interface.

The return result is the recognized text string (when the input is multiple images, a list of the same length is returned).

Recognizing Pure Formula Images

The class method .recognize_formula() of the class Pix2Text is used to recognize mathematical formulas in specified images and convert them into Latex representation. The class function .recognize_formula() is described as follows:

    def recognize_formula(
        self,
        imgs: Union[str, Path, Image.Image, List[str], List[Path], List[Image.Image]],
        batch_size: int = 1,
        **kwargs,
    ) -> Union[str, List[str]]:

The input parameters are explained as follows:

imgs (Union[str, Path, Image.Image, List[str], List[Path], List[Image.Image]]): The path of the image(s) to be recognized, or Image objects that has been read in using Image.open(). Supports a single image or a list of multiple images.
batch_size: The batch size for processing.
kwargs: Additional parameters to be passed to the formula recognition interface.

The return result is the recognized LaTeX representation string (when the input is multiple images, a list of the same length is returned).

Script Usage

P2T includes the following command-line tools.

Recognizing a single image or all images in a directory

Use the p2t predict command to predict a single image or all images in a directory. Below is the usage guide:

$ p2t predict -h
Usage: p2t predict [OPTIONS]

  Use Pix2Text (P2T) to predict the text information in an image

Options:
  -l, --languages TEXT            Language Codes for Text-OCR to recognize,
                                  separated by commas  [default: en,ch_sim]
  -a, --analyzer-name [mfd|layout]
                                  Which Analyzer to use, either MFD or Layout
                                  Analysis  [default: mfd]
  -t, --analyzer-type TEXT        Which model to use for the Analyzer,
                                  'yolov7_tiny' or 'yolov7'  [default:
                                  yolov7_tiny]
  --analyzer-model-fp TEXT        File path for the Analyzer detection model.
                                  Default: `None`, meaning using the default
                                  model
  --formula-ocr-config TEXT       Configuration information for the Latex-OCR
                                  mathematical formula recognition model.
                                  Default: `None`, meaning using the default
                                  configuration
  --text-ocr-config TEXT          Configuration information for Text-OCR
                                  recognition, in JSON string format. Default:
                                  `None`, meaning using the default
                                  configuration
  -d, --device TEXT               Choose to run the code using `cpu`, `gpu`,
                                  or a specific GPU like `cuda:0`  [default:
                                  cpu]
  --image-type [mixed|formula|text]
                                  Which image type to process, either 'mixed',
                                  'formula' or 'text'  [default: mixed]
  --resized-shape INTEGER         Resize the image width to this size before
                                  processing  [default: 608]
  -i, --img-file-or-dir TEXT      File path of the input image or the
                                  specified directory  [required]
  --save-analysis-res TEXT        Save the analysis results to this file or
                                  directory (If '--img-file-or-dir' is a
                                  file/directory, then '--save-analysis-res'
                                  should also be a file/directory). Set to
                                  `None` for not saving
  --rec-kwargs TEXT               kwargs for calling .recognize(), in JSON
                                  string format
  --auto-line-break / --no-auto-line-break
                                  Whether to automatically determine to merge
                                  adjacent line results into a single line
                                  result  [default: no-auto-line-break]
  --log-level TEXT                Log Level, such as `INFO`, `DEBUG`
                                  [default: INFO]
  -h, --help                      Show this message and exit.

This command can be used to print detection and recognition results for the specified image. For example, run:

$ p2t predict -a mfd --resized-shape 608 -i docs/examples/en1.jpg --save-analysis-res output-en1.jpg

The above command prints the recognition results, and it will also store the detection results in the output-en1.jpg file, similar to the effect below:

P2T Mathematical Formula Detection Effect Image

HTTP Server

Pix2Text adds the FastAPI-based HTTP server. The server requires the installation of several additional packages, which can be installed using the following command.

$ pip install pix2text[serve]

Once the installation is complete, the HTTP server can be started with the following command (-p followed by the port, which can be adjusted as needed).

$ p2t serve -l en,ch_sim -a mfd

p2t serve command usage guide:

$ p2t serve -h
Usage: p2t serve [OPTIONS]

  Start the HTTP service.

Options:
  -l, --languages TEXT            Language Codes for Text-OCR to recognize,
                                  separated by commas  [default: en,ch_sim]
  -a, --analyzer-name [mfd|layout]
                                  Which Analyzer to use, either MFD or Layout
                                  Analysis  [default: mfd]
  -t, --analyzer-type TEXT        Which model to use for the Analyzer,
                                  'yolov7_tiny' or 'yolov7'  [default:
                                  yolov7_tiny]
  --analyzer-model-fp TEXT        File path for the Analyzer detection model.
                                  Default: `None`, meaning using the default
                                  model
  --formula-ocr-config TEXT       Configuration information for the LatexOCR
                                  mathematical formula recognition model.
                                  Default: `None`, meaning using the default
                                  configuration
  --text-ocr-config TEXT          Configuration information for Text-OCR
                                  recognition, in JSON string format. Default:
                                  `None`, meaning using the default
                                  configuration
  -d, --device TEXT               Choose to run the code using `cpu`, `gpu`,
                                  or a specific GPU like `cuda:0`  [default:
                                  cpu]
  -H, --host TEXT                 server host  [default: 0.0.0.0]
  -p, --port INTEGER              server port  [default: 8503]
  --reload                        whether to reload the server when the codes
                                  have been changed
  --log-level TEXT                Log Level, such as `INFO`, `DEBUG`
                                  [default: INFO]
  -h, --help                      Show this message and exit.

After the service starts, you can call the service in the following ways.

Python

To call the service, refer to the following method in the file scripts/try_service.py:

import requests

url = 'http://0.0.0.0:8503/pix2text'

image_fp = 'docs/examples/mixed.jpg'
data = {
    "image_type": "mixed",  # "mixed": Mixed image; "formula": Pure formula image; "text": Pure text image
    "resized_shape": 768,  # Effective only when image_type=="mixed"
    "embed_sep": " $,$ ",  # Effective only when image_type=="mixed"
    "isolated_sep": "$$\n, \n$$"  # Effective only when image_type=="mixed"
}
files = {
    "image": (image_fp, open(image_fp, 'rb'))
}

r = requests.post(url, data=data, files=files)

outs = r.json()['results']
if isinstance(outs, str):
    only_text = outs
else:
    only_text = '\n'.join([out['text'] for out in outs])
print(f'{only_text=}')

Curl

Use curl to call the service:

$ curl -F image=@docs/examples/mixed.jpg --form 'image_type=mixed' --form 'resized_shape=768' http://0.0.0.0:8503/pix2text

Other Language

Please refer to the curl format for your own implementation.

A cup of coffee for the author

It is not easy to maintain and evolve the project, so if it is helpful to you, please consider offering the author a cup of coffee 🥤.

Official code base: https://github.com/breezedeus/pix2text. Please cite it properly.

For more information on Pix2Text (P2T), visit: https://www.breezedeus.com/pix2text.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.1.1.2

Jul 18, 2024

1.1.1.1

Jun 24, 2024

1.1.1

Jun 18, 2024

1.1.0.7

Jun 17, 2024

1.1.0.6

Jun 4, 2024

1.1.0.5

May 27, 2024

1.1.0.4

May 20, 2024

1.1.0.3

May 19, 2024

1.1.0.2

May 10, 2024

1.1.0.1

Apr 30, 2024

1.1

Apr 28, 2024

1.0.2.3

Mar 29, 2024

1.0.2.2

Mar 18, 2024

1.0.2.1

Mar 15, 2024

1.0.2

Mar 14, 2024

1.0.1

Mar 3, 2024

This version

1.0

Feb 26, 2024

0.3

Jan 10, 2024

0.2.3.3

Dec 21, 2023

0.2.3.2

Sep 10, 2023

0.2.3.1

Jul 14, 2023

0.2.3

Jul 3, 2023

0.2.2.1

Feb 19, 2023

0.2.2

Feb 19, 2023

0.2.1 yanked

Feb 19, 2023

Reason this release was yanked:

bug: including `breakpoint()`

0.2

Feb 3, 2023

0.1.1

Oct 21, 2022

0.1

Sep 10, 2022

0.0.1

Sep 7, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pix2text-1.0.tar.gz (77.9 kB view hashes)

Uploaded Feb 26, 2024 Source

Built Distribution

pix2text-1.0-py3-none-any.whl (77.2 kB view hashes)

Uploaded Feb 26, 2024 Python 3

Hashes for pix2text-1.0.tar.gz

Hashes for pix2text-1.0.tar.gz
Algorithm	Hash digest
SHA256	`3af37621c44af8f94f31dc783f6c21ed029832dd1a81df6ec43822d2ad76bbaf`
MD5	`ccdb1e6ae2729294907980c34b36ef7d`
BLAKE2b-256	`202fc9e4461ce73f1a311afc3fdd1efb40ae8d4320641d4ecef053270e43aecf`

Hashes for pix2text-1.0-py3-none-any.whl

Hashes for pix2text-1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`06e5dc5ec6cf244dd409c40cd617b32da5e4b9984e1a265b129b005cfb064f37`
MD5	`b60de2bae1d89e7c51ec6e9da3daa889`
BLAKE2b-256	`16f16faf668e061839d8450e50a8cd5c308dfba7288595c3a380c0e86e4580cf`

pix2text 1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Pix2Text

Update 2024.02.26: V1.0 Released

Update 2024.01.10: V0.3 Released

Update 2023.07.03: V0.2.3 Released

List of Supported Languages

Usage

Recognizing Mixed Images with Both Text and Formulas

Recognizing Pure Formula Images

Recognizing Pure Text Images

Examples

English

Simplified Chinese

Traditional Chinese

Vietnamese

Model Download

Free Open-source Models

Paid Models

Install

API Interfaces

Class Initializer

Class Function for Recognition

Recognizing Mixed Images containing both Text and Formulas

Recognizing Pure Text Images

Recognizing Pure Formula Images

Script Usage

Recognizing a single image or all images in a directory

HTTP Server

Python

Curl

Other Language

A cup of coffee for the author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution