Python parser to extract data from pdf invoice

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

bosd manuelriel

These details have not been verified by PyPI

Project links

documentation

Project description

invoice2data

Data extractor for PDF invoices - invoice2data

A command line tool and Python library that automates the extraction of key information from invoices to support your accounting process. The library is very flexible and can be used on other types of business documents as well.

In essence, invoice2data simplifies getting data from invoices by:

Automating text extraction — no more manual copying and pasting.
Using templates for structure — handles different invoice layouts.
Providing structured output — data ready for analysis or further processing.

This makes it a valuable tool for businesses and developers dealing with a large volume of invoices, saving time and reducing manual-entry errors. It:

extracts text from PDF files with a pluggable, cascading backend — pdfium (default, no system deps), pdftotext, text, pdfminer, pdfplumber, or OCR (tesseract, ocrmypdf, docTR, paddleocr, gvision).
searches for regex in the result using a YAML or JSON-based template system (with an optional AI fallback).
saves results as CSV, JSON or XML, or renames PDF files to match the content.

With the flexible template system you can:

precisely match content PDF files
plugins available to match line items and tables
define static fields that are the same for every invoice
define custom fields needed in your organisation or process
have multiple regex per field (if layout or wording changes)
define currency
extract invoice-items using the lines-plugin developed by Holger Brunn

Go from PDF files to this:

{'issuer': 'QualityHosting', 'amount': 34.73, 'date': datetime.datetime(2014, 5, 7, 0, 0), 'invoice_number': '30064443', 'currency': 'EUR', 'desc': 'Invoice 30064443 from QualityHosting', 'template_name': 'com.qualityhosting.yml'}
{'issuer': 'Amazon EU', 'amount': 35.24, 'date': datetime.datetime(2014, 6, 4, 0, 0), 'invoice_number': 'EUVINS1-OF5-DE-120725895', 'currency': 'EUR', 'desc': 'Invoice EUVINS1-OF5-DE-120725895 from Amazon EU'}
{'issuer': 'Amazon Web Services', 'amount': 4.11, 'date': datetime.datetime(2014, 8, 3, 0, 0), 'invoice_number': '42183017', 'currency': 'USD', 'desc': 'Invoice 42183017 from Amazon Web Services'}
{'issuer': 'Envato', 'amount': 101.0, 'date': datetime.datetime(2015, 1, 28, 0, 0), 'invoice_number': '12429647', 'currency': 'USD', 'desc': 'Invoice 12429647 from Envato'}

Quickstart

pip install invoice2data
invoice2data invoice.pdf                          # extract -> CSV
invoice2data --output-format json invoice.pdf     # or JSON / XML

As a Python library:

from invoice2data import extract_data

result = extract_data("invoice.pdf")

No system libraries are required by default — the pdfium backend bundles its own engine. Optional backends and extras (poppler, OCR, AI, ...) are covered in the installation guide.

Documentation

Full documentation: https://invoice2data.readthedocs.io/

How it works — the extraction pipeline
Installation — backends, OCR and optional extras
Usage — all CLI options and common tasks
Template creation — write templates for your invoices
Recommended fields — the canonical output schema
AI features — optional LLM fallback & template generation
FAQ — including a comparison with other tools

Development

If you are interested in improving this project, have a look at our contributor guide to get you started quickly.

Roadmap and open tasks

integrate with online OCR?
try to 'guess' parameters for new invoice formats.
apply machine learning to guess new parameters / template creation
Data cleanup per field
advanced table parsing with pypdf_table_extraction

Maintainers

Contributors and Credits

Harshit Joshi: As Google Summer of Code student.
Holger Brunn: Add support for parsing invoice items.

Contributions are very welcome. To learn more, see the Contributor Guide.

Used By

Odoo, OCA module account_invoice_import_invoice2data

Related Projects

OCR-Invoice (FOSS | C#)
DeepLogic AI (Commercial | SaaS)
Docparser (Commercial | Web Service)
A-PDF (Commercial)
PDFdeconstruct (Commercial)
CVision (Commercial)

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

bosd manuelriel

These details have not been verified by PyPI

Project links

documentation

Release history Release notifications | RSS feed

This version

1.0.0

Jun 23, 2026

0.5.0

May 23, 2026

0.4.7

May 22, 2026

0.4.6

May 22, 2026

0.4.5

Nov 26, 2023

0.4.4

Apr 8, 2023

0.4.3

Mar 31, 2023

0.4.2

Feb 11, 2023

0.4.1

Feb 6, 2023

0.4.0

Dec 12, 2022

0.3.6

Jun 21, 2021

0.3.5

Aug 21, 2019

0.3.4

Jun 21, 2019

0.3.3

Jan 30, 2019

0.3.2

Nov 28, 2018

0.3.1

Nov 9, 2018

0.2.103

Nov 9, 2018

0.2.101

Sep 8, 2018

0.2.100

Aug 17, 2018

0.2.99

Aug 9, 2018

0.2.98

Jun 8, 2018

0.2.97

May 27, 2018

0.2.96

May 27, 2018

0.2.95

May 27, 2018

0.2.94

May 27, 2018

0.2.93

May 24, 2018

0.2.92

May 22, 2018

0.2.91

May 21, 2018

0.2.90

May 21, 2018

0.2.89

May 20, 2018

0.2.88

May 15, 2018

0.2.87

May 15, 2018

0.2.86

May 14, 2018

0.2.85

May 13, 2018

0.2.84

May 6, 2018

0.2.83

May 2, 2018

0.2.82

Apr 20, 2018

0.2.81

Mar 20, 2018

0.2.80

Mar 19, 2018

0.2.79

Mar 18, 2018

0.2.78

Mar 15, 2018

0.2.77

Mar 14, 2018

0.2.76

Feb 26, 2018

0.2.75

Feb 26, 2018

0.2.74

Feb 17, 2018

0.2.73

Feb 16, 2018

0.2.72

Feb 15, 2018

0.2.71

Feb 15, 2018

0.2.70

Jan 23, 2018

0.2.69

Jan 10, 2018

0.2.67

Dec 1, 2017

0.2.66

Nov 7, 2017

0.2.65

Oct 3, 2017

0.2.64

Sep 29, 2017

0.2.63

Sep 29, 2017

0.2.62

Sep 26, 2017

0.2.61

Aug 31, 2017

0.2.59

Jul 4, 2017

0.2.58

Jun 20, 2017

0.2.56

Jun 14, 2017

0.2.55

May 31, 2017

0.2.54

May 24, 2017

0.2.53

May 18, 2017

0.2.51

Mar 29, 2017

0.2.49

Mar 23, 2017

0.2.47

Mar 8, 2017

0.2.45

Mar 8, 2017

0.2.44

Mar 8, 2017

0.2.43

Feb 3, 2017

0.2.42

Jan 23, 2017

0.2.41

Jan 4, 2017

0.2.40

Dec 29, 2016

0.2.39

Dec 16, 2016

0.2.38

Nov 13, 2016

0.2.36

Oct 6, 2016

0.2.34

Oct 4, 2016

0.2.33

Sep 30, 2016

0.2.31

Sep 30, 2016

0.2.30

Sep 28, 2016

0.2.29

Jun 25, 2016

0.2.28

Jun 7, 2016

0.2.27

May 25, 2016

0.2.26

May 14, 2016

0.2.25

May 14, 2016

0.2.24

May 14, 2016

0.2.22

May 14, 2016

0.2.21

May 14, 2016

0.2.20

May 14, 2016

0.2.19

May 14, 2016

0.2.18

May 14, 2016

0.2.17

May 14, 2016

0.2.16

May 14, 2016

0.2.15

May 14, 2016

0.2.14

Apr 3, 2016

0.2.13

Apr 2, 2016

0.2.10

Apr 2, 2016

0.2.9

Apr 2, 2016

0.2.8

Apr 2, 2016

0.2.5

Mar 30, 2016

0.2.4

Mar 30, 2016

0.2.3

Mar 30, 2016

0.2.2

Mar 30, 2016

0.2.1

Mar 30, 2016

0.2.0

Jan 23, 2016

0.1.2

Jan 2, 2016

0.0.1

Dec 26, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

invoice2data-1.0.0.tar.gz (171.6 kB view details)

Uploaded Jun 23, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

invoice2data-1.0.0-cp313-cp313-win_amd64.whl (314.0 kB view details)

Uploaded Jun 23, 2026 CPython 3.13Windows x86-64

invoice2data-1.0.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (444.9 kB view details)

Uploaded Jun 23, 2026 CPython 3.13manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

invoice2data-1.0.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (438.8 kB view details)

Uploaded Jun 23, 2026 CPython 3.13manylinux: glibc 2.17+ ARM64manylinux: glibc 2.28+ ARM64

invoice2data-1.0.0-cp313-cp313-macosx_11_0_arm64.whl (356.1 kB view details)

Uploaded Jun 23, 2026 CPython 3.13macOS 11.0+ ARM64

invoice2data-1.0.0-cp313-cp313-macosx_10_13_x86_64.whl (360.9 kB view details)

Uploaded Jun 23, 2026 CPython 3.13macOS 10.13+ x86-64

invoice2data-1.0.0-cp312-cp312-win_amd64.whl (314.4 kB view details)

Uploaded Jun 23, 2026 CPython 3.12Windows x86-64

invoice2data-1.0.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (447.6 kB view details)

Uploaded Jun 23, 2026 CPython 3.12manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

invoice2data-1.0.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (441.7 kB view details)

Uploaded Jun 23, 2026 CPython 3.12manylinux: glibc 2.17+ ARM64manylinux: glibc 2.28+ ARM64

invoice2data-1.0.0-cp312-cp312-macosx_11_0_arm64.whl (357.1 kB view details)

Uploaded Jun 23, 2026 CPython 3.12macOS 11.0+ ARM64

invoice2data-1.0.0-cp312-cp312-macosx_10_13_x86_64.whl (361.8 kB view details)

Uploaded Jun 23, 2026 CPython 3.12macOS 10.13+ x86-64

invoice2data-1.0.0-cp311-cp311-win_amd64.whl (313.5 kB view details)

Uploaded Jun 23, 2026 CPython 3.11Windows x86-64

invoice2data-1.0.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (438.8 kB view details)

Uploaded Jun 23, 2026 CPython 3.11manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

invoice2data-1.0.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (434.4 kB view details)

Uploaded Jun 23, 2026 CPython 3.11manylinux: glibc 2.17+ ARM64manylinux: glibc 2.28+ ARM64

invoice2data-1.0.0-cp311-cp311-macosx_11_0_arm64.whl (356.3 kB view details)

Uploaded Jun 23, 2026 CPython 3.11macOS 11.0+ ARM64

invoice2data-1.0.0-cp311-cp311-macosx_10_9_x86_64.whl (359.5 kB view details)

Uploaded Jun 23, 2026 CPython 3.11macOS 10.9+ x86-64

invoice2data-1.0.0-cp310-cp310-win_amd64.whl (313.5 kB view details)

Uploaded Jun 23, 2026 CPython 3.10Windows x86-64

invoice2data-1.0.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (437.7 kB view details)

Uploaded Jun 23, 2026 CPython 3.10manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

invoice2data-1.0.0-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (432.9 kB view details)

Uploaded Jun 23, 2026 CPython 3.10manylinux: glibc 2.17+ ARM64manylinux: glibc 2.28+ ARM64

invoice2data-1.0.0-cp310-cp310-macosx_11_0_arm64.whl (357.3 kB view details)

Uploaded Jun 23, 2026 CPython 3.10macOS 11.0+ ARM64

invoice2data-1.0.0-cp310-cp310-macosx_10_9_x86_64.whl (360.4 kB view details)

Uploaded Jun 23, 2026 CPython 3.10macOS 10.9+ x86-64

File details

Details for the file invoice2data-1.0.0.tar.gz.

File metadata

Download URL: invoice2data-1.0.0.tar.gz
Upload date: Jun 23, 2026
Size: 171.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for invoice2data-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`1f3cb559b7e7aa7ee4f7d7dc061e4d45cfd84de47c0f34c45cd5e7b10fce018e`
MD5	`84c990f125acfbc4cbe6d526ed9b1571`
BLAKE2b-256	`12704a4fc5d618b49abed3ac557d3375c22d2f131c83a02d1d06d3afc377a575`

Algorithm	Hash digest
SHA256	`a3596826e2852b800589859ed0cd0cac7adc5605ac0c853f2f7fffb8b87cc1a8`
MD5	`4d206b9e44cbd336cdbe3c03dbe75896`
BLAKE2b-256	`290acfd1fe3aa1697e0068d08144793a680a81596e1b68d25f2fde6df33f322f`

Algorithm	Hash digest
SHA256	`191dfc4a605e578b017b0246b60c70e348a9b9c52525e26dbd28793c46bd79ae`
MD5	`1ccb6b68d86b768a2c552e582245fd3b`
BLAKE2b-256	`3d16fcceb43868eb19aec0d9513fe19985e92757b3f18d104b8aacb6723775bd`

Algorithm	Hash digest
SHA256	`a47ba6fcaf8a9d1be60dd6cee12561467138590c5ba6eb44721ab2bebf4300b1`
MD5	`c125b53368692ab64b04745ae4a23d19`
BLAKE2b-256	`3bcb48f546da3826284b3836bef6d5946740b1ca881c4c9cfe89e5d7ad67b9b7`

Algorithm	Hash digest
SHA256	`83dfce30860de002224e9f6d6566aa6876a87efeb70bace8db0e6b4b314b279a`
MD5	`48f776be110026d9b560236a1f141821`
BLAKE2b-256	`c97db8ac9b19af3bd55bf4cd5187631f62575ab8b8c9f1b987b40e2ea95fd554`

Algorithm	Hash digest
SHA256	`83f39cc77f14880d2a957b73acf99b55689cb9f6e13ab2b6335226be8397b678`
MD5	`22c01672ed83312bd85f4209c1e6e8a0`
BLAKE2b-256	`ac578684082c03cf8636ff3334469ef38bfda8d84c0dc619a9507f28fc4d515a`

Algorithm	Hash digest
SHA256	`20d555f2b2da954060efa43db95b6a7b9daa5f761a9f3df5623fae889f0299f3`
MD5	`7263434f6c219230979c74aae10b8c20`
BLAKE2b-256	`e171d6ac86514b386e927aec621b98cf48adefefebaea511321e630c2817f5e1`

Algorithm	Hash digest
SHA256	`5cade2be43bbc23822347c252c16b465402135ecf8095bc073c2474a93c42edf`
MD5	`3d5ec6046a9574cb838a52eb0f33527e`
BLAKE2b-256	`8f4bba17065390edad15f5a4e32f0fc126a362b32b5f4ae741f6a359e23d7120`

Algorithm	Hash digest
SHA256	`0836b757d55284166f407930a49bac1e05baf11281a18330cbf7bd69369ee44d`
MD5	`a023052c245f822af640b545bfaa1a8f`
BLAKE2b-256	`1362a50de4e903224b7d2b1b957c6a5e282f7133531360436d4efcd84ba22672`

Algorithm	Hash digest
SHA256	`694df92fe3bafc904ff05c4180d22d58b706a755a5ffab94564da5cdb5025737`
MD5	`6fc152f63dba3b473c41cba5d6412313`
BLAKE2b-256	`fa22c0a102ceb2eb21743679e42f47cdf96ba5ec1d5b4945bdbcb3b01597ff7d`

Algorithm	Hash digest
SHA256	`d7b336690a7c9ed920bfacf2ff9cb288ec44b76e39a6c5bc8c3a591b00e27934`
MD5	`93d65c72e02e1d4eed36ce7ac1c38dae`
BLAKE2b-256	`6a444ce0c5cfd54c0ff7fc1bb5168a740689102632de8a268ef11ad22ed79257`

Algorithm	Hash digest
SHA256	`f199261ca203fb7a4246b22cc56cb628ad8a6118af30f7b752c7ff8dda9097fd`
MD5	`b32ffc53a6367fd4f00f77aa7dce0f51`
BLAKE2b-256	`452d29808d54957c05abb351dac0685087c382187c07871d04425d0219d04f03`

Algorithm	Hash digest
SHA256	`aad1932561c61a51506f6e8e332ef6a36c620decf43c5ddb2603f9ec9118210c`
MD5	`23ca9ee77cbbce49ef6ee186195b8c33`
BLAKE2b-256	`6d8b5926b9c3a1422cae327675871fadf84c193ed2b1b5801e2cb0ee7d88b9bd`

Algorithm	Hash digest
SHA256	`45eba6b095aa9c79bff72ebca9b32af3eaf16ffb99babf81a7d4585b4669c09b`
MD5	`16401988b32323a3b1b9fcc05d469b03`
BLAKE2b-256	`b3025d093a0b2ff9545e1247d1cfe7ef059e4f41074fc0a6c637f5ba1a61207d`

Algorithm	Hash digest
SHA256	`ba04391c3159815ef8074d81bed850635235f6a10e9d3f0fa4219ba3ba9b15d1`
MD5	`f4ff74b02309cb5a92c7945ffa8fafdf`
BLAKE2b-256	`fcae4ae5541e5f1294cd6f8e755baa9773b77b274cc63581e61a8ab4fb5b1b46`

Algorithm	Hash digest
SHA256	`834b853271d5d360014d7e31f45c0c28824a041fbb13d104cde9dc5b2bcf82a4`
MD5	`3b6545fc5a0710da8336b8cf8d133ff5`
BLAKE2b-256	`7cbc5d11b8045a0f19195a2d2156bc252a0f040ba515f0068060420dcb290f89`

Algorithm	Hash digest
SHA256	`2cb584008acfb0bdab7da83d397e7dd12b769c0abf5a0ca7c737625ee5cf5726`
MD5	`73b39b0b4ccad3a5190531eeef70c452`
BLAKE2b-256	`98571121cde8a05c8d957a2101fa9605b31bd60fbc7a3ae4c845cdf4d052aa4d`

Algorithm	Hash digest
SHA256	`4029ad09cc945f49c2b3220ce3b99f900504cb8ce9984093b141a044e9965578`
MD5	`e6e86b0f333ea86e326bb317926c8a72`
BLAKE2b-256	`00449c794123f06ea41ee0bf3964940f7fdc06487d90c3cabe6a4c0462aa67a7`

Algorithm	Hash digest
SHA256	`52a2e3e74b66ed0ee694f450855b208d7eeca4e9d7ae59902df801a10e9a1593`
MD5	`a08132ebbe41b7dc358a202d639c0275`
BLAKE2b-256	`c1656825920847447c890b89b2e3b30d6905f1961c6faa595caea362fc4c247a`

Algorithm	Hash digest
SHA256	`689a487d3f99b0b02b738ef027587a93e478caa3f6a52fc1eb11d3a825d3bbc8`
MD5	`26ecef343f7776b40d03cb505af3b366`
BLAKE2b-256	`45cbb3c2af1bc906b8d2bd3fe3b8d82e9fce4632834beb5c9e40d0502d00d361`

Algorithm	Hash digest
SHA256	`0c38d74a43f537fa9fe7dbcfe6b97779891b6635658650048d14752e0e85e686`
MD5	`655b1f72b10bb13c6f8c9515c61ff364`
BLAKE2b-256	`9321d8576027759e2bd916ea15b225882f79ceb79426f82602795e163174bb41`

invoice2data 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Data extractor for PDF invoices - invoice2data

Quickstart

Documentation

Development

Roadmap and open tasks

Maintainers

Contributors and Credits

Used By

Related Projects

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes