Skip to main content

Package to decode and extract invoice metadata from an AFIP CAE qr code link

Project description

PyPI python package

AFIP invoice pdf qr CAE extract and decode

This is a python package that uses pdf2image to convert the first page of your AFIP invoice with an AFIP CAE QR code to an image, and then run qreader on it in order to locate and decode the AFIP CAE QR code in order to extract relevant invoice metadata like:

  • Invoice date
  • CUIT of invoice creator
  • AFIP electronic invoice point of sale (Punto de venta)
  • Invoice number
  • Amount
  • Currency
  • CUIT of inovoice recipient

And other less important properties.

Why qreader instead of pyzbar

In its inception this library used just pyzbar, however we came upon some QR codes which did not decode succesfully using just pyzbar.

qreader depends on pyzbar, but uses a pre-trained AI model to detect and segment QR codes, using information extracted by this AI model, it applies different image preprocessing techniques that heavily increase the decoding rate by pyzbar

Example Usage and notes about metadata

Using the included sample files for demonstration (and ran from repository root using included sample file):

from afipcaeqrdecode import get_cae_metadata

invoice_metadata = get_cae_metadata('./tests/sample_files/2000005044986390.pdf')

Here, invoice metadata will evaluate to:

{
    "ver":1,
    "fecha":"2023-02-10", #I've found this field to be missing in some decodes
    "cuit":30710145764,
    "ptoVta":4,
    "tipoCmp":1,
    "nroCmp":25399,
    "importe":2460,
    "moneda":"PES",
    "ctz":1,
    "tipoDocRec":80,
    "nroDocRec":30717336905,
    "tipoCodAut":"E",
    "codAut":73064176949471
}

#The actual output will not be pretty printed, it will be stripped of all whitespace and formatting characters

Salvaging bad QR code z-indexing on invoices, bad AFIP CAE urls, and bad JSON

Some bad PDFs have other images overlapping on the AFIP CAE QR code, so we implemented a second run codepath that uses PyMuPDF in order to extract all images inside the invoices and then run qreader on them.

In cases in which the construction of the AFIP CAE QR url was done incorrectly or have some parts missing, we try to decode anyways.

We came upon many decoded metadatas with bad json that had to be repaired in the consumer application, with this in mind we included [json-repair] (https://pypi.org/project/json-repair/) by and turn it on by default.

System Dependencies and their installation

This package depends on qreader, which in turn depends on pyzbar, which in turn depends on the system library zbar ZBar

Check your OS documentation on what package to install to get ZBar working with pyzbar.

On Linux (Ubuntu 22.04):

sudo apt-get install libzbar0

On Mac OS X:

brew install zbar

Installation using pip

After installing system dependencies, you can install using the PyPI python package

pip install afipcaeqrdecode

First run notice

On first run qreader will download the weights to run its QR detector AI model, then it will resume program operation automatically.

WARNING

This is an experimental package, USE IN PRODUCTION AT YOUR OWN RISK.

It is barely even tested, i'm sharing it so I can actually import it as a PyPI package in another project that consumes it.

Credits

All the other library authors this package depends on. Facundo Mainere for helping with JWT decode.

Author: Emiliano Mesquita.

License

GNU LGPLv3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

afipcaeqrdecode-0.0.14.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

afipcaeqrdecode-0.0.14-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file afipcaeqrdecode-0.0.14.tar.gz.

File metadata

  • Download URL: afipcaeqrdecode-0.0.14.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for afipcaeqrdecode-0.0.14.tar.gz
Algorithm Hash digest
SHA256 b6cad5138d8ebd507e8f978b408220a692444b39796ccdf06b9cef2d64f3ed44
MD5 4361ac02b6f5f4028e800e4ab4cd3183
BLAKE2b-256 54fc7a0e6362bf823dae4c4559007cd4225be09fba921def4d23de25df8ab0ec

See more details on using hashes here.

File details

Details for the file afipcaeqrdecode-0.0.14-py3-none-any.whl.

File metadata

File hashes

Hashes for afipcaeqrdecode-0.0.14-py3-none-any.whl
Algorithm Hash digest
SHA256 52da1635ca6313b6152cb289de0ed56a254d62a7ed3e0b4c24276bc75302613e
MD5 606559ed54e40f930bcedacec941ae13
BLAKE2b-256 c6969ff472b29cfb33f9758b039a5c66c803f998003400e3801b9f678ef62c1e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page