Skip to main content

Mr. Franz Cucaracha will be glad to assist you to the document analysis and processing routine

Project description

Documentation Status codecov CI Main CI Develop PyPI version

cucaracha 🪳

Inspired by Franz Kafka's infamous character, Gregor Samsa — an ordinary man who wakes up to find himself transformed into a cockroach in The Metamorphosis. Here, cucaracha embodies the metaphor of a tireless, sometimes bureaucratic helper working tirelessly in the background. In the digital age, Mr. Cucaracha is here to assist you with the complex and often tedious tasks of document processing and analysis.

Meet Mr. Cucaracha: Your Assistant for Digital Document Processing and Analysis

cucaracha is an open-source library crafted to help with digital document analysis and processing. It provides a toolkit for working with both structured and unstructured data, allowing users to collect, transform, and interpret textual content from various document formats, including PDFs and images.

Key Features

  • Text Extraction: Efficiently retrieve text from PDFs and image files, transforming them into usable data.
  • Content Structuring: Process extracted text into structured formats, aiding in more organized data handling and downstream applications.
  • Context Recognition: Perform contextual analysis to interpret and label document content based on intended usage.

The major objective of this project is to offer an accessible, open-source alternative for processing document files, which provides document processing and analysis algorithms to simplify tasks that would traditionally be time-consuming or challenging to automate.

Check it out all the public datasets and ML models used in this project located at Kaggle - Cucaracha Project

Why cucaracha?

The name cucaracha reflects the tireless, behind-the-scenes nature of the tool. Like Kafka's transformed character, Mr. Cucaracha deals with the mundanity and bureaucracy often present in document processing tasks. He's designed to tackle these repetitive and complex tasks with minimal oversight, ensuring efficient and structured data extraction without the typical hurdles of document handling.

Getting Started

Check out the full documentation for detailed instructions on how to use, implement, and keep up with updates to cucaracha.

Contributing to cucaracha

We welcome contributions to cucaracha! To get involved, take a look at the open issues and join us in enhancing Mr. Cucaracha's capabilities. Whether you're here to fix bugs, suggest features, or work on documentation, your input is valuable to the project.

Happy document processing with Mr. Cucaracha! 🪳

How to install

A quick to use install is via pip, as follows:

[!NOTE] The installation requires Python 3.9 or higher

pip install cucaracha

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cucaracha-0.5.0.tar.gz (27.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cucaracha-0.5.0-py3-none-any.whl (37.2 kB view details)

Uploaded Python 3

File details

Details for the file cucaracha-0.5.0.tar.gz.

File metadata

  • Download URL: cucaracha-0.5.0.tar.gz
  • Upload date:
  • Size: 27.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/5.15.0-124-generic

File hashes

Hashes for cucaracha-0.5.0.tar.gz
Algorithm Hash digest
SHA256 95c81076e49d37e7f7ce0a2dc937840283e805b333f9ed43a8372c999920b040
MD5 1040f1665630397378287935e2c95d0e
BLAKE2b-256 f9de7e6011f9bd4a73bb064170dd1d2262e2cf010733ddf2c50891b30ffb8cb9

See more details on using hashes here.

File details

Details for the file cucaracha-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: cucaracha-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 37.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/5.15.0-124-generic

File hashes

Hashes for cucaracha-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7153cbf3785768b3ce7a335e844611cc3d4cdfccb74f6d6c4b78f9f988bfcd2b
MD5 83a8d2c854abe4318b44294bbd7750a3
BLAKE2b-256 461a721e06423d64bede5f0da28d04c42f350d96f6e16c1c9b2a42128350934b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page