Skip to main content

Doxstractor extracts strutured data from text in an easily configurable way.

Project description

Doxtractor 📄➡️📊

Doxtractor is a modular library to extract structured data from documents using LLMs.

There are many situations where you want to extract data such as numbers, text or categories from a bunch of documents. Doxstractor was created with M&A due dilligence in mind. When a company is sold, the prospective buyer will recieve a data room with everything from key employment contracts to real estate leases.

People will then need to go through all these documents and extract key information, such as "How many stock options have been granted?" or "Does this lease contain a break clause?". This data is then first compiled into spreadsheets, and finally written up in the due dilligence report. It is tedious for the people doing it and expensive to the people buying the report.

Tutorial

Install using pip: pip install doxstractor

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doxstractor-0.0.5.tar.gz (12.0 kB view hashes)

Uploaded Source

Built Distribution

doxstractor-0.0.5-py3-none-any.whl (14.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page