Doxstractor extracts strutured data from text in an easily configurable way.
Project description
Doxtractor 📄➡️📊
Doxtractor is a modular library to extract structured data from documents using LLMs.
There are many situations where you want to extract data such as numbers, text or categories from a bunch of documents. Doxstractor was created with M&A due dilligence in mind. When a company is sold, the prospective buyer will recieve a data room with everything from key employment contracts to real estate leases.
People will then need to go through all these documents and extract key information, such as "How many stock options have been granted?" or "Does this lease contain a break clause?". This data is then first compiled into spreadsheets, and finally written up in the due dilligence report. It is tedious for the people doing it and expensive to the people buying the report.
Tutorial
Install using pip:
pip install doxstractor
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for doxstractor-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f133cbed241f892071c3d46ac1aeb45f2746d23f450cdd21c2efa41195db776 |
|
MD5 | ebb996b17dcd9ddc3abaab5531451a33 |
|
BLAKE2b-256 | 4c46248fb501aba2de27740ff066001dad9131dcbcb11a6ec5af68d84bfef0f6 |