Enzyme lineage analysis and sequence extraction package
Project description
DEBase
DEBase is a Python package for extracting and analyzing enzyme lineage data from scientific papers using AI-powered parsing.
Features
- Extract enzyme variant lineages from PDF documents
- Parse protein and DNA sequences with mutation annotations
- Extract reaction performance metrics (yield, TTN, ee)
- Extract and organize substrate scope data
- Match enzyme variants across different data sources using AI
- Generate structured CSV outputs for downstream analysis
Installation
pip install debase
Quick Start
# Run the complete pipeline
debase --manuscript paper.pdf --si supplementary.pdf --output results.csv
# Enable debug mode to save Gemini prompts and responses
debase --manuscript paper.pdf --si supplementary.pdf --output results.csv --debug-dir ./debug_output
# Individual components with debugging
python -m debase.enzyme_lineage_extractor --manuscript paper.pdf --output lineage.csv --debug-dir ./debug_output
python -m debase.reaction_info_extractor --manuscript paper.pdf --lineage-csv lineage.csv --output reactions.csv --debug-dir ./debug_output
python -m debase.substrate_scope_extractor --manuscript paper.pdf --lineage-csv lineage.csv --output substrate_scope.csv --debug-dir ./debug_output
python -m debase.lineage_format -r reactions.csv -s substrate_scope.csv -o final.csv -v
Debugging
Use the --debug-dir flag to save all Gemini API prompts and responses for debugging:
- Location extraction prompts
- Sequence extraction prompts (can be very large, up to 150K characters)
- Enzyme matching prompts
- All API responses with timestamps
- Note: lineage_format.py uses
-vfor verbose output instead of--debug-dir
Requirements
- Python 3.8+
- Google Gemini API key (set as GEMINI_API_KEY environment variable)
Version
0.4.4
License
MIT License
Authors
DEBase Team - Caltech
Contact
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
debase-0.5.0.tar.gz
(148.1 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
debase-0.5.0-py3-none-any.whl
(150.0 kB
view details)
File details
Details for the file debase-0.5.0.tar.gz.
File metadata
- Download URL: debase-0.5.0.tar.gz
- Upload date:
- Size: 148.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81927a43c1a58df7cfd32445ff981a9edcb339c4f1e9a866bc63631bcb0017cc
|
|
| MD5 |
17b38a7b00e380a81f6c6f7047e8e9b5
|
|
| BLAKE2b-256 |
6df09d88225e1f2012339e906098c9c409dc089164a504ed87c1809152bef037
|
File details
Details for the file debase-0.5.0-py3-none-any.whl.
File metadata
- Download URL: debase-0.5.0-py3-none-any.whl
- Upload date:
- Size: 150.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1d0c688746b80c2c7d5e932ad23c8b2760433ea18d261f8ae63d767778dc381
|
|
| MD5 |
4ea1c5dcd90032e44990998895edccd0
|
|
| BLAKE2b-256 |
a9df58e9a8dfefc2fa66b489f3727f923f34a89e93744f66a36d84eccac5ece6
|