MetaBeeAI LLM Pipeline for PDF processing and data extraction

These details have not been verified by PyPI

Project links

Project description

MetaBeeAI Literature Review Pipeline

A comprehensive pipeline for extracting, analyzing, and benchmarking structured information from scientific literature using Large Language Models and Vision AI.

Required API Accounts

Before starting, you need to set up the following API accounts:

Service	Purpose	Sign Up	Cost
OpenAI	LLM processing and evaluation	platform.openai.com	Pay-per-use (model dependent)
LandingLens API	PDF text extraction with vision AI	landing.ai	US $0.03 per page

Setting Up API Keys

Create a .env file in the project root:

# Copy the example file
cp env.example .env

# Edit .env and add your keys:
OPENAI_API_KEY=sk-proj-...your_key_here
LANDING_AI_API_KEY=...your_key_here

The .env file is automatically excluded from git for security.

Quick Start

1. Install Dependencies

Choose the option that best fits your workflow.

Option A – Install from PyPI (recommended for using the CLI/package directly):

# Create and activate a virtual environment (recommended)
python -m venv venv
source venv/bin/activate        # Mac/Linux
# Or: venv\Scripts\activate     # Windows

# Install the published package
pip install metabeeai

# Verify the CLI is available
metabeeai --help

Option B – Install for development (editable install from the repo):

# Clone the repository if you have not already
git clone https://github.com/MetaBeeAI/MetaBeeAI.git
cd MetaBeeAI

# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate        # Mac/Linux
# Or: venv\Scripts\activate     # Windows

# Install in editable mode with project dependencies
pip install -e .
# (Alternatively, pip install -r requirements.txt if you prefer)

2. Prepare Your PDFs

Organize papers in a folder with a subfolder called "papers" (best to do this in a location outside the repo folder) YOURDATABASE/papers/:

YOURDATABASE/papers/
├── 4YD2Y4J8/
│   └── 4YD2Y4J8_main.pdf
├── 76DQP2DC/
│   └── 76DQP2DC_main.pdf
└── ...

Each paper should be in its own folder with a unique alphanumeric ID.

3. Run the Pipeline

See the Complete Workflow section below for the full step-by-step process.

Pipeline Overview

The repository is now published as the metabeeai Python package. Every stage can be run directly from the command line via the metabeeai CLI (installed automatically with the package). The end-to-end flow is:

PDFs → Document Processing → LLM Extraction → Human Review → Benchmarking → Analysis

Stage	Purpose	CLI command(s)	Python entry point
1. Document processing	Convert PDFs into structured JSON chunks using Vision AI + merging + deduplication	`metabeeai process-pdfs`	`metabeeai.process_pdfs.process_all`
2. LLM question answering	Ask the configured questions over the processed chunks and store answers	`metabeeai llm`	`metabeeai.metabeeai_llm.llm_pipeline`
3. Human review	Launch the BeeGUI application to validate and rate answers	`metabeeai review`	`metabeeai.llm_review_software.beegui`
4. Benchmarking & QA	Prepare benchmark data, run DeepEval, plot metrics, inspect edge cases	`metabeeai prep-benchmark`, `metabeeai benchmark`, `metabeeai plot-metrics`, `metabeeai edge-cases` or `metabeeai benchmark-all` for the whole sequence	`metabeeai.llm_benchmarking.*`
5. Downstream analysis	Run domain-specific analyses (trends, networks, investigations)	Python scripts in `metabeeai.query_database` (see docs)	`metabeeai.query_database.*`

Each command accepts additional flags (see metabeeai --help). You can also run the underlying Python modules directly if you prefer to script the workflow, but the CLI provides the recommended interface for both production use and local experimentation.

For more detailed walkthroughs, refer to the documentation in docs/guide/ (particularly pipeline_overview.rst, workflow.rst, and benchmarking.rst).

Complete Workflow (CLI-first)

The metabeeai CLI exposes every stage. All commands automatically load configuration from config.py / environment variables unless you override them with flags.

1. Process PDFs to JSON

# Basic usage – processes every folder under your data directory
metabeeai process-pdfs

# Only merge/deduplicate previously processed chunks (skip expensive steps)
metabeeai process-pdfs --merge-only

# Process a subset of folders (alphabetical range) and skip API calls
metabeeai process-pdfs --start af20101 --end b2050e6 --skip-api

# Change root directory
metabeeai process-pdfs --dir /path/to/YOURDATABASE/papers

Purpose: Split PDFs, send pages to Vision AI, merge/deduplicate chunks Output: YOURDATABASE/papers/{paper_id}/pages/merged_v2.json Key options: --dir, --start/--end, --merge-only, --skip-split, --skip-api, --skip-merge, --skip-deduplicate, --filter-chunk-type, --pages

2. Extract Information with the LLM

# Default run (balanced settings from pipeline_config.py)
metabeeai llm

# Process specific paper folders
metabeeai llm --papers 4YD2Y4J8 76DQP2DC

# Override model choices
metabeeai llm --relevance-model openai/gpt-4o-mini --answer-model openai/gpt-4o

# Point to a different data directory
metabeeai llm --dir /path/to/YOURDATABASE/papers

Purpose: Run questions.yml against processed chunks and store answers Output: YOURDATABASE/papers/{paper_id}/answers.json Key options: --dir, --papers, --overwrite, --relevance-model, --answer-model, --config

3. Human Review

# Launch BeeGUI (requires desktop/Qt)
metabeeai review

Purpose: Inspect LLM answers alongside PDFs, capture reviewer edits/ratings Output: YOURDATABASE/papers/{paper_id}/answers_extended.json Notes: GUI allows filtering, editing, rating, adding reviewer notes

4. Benchmarking & Evaluation

This stage has individual commands plus a “run everything” helper.

4.1 Prepare benchmarking data

# Generate benchmark_data_gui.json (default paths)
metabeeai prep-benchmark

# Custom locations
metabeeai prep-benchmark --papers-dir /path/to/YOURDATABASE/papers \
                         --questions-yml /path/to/questions.yml \
                         --output /path/to/benchmark_data_gui.json

Purpose: Collate GUI-reviewed answers + LLM answers + retrieval context Output: YOURDATABASE/benchmark_data_gui.json (nested structure) Key options: --papers-dir, --questions-yml, --output

4.2 Run DeepEval benchmarking

# Basic run over all questions
metabeeai benchmark

# Filter by question and sample size, list available keys
metabeeai benchmark --list-questions
metabeeai benchmark --question bee_species --limit 5

# Adjust runtime/cost knobs
metabeeai benchmark --batch-size 10 --max-retries 3
metabeeai benchmark --use-retrieval-only
metabeeai benchmark --model gpt-4o-mini --max-context-length 150000

Purpose: Evaluate LLM answers vs reviewer answers using 5 metrics Output: YOURDATABASE/deepeval_results/combined_results_{question}_{timestamp}.json(.jsonl) Key options: --question, --input, --limit, --batch-size, --max-retries, --model, --max-context-length, --use-retrieval-only, --list-questions

4.3 Visualize metrics

# Create per-metric plots + summary plot
metabeeai plot-metrics

# Alternate locations
metabeeai plot-metrics --results-dir /custom/results --output-dir /custom/results

Purpose: Generate bar charts (mean ± standard error) per metric and a summary plot Output: YOURDATABASE/deepeval_results/plots/{metric}.png and summary_metrics.png Key options: --results-dir, --output-dir

4.4 Identify edge cases

# Default: bottom 3 per question
metabeeai edge-cases --num-cases 3

# Contextual-only run with custom directories and OpenAI key override
metabeeai edge-cases --contextual-only \
    --results-dir /custom/deepeval_results \
    --output-dir /custom/edge_cases \
    --openai-api-key sk-... \
    --model gpt-4o-mini

Purpose: Surface lowest-scoring papers, generate reports and optional LLM summaries Output: YOURDATABASE/edge_cases/ (combined JSON, markdown report, summaries) Key options: --num-cases, --results-dir, --output-dir, --openai-api-key, --model, --generate-summaries-only, --contextual-only, --generate-contextual-summaries-only

4.5 Run the entire benchmarking pipeline

# Prep + benchmark + plot + edge cases (default settings)
metabeeai benchmark-all

# Skip certain stages or pass through flags
metabeeai benchmark-all --skip-prep --skip-edge-cases --question bee_species --limit 5

This wrapper simply forwards the relevant options to the commands above. Use it when you want the full workflow in one go; use the individual commands for finer control.

Step 5: Data Analysis

cd query_database

# Analyze trends
python trend_analysis.py

# Network analysis
python network_analysis.py

# Investigate specific topics
python investigate_bee_species.py
python investigate_pesticides.py

Output: query_database/output/ (plots, reports, JSON data)

Project Structure

The repository is packaged under src/metabeeai. Key directories:

metabeeai/
├── pyproject.toml              # Packaging, dependencies, CLI entrypoints
├── README.md                   # This file
├── docs/                       # Sphinx documentation (see docs/guide/*)
├── examples/                   # Sample configs / usage snippets
├── tests/                      # CLI and unit tests
├── src/
│   └── metabeeai/
│       ├── __init__.py
│       ├── cli.py              # `metabeeai` console entrypoint
│       ├── config.py           # Shared helpers for locating data dirs
│       ├── process_pdfs/       # Stage 1 modules (process_all, split_pdf, etc.)
│       ├── metabeeai_llm/      # Stage 2 modules (llm_pipeline, questions.yml, …)
│       ├── llm_review_software/# Stage 3 GUI (beegui, annotator)
│       ├── llm_benchmarking/   # Stage 4 tools (prep_benchmark_data, deepeval, plots, edge cases, run_benchmarking)
│       └── query_database/     # Stage 5 analysis scripts (trend_analysis, network_analysis, investigations)
└── data/ (optional)            # Local data tree (see below)

Where to store your data

We recommend keeping paper data outside the repo checkout, e.g.:

YOURDATABASE/
└── papers/
    ├── 4YD2Y4J8/
    │   ├── 4YD2Y4J8_main.pdf
    │   ├── pages/merged_v2.json
    │   ├── answers.json
    │   └── answers_extended.json
    └── ...

Point the CLI at this directory via config.py, environment variables, or per-command --dir/--papers-dir flags.

Common Use Cases

Use Case 1: Process New Papers

# 1. Add PDFs to YOURDATABASE/papers/{paper_id}/
# 2. Run the document pipeline
metabeeai process-pdfs --dir /path/to/YOURDATABASE/papers

# 3. Extract answers with the balanced preset
metabeeai llm --dir /path/to/YOURDATABASE/papers --config balanced

Result: Each paper folder now contains an answers.json file with structured outputs

Use Case 2: Review LLM Answers

metabeeai review

Features:

View PDF alongside LLM answers
Edit and validate answers
Rate answer quality
Navigate between papers
Saves reviewer responses side-by-side with the original LLM output in YOURDATABASE/papers/{paper_id}/answers_extended.json

Use Case 3: Benchmark LLM Performance

# 1. Ensure reviewer confirmations exist (answers_extended.json per paper from BeeGUI)

# 2. Create benchmark dataset from GUI reviewer answers
metabeeai prep-benchmark \
  --papers-dir /path/to/YOURDATABASE/papers \
  --output /path/to/YOURDATABASE/benchmark_data_gui.json

# 3. Run evaluation (all questions or filtered)
metabeeai benchmark \
  --input /path/to/YOURDATABASE/benchmark_data_gui.json \
  --question species

# 4. Visualize metrics
metabeeai plot-metrics \
  --results-dir /path/to/YOURDATABASE/deepeval_results

# 5. Find problem papers (edge cases - can specify how many to include)
metabeeai edge-cases --num-cases 5 \
  --results-dir /path/to/YOURDATABASE/deepeval_results \
  --output-dir /path/to/YOURDATABASE/edge_cases

Result:

Performance metrics across 5 dimensions
Comparison plots
Edge case analysis

Use Case 4: Analyze Extracted Data

cd query_database

# Analyze trends
python trend_analysis.py

# Analyze relationships between variables
python network_analysis.py

Result: Plots and reports in query_database/output/

Question Definitions (`questions.yml`)

All question logic lives in src/metabeeai/metabeeai_llm/questions.yml. Each entry under the top-level QUESTIONS: key defines how the LLM should extract a specific piece of information. A typical block looks like this:

QUESTIONS:
  some_question_id:
    question: "Natural language prompt to send to the model"
    instructions:
      - "Step-by-step guidance on what to include/exclude"
      - "Each bullet is enforced before the answer."
    output_format: "Human-readable description of the expected formatting"
    example_output:
      - "Example answer 1"
      - "Example answer 2"
    bad_example_output:
      - "Examples of what NOT to return"
    no_info_response: "Fallback text when nothing is found"
    max_chunks: 5                 # (optional) throttle retrieval depth per question
    description: "Short note about retrieval threshold/purpose"

Field descriptions

Field	Purpose
`question`	The actual prompt sent to the LLM. Treats retrieved chunks as context.
`instructions`	Ordered list of constraints/checklists. The LLM sees these before answering.
`output_format`	Plain-language description of the formatting you expect (e.g., numbered list, JSON-like bullets).
`example_output`	One or more positive examples showing ideal answers.
`bad_example_output`	(Optional) Counter-examples to discourage common mistakes.
`no_info_response`	Exact string returned when the pipeline cannot find relevant information.
`max_chunks`	(Optional) The maximum number of retrieval chunks passed to the LLM for this question.
`description`	(Optional) Human-readable comment about retrieval strictness, priority, etc.

You can add, remove, or edit question blocks to suit new projects (e.g., different species, stressors, experimental outputs). The LLM pipeline will automatically pick up any question_key listed under QUESTIONS as long as it has the required fields above. After editing questions.yml, rerun metabeeai llm (and downstream benchmarking if needed) to populate the new fields in each answers.json.

Model Selection

The LLM pipeline exposes model selection through the metabeeai llm CLI. Choose from presets or override models directly.

Preset configurations (recommended)

metabeeai llm --config fast      # gpt-4o-mini for relevance + answers
metabeeai llm --config balanced  # gpt-4o-mini for relevance, gpt-4o for answers
metabeeai llm --config quality   # gpt-4o for relevance + answers

Custom model override

metabeeai llm \
  --relevance-model openai/gpt-4o-mini \
  --answer-model openai/gpt-4o

Configuration	Relevance model	Answer model	Primary goal
`fast`	`openai/gpt-4o-mini`	`openai/gpt-4o-mini`	High throughput / low cost
`balanced`	`openai/gpt-4o-mini`	`openai/gpt-4o`	Default mix of speed + accuracy
`quality`	`openai/gpt-4o`	`openai/gpt-4o`	Maximum fidelity, slower

Configuration

Global Configuration (`config.py`)

Centralized configuration for all pipeline components:

from config import get_papers_dir, get_data_dir

# Get configured directories
papers_dir = get_papers_dir()  # Default: data/papers
data_dir = get_data_dir()      # Default: data

Environment Variables (set in .env):

METABEEAI_DATA_DIR - Base data directory (default: data)
OPENAI_API_KEY - OpenAI API key
LANDING_AI_API_KEY - LandingLens API key

Question Configuration (`metabeeai_llm/questions.yml`)

Define questions with:

Question text
Instructions for LLM
Expected output format
Examples (good and bad)
Retrieval parameters (max_chunks, min_score)

Benchmarking Metrics

The pipeline evaluates LLM performance using 5 metrics:

Standard DeepEval Metrics (3)

Faithfulness (0-1, higher is better)
- Measures if LLM answer contradicts source text
- Perfect score: No hallucinations or contradictions
Contextual Precision (0-1, higher is better)
- Evaluates if relevant chunks are ranked highly
- Perfect score: Most relevant chunks retrieved first
Contextual Recall (0-1, higher is better)
- Checks if expected answer is supported by retrieval
- Perfect score: All key points have source support

G-Eval Metrics (2)

Completeness (0-1, threshold: 0.5)
- Assesses if answer covers all key points
- Uses GPT-4o to evaluate against reviewer answer
Accuracy (0-1, threshold: 0.5)
- Evaluates information accuracy
- Uses GPT-4o to compare LLM vs reviewer answers

Typical Performance (based on 10 primate welfare papers):

Standard metrics: 0.7-1.0 (good)
G-Eval metrics: 0.4-0.5 (moderate)

Cost Estimates

Based on typical usage with GPT-4o:

Task	Papers	Questions	Cost
LLM Extraction	10	3 per paper	~$2-3
Benchmarking	10	3 questions	~$0.95
Edge Case Analysis	3 bottom papers	All questions	~$0.05
TOTAL	10 papers	Full pipeline	~$3-4

Cost Reduction Options:

Use --config fast instead of --config quality (3-5x cheaper)
Use --config balanced for optimal cost/quality trade-off
Process fewer papers initially for testing

Detailed Documentation

Each component has detailed documentation:

Component	Documentation
PDF Processing	`process_pdfs/README.md`
LLM Pipeline	`metabeeai_llm/README.md`
Benchmarking	`llm_benchmarking/README.md`
Data Analysis	`query_database/README.md`

Tutorial: Process Your First 3 Papers

Complete Example

# 1. Set up environment (one-time)
python -m venv venv
source venv/bin/activate
pip install metabeeai
cp env.example .env  # fill in API keys

# 2. Add 3 PDFs under YOURDATABASE/papers/
cp your_paper.pdf /path/to/YOURDATABASE/papers/PAPER001/PAPER001_main.pdf
# Repeat for PAPER002, PAPER003

# 3. Process PDFs → merged_v2.json
metabeeai process-pdfs --dir /path/to/YOURDATABASE/papers

# 4. Run LLM extraction (balanced preset recommended)
metabeeai llm --dir /path/to/YOURDATABASE/papers --config balanced
# Output: answers.json per paper

# 5. Review answers (saves answers_extended.json)
metabeeai review

# 6. Create benchmark dataset from GUI reviews
metabeeai prep-benchmark \
  --papers-dir /path/to/YOURDATABASE/papers \
  --output /path/to/YOURDATABASE/benchmark_data_gui.json

# 7. Run evaluation (choose a question or all)
metabeeai benchmark \
  --input /path/to/YOURDATABASE/benchmark_data_gui.json \
  --question bee_species

# 8. Visualize metrics
metabeeai plot-metrics \
  --results-dir /path/to/YOURDATABASE/deepeval_results

# 9. Find problem papers
metabeeai edge-cases --num-cases 5 \
  --results-dir /path/to/YOURDATABASE/deepeval_results \
  --output-dir /path/to/YOURDATABASE/edge_cases

Expected time:

PDF processing: ~1-5 min per paper
LLM extraction: ~2-3 min per paper
Evaluation: <1 min per question

Troubleshooting

Common Issues

Issue: Module not found errors

# Solution: Activate virtual environment
source venv/bin/activate

Issue: API key errors

# Solution: Check .env file exists and has valid keys
cat .env

Issue: "Context too long" warnings

# Solution: Use faster models or reduce batch size
python llm_pipeline.py --config fast

Issue: Empty GUI window

# Solution: Check folder names are alphanumeric (not just numeric)
# The GUI now accepts folders like: 4YD2Y4J8, 76DQP2DC, etc.

Issue: UTF-8 BOM in CSV

# Solution: Scripts automatically handle BOM with utf-8-sig encoding
# If you see '\ufeff' in column names, the script handles this

Best Practices

1. Start Small

Test with 3-5 papers first
Use --limit flags to test scripts
Verify outputs before scaling up

2. Version Control

Results are timestamped (no overwrites)
Keep original answers.json files unchanged
Reviewer answers go in separate files

3. Cost Management

Use --config fast for initial testing
Use --config balanced for production runs
Test with specific papers using --papers before full runs

4. Quality Assurance

Review edge cases to identify patterns
Check low-scoring papers manually
Validate LLM answers with GUI tool

Additional Resources

Documentation

Full docs (installation, pipeline, API): https://metabeeai.readthedocs.io
Module-specific references remain in docs/ and the Read the Docs site (LLM benchmarking, PDF processing, LLM pipeline, review software, query database)

External Links

Landing AI (PDF to JSON conversion): https://landing.ai/
OpenAI API (LLM use): https://platform.openai.com/docs
DeepEval Docs (Benchmarking): https://docs.confident-ai.com/

Project: MetaBeeAI Last Updated: November 14, 2025 Written by: Rachel Parkinson, Shuxiang Cao, Mikael Mieskolainen, Alasdair Wilson Contact: Rachel Parkinson r.parkinson@qmul.ac.uk

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jan 26, 2026

0.1.0

Nov 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metabeeai-0.2.0.tar.gz (125.8 kB view details)

Uploaded Jan 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

metabeeai-0.2.0-py3-none-any.whl (126.3 kB view details)

Uploaded Jan 26, 2026 Python 3

File details

Details for the file metabeeai-0.2.0.tar.gz.

File metadata

Download URL: metabeeai-0.2.0.tar.gz
Upload date: Jan 26, 2026
Size: 125.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for metabeeai-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`87854aa27cca0f19f87d6d0b0ce74f91c4ca3ea4d0f42541f6f437505beb63e6`
MD5	`71106b8b6a21ef0a66c20771fbe530e4`
BLAKE2b-256	`e20058bba5e44027df606a30630f867fd0df4f71d73cf05472db9ad8a11657f9`

See more details on using hashes here.

File details

Details for the file metabeeai-0.2.0-py3-none-any.whl.

File metadata

Download URL: metabeeai-0.2.0-py3-none-any.whl
Upload date: Jan 26, 2026
Size: 126.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for metabeeai-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9b2f30ee1ef4625aa7109cd95e068ba8950c77de678d5746fda8aa8f88f50098`
MD5	`e79c6fb759649cee066d10b5dbdafa4b`
BLAKE2b-256	`a9928b0a5b10923d8d5cc52ef3f6deade868a68e75c633c91092ef09f7b315d1`

See more details on using hashes here.

metabeeai 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

MetaBeeAI Literature Review Pipeline

Required API Accounts

Setting Up API Keys

Quick Start

1. Install Dependencies

2. Prepare Your PDFs

3. Run the Pipeline

Pipeline Overview

Complete Workflow (CLI-first)

1. Process PDFs to JSON

2. Extract Information with the LLM

3. Human Review

4. Benchmarking & Evaluation

4.1 Prepare benchmarking data

4.2 Run DeepEval benchmarking

4.3 Visualize metrics

4.4 Identify edge cases

4.5 Run the entire benchmarking pipeline

Step 5: Data Analysis

Project Structure

Where to store your data

Common Use Cases

Use Case 1: Process New Papers

Use Case 2: Review LLM Answers

Use Case 3: Benchmark LLM Performance

Use Case 4: Analyze Extracted Data

Question Definitions (questions.yml)

Field descriptions

Model Selection

Preset configurations (recommended)

Custom model override

Configuration

Global Configuration (config.py)

Question Configuration (metabeeai_llm/questions.yml)

Benchmarking Metrics

Standard DeepEval Metrics (3)

G-Eval Metrics (2)

Cost Estimates

Detailed Documentation

Tutorial: Process Your First 3 Papers

Complete Example

Troubleshooting

Common Issues

Best Practices

1. Start Small

2. Version Control

3. Cost Management

4. Quality Assurance

Additional Resources

Documentation

External Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Question Definitions (`questions.yml`)

Global Configuration (`config.py`)

Question Configuration (`metabeeai_llm/questions.yml`)