Skip to main content

Open-source tool for accurate & fast scientific literature data extraction with LLM and human-in-the-loop.

Project description

Extralit
Extralit

Extract structured data from scientific literature with human validation

CI Codecov Downloads

Extralit is an open-source platform that transforms how researchers extract structured data from scientific literature. Want to get started? Check out our documentation.

Why use Extralit?

Accelerate Scientific Data Collection

Manual data extraction from research papers is slow and error-prone, often taking 6-12 months for systematic reviews. Extralit combines AI-powered extraction with human validation to reduce this to weeks while maintaining research-grade accuracy.

Take Control of Your Research Data

Most scientific data extraction tools are inflexible black boxes. Extralit is different - it's open source and puts you in control. Define custom extraction schemas, validate results, and integrate with your existing research workflows.

Scale Your Literature Reviews

Whether you're conducting a systematic review, meta-analysis, or building a scientific knowledge base, Extralit helps you efficiently process hundreds of papers. Our platform handles complex tables, figures, and relationships while preserving scientific rigor.

🏘️ Community

We're an open-source project built for researchers, by researchers. Here's how to get involved:

  • Slack Community: Connect with other researchers and developers
  • Documentation: Learn how to use and contribute to Extralit
  • Roadmap: See what we're building and share your ideas

Real-World Impact

Extralit is already accelerating research at leading institutions:

  • Gates Foundation: Reduced systematic review time for malaria intervention studies from 6 months to 6 weeks
  • Life Science Research: Streamlined extraction of clinical trial endpoints, genetic markers, and intervention protocols
  • Meta-Analysis: Enabled rapid synthesis of evidence across hundreds of papers while maintaining rigorous validation

👨‍💻 Getting Started

Installation

Install Extralit using pip:

pip install extralit

Initialize the client:

import extralit as ex

client = ex.Extralit(
    api_url="https://your-deployment-url", 
    api_key="your-api-key"
)

Create an extraction schema

Define what data you want to extract:

schema = ex.Schema(
    name="clinical_trial",
    fields=[
        ex.TextField(name="intervention", required=True),
        ex.NumericField(name="sample_size", required=True),
        ex.TextField(name="outcome_measure"),
        ex.TableField(name="results_table")
    ]
)

project = client.create_project(
    name="trial_extraction",
    schema=schema
)

Add documents and start extraction

# Add PDFs to extract from
project.add_documents("path/to/papers/*.pdf")

# Start extraction
extractions = project.extract()

# Review and validate results
validated_data = project.validate(extractions)

Need more help? Check out our detailed tutorials.

🥇 Contributors

Want to contribute? Great! Check out our contribution guide or join our Slack community.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extralit-0.4.0.tar.gz (235.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

extralit-0.4.0-py3-none-any.whl (312.1 kB view details)

Uploaded Python 3

File details

Details for the file extralit-0.4.0.tar.gz.

File metadata

  • Download URL: extralit-0.4.0.tar.gz
  • Upload date:
  • Size: 235.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.24.1 CPython/3.13.3 Linux/6.11.0-1012-azure

File hashes

Hashes for extralit-0.4.0.tar.gz
Algorithm Hash digest
SHA256 7de276aff15f2039c8bc503f7efe830c51237ea2a0176779daf2b7a4c796ea6a
MD5 e9afd7af97a096e57e4ff8a759a58432
BLAKE2b-256 75d4db1e0b00f8c016bf8cd47f94504311f20d450f134d33337999596469ffca

See more details on using hashes here.

File details

Details for the file extralit-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: extralit-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 312.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.24.1 CPython/3.13.3 Linux/6.11.0-1012-azure

File hashes

Hashes for extralit-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b2210d3f6aafad4dfa80653be2ee27263865a061403f4337ca5594343b6d67be
MD5 cd5cd07df4094433ce464e21dddf3a19
BLAKE2b-256 80792bcfed6ea9db76f7a29ff644a8e392084b121941a4630fc8ae22216fefd8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page