Skip to main content

Open-source tool for accurate & fast scientific literature data extraction with LLM and human-in-the-loop.

Project description

Extralit
Extralit Server

Extract structured data from scientific literature with human validation

CI Codecov Downloads

This repository contains developer information about the backend server components. For general usage, please refer to our main repository or our documentation.

Source Code Structure

The server components are split into two main services:

/extralit_server
  /api # Core extraction API endpoints
    /handlers # FastAPI request handlers 
    /schemas # Data models and validation
    /services # Business logic services
    /utils # Helper utilities
  /ml # Machine learning components
    /extractors # Document extraction models
    /ocr # OCR processing
    /pipeline # Extraction pipeline orchestration
  /storage # Data persistence layer
    /models # Database models
    /search # Search engine integration
    /vector # Vector store
/argilla_server 
  /api # Annotation UI API endpoints
    /handlers
    /schemas 
  /models # Database models
  /auth # Authentication
  /tasks # Background jobs

Development Environment

The development environment uses Docker Compose to run all required services. Key commands:

# Start all services
docker-compose up -d

# Run server in dev mode
pdm run dev

# Run tests
pdm test

# Format and lint
pdm format
pdm lint

# Run all checks
pdm all

Key Components

FastAPI Servers

  • Extraction Server: Handles document processing, extraction pipeline, and ML model serving
  • Annotation Server: Manages UI, data validation workflow, and user collaboration

Databases

  • PostgreSQL: Main database for user data, annotations, and metadata
  • Elasticsearch: Vector store for semantic search and document indexing
  • Weaviate: Vector database for table and section embeddings

Background Processing

Uses Celery for asynchronous tasks like:

  • Document OCR and preprocessing
  • ML model inference
  • Batch extraction jobs
  • Data export

CLI Commands

Key management commands:

# Database management
python -m extralit_server db migrate
python -m extralit_server db create-user

# Start servers
python -m extralit_server start
python -m argilla_server start

# Run workers
python -m extralit_server worker

See full CLI documentation in our developer docs.

Contributing

Check our contribution guide and join our Slack community.

Roadmap

See our development roadmap and share your ideas!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extralit_server-0.4.0.tar.gz (4.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

extralit_server-0.4.0-py3-none-any.whl (4.9 MB view details)

Uploaded Python 3

File details

Details for the file extralit_server-0.4.0.tar.gz.

File metadata

  • Download URL: extralit_server-0.4.0.tar.gz
  • Upload date:
  • Size: 4.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.24.1 CPython/3.13.3 Linux/6.11.0-1012-azure

File hashes

Hashes for extralit_server-0.4.0.tar.gz
Algorithm Hash digest
SHA256 4ca158352eb412c5a0d071d55597f44c0c42fe75d7cdb6b54023302f7a8501a8
MD5 59f865a18a2f6a22cc54845eda6b6cd5
BLAKE2b-256 8c32c42b7122e12a4d828d690f5ce49b79b26bc9ead1ebc44fabe82a969f8ab7

See more details on using hashes here.

File details

Details for the file extralit_server-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: extralit_server-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 4.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.24.1 CPython/3.13.3 Linux/6.11.0-1012-azure

File hashes

Hashes for extralit_server-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1638414a9c42dae691dc118d4140d3d04317c0db0956d533d14a81d8e1af7c03
MD5 5c93f4c32b84d6ba8df152a69810905f
BLAKE2b-256 89b9c80d73af43ed1ec67126b41e712a1258dc29cefe3f17fbd3ec9fde9fc71b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page