Skip to main content

Production-quality LLM fine-tuning, RAG, and RAFT library with comprehensive safety, audit, and traceability features.

Project description

PlatformX

PlatformX Logo

Enterprise-Grade AI Library for Pharmaceutical & Life Sciences

FeaturesInstallationQuick StartDocumentationExamples


Overview

PlatformX is a production-ready Python library specifically designed for building accurate, auditable, and safety-conscious AI applications in the pharmaceutical and life sciences domains.

Whether you're building RAG systems for clinical trial data, fine-tuning models on regulatory documents, or generating training data with RAFT, PlatformX provides the tools you need with built-in compliance and traceability.

Why PlatformX?

Pharma-Focused: Built specifically for regulated industries
Audit-First: Complete provenance tracking and structured logging
Safety-Built-In: PII detection, content filtering, confidence scoring
Production-Ready: Type-safe, tested, and documented
Flexible: Modular architecture with pluggable components
Compliant: Designed for regulatory review and validation


Table of Contents

  • Audit logging: Complete training lineage for compliance

PlatformX

PlatformX is a modular, extensible platform for building, evaluating, and deploying retrieval-augmented generation (RAG) pipelines and AI safety solutions. It provides a unified interface for data indexing, retrieval, model fine-tuning, safety filtering, and audit logging, enabling rapid prototyping and robust deployment of advanced AI systems.

Features

  • Modular RAG Pipeline: Easily build and customize RAG pipelines with interchangeable components for data loading, retrieval, generation, and safety filtering.
  • AI Safety: Integrated safety modules for content filtering, bias detection, and audit logging.
  • Model Fine-tuning: Tools for fine-tuning and evaluating language models on custom datasets.
  • Extensible API: Unified API for interacting with all platform components.
  • CLI Tools: Command-line utilities for common tasks and workflows.

Installation

PlatformX requires Python 3.8+.

pip install platformx

Or install from source:

git clone https://github.com/fiscaloxai/platformx.git
cd platformx
pip install -e .

Quick Start

See the examples/README.md directory for usage examples.

from platformx import api

# Index data
api.index_data("my_corpus", ["Document 1", "Document 2"])

# Run a RAG pipeline
response = api.rag_query("my_corpus", "What is PlatformX?")
print(response)

Documentation

Full documentation is available in the docs/index.md directory and at https://fiscaloxai.github.io/platformx/

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

PlatformX is licensed under the Apache 2.0 License.

With All Features

pip install platformx[retrieval,training,documents,openai,anthropic]

From Source

git clone https://github.com/your-org/platformx.git
cd platformx
pip install -e ".[dev]"

See INSTALL.md for detailed installation instructions.


Architecture Overview

PlatformX is organized into seven core modules:

platformx/
├── data/          # Dataset loading, schema, registry
├── retrieval/     # Indexing, embeddings, query engine
├── model/         # Fine-tuning, adapters, inference
├── training/      # RAFT generation, dataset builders
├── safety/        # Filters, confidence, refusal logic
├── audit/         # Structured logging, compliance
└── api/           # High-level user-friendly API

Module Details

  • data: Load datasets from various formats with automatic text extraction and provenance tracking
  • retrieval: Index documents and perform semantic search with configurable backends
  • model: Fine-tune models using LoRA/PEFT with full audit logging
  • training: Generate RAFT samples for retrieval-aware model training
  • safety: Filter content, detect PII, assess confidence, generate refusals
  • audit: Log all operations with correlation IDs for traceability
  • api: Simple one-liner functions for common workflows

For detailed API reference, see docs/api.md.


Quick Start

1. Index Clinical Trial Documents

import platformx.api as pfx

# Index a directory of clinical trial documents
result = pfx.index_documents(
    source="./clinical_trials/",
    dataset_id="trials-2024-q1",
    index_path="./index/trials/",
    chunk_size=200,
    embedding_backend="tfidf",
    domain="clinical"
)

print(f"Indexed {result['chunk_count']} chunks")

2. Run RAG Query with Safety

# Query with automatic safety filtering
response = pfx.rag_query(
    query="What are the adverse events in pediatric trials?",
    index_path="./index/trials/",
    top_k=5,
    safety_check=True,
    min_confidence="medium"
)

# Check results
if response['safety_result']['decision'] == 'allow':
    for i, result in enumerate(response['results'], 1):
        print(f"{i}. [{result['score']:.3f}] {result['text'][:100]}...")
else:
    print(f"Query blocked: {response['safety_result']['reason']}")

3. Generate RAFT Training Samples

# Generate training samples from indexed data
samples = pfx.generate_raft_samples(
    dataset_ids=["trials-2024-q1", "trials-2024-q2"],
    index_path="./index/trials/",
    samples_per_dataset=100,
    positive_fraction=0.6,
    include_reasoning=True,
    output_path="./training_data/raft_samples.json"
)

print(f"Generated {len(samples)} RAFT samples")

4. Fine-Tune with Compliance Logging

# Fine-tune a model with full audit trail
report = pfx.finetune(
    base_model="meta-llama/Llama-2-7b-hf",
    dataset_path="./training_data/raft_samples.json",
    output_dir="./models/pharma-qa-v1",
    num_epochs=3,
    learning_rate=2e-4,
    lora_r=16,
    seed=42
)

print(f"Model fine-tuned: {report['adapter_id']}")
print(f"Training datasets: {report['training_dataset_ids']}")

5. Full Platform Setup

import platformx as pfx

# Initialize platform with configuration
config = pfx.PlatformConfig(
    project_name="pharma_qa_system",
    data_dir="./data",
    logging_level="INFO",
    reproducible=True,
    seed=42
)

platform = pfx.Platform(config)

# Register a dataset
dataset = platform.register_dataset(
    "clinical_protocols.pdf",
    {
        "dataset_id": "protocols-001",
        "domain": "clinical",
        "intended_use": "retrieval"
    }
)

# Index for retrieval
chunk_ids = platform.index_dataset("protocols-001")

print(f"Registered and indexed {len(chunk_ids)} chunks")

Use Cases

1. Clinical Trial Q&A System

# Build a Q&A system over clinical trial documents
import platformx.api as pfx

# Step 1: Index trial documents
pfx.index_documents(
    source="./trials/",
    dataset_id="clinical-trials-2024",
    domain="clinical"
)

# Step 2: Query with safety
result = pfx.rag_query(
    "What is the efficacy rate in Phase 3 trials?",
    index_path="./index/",
    safety_check=True
)

# Step 3: Generate response with confidence
if result['confidence']['level'] == 'high':
    print(f"Answer: {result['results'][0]['text']}")
else:
    print("Low confidence - review required")

2. Regulatory Document Analysis

# Analyze FDA submissions and guidance documents
from platformx import Platform, PlatformConfig
from platformx.safety import create_default_filter_chain

config = PlatformConfig(
    project_name="regulatory_analysis",
    data_dir="./fda_docs"
)
platform = Platform(config)

# Load regulatory documents
platform.register_dataset("fda_guidance.pdf", {
    "dataset_id": "fda-guidance-001",
    "domain": "regulatory",
    "intended_use": "retrieval"
})

# Index with pharma-specific safety filters
platform.index_dataset("fda-guidance-001")

# Query with domain-specific filters
chain = create_default_filter_chain("pharma")
query_result = chain.check("What are the requirements?", {})

3. Fine-Tune Domain-Specific Models

# Train a model specifically for pharma Q&A
import platformx.api as pfx

# Generate RAFT samples from your documents
samples = pfx.generate_raft_samples(
    dataset_ids=["protocols", "trials", "guidance"],
    index_path="./index/",
    samples_per_dataset=200
)

# Fine-tune with audit logging
pfx.finetune(
    base_model="microsoft/phi-2",
    datasets=samples,
    output_dir="./models/pharma-phi-2",
    num_epochs=5
)

Documentation

Comprehensive documentation is available:

Examples

Explore the examples/ directory:

  1. 01_basic_indexing.py - Document indexing basics
  2. 02_rag_pipeline.py - Complete RAG workflow
  3. 03_raft_generation.py - RAFT sample generation
  4. 04_safety_filtering.py - Safety configuration
  5. 05_quick_start.py - Quick start demo

Design Principles

Reproducibility

  • Deterministic workflows with seed control
  • Dataset and model fingerprinting
  • Version tracking for all artifacts

Transparency

  • Structured audit logs for all operations
  • Complete provenance tracking
  • Traceable model and dataset lineage

Extensibility

  • Plugin architecture for adapters and backends
  • Custom policy injection points
  • Flexible compliance controls

Safety

  • Built-in PII detection and content filtering
  • Confidence scoring and refusal logic
  • Domain-specific safety policies

Performance & Benchmarks

PlatformX is designed for production use:

  • Indexing: ~1000 documents/minute (TF-IDF backend)
  • Retrieval: <100ms for top-10 queries on 10K documents
  • Fine-tuning: Supports models up to 70B parameters with quantization
  • Memory: <2GB RAM for indexing 10K documents

See benchmarks/ for detailed performance metrics.

Quick Start for Contributors

# Clone and setup
git clone https://github.com/your-org/platformx.git
cd platformx
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check src/
mypy src/

# Format code
black src/

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

platformx-0.1.2.tar.gz (65.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

platformx-0.1.2-py3-none-any.whl (66.0 kB view details)

Uploaded Python 3

File details

Details for the file platformx-0.1.2.tar.gz.

File metadata

  • Download URL: platformx-0.1.2.tar.gz
  • Upload date:
  • Size: 65.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for platformx-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3a5477a9a226b3bb6b5b91967f60556bdb726db00f686d9dbbf41d64cceca528
MD5 02a68d2c6954278021c163c572020277
BLAKE2b-256 c57866a7c00a3ce80428a20ee90f1f1db7713217ebc032a0d321f3e5523a0541

See more details on using hashes here.

File details

Details for the file platformx-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: platformx-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 66.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for platformx-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 eb86010d4b00c4db82c47c4ccfbb1eb11e2a9d130edd35f853d2f76bd825ab94
MD5 8a0c14e54c1d73c7c388d8556165d80a
BLAKE2b-256 4ce1e0a1dd3b4507770f2a311b42b1f3b49235683aa1824faac2fb3518580d7a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page