An AI powered scientific literature search engine

These details have not been verified by PyPI

Project links

Project description

ScienceAI

An AI-Powered Research Assistant for Systematic Literature Analysis

ScienceAI is a Python application that transforms how researchers analyze scientific literature. Unlike a standard LLM chatbot, ScienceAI is specifically designed to handle complex, multi-paper research tasks through an intelligent agent-based architecture that supports both GPT-5.2, Claude, and Gemini models.

🎯 Why ScienceAI vs. a Regular LLM Chatbot?

Standard LLM Chatbot	ScienceAI
Single conversation context	Multi-agent system with specialized analyst agents
Manual upload of each document excerpt	Automatic processing of hundreds of PDFs
Limited by context window (~200K tokens)	Processes entire paper collections regardless of size
Requires you to extract data manually	Automated data extraction with structured schemas
One-off responses	Persistent analysis with downloadable results
No systematic validation	Built-in validation and provenance tracking
Generic responses	Evidence-based answers with source citations

The Key Difference: Agentic Architecture

ScienceAI employs a Principal Investigator (PI) that:

Breaks down your research question into manageable sub-tasks
Creates specialized Analyst Agents for each sub-task
Coordinates parallel data extraction across your entire paper collection
Synthesizes findings from multiple analysts
Provides comprehensive, evidence-backed answers

This means you can ask: "Extract healing times, sample sizes, and intervention types from all these papers" and ScienceAI will automatically create the right analysts, define extraction schemas, process all papers, and return structured CSV data—something impossible with a standard chatbot.

🚀 Main Features

📚 Automated Paper Processing: Upload PDFs and let ScienceAI extract text, figures, tables, and metadata automatically
🤖 AI-Driven Multi-Agent Analysis: The PI delegates tasks to specialized Analyst Agents that work autonomously
📊 Structured Data Extraction: Define data schemas and extract information systematically across all papers
💬 Interactive Research Discussion: Ask complex research questions and receive evidence-backed answers
🔍 Provenance Tracking: Every extracted data point includes source quotes and derivation explanations
📈 Export & Visualization: Download extracted data as CSV, export papers with metadata, view analysis results in an interactive interface
🌙 Dark Mode: Fully supported dark mode for comfortably working in low-light environments, including specialized styling for data viewers.
💾 Project Management: Save and resume research projects with full checkpoint support

📦 Installation

Requirements: Python 3.11+ and an OpenAI API key

pip install scienceai-llm

🎬 Getting Started

1. Launch ScienceAI

scienceai

This starts a local web server. Open your browser to:

http://localhost:4242

You will be prompted to enter your OpenAI API key. This key is used to authenticate requests to the OpenAI API. You can find your API key in your OpenAI account settings.

Enter your project name and click "Start" to create a new project or load an existing one.

Tip: You can switch between OpenAI, Anthropic (Claude), and Google (Gemini) models using the "LLM Provider" card in the main menu once started. See Configuration for setup details.

Papers Panel - Your Literature Library

2. Understanding the Interface

Papers Panel - Your Literature Library

Papers Panel (Left Side): This is your literature library showing all uploaded PDFs with:

Search Bar at the top to filter papers by title, author, or keywords
Automatically Detected Metadata: Author, Date, Title, Journal
Paper IDs: Each paper gets a unique identifier
Analyst Tracking: Shows which analysts have processed each paper
Add Papers Button: Upload additional PDFs to your project at any time

You can upload PDFs individually or as a zip folder during project creation, or add more later via the "Add Papers" button.

3. Chatting with the Principal Investigator

Science Discussion - Your Research Conversation

Science Discussion Panel (Right Side): This is where you interact with the Principal Investigator (PI). The PI:

Understands complex research questions
Plans multi-step analysis strategies
Creates and manages Analyst Agents to accomplish your goals
Presents synthesized findings with evidence

Key Features:

Message Status: Messages show "Processed" (waiting for your input) or "Pending" (PI is working)
"Show work..." Links: Click to see detailed tool calls and PI reasoning (see below)
Timestamps: Track when each interaction occurred
Brain Indicator 🧠: A floating emoji shows real-time context (memory) usage. It turns yellow ⚠️ or red 🔴 as the model's memory fills up.

🔍 Transparency: "Show work..." Feature

Show Work Collapsed

Messages from the PI include a "Show work..." link. This transparency feature lets you see exactly what the PI is doing behind the scenes.

Show Work Expanded

Click "Show work..." to reveal:

Tool Calls: Every function the PI called (e.g., read_paper_chunks, create_analyst, search_database)
Arguments: The exact parameters passed to each tool
Outputs: Results returned from each operation
Reasoning: The PI's step-by-step decision-making process

This is invaluable for:

Understanding how ScienceAI processes your requests
Debugging unexpected results
Learning how to phrase better questions
Trust through complete transparency

Click "Hide work..." to collapse the details again.

Example Questions to Ask:

"Extract sample sizes, intervention types, and outcomes from all studies"
"Which papers found significant effects for [specific intervention]?"
"Create a summary table comparing study methodologies"
"What are the outcome measures used across these papers?"

🔄 Resetting the Conversation

If you wish to start fresh while keeping your uploaded papers, use the Reset Conversation button (or the undo arrow icon in the chat interface). This will:

Clear the chat history
Reset the Principal Investigator's memory
Fix any potential database locks
Keep your uploaded papers and extracted data collections

4. Working with Analyst Agents

Analysis Panel - Your Data Extraction Agents

Analysis Panel (Bottom Section): When you request data extraction or specific analyses, the PI creates specialized Analyst Agents. This panel shows:

Analyst Categories: Different types of analysts (e.g., "Study Categorization & Eligibility Analyst", "Nonunion and Union Status Analyst")
Data Collections: Each analyst creates structured data collections with names like "NonunionSmokingData2"
Load Button: Click to view the extracted data in a table format
Download Button: Export data as CSV for analysis in Excel, R, or Python

Each analyst autonomously:

Defines an extraction schema based on your request
Processes all relevant papers
Validates extracted data for accuracy
Provides results with source citations

5. Viewing Extracted Data

Data Tables: Click "Load" on any data evidence_files to see the extracted data in a structured table format. Each row represents data from a paper, with columns showing:

Standard Fields: Data you requested (e.g., smoking status, healing time, sample size)
Provenance Metadata: Automatically added by ScienceAI
- _source_quote: The exact text from the paper supporting this data
- _derivation: Explanation of how calculated/inferred values were determined
- _source_location: Where in the paper this data was found

Key Features:

Sortable Columns: Click headers to sort
Download CSV: Click the download button to export for further analysis
Source Verification: Every data point links back to the original paper text

👁️ Viewing Raw Data: JSON and CSV Viewers

In the Analysis Panel, each data collection offers multiple view formats:

JSON Viewer with Syntax Highlighting

JSON Data Eye Icon (👁️): Click the eye icon next to "JSON Data" to open an interactive JSON viewer featuring:

Syntax Highlighting: Easy-to-read colored formatting
Collapsible Sections: Expand/collapse nested objects and arrays
Copy Button: Copy the entire JSON to clipboard
Raw Format: See the exact data structure as stored

CSV Viewer with Data Grid

CSV Data Eye Icon (👁️): Click the eye icon next to "CSV Data" to open a spreadsheet-style viewer with:

Grid Layout: See your data in familiar rows and columns
Quick Preview: View data without downloading
Inspect Format: Check CSV structure before exporting

These viewers help you:

Verify data quality before export
Debug extraction issues by inspecting raw values
Choose the best format (JSON vs CSV) for your workflow
Inspect data structure and field types

Click the Close button or press Esc to dismiss the viewer.

6. Exporting Your Work

Export Menu - Download Papers and Data

Export Button (📦): Located in the bottom control panel, this opens the Export Papers menu where you can:

Select Papers to Export:

All: Export every paper in your project
User Defined Tag: Filter by custom tags you've applied

Customize Filenames with detected metadata:

Choose which fields to include: DOI, Date, First Author, Title, Journal, Tags
Set the order of fields in the filename
Choose separator (underscore, dash, space)
Preview: 2023_Smith_ImplantFailureRates_JBJS.pdf

Bottom Control Panel Buttons:

💾 Checkpoints: Download auto-generated checkpoint saves that allow you to resume your project at the last saved state or share it with others
📦 Export: Export papers with custom filenames
📊 Extracted Data: Combines ALL extracted data into a single CSV file that you can use for analysis and verification of extracted data quality (column names may be very long, so you may want to rename them)
❌ Close: Return to project selection screen

💡 Example Use Cases

1. Systematic Literature Reviews

Upload 100+ papers, ask the PI to categorize them by intervention type, extract study characteristics, and generate summary tables—all automatically.

2. Meta-Analysis Data Extraction

Request extraction of effect sizes, sample sizes, and study parameters. ScienceAI handles the schema definition, extraction, validation, and CSV export.

3. Research Gap Analysis

Ask "What methodologies are under-represented?" and let analysts scan all papers to identify patterns and gaps.

4. Evidence Synthesis

"Summarize all findings related to [X]" triggers analysts to extract relevant sections, synthesize findings, and provide citations.

🐍 Python Library Usage

ScienceAI can also be used as a Python library to integrate its capabilities into your own scripts and applications.

Initialization

from scienceai.client import ScienceAI

# Initialize the client (starts backend automatically)
client = ScienceAI(project_name="MyResearchProject")

Ingesting Papers

You can upload papers programmatically and trigger preprocessing.

# Upload papers and wait for preprocessing to complete
client.upload_papers(["/path/to/paper1.pdf", "/path/to/paper2.pdf"])

# Or upload without immediate preprocessing
client.upload_papers(["/path/to/paper3.pdf"], trigger_preprocess=False)

# Manually trigger preprocessing later
client.preprocess()

Chatting with the PI

Interact with the Principal Investigator to ask questions or request analyses.

# Send a message and wait for the response (blocking)
response = client.chat("Summarize the findings of the uploaded papers.")
print(response)

# Non-blocking chat
client.chat_background("Extract sample sizes from all papers.")

# Poll for status
while True:
    result = client.poll()
    if result:
        print("Response received:", result)
        break
    print("Working...")
    time.sleep(1)

# Get full history
history = client.history()

🏗️ How It Works: Architecture Overview

The Principal Investigator (PI)

Your main interface—a conversational AI that:

Understands research objectives
Plans analysis strategies
Creates and manages Analyst Agents
Synthesizes multi-agent findings
Communicates results clearly

Analyst Agents

Specialized workers created on-demand:

Each has a focused research goal
Autonomously defines data schemas
Extracts, validates, and exports data
Provides evidence-backed conclusions

Data Extraction Engine

Flexible Schemas: Support for numbers, dates, text blocks, categorical data, and more
Derivation Support: Extract calculated or inferred values with explanations
Automatic Provenance: Every data point links to source location and quotes
Validation: Built-in error checking and re-extraction on failure

Database & Storage

Persistent project storage
Efficient paper and metadata management
Data collection tracking
Checkpoint and export functionality

🔧 Configuration

LLM Provider Selection

ScienceAI supports multiple LLM providers with flexible authentication options:

Supported Providers

OpenAI (GPT-4, GPT-5, o4-mini): Default provider
Anthropic (Claude Sonnet/Opus 4.5): Via direct API or Google Vertex AI
Google (Gemini 3 Pro): Via API key or Vertex AI service account

Setting Up Providers

OpenAI (Required for Default Setup)

# Method 1: Interactive setup
scienceai --setup-keys

# Method 2: Direct key setting
scienceai --set-key openai YOUR_OPENAI_API_KEY

# Method 3: Environment variable
export OPENAI_API_KEY="sk-..."

Anthropic Claude (Optional)

# Direct API (recommended for personal use)
scienceai --set-key anthropic YOUR_ANTHROPIC_API_KEY

# Or via environment variable
export ANTHROPIC_API_KEY="sk-ant-..."

Google Gemini (Optional)

# Standard API key (simple setup)
scienceai --set-key google YOUR_GOOGLE_API_KEY

# Or via environment variable
export GOOGLE_API_KEY="..."
# or
export GEMINI_API_KEY="..."

GCP Service Account for Production/Enterprise

For production deployments or enterprise use, you can use a GCP service account for both Gemini and Claude on Vertex AI:

Setup:

scienceai --gcp-service-account /path/to/service-account.json

This will:

Validate your service account file
Extract the project ID automatically

Prompt you interactively:

✓ Valid service account file for project: my-project-123
  This service account can be used for:
    1. Google Gemini (native GCP models)
    2. Claude on Vertex AI (Anthropic partner models)

Use this service account for Claude on Vertex AI? (y/n):

Ask for your preferred Vertex AI region:

Common Vertex AI regions:
  - us-east5 (US East)
  - us-central1 (US Central)
  - europe-west1 (Europe West)
Enter Vertex AI region (default: us-east5):

Save the configuration

Remove GCP Configuration:

scienceai --remove-gcp-config

This command allows you to selectively remove Gemini and/or Claude Vertex configurations, reverting to API key authentication.

Priority Order:

If both GCP service account AND API key are configured for a provider:
1. GCP Service Account takes priority (recommended for production)
2. API Key is used as fallback

This design allows smooth transitions between development (API key) and production (service account) environments.

Provider Switching

Switch between providers via the LLM Provider card in the menu UI. Select:

OpenAI (GPT models)
Claude (Anthropic direct API)
Claude on Vertex (via GCP - if configured)
Gemini (Google models)

Unavailable providers (missing API keys) are grayed out.

Validate Your Configuration

Test all configured API keys:

scienceai --validate-keys

Output:

Validating configured API keys...

  ✓ openai: Valid (gpt-5.2 accessible)
  ✓ anthropic: Valid (claude-sonnet-4-5 accessible)
  ✗ google: Invalid (API key expired)

⚠ Some keys failed validation

CLI Options Reference

# API Key Management
scienceai --setup-keys                    # Interactive key setup
scienceai --set-key PROVIDER KEY         # Set a specific key
scienceai --validate-keys                # Validate all keys

# GCP Service Account
scienceai --gcp-service-account PATH     # Configure service account
scienceai --remove-gcp-config            # Remove service account config

# Provider Selection
scienceai --provider anthropic           # Start with specific provider

# Server Options
scienceai --port 8080                    # Custom port (default: 4242)
scienceai --skip-validation              # Skip startup key validation

# Logging
scienceai -v                             # Verbose (INFO level)
scienceai --debug                        # Debug logging
scienceai --log-level WARNING            # Specific log level

Configuration Files

API keys and GCP configuration are stored in:

~/Documents/ScienceAI/scienceai-keys.json

Example structure:

{
  "openai": "sk-...",
  "anthropic": "sk-ant-...",
  "google": "AIza...",
  "google_gcp": {
    "service_account_path": "/path/to/sa.json",
    "project_id": "my-project-123",
    "region": "us-east5"
  },
  "anthropic_vertex": {
    "service_account_path": "/path/to/sa.json",
    "project_id": "my-project-123",
    "region": "us-east5"
  }
}

📚 Detailed Documentation

🧠 Principal Investigator (PI)

The Principal Investigator (src/scienceai/principal_investigator.py) is the central orchestrator of the system. It uses an LLM-driven reasoning loop to:

Plan Research: Decomposes user queries into sub-tasks.
Delegate: Spawns Analyst Agents using delegate_research() to handle specific data extraction or analysis tasks.
Execute Code: Uses run_python_code() to perform statistical analysis, generate plots, or manipulate data using Python (pandas, matplotlib, etc.).
Synthesize: Aggregates results from multiple analysts using reflect_on_delegations() to provide a cohesive answer.
Transparency: All PI actions are recorded and visible via the "Show work..." feature in the UI, exposing tool calls, arguments, and internal reasoning.

🕵️ Analyst Agents

Analyst Agents (src/scienceai/analyst.py) are specialized, autonomous workers created by the PI. Each analyst has a specific goal (e.g., "Extract patient demographics") and follows this workflow:

Paper Selection: Identifies relevant papers using get_all_papers() or filters by criteria.
Schema Generation: Automatically generates a JSON schema for data extraction based on its goal.
Concurrent Extraction: Runs extract_data() across all selected papers in parallel.
Validation: Uses reflect_on_evidence() to verify that extracted data is supported by the source text.
Data Collection: Saves structured results into a named collection (e.g., DemographicsData) which becomes available to the PI and the user.

⛏️ Data Extraction Engine

The Data Extraction Engine (src/scienceai/data_extractor.py) is the core NLP component responsible for turning unstructured PDF text into structured data.

Supported Types: number, date, text_block, categorical, boolean, array, object.
Provenance Injection: Automatically adds metadata to every extracted field:
- _source_quote: The verbatim text from the paper supporting the data.
- _source_location: Page number and context.
- _derivation: Logic used to calculate values (e.g., "Calculated as 15 males + 12 females").
Reflection & Validation: The reflect_on_data_extraction() function acts as a critic, comparing the extracted JSON against the paper's text to catch hallucinations or errors before saving.

💾 Database & Storage

Managed by DatabaseManager (src/scienceai/database_manager.py), the system uses a file-based storage approach for portability and simplicity.

Paper Ingestion: PDFs are hashed (sha256) to prevent duplicates. Text, tables, and figures are extracted and stored.
Storage Format: Uses dictdatabase to store project state, chat history, and data collections as JSON files.
Checkpoints: The system supports full project checkpointing. The save_database() function creates a zip archive of the project directory, allowing users to backup, share, or resume their work at any time.
Export: Data can be exported as CSVs, and papers can be renamed/exported based on their metadata.

🤝 Contributing

We welcome contributions! Here's how:

Report Bugs: Open an issue on GitHub with reproduction steps
Feature Requests: Suggest new capabilities or improvements
Pull Requests: Fork, develop, and submit PRs for review

📄 License

See LICENSE file for details.

🆘 Troubleshooting

Papers not processing? Check that PDFs are valid and not password-protected.

API errors? Verify your API key or Service Account is valid and has available credits.

Analyst not completing? Check the chat panel for error messages—the PI will explain any issues.

Cannot download data? Ensure analysts have completed their data collections before exporting.

"Context Limit Reached" Warning? This means the conversation has exceeded the LLM's memory. ScienceAI will automatically compress older messages to free up space. You can also use the Reset Conversation feature to clear the history while keeping your uploaded papers.

Ready to transform your literature review workflow? Install ScienceAI and start asking research questions!

pip install scienceai-llm
scienceai

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.7

Mar 21, 2026

0.4.6

Dec 28, 2025

0.4.5

Dec 27, 2025

0.4.4 yanked

Dec 27, 2025

Reason this release was yanked:

gemini with api key not working

This version

0.4.3 yanked

Dec 19, 2025

Reason this release was yanked:

io folder missing

0.4.2

Dec 19, 2025

0.4.1

Dec 9, 2025

0.3.1

Nov 25, 2025

0.1.6

Jun 21, 2024

0.1.5

Jun 20, 2024

0.1.4 yanked

Jun 18, 2024

Reason this release was yanked:

Windows Incompatible

0.1.3 yanked

Jun 18, 2024

Reason this release was yanked:

Windows Incompatible

0.1.2 yanked

Jun 18, 2024

Reason this release was yanked:

Windows Incompatible

0.1.1 yanked

Jun 18, 2024

Reason this release was yanked:

Windows Incompatible

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scienceai_llm-0.4.3.tar.gz (260.1 kB view details)

Uploaded Dec 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scienceai_llm-0.4.3-py3-none-any.whl (255.3 kB view details)

Uploaded Dec 19, 2025 Python 3

File details

Details for the file scienceai_llm-0.4.3.tar.gz.

File metadata

Download URL: scienceai_llm-0.4.3.tar.gz
Upload date: Dec 19, 2025
Size: 260.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scienceai_llm-0.4.3.tar.gz
Algorithm	Hash digest
SHA256	`6b0ed57b4ba21d218782459ee844a0544e4805e83e859357bb4b57cb6be56fd5`
MD5	`440ec31407465c583cba6ccec631187b`
BLAKE2b-256	`ab2dd23d2ae90eda65ee102d79f5dcb9a0aebeef914fd528794b57e17958427c`

See more details on using hashes here.

File details

Details for the file scienceai_llm-0.4.3-py3-none-any.whl.

File metadata

Download URL: scienceai_llm-0.4.3-py3-none-any.whl
Upload date: Dec 19, 2025
Size: 255.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scienceai_llm-0.4.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`580871121017efc004fd34c01cd04c0301889ecb11fcca77949178dcdb223c2f`
MD5	`0297fa0c37f2fa4fb36197540b70599c`
BLAKE2b-256	`461f5a3a9a6802a53e60ef15abeab2832e13bfa134bd0ca00e9c21a2d02e8b8e`

See more details on using hashes here.

scienceai-llm 0.4.3

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

ScienceAI

🎯 Why ScienceAI vs. a Regular LLM Chatbot?

The Key Difference: Agentic Architecture

🚀 Main Features

📦 Installation

🎬 Getting Started

1. Launch ScienceAI

2. Understanding the Interface

3. Chatting with the Principal Investigator

🔍 Transparency: "Show work..." Feature

🔄 Resetting the Conversation

4. Working with Analyst Agents

5. Viewing Extracted Data

👁️ Viewing Raw Data: JSON and CSV Viewers

6. Exporting Your Work

💡 Example Use Cases

1. Systematic Literature Reviews

2. Meta-Analysis Data Extraction

3. Research Gap Analysis

4. Evidence Synthesis

🐍 Python Library Usage

Initialization

Ingesting Papers

Chatting with the PI

🏗️ How It Works: Architecture Overview

The Principal Investigator (PI)

Analyst Agents

Data Extraction Engine

Database & Storage

🔧 Configuration

LLM Provider Selection

Supported Providers

Setting Up Providers

GCP Service Account for Production/Enterprise

Provider Switching

Validate Your Configuration

CLI Options Reference

Configuration Files

📚 Detailed Documentation

🧠 Principal Investigator (PI)

🕵️ Analyst Agents

⛏️ Data Extraction Engine

💾 Database & Storage

🤝 Contributing

📄 License

🆘 Troubleshooting

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes