Skip to main content

An AI powered scientific literature search engine

Reason this release was yanked:

Windows Incompatible

Project description

ScienceAI

ScienceAI is a Python package designed to act as an AI-powered scientific literature search engine. It leverages the power of large language models (LLMs) to process and analyze research papers, enabling users to ask complex questions and receive insightful answers supported by evidence from the literature. The application can handle hundreds of papers included in the analysis without needing you to include any metadata for the uploaded papers, just provide the files!

Main Features

  • Automated Paper Processing: Automatically extract text, figures, tables, and metadata from research papers (PDFs).
  • AI-Driven Analysis: Utilize LLMs to summarize papers, interpret figures and tables, and extract relevant data points based on user-defined schemas.
  • Analyst Agents: Created and managed by the top-level AI to address specific research goals autonomously.
  • Interactive Discussion: Engage in a conversational interface with ScienceAI, asking questions and receiving detailed responses.
  • Data Management: Robust database system for efficient data retrieval and management.
  • Visualization and Export: Interactive, tree-like structure for exploring analysis results, with options to download extracted data, analysis summaries, and individual papers.
  • Export Capabilities: Export the data extracted by the AI in CSV format, and export all or subsets of the papers with meaningful file names and metadata detected by the system.

Installation

ScienceAI requires Python 3.11 or higher and an OpenAI API key. To install the package, you can use pip:

pip install scienceai-llm

Usage

ScienceAI is designed to be used through its user interface. After installation, start the application and use the web interface to upload PDF files and manage projects.

  1. Start the ScienceAI application:

    scienceai
    
  2. Open your web browser and navigate to http://localhost:4242.

  3. Create a New Project:

    • Enter a project name and click "Start".
    • Upload individual PDFs or a zip folder of PDFs for analysis.
    • Click "Create Project" to begin the analysis.
  4. Analyze Papers:

    • View the list of uploaded papers with their metadata.
    • Use the "Science Discussion" panel to interact with the analysis framework.
    • View and download extracted data in JSON and CSV formats.

Example Use Case

An example use case for ScienceAI is performing ad hoc literature reviews. A researcher can direct the AI to extract data from hundreds of papers simultaneously, which would be cumbersome using a simple chat interface. The researcher can upload a large set of PDFs, specify the data to be extracted, and let the AI handle the complex analysis. The extracted data can then be exported in CSV format for further investigation.

Detailed Documentation

Database Management

The database_manager module is the backbone of ScienceAI's data handling. It's responsible for:

  • Ingesting Papers: Adding research papers (PDFs) to the database.
  • Processing Papers: Extracting text, figures, tables, metadata, and generating summaries.
  • Storing Data: Persisting processed information in a structured format.
  • Retrieving Data: Providing access to papers, data extractions, and analysis results.
  • Managing Analyst Agents: Creating, storing, and retrieving Analyst Agent data.

Principal Investigator (PI)

The principle_investigator module represents the main AI persona you interact with. The PI:

  • Delegates Research: Creates and manages Analyst Agents to address specific research questions.
  • Interacts with User: Conducts the conversation with the user, understanding their goals and relaying information from the Analyst Agents.
  • Oversees Analysis: Monitors the progress of Analyst Agents and ensures the research process is effective.

Analyst Agents

The analyst module defines Analyst Agents, which are created by the top-level AI. Each Analyst:

  • Has a Goal: Is assigned a specific research question or objective by the top-level AI.
  • Requests Data: Directs the creation of data extraction schemas to gather relevant information from the papers.
  • Analyzes Data: Processes the extracted data to form answers to their assigned goal.
  • Provides Evidence: Presents the answer to their goal with supporting evidence from the research papers.

Data Extraction

The data_extractor module handles the process of extracting structured data from papers based on user-defined schemas:

  • Data Types: Offers various data types (number, date, text_block, etc.) for flexible data extraction.
  • Schema Generation: Assists in generating a schema that outlines the data to be extracted.
  • Data Extraction: Uses LLMs to extract the specified data from research papers.

Language Model Interaction

The llm module manages interactions with the OpenAI API:

  • API Calls: Handles requests to the OpenAI API for tasks such as text generation and function calling.
  • Token Management: Tracks token usage to stay within API limits.
  • Error Handling: Provides error handling for API requests.

Configuring the OpenAI API Key

ScienceAI requires an OpenAI API key to interact with the OpenAI language models. Follow these steps to configure your API key:

  1. Obtain an API Key: Sign up for an API key from OpenAI if you don't already have one.
  2. Enter the API Key: The first time you run ScienceAI, you will be prompted to enter your OpenAI API key. Paste your API key into the prompt.

Contributing

We welcome contributions to ScienceAI! Here's how you can get involved:

  • Report Bugs: If you find any issues or bugs, please open an issue on our GitHub repository.
  • Feature Requests: Have an idea for a new feature? Submit a feature request on GitHub.
  • Pull Requests: Want to contribute code? Fork the repository, make your changes, and submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scienceai_llm-0.1.2.tar.gz (65.1 kB view details)

Uploaded Source

Built Distribution

scienceai_llm-0.1.2-py3-none-any.whl (72.1 kB view details)

Uploaded Python 3

File details

Details for the file scienceai_llm-0.1.2.tar.gz.

File metadata

  • Download URL: scienceai_llm-0.1.2.tar.gz
  • Upload date:
  • Size: 65.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for scienceai_llm-0.1.2.tar.gz
Algorithm Hash digest
SHA256 6b44e126b4d6148b7de125a449fbf366fc989a9438d6c05829d369758bfb56d6
MD5 31530bb01d17c575e720af544e12fd0e
BLAKE2b-256 60eee6020b4ce5274df777c2b5185e4eef8aef1a2ad32e688002b67e31f0b392

See more details on using hashes here.

File details

Details for the file scienceai_llm-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for scienceai_llm-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6b9c9c705a4d3daa61b6c62ff8961ad37390e9ab4578f077fe64f5e7578ecb9b
MD5 cee96c984e09d84d184c354350090906
BLAKE2b-256 e2ef1df973a6c5534b523782ef155feb50780e50a76f8e80c8d4099af99372c5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page