Skip to main content

coreason-prism

Project description

coreason-prism

The Scientific Eye / Multi-Modal Encoder

Organization License CI Code Quality Docs

coreason-prism is the specialized processing engine for scientific data types (Chemistry and Vision) within the CoReason AI ecosystem. It acts as the "Scientific Eye" and "Multi-Modal Encoder", transforming fragile string representations and static images into robust, mathematical graphs and vectors.

Core Philosophy:

"A Molecule is a Graph, not a String. A Chart is Data, not an Image."


Features

Derived from the Product Requirements Document:

  • Cheminformatic Grounding (The Chemist):

    • Treats molecules as mathematical graphs, not strings.
    • Normalizes and sanitizes chemical structures (SMILES/InChI) using datamol.
    • Transmutes SMILES to SELFIES for 100% valid generative output.
    • Computes fingerprints (Morgan/ECFP) for structural similarity search.
    • Calculates key properties: Molecular Weight, LogP, TPSA, Lipinski Violations.
  • Visual De-Plotting (The Analyst):

    • Extracts raw data from scientific figures (e.g., Kaplan-Meier curves) using DePlot.
    • Digitizes charts into linear tables/DataFrames.
    • Enables meta-analysis of data locked in PDF images.
  • Bio-Image Segmentation (The Biologist):

    • Segments and classifies medical images (e.g., Histology) using MedSAM.
    • Detects ROIs and computes metrics like cell counts and tumor area.
  • Multi-Modal Embedding (The Embedder):

    • Generates joint embeddings for text, molecules, and images using BioCLIP.
    • Enables multi-modal retrieval (e.g., searching for histology slides via text description).

Installation

pip install coreason-prism

Or install from source:

git clone https://github.com/CoReason-AI/coreason_prism.git
cd coreason_prism
pip install .

Usage

Here is a concise snippet showing how to initialize and use the library:

from pathlib import Path
from coreason_prism.interface import Prism, PrismMode

# Initialize Prism (The Facade)
prism = Prism(light_mode=False)  # Set light_mode=True to skip heavy models (DePlot/BioCLIP)

# 1. Process a Molecule (SMILES -> Graph/SELFIES + Properties)
molecule_result = prism.process_molecule("CC(=O)Oc1ccccc1C(=O)O")  # Aspirin
if molecule_result.status == "VALID":
    print(f"Canonical SMILES: {molecule_result.canonical_smiles}")
    print(f"SELFIES: {molecule_result.selfies_string}")
    print(f"LogP: {molecule_result.logp}")
    print(f"Fingerprint (first 10 bits): {molecule_result.fingerprint_vector[:10]}")

# 2. Process a Chart Image (Extract Data)
# Ensure you have an image file at the specified path
chart_path = Path("tests/data/kaplan_meier.png")
if chart_path.exists():
    chart_result = prism.process_image(
        image_path=chart_path,
        source_document_id="doc_123",
        mode=PrismMode.CHART
    )
    print(f"Figure Type: {chart_result.figure_type}")
    print(f"Extracted Data: {chart_result.data_series}")
    if chart_result.metadata:
        print(f"Median Survival: {chart_result.metadata.get('median_survival')}")

# 3. Process a Bio-Image (Segmentation)
bio_path = Path("tests/data/histology_slide.jpg")
if bio_path.exists():
    bio_result = prism.process_image(
        image_path=bio_path,
        source_document_id="doc_456",
        mode=PrismMode.BIO
    )
    print(f"Cell Count: {bio_result.metadata.get('cell_count')}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coreason_prism-0.4.0.tar.gz (21.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

coreason_prism-0.4.0-py3-none-any.whl (30.6 kB view details)

Uploaded Python 3

File details

Details for the file coreason_prism-0.4.0.tar.gz.

File metadata

  • Download URL: coreason_prism-0.4.0.tar.gz
  • Upload date:
  • Size: 21.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for coreason_prism-0.4.0.tar.gz
Algorithm Hash digest
SHA256 6226fdb7d55039dd604f697bbc3a34df07dc7515c8c09be10467a6d1f1fe0704
MD5 a1549bb66ae612462fa07ebb39911742
BLAKE2b-256 fb74c65d565970ca45bd82f1e6afd6523a1d05092b1c00caab796241b4c33511

See more details on using hashes here.

Provenance

The following attestation bundles were made for coreason_prism-0.4.0.tar.gz:

Publisher: publish.yml on CoReason-AI/coreason-prism

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file coreason_prism-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: coreason_prism-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for coreason_prism-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e2bac3242006064d8d6e41d2765a7388469852b65fd35ce736105b43d03fce7
MD5 cae874b06669962ee585f98071191b0d
BLAKE2b-256 2cefcccdaaf8e262f1acb76ddd665492db8b07e2779849e85c346f8fcdef9991

See more details on using hashes here.

Provenance

The following attestation bundles were made for coreason_prism-0.4.0-py3-none-any.whl:

Publisher: publish.yml on CoReason-AI/coreason-prism

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page