Skip to main content

A comprehensive Python library for scientific computing and data analysis

Project description

SciTeX

A Python framework for scientific research that makes the entire research pipeline more standardized, structured, and reproducible by automating repetitive processes.

Part of the fully open-source SciTeX project: https://scitex.ai

PyPI version Python Versions License Tests Coverage Documentation Code Style Pre-commit

📦 Installation

pip install scitex # ~600 MB, Core + utilities
pip install scitex[dl,ml,jupyter,neuro,web,scholar,writer,dev] # ~2-5 GB, Complete toolkit

Optional Groups:

Group Packages Size Impact
dl PyTorch, transformers +2-4 GB
ml scikit-image, catboost, optuna, OpenAI, Anthropic, Groq ~200 MB
jupyter JupyterLab, papermill ~100 MB
neuro MNE, obspy (EEG/MEG analysis) ~200 MB
web FastAPI, Flask, Streamlit ~50 MB
scholar Selenium, PDF tools, paper management ~150 MB
writer LaTeX compilation tools ~10 MB
dev Testing, linting (dev only) ~100 MB

📦 Module Overview

SciTeX is organized into focused modules for different aspects of scientific computing:

🔧 Core Utilities

Module Description
scitex.gen Project setup, session management, and experiment tracking
scitex.io Universal I/O for 30+ formats (CSV, JSON, HDF5, Zarr, pickle, etc.)
scitex.path Path manipulation and project structure utilities
scitex.logging Structured logging with color support and context

📊 Data Science & Statistics

Module Description
scitex.stats 16 statistical tests, effect sizes, power analysis, multiple corrections
scitex.plt Enhanced matplotlib with auto-export and scientific captions
scitex.pd Pandas extensions for research workflows

🧠 AI & Machine Learning

Module Description
scitex.ai GenAI (7 providers), classification, training utilities
scitex.torch PyTorch training loops, metrics, and utilities
scitex.nn Custom neural network layers

🌊 Signal Processing

Module Description
scitex.dsp Filtering, spectral analysis, wavelets, PAC, ripple detection

📚 Literature Management

Module Description
scitex.scholar Paper search, PDF download, BibTeX enrichment with IF/citations

🌐 Web & Browser

Module Description
scitex.browser Playwright automation with debugging, PDF handling, popups

🗄️ Data Management

Module Description
scitex.db SQLite3 and PostgreSQL abstractions

🛠️ Utilities

Module Description
scitex.decorators Function decorators for caching, timing, validation
scitex.rng Reproducible random number generation
scitex.resource System resource monitoring (CPU, memory, GPU)
scitex.dict Dictionary manipulation and nested access
scitex.str String utilities for scientific text processing

🚀 Quick Start

Use Case 1: Data Analysis with Statistics

import scitex as stx

# Load data
data = stx.io.load("experiment_data.csv")
control = data[data['group'] == 'control']['response']
treatment = data[data['group'] == 'treatment']['response']

# Statistical comparison
from scitex.stats.tests.parametric import ttest_ind
from scitex.stats.effect_sizes import cohens_d

result = ttest_ind(control, treatment)
effect = cohens_d(treatment, control)

print(f"{result['formatted']}")  # "t(58) = 2.45, p = 0.017*"
print(f"Cohen's d = {effect['d']:.2f} ({effect['interpretation']})")

# Visualization
fig, ax = stx.plt.subplots()
ax.boxplot([control, treatment], labels=['Control', 'Treatment'])
stx.io.save(fig, "comparison.png")  # Saves figure + data as CSV

Use Case 2: Signal Processing Pipeline

import scitex as stx

# Load EEG/neural data
signal = stx.io.load("neural_recording.h5")  # (n_channels, n_epochs, n_timepoints)
fs = 1000  # Sampling rate

# Preprocessing
from scitex.dsp import filt, psd, wavelet

# Filter to theta band (4-8 Hz)
theta = filt.bandpass(signal, fs, bands=[[4, 8]])

# Power spectral density
freqs, power = psd(signal, fs)

# Time-frequency analysis
import numpy as np
tf_freqs = np.logspace(np.log10(1), np.log10(100), 50)
wavelet_coeffs = wavelet(signal, fs, freqs=tf_freqs)

# Save results
stx.io.save(theta, "processed/theta_filtered.npy")
stx.io.save(power, "processed/psd_results.h5")

Use Case 3: Literature Management

import scitex as stx

# Search and download academic papers
scholar = stx.scholar.Scholar(project="my_research")

# Enrich BibTeX with citations and impact factors
papers = scholar.load_bibtex("references.bib")
enriched = scholar.enrich_papers(papers)

# Filter high-impact papers
high_impact = enriched.filter(
    year_min=2020,
    min_citations=50,
    min_impact_factor=5.0
)

# Download PDFs (requires institutional access)
import asyncio
dois = [p.doi for p in high_impact if p.doi]
asyncio.run(scholar.download_pdfs_from_dois_async(dois))

# Export results
scholar.save_papers_as_bibtex(high_impact, "high_impact_papers.bib")

Use Case 4: Machine Learning Workflow

import scitex as stx
import numpy as np

# Load and prepare data
X_train = stx.io.load("features_train.npy")
y_train = stx.io.load("labels_train.npy")
X_test = stx.io.load("features_test.npy")
y_test = stx.io.load("labels_test.npy")

# Train model
from scitex.ai import ClassificationReporter, EarlyStopping

model = YourModel()  # Your PyTorch/sklearn model
early_stopper = EarlyStopping(patience=10)

# Training loop
for epoch in range(100):
    train_loss = train_epoch(model, X_train, y_train)
    val_loss = validate(model, X_val, y_val)

    early_stopper(val_loss, model)
    if early_stopper.early_stop:
        break

# Evaluate with comprehensive metrics
reporter = ClassificationReporter(save_dir="./results")
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)

reporter.calc_metrics(y_test, y_pred, y_prob, labels=['class0', 'class1'])
reporter.summarize()  # Prints confusion matrix, ROC, PR curves
reporter.save()  # Saves all metrics and plots

Use Case 5: Complete Research Script

#!/usr/bin/env python3
import scitex as stx
import sys
import matplotlib.pyplot as plt

def main(args):
    # Load experimental data
    data = stx.io.load("data.csv")

    # Preprocess
    processed = preprocess_data(data)

    # Statistical analysis
    results = perform_statistical_tests(processed)

    # Generate publication-quality figures
    fig, axes = stx.plt.subplots(2, 2, figsize=(12, 10))
    plot_results(axes, results)
    stx.io.save(fig, "results/figure1.png")  # Auto-exports data as CSV

    # Save results
    stx.io.save(results, "results/statistical_results.json")

    return 0

if __name__ == '__main__':
    # Initialize SciTeX session (logging, reproducibility, etc.)
    CONFIG, sys.stdout, sys.stderr, plt, CC, rng = stx.session.start(
        sys, plt,
        file=__file__,
        verbose=True
    )

    # Run main analysis
    exit_status = main(None)

    # Cleanup and finalize
    stx.session.close(CONFIG, exit_status=exit_status)

Common Patterns

import scitex as stx

# Universal I/O - format auto-detected
data = stx.io.load("data.csv")       # → pandas DataFrame
array = stx.io.load("data.npy")      # → numpy array
model = stx.io.load("model.pth")     # → PyTorch state dict
config = stx.io.load("config.yaml")  # → dict

# Caching expensive operations
@stx.io.cache(cache_dir=".cache")
def expensive_computation(x):
    return process_large_dataset(x)

# Reproducible random numbers
rng = stx.rng.get_rng(seed=42)
random_data = rng.normal(0, 1, size=1000)

# Path management
project_root = stx.path.find_git_root()
data_dir = project_root / "data"
latest_results = stx.path.find_latest("results/experiment_v*.csv")

def parse_args() -> argparse.Namespace:
    """Parse command line arguments."""
    import scitex as stx
    parser = argparse.ArgumentParser(description='')
    args = parser.parse_args()
    return args

def run_main() -> None:
    """Initialize scitex framework, run main function, and cleanup."""
    global CONFIG, CC, sys, plt, rng

    import sys
    import matplotlib.pyplot as plt
    import scitex as stx

    args = parse_args()

    # Start an session with:
    #   Collect configs defined in ./config/*yaml
    #   Prepare runtime directory as /path/to/script_out/RUNNING/YYYY_MMDD_mmss_<4-random-digit>/
    #   Start logging to <runtime_directory>/logs/{stdout.log,stderr.log}
    #   Setup matplotlib wrapper for saving plotted data as csv
    #   CC: Custom colors for plotting
    #   rng: Fix random seeds for common packages as 42
    CONFIG, sys.stdout, sys.stderr, plt, CC, rng = stx.session.start(
        sys,
        plt,
        args=args,
        file=__FILE__,
        sdir_suffix=None,
        verbose=False,
        agg=True,
    )

    # Check the runtime status at the end
    exit_status = main(args)

    # Close the session with:
    #   Route all logs and outputs created by the session to RUNNING
    #   Send notification user (needs setup)
    stx.session.close(
        CONFIG,
        verbose=False,
        notify=False,
        message="",
        exit_status=exit_status,
    )

Recommended Python Script Template for SciTeX project
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Timestamp: "2024-11-03 10:33:13 (ywatanabe)"
# File: placeholder.py

__FILE__ = "placeholder.py"

"""
Functionalities:
  - Does XYZ
  - Does XYZ
  - Does XYZ
  - Saves XYZ

Dependencies:
  - scripts:
    - /path/to/script1
    - /path/to/script2
  - packages:
    - package1
    - package2
IO:
  - input-files:
    - /path/to/input/file.xxx
    - /path/to/input/file.xxx

  - output-files:
    - /path/to/input/file.xxx
    - /path/to/input/file.xxx

(Remove me: Please fill docstrings above, while keeping the bulette point style, and remove this instruction line)
"""

"""Imports"""
import os
import sys
import argparse
import scitex as stx
from scitex import logging

logger = logging.getLogger(__name__)

"""Warnings"""
# stx.pd.ignore_SettingWithCopyWarning()
# warnings.simplefilter("ignore", UserWarning)
# with warnings.catch_warnings():
#     warnings.simplefilter("ignore", UserWarning)

"""Parameters"""
# CONFIG = stx.io.load_configs()

"""Functions & Classes"""
def main(args):
    return 0

import argparse
def parse_args() -> argparse.Namespace:
    """Parse command line arguments."""
    import scitex as stx
    parser = argparse.ArgumentParser(description='')
    # parser.add_argument(
    #     "--var",
    #     "-v",
    #     type=int,
    #     choices=None,
    #     default=1,
    #     help="(default: %(default)s)",
    # )
    # parser.add_argument(
    #     "--flag",
    #     "-f",
    #     action="store_true",
    #     default=False,
    #     help="(default: %%(default)s)",
    # )
    args = parser.parse_args()
    return args

def run_main() -> None:
    """Initialize scitex framework, run main function, and cleanup."""
    global CONFIG, CC, sys, plt, rng

    import sys
    import matplotlib.pyplot as plt
    import scitex as stx

    args = parse_args()

    CONFIG, sys.stdout, sys.stderr, plt, CC, rng = stx.session.start(
        sys,
        plt,
        args=args,
        file=__FILE__,
        sdir_suffix=None,
        verbose=False,
        agg=True,
    )

    exit_status = main(args)

    stx.session.close(
        CONFIG,
        verbose=False,
        notify=False,
        message="",
        exit_status=exit_status,
    )

if __name__ == '__main__':
    run_main()

# EOF

📖 Documentation

Online Documentation

Local Resources

Key Tutorials

  1. I/O Operations: Essential file handling (start here!)
  2. Plotting: Publication-ready visualizations
  3. Statistics: Research-grade statistical analysis
  4. Scholar: Literature management with impact factors
  5. AI/ML: Complete machine learning toolkit

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

📄 License

This project is licensed under the MIT License.

📧 Contact

Yusuke Watanabe (ywatanabe@scitex.ai)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scitex-2.1.3.tar.gz (24.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scitex-2.1.3-py3-none-any.whl (7.2 MB view details)

Uploaded Python 3

File details

Details for the file scitex-2.1.3.tar.gz.

File metadata

  • Download URL: scitex-2.1.3.tar.gz
  • Upload date:
  • Size: 24.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.0rc1

File hashes

Hashes for scitex-2.1.3.tar.gz
Algorithm Hash digest
SHA256 7a008f5b77ee67a96d5ea9e64d9365a72d80e2d669a74d927a3c9535c50f2e3a
MD5 ddde7fdcbafb303bf6b412788d15dcde
BLAKE2b-256 eaf9f5d45ea5d4d029ef67959aee55e9cfb02b7a5d88718f8a411cd2c3debb72

See more details on using hashes here.

File details

Details for the file scitex-2.1.3-py3-none-any.whl.

File metadata

  • Download URL: scitex-2.1.3-py3-none-any.whl
  • Upload date:
  • Size: 7.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.0rc1

File hashes

Hashes for scitex-2.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f863bfec58c345238d0ffce63fb740c67dc83c53bedeca4d7bf623003f9794d2
MD5 dc1f365d3079f518e717232a3d27b01b
BLAKE2b-256 23dc4aa81f12323cb8ecdaf5f12f4a19931a2f6a162215215b8f702a4514c3bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page