Skip to main content

A comprehensive Python library for scientific computing and data analysis

Project description

SciTeX

A Python framework for scientific research that makes the entire research pipeline more standardized, structured, and reproducible by automating repetitive processes.

Part of the fully open-source SciTeX project: https://scitex.ai

PyPI version Python Versions License Tests Coverage Stats Coverage Logging Coverage Documentation Code Style Pre-commit

๐Ÿ“ฆ Installation

pip install scitex # ~600 MB, Core + utilities
pip install scitex[dl,ml,jupyter,neuro,web,gui,scholar,writer,dev] # ~2-5 GB, Complete toolkit

Alial

# Ubuntu
sudo apt update
sudo apt-get install ttf-mscorefonts-installer
sudo DEBIAN_FRONTEND=noninteractive \
    apt install -y ttf-mscorefonts-installer
sudo mkdir -p /usr/share/fonts/truetype/custom
sudo cp /mnt/c/Windows/Fonts/arial*.ttf /usr/share/fonts/truetype/custom/
sudo fc-cache -fv
rm ~/.cache/matplotlib -rf

# WSL
mkdir -p ~/.local/share/fonts/windows
cp /mnt/c/Windows/Fonts/arial*.ttf ~/.local/share/fonts/windows/
fc-cache -fv ~/.local/share/fonts/windows
rm ~/.cache/matplotlib -rf
# Check
import matplotlib
print(matplotlib.rcParams['font.family'])

import matplotlib.font_manager as fm
fonts = fm.findSystemFonts()
print("Arial found:", any("Arial" in f or "arial" in f for f in fonts))
[a for a in fonts if "Arial" in a or "arial" in a][:5]

import matplotlib as mpl
import matplotlib.pyplot as plt

mpl.rcParams["font.family"] = "Arial"
mpl.rcParams["font.sans-serif"] = ["Arial"]  # ๅฟตใฎใŸใ‚

fig, ax = plt.subplots(figsize=(3, 2))
ax.text(0.5, 0.5, "Arial Test", fontsize=32, ha="center", va="center")
ax.set_axis_off()

fig.savefig("arial_test.png", dpi=300)
plt.close(fig)

Optional Groups:

Group Packages Size Impact
dl PyTorch, transformers +2-4 GB
ml scikit-image, catboost, optuna, OpenAI, Anthropic, Groq ~200 MB
jupyter JupyterLab, papermill ~100 MB
neuro MNE, obspy (EEG/MEG analysis) ~200 MB
web FastAPI, Flask, Streamlit ~50 MB
gui Flask, DearPyGui, PyQt6 (multi-backend figure editors) ~100 MB
scholar Selenium, PDF tools, paper management ~150 MB
writer LaTeX compilation tools ~10 MB
dev Testing, linting (dev only) ~100 MB

๐Ÿš€ Quick Start

The SciTeX Advantage: 70% Less Code

Compare these two implementations that produce identical research outputs:

With SciTeX (57 Lines of Code)

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Timestamp: "2025-11-18 09:34:36 (ywatanabe)"
# File: /home/ywatanabe/proj/scitex-code/examples/demo_session_plt_io.py


"""Minimal Demonstration for scitex.{session,io,plt}"""

import numpy as np
import scitex as stx


def demo(filename, verbose=False):
    """Show metadata without QR code (just embedded)."""

    # matplotlib.pyplot wrapper.
    fig, ax = stx.plt.subplots()

    t = np.linspace(0, 2, 1000)
    signal = np.sin(2 * np.pi * 5 * t) * np.exp(-t / 2)

    ax.plot_line(t, signal)  # Original plot for automatic CSV export
    ax.set_xyt(
        "Time (s)",
        "Amplitude",
        "Clean Figure (metadata embedded, no QR overlay)",
    )

    # Saving: stx.io.save(obj, rel_path, **kwargs)
    stx.io.save(
        fig,
        filename,
        metadata={"exp": "s01", "subj": "S001"},  # with meatadata embedding
        symlink_to="./data",  # Symlink for centralized outputs
        verbose=verbose,  # Automatic terminal logging (no manual print())
    )
    fig.close()

    # Loading: stx.io.load(path)
    ldir = __file__.replace(".py", "_out")
    img, meta = stx.io.load(
        f"{ldir}/{filename}",
        verbose=verbose,
    )


@stx.session
def main(filename="demo.jpg", verbose=True):
    """Run demo for scitex.{session,plt,io}."""

    demo(filename, verbose=verbose)

    return 0


if __name__ == "__main__":
    main()

Without SciTeX (188 Lines of Code)

Click to see the pure Python equivalent requiring 3.3ร— more code ```python #!/usr/bin/env python3 # -*- coding: utf-8 -*- # Timestamp: "2025-11-18 09:34:51 (ywatanabe)" # File: /home/ywatanabe/proj/scitex-code/examples/demo_session_plt_io_pure_python.py

"""Minimal Demonstration - Pure Python Version"""

import argparse import json import logging import os import shutil import sys from datetime import datetime from pathlib import Path import random import string

import matplotlib.pyplot as plt import numpy as np from PIL import Image from PIL.PngImagePlugin import PngInfo

def generate_session_id(): """Generate unique session ID.""" timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S") random_suffix = ''.join(random.choices(string.ascii_uppercase + string.digits, k=4)) return f"{timestamp}_{random_suffix}"

def setup_logging(log_dir): """Set up logging infrastructure.""" log_dir.mkdir(parents=True, exist_ok=True) logger = logging.getLogger(name) logger.setLevel(logging.INFO)

stdout_handler = logging.FileHandler(log_dir / "stdout.log")
stderr_handler = logging.FileHandler(log_dir / "stderr.log")
console_handler = logging.StreamHandler(sys.stdout)

formatter = logging.Formatter('%(levelname)s: %(message)s')
stdout_handler.setFormatter(formatter)
stderr_handler.setFormatter(formatter)
console_handler.setFormatter(formatter)

logger.addHandler(stdout_handler)
logger.addHandler(stderr_handler)
logger.addHandler(console_handler)

return logger

def save_plot_data_to_csv(fig, output_path): """Extract and save plot data.""" csv_path = output_path.with_suffix('.csv') data_lines = ["ax_00_plot_line_0_line_x,ax_00_plot_line_0_line_y"]

for ax in fig.get_axes():
    for line in ax.get_lines():
        x_data = line.get_xdata()
        y_data = line.get_ydata()
        for x, y in zip(x_data, y_data):
            data_lines.append(f"{x},{y}")

csv_path.write_text('\n'.join(data_lines))
return csv_path, csv_path.stat().st_size / 1024

def embed_metadata_in_image(image_path, metadata): """Embed metadata into image file.""" img = Image.open(image_path)

if image_path.suffix.lower() in ['.png']:
    pnginfo = PngInfo()
    for key, value in metadata.items():
        pnginfo.add_text(key, str(value))
    img.save(image_path, pnginfo=pnginfo)
elif image_path.suffix.lower() in ['.jpg', '.jpeg']:
    json_path = image_path.with_suffix(image_path.suffix + '.meta.json')
    json_path.write_text(json.dumps(metadata, indent=2))
    img.save(image_path, quality=95)

def save_figure(fig, output_path, metadata=None, symlink_to=None, logger=None): """Save figure with metadata and symlink.""" output_path = Path(output_path) output_path.parent.mkdir(parents=True, exist_ok=True)

if metadata is None:
    metadata = {}
metadata['url'] = 'https://scitex.ai'

if logger:
    logger.info(f"๐Ÿ“ Saving figure with metadata to: {output_path}")
    logger.info(f"  โ€ข Embedded metadata: {metadata}")

csv_path, csv_size = save_plot_data_to_csv(fig, output_path)
if logger:
    logger.info(f"โœ… Saved to: {csv_path} ({csv_size:.1f} KiB)")

fig.savefig(output_path, dpi=150, bbox_inches='tight')
embed_metadata_in_image(output_path, metadata)

if symlink_to:
    symlink_dir = Path(symlink_to)
    symlink_dir.mkdir(parents=True, exist_ok=True)
    symlink_path = symlink_dir / output_path.name
    if symlink_path.exists() or symlink_path.is_symlink():
        symlink_path.unlink()
    symlink_path.symlink_to(output_path.resolve())

def demo(output_dir, filename, verbose=False, logger=None): """Generate, plot, and save signal.""" fig, ax = plt.subplots(figsize=(8, 6))

t = np.linspace(0, 2, 1000)
signal = np.sin(2 * np.pi * 5 * t) * np.exp(-t / 2)

ax.plot(t, signal)
ax.set_xlabel("Time (s)")
ax.set_ylabel("Amplitude")
ax.set_title("Damped Oscillation")
ax.grid(True, alpha=0.3)

output_path = output_dir / filename
save_figure(fig, output_path, metadata={"exp": "s01", "subj": "S001"},
            symlink_to=output_dir.parent / "data", logger=logger)
plt.close(fig)

return 0

def main(): """Run demo - Pure Python Version.""" parser = argparse.ArgumentParser(description="Run demo - Pure Python Version") parser.add_argument('-f', '--filename', default='demo.jpg') parser.add_argument('-v', '--verbose', type=bool, default=True) args = parser.parse_args()

session_id = generate_session_id()
script_path = Path(__file__).resolve()
output_base = script_path.parent / (script_path.stem + "_out")
running_dir = output_base / "RUNNING" / session_id
logs_dir = running_dir / "logs"
config_dir = running_dir / "CONFIGS"

logger = setup_logging(logs_dir)

print("=" * 40)
print(f"Pure Python Demo")
print(f"{session_id} (PID: {os.getpid()})")
print(f"\n{script_path}")
print(f"\nArguments:")
print(f"    filename: {args.filename}")
print(f"    verbose: {args.verbose}")
print("=" * 40)

config_dir.mkdir(parents=True, exist_ok=True)
config_data = {
    'ID': session_id,
    'FILE': str(script_path),
    'SDIR_OUT': str(output_base),
    'SDIR_RUN': str(running_dir),
    'PID': os.getpid(),
    'ARGS': vars(args)
}
(config_dir / "CONFIG.json").write_text(json.dumps(config_data, indent=2))

try:
    result = demo(output_base, args.filename, args.verbose, logger)
    success_dir = output_base / "FINISHED_SUCCESS" / session_id
    success_dir.parent.mkdir(parents=True, exist_ok=True)
    shutil.move(str(running_dir), str(success_dir))
    logger.info(f"\nโœ… Script completed: {success_dir}")
    return result
except Exception as e:
    error_dir = output_base / "FINISHED_ERROR" / session_id
    error_dir.parent.mkdir(parents=True, exist_ok=True)
    shutil.move(str(running_dir), str(error_dir))
    logger.error(f"\nโŒ Error: {e}", exc_info=True)
    raise

if name == "main": sys.exit(main())


</details>

### What You Get With `@stx.session`

Both implementations produce **identical outputs**, but SciTeX eliminates 131 lines of boilerplate:
```bash
demo_session_plt_io_out/
โ”œโ”€โ”€ demo.csv              # Auto-extracted plot data
โ”œโ”€โ”€ demo.jpg              # With embedded metadata
โ””โ”€โ”€ FINISHED_SUCCESS/
    โ””โ”€โ”€ 2025Y-11M-18D-09h12m03s_HmH5-main/
        โ”œโ”€โ”€ CONFIGS/
        โ”‚   โ”œโ”€โ”€ CONFIG.pkl    # Python object
        โ”‚   โ””โ”€โ”€ CONFIG.yaml   # Human-readable
        โ””โ”€โ”€ logs/
            โ”œโ”€โ”€ stderr.log
            โ””โ”€โ”€ stdout.log

What SciTeX Automates:

  • โœ… Session ID generation and tracking
  • โœ… Output directory management (RUNNING/ โ†’ FINISHED_SUCCESS/)
  • โœ… Argument parsing with auto-generated help
  • โœ… Logging to files and console
  • โœ… Config serialization (YAML + pickle)
  • โœ… CSV export from matplotlib plots
  • โœ… Metadata embedding in images
  • โœ… Symlink management for centralized outputs
  • โœ… Error handling and directory cleanup
  • โœ… Global variable injection (CONFIG, plt, COLORS, logger, rng_manager)

Research Benefits:

  • ๐Ÿ“Š Figures + data always together - CSV auto-exported from every plot
  • ๐Ÿ”„ Perfect reproducibility - Every run tracked with unique session ID
  • ๐ŸŒ Universal format - CSV data readable anywhere
  • ๐Ÿ“ Zero manual work - Metadata embedded automatically
  • ๐ŸŽฏ 3.3ร— less code - Focus on research, not infrastructure

Try It Yourself

pip install scitex
python ./examples/demo_session_plt_io.py

๐Ÿ“ฆ Module Overview

SciTeX is organized into focused modules for different aspects of scientific computing:

๐Ÿ”ง Core Utilities

Module Description
scitex.gen Project setup, session management, and experiment tracking
scitex.io Universal I/O for 30+ formats (CSV, JSON, HDF5, Zarr, pickle, etc.)
scitex.path Path manipulation and project structure utilities
scitex.logging Structured logging with color support and context

๐Ÿ“Š Data Science & Statistics

Module Description
scitex.stats 16 statistical tests, effect sizes, power analysis, multiple corrections
scitex.plt Enhanced matplotlib with auto-export and scientific captions
scitex.pd Pandas extensions for research workflows

๐Ÿง  AI & Machine Learning

Module Description
scitex.ai GenAI (7 providers), classification, training utilities
scitex.torch PyTorch training loops, metrics, and utilities
scitex.nn Custom neural network layers

๐ŸŒŠ Signal Processing

Module Description
scitex.dsp Filtering, spectral analysis, wavelets, PAC, ripple detection

๐Ÿ“š Literature Management

Module Description
scitex.scholar Paper search, PDF download, BibTeX enrichment with IF/citations

๐ŸŒ Web & Browser

Module Description
scitex.browser Playwright automation with debugging, PDF handling, popups

๐Ÿ—„๏ธ Data Management

Module Description
scitex.db SQLite3 and PostgreSQL abstractions

๐Ÿ› ๏ธ Utilities

Module Description
scitex.decorators Function decorators for caching, timing, validation
scitex.rng Reproducible random number generation
scitex.resource System resource monitoring (CPU, memory, GPU)
scitex.dict Dictionary manipulation and nested access
scitex.str String utilities for scientific text processing

๐Ÿ“– Documentation

Online Documentation

Local Resources

Key Tutorials

  1. I/O Operations: Essential file handling (start here!)
  2. Plotting: Publication-ready visualizations
  3. Statistics: Research-grade statistical analysis
  4. Scholar: Literature management with impact factors
  5. AI/ML: Complete machine learning toolkit

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

๐Ÿ“„ License

AGPL-3.0.

๐Ÿ“ง Contact

Yusuke Watanabe (ywatanabe@scitex.ai)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scitex-2.7.3.tar.gz (26.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scitex-2.7.3-py3-none-any.whl (7.6 MB view details)

Uploaded Python 3

File details

Details for the file scitex-2.7.3.tar.gz.

File metadata

  • Download URL: scitex-2.7.3.tar.gz
  • Upload date:
  • Size: 26.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.0rc1

File hashes

Hashes for scitex-2.7.3.tar.gz
Algorithm Hash digest
SHA256 61773a48df7b20b0693cfadf4978b97ddd4d8799aca043b3070131061173637d
MD5 e326a1b3a1267f64b05f0960ecf7e9f8
BLAKE2b-256 d861cd994850c609c81b3fcdb5a2a85e2df078a9f810bcdfb55501c17c07f672

See more details on using hashes here.

File details

Details for the file scitex-2.7.3-py3-none-any.whl.

File metadata

  • Download URL: scitex-2.7.3-py3-none-any.whl
  • Upload date:
  • Size: 7.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.0rc1

File hashes

Hashes for scitex-2.7.3-py3-none-any.whl
Algorithm Hash digest
SHA256 37453a298d116bd2a0340253e298bd766f873bd858ba2b9d8eca40a71e3df882
MD5 039f164bae55e7ee089e4b6575895cb4
BLAKE2b-256 6d725cd7bd30c0cfd8306793856d1665d737d9d229e620ad94cac67e3e565f4e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page