Skip to main content

A Python toolkit for scientific computing and data analysis, featuring parallel processing, batch OpenAI API calls, statistical analysis, and common research utilities.

Project description

SciKuFu

中文

SciKuFu is a Python toolkit that wraps up the most frequently used utilities from my personal research workflow. It aims to boost productivity and simplify common scientific computing and data analysis tasks.

Features

  • Parallel Processing: High-performance parallel computing with threading, multiprocessing, and asyncio backends
  • OpenAI Integration: Batch processing of OpenAI API calls with caching and structured output parsing
  • File I/O Operations: Unified text, JSON, and JSON Lines file operations with encoding support
  • Statistical Analysis: Comprehensive statistical methods including t-tests with normality checks and visualization
  • Clean Architecture: Modular design with optional dependencies for lightweight core usage

Installation

Basic Installation

pip install scikufu

With Optional Features

# Install with parallel processing and OpenAI support
pip install scikufu[parallel,parallel-openai]

# Install with statistical analysis support
pip install scikufu[stats]

# Install with all features
pip install scikufu[parallel,parallel-openai,stats]

From Source

git clone https://github.com/Mars160/scikufu.git
cd scikufu
pip install -e .

Quick Start

Parallel Processing

from scikufu.parallel import run_in_parallel

def process_item(item):
    return item * 2

items = [1, 2, 3, 4, 5]
results = run_in_parallel(
    func=process_item,
    tasks=items,
    n_jobs=4,
    backend="threading"  # or "multiprocessing", "asyncio"
)
print(results)  # [2, 4, 6, 8, 10]

OpenAI API Batch Processing

from scikufu.parallel.openai import Client

client = Client(api_key="your-api-key")
messages = [
    [{"role": "user", "content": "What is Python?"}],
    [{"role": "user", "content": "What is JavaScript?"}],
]

# Simple chat completion
results = client.chat_completion(
    messages=messages,
    model="gpt-4",
    n_jobs=4,
    with_tqdm=True,
    temperature=0.7
)

# Structured output parsing with Pydantic
from pydantic import BaseModel

class Answer(BaseModel):
    language: str
    description: str

structured_results = client.chat_completion_parse(
    messages=messages,
    model="gpt-4",
    response_model=Answer,
    n_jobs=4
)

File I/O Operations

from scikufu.file import text, json, jsonl

# Text file operations
text.write("hello.txt", "Hello, World!")
content = text.read("hello.txt", encoding="utf-8")

# JSON file operations
data = {"name": "SciKuFu", "version": "0.1.0"}
json.write("config.json", data, indent=4)
loaded_data = json.read("config.json")

# JSON Lines operations
records = [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]
jsonl.write("data.jsonl", records)
for record in jsonl.read("data.jsonl"):
    print(record)

Statistical Analysis

from scikufu.stats.ttest import t_test
import numpy as np

# Generate sample data
group1 = np.random.normal(100, 15, 30)
group2 = np.random.normal(105, 15, 30)

# Comprehensive t-test with visualization
t_stat, p_value, significant = t_test(
    data=(group1, group2),
    alpha=0.05,
    show_plot=True,
    save_path="./t_test_plot.png",
    test_type="welch"  # or "student"
)

print(f"t-statistic: {t_stat}")
print(f"p-value: {p_value}")
print(f"Significant: {significant}")

Modules

🚀 Parallel Processing (scikufu.parallel)

  • Core Functions: run_in_parallel(), run_async_in_parallel()
  • Backends: Threading, Multiprocessing, AsyncIO
  • Features: Disk-based caching, retry mechanisms, progress tracking
  • Use Case: CPU-bound tasks, I/O operations, concurrent API calls

🤖 OpenAI Integration (scikufu.parallel.openai)

  • Client Class: Wrapper for OpenAI async API
  • Features: Batch processing, structured output parsing, caching
  • Use Case: Large-scale language model inference, data processing

📁 File I/O (scikufu.file)

  • Text Operations: text.read(), text.write(), text.append()
  • JSON Operations: json.read(), json.write(), json.append()
  • JSONL Operations: jsonl.read(), jsonl.write(), jsonl.append()
  • Features: Unicode support, automatic directory creation, memory efficiency

📊 Statistical Analysis (scikufu.stats)

  • T-Test: Comprehensive statistical testing with visualization
  • Features: Normality checks, effect size calculation, PP/QQ plots
  • Input Formats: Tuples, pandas DataFrames, numpy arrays
  • Export: Multiple plot formats, detailed statistical reports

Optional Dependencies

# Parallel processing features
pip install diskcache tqdm

# OpenAI API integration
pip install openai

# Statistical analysis and visualization
pip install matplotlib numpy pandas scipy

Project Structure

scikufu/
├── src/scikufu/          # Main package source
│   ├── parallel/         # Parallel processing utilities
│   ├── openai.py        # OpenAI API integration
│   ├── file/            # File I/O operations
│   ├── stats/           # Statistical analysis
│   └── py.typed        # Type annotations support
├── tests/               # Comprehensive test suite
│   ├── parallel/       # Parallel processing tests
│   ├── file/          # File I/O tests
│   └── stats/         # Statistical tests
└── htmlcov/           # Coverage reports

Requirements

  • Python: 3.12+
  • Core Dependencies: None (lightweight design)
  • Optional Dependencies: Feature-based extras for specific functionality

License

MIT

Contributing

All features are developed based on actual research needs. Suggestions, feedback, and contributions are welcome! Please feel free to open issues or submit pull requests.

Note

This toolkit is designed to be modular and extensible. Each module can be used independently, and the core functionality remains lightweight with optional dependencies for specific features.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikufu-0.1.1.post1.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scikufu-0.1.1.post1-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file scikufu-0.1.1.post1.tar.gz.

File metadata

  • Download URL: scikufu-0.1.1.post1.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for scikufu-0.1.1.post1.tar.gz
Algorithm Hash digest
SHA256 5989866ceae9285bf18640953c0eeb3c3d293bb6800a95f1830b3a78a5856d70
MD5 f7ffa07b553a13e9a5e384b5a04c2f4c
BLAKE2b-256 f95f8a81ed918d3ae65342dcd3f48bb06d82180909b266bc31d66965d4bdd258

See more details on using hashes here.

File details

Details for the file scikufu-0.1.1.post1-py3-none-any.whl.

File metadata

  • Download URL: scikufu-0.1.1.post1-py3-none-any.whl
  • Upload date:
  • Size: 14.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for scikufu-0.1.1.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 5ed31374d23d7da2ae727cd42342a6d9371a9481e8a167b427ad0922dd11d3ba
MD5 70ac5fd5c4595eb0116a36435d181024
BLAKE2b-256 a8478da9e46499c1978ee2387b1932b2d9ac1e34147510532f999789ddbc2713

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page