Skip to main content

DSLighting 2.7.9 - Enhanced PyPI README with comprehensive documentation: installation guide, multi-provider setup, usage examples, and custom Agent tutorial

Project description

DSLighting

End-to-End Data Science Agent

Python PyPI PyPI - Downloads License

๐Ÿ“š Full Docs | ๐Ÿš€ Quick Start | ๐Ÿ’ป GitHub | ๐Ÿ› Issues


โœจ Highlights

  • ๐Ÿค– Intelligent Agent Workflows: aide / automind / dsagent / data_interpreter / autokaggle / aflow, etc.
  • ๐Ÿ” Discovery API: explore all available prompts and operators
  • ๐Ÿ“Š Data Management: unified data loading, task registry, and grading
  • ๐Ÿ”ง Multi-model Support: OpenAI, GLM, DeepSeek, Qwen, and more
  • ๐Ÿงฉ Extensible Architecture: custom tasks, workflows, and operators
  • ๐Ÿ“ฆ Smart Package Context: auto-detect installed packages to avoid incompatible code
  • ๐ŸŽฏ Built-in Datasets: run demos without data preparation
  • ๐Ÿ“ Full Traceability: logs, workspace, and artifacts saved automatically

๐Ÿš€ Quick Start

1. Install

pip install dslighting python-dotenv

System requirements: Python 3.10+. Using a virtual environment is recommended.

๐ŸŽ macOS note (xgboost)

If you use xgboost, install the OpenMP runtime:

brew install libomp

Otherwise you may see XGBoostError: Library not loaded: libomp.dylib.


2. Configure environment variables

Create a .env file:

# .env

# Default model (required)
LLM_MODEL=glm-4

# Multi-model config (JSON)
LLM_MODEL_CONFIGS='{
  "glm-4": {
    "api_key": ["your-key-1", "your-key-2"],
    "api_base": "https://open.bigmodel.cn/api/paas/v4",
    "temperature": 0.7,
    "provider": "openai"
  },
  "openai/deepseek-ai/DeepSeek-V3": {
    "api_key": ["sk-siliconflow-key-1", "sk-siliconflow-key-2"],
    "api_base": "https://api.siliconflow.cn/v1",
    "temperature": 1.0
  },
  "gpt-4o": {
    "api_key": "sk-your-openai-api-key",
    "api_base": "https://api.openai.com/v1",
    "temperature": 0.7
  }
}'

Supported providers:

  • OpenAI (GPT-4 / GPT-3.5)
  • Zhipu AI (GLM-4)
  • SiliconFlow (DeepSeek / Qwen / Kimi, etc.)
  • Any OpenAI-compatible API

๐Ÿ’ก Tip: call load_dotenv() before importing dslighting.


๐Ÿ†• Quick Experience

Option 1: Built-in dataset (zero setup)

from dotenv import load_dotenv
load_dotenv()

import dslighting

# No data prep required
result = dslighting.run_agent(task_id="bike-sharing-demand")
print(f"โœ… Done! Score: {result.score}")

Built-in dataset example:

  • bike-sharing-demand (bike demand forecasting)

Option 2: Open-ended API (recommended for beginners)

import dslighting

# Analyze
result = dslighting.analyze(
    data="./data/titanic",
    description="Analyze passenger distribution",
    model="gpt-4o"
)

# Process
result = dslighting.process(
    data="./data/titanic",
    description="Clean missing values and outliers",
    model="gpt-4o"
)

# Model
result = dslighting.model(
    data="./data/titanic",
    description="Train a survival prediction model",
    model="gpt-4o"
)

Option 3: Global config (recommended for multi-task)

from dotenv import load_dotenv
load_dotenv()

import dslighting

# Configure once, reuse everywhere
dslighting.setup(
    data_parent_dir="/path/to/data/competitions",
    registry_parent_dir="/path/to/registry"
)

agent = dslighting.Agent()
result = agent.run(task_id="bike-sharing-demand")

๐ŸŒฑ Beginner Usage

1. One-line demo (built-in dataset)

from dotenv import load_dotenv
load_dotenv()

import dslighting

result = dslighting.run_agent(task_id="bike-sharing-demand")
print(f"โœ… Done! Score: {result.score}")

2. Open-ended API trio (Analyze / Process / Model)

import dslighting

# Analyze
_ = dslighting.analyze(
    data="./data/titanic",
    description="Analyze passenger distribution",
    model="gpt-4o"
)

# Process
_ = dslighting.process(
    data="./data/titanic",
    description="Handle missing values and outliers",
    model="gpt-4o"
)

# Model
_ = dslighting.model(
    data="./data/titanic",
    description="Train a survival prediction model",
    model="gpt-4o"
)

3. Check results and workspace

print(result.workspace_path)
print(result.score)

๐Ÿš€ Advanced Usage

1. Global config + reusable execution

import dslighting

# Configure once, reuse
dslighting.setup(
    data_parent_dir="/path/to/data/competitions",
    registry_parent_dir="/path/to/registry"
)

agent = dslighting.Agent(
    workflow="aide",
    model="gpt-4o",
    max_iterations=5,
    keep_workspace=True
)

result = agent.run(task_id="bike-sharing-demand")

2. Custom task registry (competition-style)

result = agent.run(
    task_id="your-task-name",
    data_dir="/path/to/data/competitions",
    registry_dir="/path/to/registry"
)

3. Custom Agent (Operator / Workflow / Factory)

from dslighting.operators.custom import SimpleOperator

async def summarize(text: str) -> dict:
    return {"summary": text[:200]}

summarize_op = SimpleOperator(func=summarize, name="Summarize")

class MyWorkflow:
    def __init__(self, operators):
        self.ops = operators

    async def solve(self, description, io_instructions, data_dir, output_path):
        _ = await self.ops["summarize"](text=description)

class MyWorkflowFactory:
    def __init__(self, model="openai/gpt-4o"):
        self.model = model

    def create_agent(self):
        return MyWorkflow({"summarize": summarize_op})

agent = MyWorkflowFactory().create_agent()

๐Ÿ“ฆ Data Preparation

Method 1: MLE-Bench (recommended)

git clone https://github.com/openai/mle-bench.git
cd mle-bench
pip install -e .
python scripts/prepare.py --competition all

# Link data to DSLighting
ln -s ~/mle-bench/data/competitions /path/to/dslighting/data/competitions

Method 2: Custom dataset

data/competitions/
  <competition-id>/
    config.yaml
    prepared/
      public/
      private/

More details:


๐Ÿงญ Discovery API (Explore Components)

import dslighting

# List all prompts / operators
dslighting.explore()

List specific categories:

all_prompts = dslighting.list_prompts()
llm_ops = dslighting.list_operators(category="llm")

Get details:

from dslighting.prompts import get_prompt_info
from dslighting.operators import get_operator_info

print(get_prompt_info("create_improve_prompt"))
print(get_operator_info("PlanOperator"))

Full guide:


๐Ÿงฐ CLI Usage

After installation:

dslighting --help

Common subcommands:

  • dslighting help: help and quick guide
  • dslighting workflows: list all workflows
  • dslighting example <workflow>: show workflow examples
  • dslighting quickstart: detailed quick start
  • dslighting detect-packages: detect packages and write to config.yaml
  • dslighting show-packages: show detected packages
  • dslighting validate-config: validate configuration

๐Ÿ”ง Custom Tasks (Advanced)

your-project/
โ”œโ”€โ”€ data/competitions/
โ”‚   โ””โ”€โ”€ your-task-name/
โ”‚       โ””โ”€โ”€ prepared/
โ”‚           โ”œโ”€โ”€ public/
โ”‚           โ””โ”€โ”€ private/
โ””โ”€โ”€ registry/
    โ””โ”€โ”€ your-task-name/
        โ”œโ”€โ”€ config.yaml
        โ”œโ”€โ”€ description.md
        โ””โ”€โ”€ grade.py

Example config.yaml:

id: your-task-name
name: Your Task Display Name
competition_type: simple
awards_medals: false
description: your-task-name/description.md

dataset:
  answers: your-task-name/prepared/private/test_answer.csv
  sample_submission: your-task-name/prepared/public/sampleSubmission.csv

grader:
  name: rmsle  # or accuracy, f1, mae, etc.

Run a custom task:

result = agent.run(
    task_id="your-task-name",
    data_dir="/path/to/data/competitions",
    registry_dir="/path/to/registry"
)

๐Ÿ“ˆ Checking Results

print(f"Workspace: {result.workspace_path}")
print(f"Score: {result.score}")
print(f"Cost: {result.cost}")

๐Ÿงช Web UI (Optional)

The Web UI requires the frontend/backend source. If you installed via pip, clone the repo:

git clone https://github.com/usail-hkust/dslighting.git
cd dslighting

Backend:

pip install -r web_ui/backend/requirements.txt
cd web_ui/backend
python main.py

Frontend:

cd web_ui/frontend
npm install
npm run dev

Open: http://localhost:3000


๐ŸŽ‰ Latest Version: 2.7.9

Highlights:

  • Comprehensive PyPI README with detailed documentation
  • Enhanced installation guide with system requirements
  • Multi-provider API setup examples (OpenAI, GLM, DeepSeek)
  • Beginner and advanced usage examples
  • Custom Agent tutorial for expert users
  • Complete CLI and Web UI documentation

๐Ÿ“š Docs


๐Ÿค Contributing

Contributions are welcome!

  1. Fork the repo
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit changes (git commit -m 'Add some AmazingFeature')
  4. Push to branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under AGPL-3.0.


If this project helps you, please give it a โญ๏ธ

Made with โค๏ธ by USAIL Lab

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dslighting-2.7.9.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dslighting-2.7.9-py3-none-any.whl (2.4 MB view details)

Uploaded Python 3

File details

Details for the file dslighting-2.7.9.tar.gz.

File metadata

  • Download URL: dslighting-2.7.9.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dslighting-2.7.9.tar.gz
Algorithm Hash digest
SHA256 cd566b7061853b5d3102277bfa596627111856730fefb961ae9bd9cba39bf397
MD5 ee303e13abe053e0a6541fc66290f2db
BLAKE2b-256 dfba19c38960933d3751fa6b851242a3fbc68d5f56886a0c30a7ef639f499a46

See more details on using hashes here.

File details

Details for the file dslighting-2.7.9-py3-none-any.whl.

File metadata

  • Download URL: dslighting-2.7.9-py3-none-any.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for dslighting-2.7.9-py3-none-any.whl
Algorithm Hash digest
SHA256 c24da800423c5df80e4c8ab36ea5253684e4a9098bb6072f5d35fbc3af18ffc7
MD5 4b4504ec6ebfc04436b55baded6b9782
BLAKE2b-256 07090f4b7fa2d06cadc89818aabb9e9d908109a827a2d92491d17183d23e044d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page