DSLighting 2.7.9 - Enhanced PyPI README with comprehensive documentation: installation guide, multi-provider setup, usage examples, and custom Agent tutorial
Project description
DSLighting
End-to-End Data Science Agent
๐ Full Docs | ๐ Quick Start | ๐ป GitHub | ๐ Issues
โจ Highlights
- ๐ค Intelligent Agent Workflows: aide / automind / dsagent / data_interpreter / autokaggle / aflow, etc.
- ๐ Discovery API: explore all available prompts and operators
- ๐ Data Management: unified data loading, task registry, and grading
- ๐ง Multi-model Support: OpenAI, GLM, DeepSeek, Qwen, and more
- ๐งฉ Extensible Architecture: custom tasks, workflows, and operators
- ๐ฆ Smart Package Context: auto-detect installed packages to avoid incompatible code
- ๐ฏ Built-in Datasets: run demos without data preparation
- ๐ Full Traceability: logs, workspace, and artifacts saved automatically
๐ Quick Start
1. Install
pip install dslighting python-dotenv
System requirements: Python 3.10+. Using a virtual environment is recommended.
๐ macOS note (xgboost)
If you use xgboost, install the OpenMP runtime:
brew install libomp
Otherwise you may see XGBoostError: Library not loaded: libomp.dylib.
2. Configure environment variables
Create a .env file:
# .env
# Default model (required)
LLM_MODEL=glm-4
# Multi-model config (JSON)
LLM_MODEL_CONFIGS='{
"glm-4": {
"api_key": ["your-key-1", "your-key-2"],
"api_base": "https://open.bigmodel.cn/api/paas/v4",
"temperature": 0.7,
"provider": "openai"
},
"openai/deepseek-ai/DeepSeek-V3": {
"api_key": ["sk-siliconflow-key-1", "sk-siliconflow-key-2"],
"api_base": "https://api.siliconflow.cn/v1",
"temperature": 1.0
},
"gpt-4o": {
"api_key": "sk-your-openai-api-key",
"api_base": "https://api.openai.com/v1",
"temperature": 0.7
}
}'
Supported providers:
- OpenAI (GPT-4 / GPT-3.5)
- Zhipu AI (GLM-4)
- SiliconFlow (DeepSeek / Qwen / Kimi, etc.)
- Any OpenAI-compatible API
๐ก Tip: call
load_dotenv()before importingdslighting.
๐ Quick Experience
Option 1: Built-in dataset (zero setup)
from dotenv import load_dotenv
load_dotenv()
import dslighting
# No data prep required
result = dslighting.run_agent(task_id="bike-sharing-demand")
print(f"โ
Done! Score: {result.score}")
Built-in dataset example:
bike-sharing-demand(bike demand forecasting)
Option 2: Open-ended API (recommended for beginners)
import dslighting
# Analyze
result = dslighting.analyze(
data="./data/titanic",
description="Analyze passenger distribution",
model="gpt-4o"
)
# Process
result = dslighting.process(
data="./data/titanic",
description="Clean missing values and outliers",
model="gpt-4o"
)
# Model
result = dslighting.model(
data="./data/titanic",
description="Train a survival prediction model",
model="gpt-4o"
)
Option 3: Global config (recommended for multi-task)
from dotenv import load_dotenv
load_dotenv()
import dslighting
# Configure once, reuse everywhere
dslighting.setup(
data_parent_dir="/path/to/data/competitions",
registry_parent_dir="/path/to/registry"
)
agent = dslighting.Agent()
result = agent.run(task_id="bike-sharing-demand")
๐ฑ Beginner Usage
1. One-line demo (built-in dataset)
from dotenv import load_dotenv
load_dotenv()
import dslighting
result = dslighting.run_agent(task_id="bike-sharing-demand")
print(f"โ
Done! Score: {result.score}")
2. Open-ended API trio (Analyze / Process / Model)
import dslighting
# Analyze
_ = dslighting.analyze(
data="./data/titanic",
description="Analyze passenger distribution",
model="gpt-4o"
)
# Process
_ = dslighting.process(
data="./data/titanic",
description="Handle missing values and outliers",
model="gpt-4o"
)
# Model
_ = dslighting.model(
data="./data/titanic",
description="Train a survival prediction model",
model="gpt-4o"
)
3. Check results and workspace
print(result.workspace_path)
print(result.score)
๐ Advanced Usage
1. Global config + reusable execution
import dslighting
# Configure once, reuse
dslighting.setup(
data_parent_dir="/path/to/data/competitions",
registry_parent_dir="/path/to/registry"
)
agent = dslighting.Agent(
workflow="aide",
model="gpt-4o",
max_iterations=5,
keep_workspace=True
)
result = agent.run(task_id="bike-sharing-demand")
2. Custom task registry (competition-style)
result = agent.run(
task_id="your-task-name",
data_dir="/path/to/data/competitions",
registry_dir="/path/to/registry"
)
3. Custom Agent (Operator / Workflow / Factory)
from dslighting.operators.custom import SimpleOperator
async def summarize(text: str) -> dict:
return {"summary": text[:200]}
summarize_op = SimpleOperator(func=summarize, name="Summarize")
class MyWorkflow:
def __init__(self, operators):
self.ops = operators
async def solve(self, description, io_instructions, data_dir, output_path):
_ = await self.ops["summarize"](text=description)
class MyWorkflowFactory:
def __init__(self, model="openai/gpt-4o"):
self.model = model
def create_agent(self):
return MyWorkflow({"summarize": summarize_op})
agent = MyWorkflowFactory().create_agent()
๐ฆ Data Preparation
Method 1: MLE-Bench (recommended)
git clone https://github.com/openai/mle-bench.git
cd mle-bench
pip install -e .
python scripts/prepare.py --competition all
# Link data to DSLighting
ln -s ~/mle-bench/data/competitions /path/to/dslighting/data/competitions
Method 2: Custom dataset
data/competitions/
<competition-id>/
config.yaml
prepared/
public/
private/
More details:
๐งญ Discovery API (Explore Components)
import dslighting
# List all prompts / operators
dslighting.explore()
List specific categories:
all_prompts = dslighting.list_prompts()
llm_ops = dslighting.list_operators(category="llm")
Get details:
from dslighting.prompts import get_prompt_info
from dslighting.operators import get_operator_info
print(get_prompt_info("create_improve_prompt"))
print(get_operator_info("PlanOperator"))
Full guide:
๐งฐ CLI Usage
After installation:
dslighting --help
Common subcommands:
dslighting help: help and quick guidedslighting workflows: list all workflowsdslighting example <workflow>: show workflow examplesdslighting quickstart: detailed quick startdslighting detect-packages: detect packages and write to config.yamldslighting show-packages: show detected packagesdslighting validate-config: validate configuration
๐ง Custom Tasks (Advanced)
your-project/
โโโ data/competitions/
โ โโโ your-task-name/
โ โโโ prepared/
โ โโโ public/
โ โโโ private/
โโโ registry/
โโโ your-task-name/
โโโ config.yaml
โโโ description.md
โโโ grade.py
Example config.yaml:
id: your-task-name
name: Your Task Display Name
competition_type: simple
awards_medals: false
description: your-task-name/description.md
dataset:
answers: your-task-name/prepared/private/test_answer.csv
sample_submission: your-task-name/prepared/public/sampleSubmission.csv
grader:
name: rmsle # or accuracy, f1, mae, etc.
Run a custom task:
result = agent.run(
task_id="your-task-name",
data_dir="/path/to/data/competitions",
registry_dir="/path/to/registry"
)
๐ Checking Results
print(f"Workspace: {result.workspace_path}")
print(f"Score: {result.score}")
print(f"Cost: {result.cost}")
๐งช Web UI (Optional)
The Web UI requires the frontend/backend source. If you installed via pip, clone the repo:
git clone https://github.com/usail-hkust/dslighting.git
cd dslighting
Backend:
pip install -r web_ui/backend/requirements.txt
cd web_ui/backend
python main.py
Frontend:
cd web_ui/frontend
npm install
npm run dev
Open: http://localhost:3000
๐ Latest Version: 2.7.9
Highlights:
- Comprehensive PyPI README with detailed documentation
- Enhanced installation guide with system requirements
- Multi-provider API setup examples (OpenAI, GLM, DeepSeek)
- Beginner and advanced usage examples
- Custom Agent tutorial for expert users
- Complete CLI and Web UI documentation
๐ Docs
- Quick Start: https://luckyfan-cs.github.io/dslighting-web/api/getting-started.html
- Data System: https://luckyfan-cs.github.io/dslighting-web/api/data-system.html
- GitHub: https://github.com/usail-hkust/dslighting
- PyPI: https://pypi.org/project/dslighting/
๐ค Contributing
Contributions are welcome!
- Fork the repo
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit changes (
git commit -m 'Add some AmazingFeature') - Push to branch (
git push origin feature/AmazingFeature) - Open a Pull Request
๐ License
This project is licensed under AGPL-3.0.
If this project helps you, please give it a โญ๏ธ
Made with โค๏ธ by USAIL Lab
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dslighting-2.7.9.tar.gz.
File metadata
- Download URL: dslighting-2.7.9.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd566b7061853b5d3102277bfa596627111856730fefb961ae9bd9cba39bf397
|
|
| MD5 |
ee303e13abe053e0a6541fc66290f2db
|
|
| BLAKE2b-256 |
dfba19c38960933d3751fa6b851242a3fbc68d5f56886a0c30a7ef639f499a46
|
File details
Details for the file dslighting-2.7.9-py3-none-any.whl.
File metadata
- Download URL: dslighting-2.7.9-py3-none-any.whl
- Upload date:
- Size: 2.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c24da800423c5df80e4c8ab36ea5253684e4a9098bb6072f5d35fbc3af18ffc7
|
|
| MD5 |
4b4504ec6ebfc04436b55baded6b9782
|
|
| BLAKE2b-256 |
07090f4b7fa2d06cadc89818aabb9e9d908109a827a2d92491d17183d23e044d
|