Data processing pipeline using MLX (scraper, chunker, extractor).
Project description
llama-mlx-pipeline
Llama MLX Pipeline (llama-mlx-pipeline) provides tools and components for building efficient data processing pipelines, specifically optimized for Apple Silicon using the MLX framework. It focuses on tasks like data scraping, chunking, and feature extraction.
Key Features
- MLX Optimization: Designed to leverage Apple's MLX framework for high performance on M-series chips.
- Data Scraping: Includes components for fetching data from various sources (
scraper.py). - Data Chunking: Provides utilities for splitting data into manageable chunks (
chunker.py). - Feature Extraction: Contains tools for extracting relevant features or information (
extractor.py). - Pipeline Core: A central module (
core.py) likely orchestrates pipeline stages. - Configurable: Supports configuration via
config.py.
Installation
pip install llama-mlx-pipeline
# Or install directly from GitHub for the latest version:
# pip install git+https://github.com/llamasearchai/llama-mlx-pipeline.git
Usage
(Usage examples demonstrating pipeline creation and execution will be added here.)
# Placeholder for Python client usage
# from llama_mlx_pipeline import PipelineBuilder, MLXConfig
# config = MLXConfig.load("path/to/config.yaml")
# builder = PipelineBuilder(config)
# pipeline = builder.add_scraper(source="web", url="...") \
# .add_chunker(size=512) \
# .add_extractor(model="bert-base") \
# .build()
# results = pipeline.run(input_data="...")
# print(results)
Architecture Overview
graph TD
A[Input Data Source] --> B{Scraper (scraper.py)};
B --> C{Chunker (chunker.py)};
C --> D{Extractor (extractor.py)};
D --> E[Processed Output];
F[Pipeline Orchestrator (core.py)] -- Manages --> B;
F -- Manages --> C;
F -- Manages --> D;
G[Configuration (config.py)] -- Configures --> F;
G -- Configures --> B;
G -- Configures --> C;
G -- Configures --> D;
subgraph MLX Optimized Components
direction LR
B;
C;
D;
end
style F fill:#f9f,stroke:#333,stroke-width:2px
- Input: Data enters the pipeline.
- Scraper: Fetches or loads the initial data.
- Chunker: Splits data into smaller pieces.
- Extractor: Processes chunks to extract features or information.
- Output: The final processed data is produced.
- Orchestrator: The
core.pymodule likely manages the flow and execution of these stages, configured byconfig.py.
Configuration
(Details on configuring pipeline stages, MLX settings, etc., will be added here.)
Development
Setup
# Clone the repository
git clone https://github.com/llamasearchai/llama-mlx-pipeline.git
cd llama-mlx-pipeline
# Install in editable mode with development dependencies
pip install -e ".[dev]"
Testing
pytest tests/
Contributing
Contributions are welcome! Please refer to CONTRIBUTING.md and submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llama_mlx_pipeline-0.1.0.tar.gz.
File metadata
- Download URL: llama_mlx_pipeline-0.1.0.tar.gz
- Upload date:
- Size: 22.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7b537caebebdfa3e5b8b5001a0bbeaa4a1660ed47ee64791c5d09306856621d
|
|
| MD5 |
228d5108635632a38aefac5d91a81306
|
|
| BLAKE2b-256 |
a943b6e72105a94e0012e498f8b8b23e625986cacc69c2d205b779e0d04a6bea
|
File details
Details for the file llama_mlx_pipeline-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llama_mlx_pipeline-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c49ca33b6160762ac697e737dd782127c8c73415cbd951c7257ba6fa193b59e
|
|
| MD5 |
97078f2d37035bf4314c361a5f5a0589
|
|
| BLAKE2b-256 |
9a551a1bc9fdaf8c6741f37af3d3193398fb8396e887cc0eb5577851ece4019f
|