Skip to main content

Data processing pipeline using MLX (scraper, chunker, extractor).

Project description

llama-mlx-pipeline

PyPI version License Python Version CI Status

Llama MLX Pipeline (llama-mlx-pipeline) provides tools and components for building efficient data processing pipelines, specifically optimized for Apple Silicon using the MLX framework. It focuses on tasks like data scraping, chunking, and feature extraction.

Key Features

  • MLX Optimization: Designed to leverage Apple's MLX framework for high performance on M-series chips.
  • Data Scraping: Includes components for fetching data from various sources (scraper.py).
  • Data Chunking: Provides utilities for splitting data into manageable chunks (chunker.py).
  • Feature Extraction: Contains tools for extracting relevant features or information (extractor.py).
  • Pipeline Core: A central module (core.py) likely orchestrates pipeline stages.
  • Configurable: Supports configuration via config.py.

Installation

pip install llama-mlx-pipeline
# Or install directly from GitHub for the latest version:
# pip install git+https://github.com/llamasearchai/llama-mlx-pipeline.git

Usage

(Usage examples demonstrating pipeline creation and execution will be added here.)

# Placeholder for Python client usage
# from llama_mlx_pipeline import PipelineBuilder, MLXConfig

# config = MLXConfig.load("path/to/config.yaml")
# builder = PipelineBuilder(config)

# pipeline = builder.add_scraper(source="web", url="...") \
#                   .add_chunker(size=512) \
#                   .add_extractor(model="bert-base") \
#                   .build()

# results = pipeline.run(input_data="...")
# print(results)

Architecture Overview

graph TD
    A[Input Data Source] --> B{Scraper (scraper.py)};
    B --> C{Chunker (chunker.py)};
    C --> D{Extractor (extractor.py)};
    D --> E[Processed Output];

    F[Pipeline Orchestrator (core.py)] -- Manages --> B;
    F -- Manages --> C;
    F -- Manages --> D;

    G[Configuration (config.py)] -- Configures --> F;
    G -- Configures --> B;
    G -- Configures --> C;
    G -- Configures --> D;

    subgraph MLX Optimized Components
        direction LR
        B;
        C;
        D;
    end

    style F fill:#f9f,stroke:#333,stroke-width:2px
  1. Input: Data enters the pipeline.
  2. Scraper: Fetches or loads the initial data.
  3. Chunker: Splits data into smaller pieces.
  4. Extractor: Processes chunks to extract features or information.
  5. Output: The final processed data is produced.
  6. Orchestrator: The core.py module likely manages the flow and execution of these stages, configured by config.py.

Configuration

(Details on configuring pipeline stages, MLX settings, etc., will be added here.)

Development

Setup

# Clone the repository
git clone https://github.com/llamasearchai/llama-mlx-pipeline.git
cd llama-mlx-pipeline

# Install in editable mode with development dependencies
pip install -e ".[dev]"

Testing

pytest tests/

Contributing

Contributions are welcome! Please refer to CONTRIBUTING.md and submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_mlx_pipeline-0.1.0.tar.gz (22.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_mlx_pipeline-0.1.0-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file llama_mlx_pipeline-0.1.0.tar.gz.

File metadata

  • Download URL: llama_mlx_pipeline-0.1.0.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for llama_mlx_pipeline-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e7b537caebebdfa3e5b8b5001a0bbeaa4a1660ed47ee64791c5d09306856621d
MD5 228d5108635632a38aefac5d91a81306
BLAKE2b-256 a943b6e72105a94e0012e498f8b8b23e625986cacc69c2d205b779e0d04a6bea

See more details on using hashes here.

File details

Details for the file llama_mlx_pipeline-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_mlx_pipeline-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1c49ca33b6160762ac697e737dd782127c8c73415cbd951c7257ba6fa193b59e
MD5 97078f2d37035bf4314c361a5f5a0589
BLAKE2b-256 9a551a1bc9fdaf8c6741f37af3d3193398fb8396e887cc0eb5577851ece4019f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page