Skip to main content

A Retrieval-augmented Generation (RAG) chat interface with support for multiple open-source models, designed to run natively on MacOS and Apple Silicon with MLX.

Project description

Native RAG on MacOS and Apple Silicon with MLX 🧑‍💻

This repository showcases a Retrieval-augmented Generation (RAG) chat interface with support for multiple open-source models.

chat_with_mlx

Features

  • Chat with your Data: doc(x), pdf, txt and YouTube video via URL.
  • Multilingual: Chinese 🇨🇳, English🏴, French🇫🇷, German🇩🇪, Indian🇮🇳, Italian🇮🇹, Japanese🇯🇵,Korean🇰🇷, Spanish🇪🇸, Turkish🇹🇷 and Vietnamese🇻🇳
  • Easy Integration: Easy integrate any HuggingFace and MLX Compatible Open-Source Model.

Installation and Usage

Easy Setup

  • Install Pip
  • Install: pip install chat-with-mlx
  • Note: Setting up this way is really hard if you want to add your own model (which I will let you add later in the UI), but it is a fast way to test the app.

Manual Pip Installation

git clone https://github.com/qnguyen3/chat-with-mlx.git
cd chat-with-mlx
pip install -e .

Manual Conda Installation

git clone https://github.com/qnguyen3/chat-with-mlx.git
cd chat-with-mlx
conda create -n mlx-chat python=3.11
conda activate mlx-chat
pip install -e .

Usage

  • Start the app: chat-with-mlx

Supported Models

  • Google Gemma-7b-it, Gemma-2b-it
  • Mistral-7B-Instruct, OpenHermes-2.5-Mistral-7B, NousHermes-2-Mistral-7B-DPO
  • Mixtral-8x7B-Instruct-v0.1, Nous-Hermes-2-Mixtral-8x7B-DPO
  • Quyen-SE (0.5B), Quyen (4B)
  • StableLM 2 Zephyr (1.6B)
  • Vistral-7B-Chat, VBD-Llama2-7b-chat, vinallama-7b-chat

Add Your Own Models

Solution 1

This solution only requires you to add your own model with a simple .yaml config file in chat_with_mlx/models/configs

examlple.yaml:

original_repo: google/gemma-2b-it # The original HuggingFace Repo, this helps with displaying
mlx-repo: mlx-community/quantized-gemma-2b-it # The MLX models Repo, most are available through `mlx-community`
quantize: 4bit # Optional: [4bit, 8bit]
default_language: multi # Optional: [en, es, zh, vi, multi]

After adding the .yaml config, you can go and load the model inside the app (for now you need to keep track the download through your Terminal/CLI)

Solution 2

Do the same as Solution 1. Sometimes, the download_snapshot method that is used to download the models are slow, and you would like to download it by your own.

After the adding the .yaml config, you can download the repo by yourself and add it to chat_with_mlx/models/download. The folder name MUST be the same as the orginal repo name without the username (so google/gemma-2b-it -> gemma-2b-it).

A complete model should have the following files:

  • model.safetensors
  • config.json
  • merges.txt
  • model.safetensors.index.json
  • special_tokens_map.json - this is optinal by model
  • tokenizer_config.json
  • tokenizer.json
  • vocab.json

Known Issues

  • You HAVE TO unload a model before loading in a new model. Otherwise, you would need to restart the app to use a new model, it would stuck at the old one.
  • When the model is downloading by Solution 1, the only way to stop it is to hit control + C on your Terminal.
  • If you want to switch the file, you have to manually hit STOP INDEXING. Otherwise, the vector database would add the second document to the current database.
  • You have to choose a dataset mode (Document or YouTube) in order for it to work.

WHY MLX?

MLX is an array framework for machine learning research on Apple silicon, brought to you by Apple machine learning research.

Some key features of MLX include:

  • Familiar APIs: MLX has a Python API that closely follows NumPy. MLX also has fully featured C++, C, and Swift APIs, which closely mirror the Python API. MLX has higher-level packages like mlx.nn and mlx.optimizers with APIs that closely follow PyTorch to simplify building more complex models.

  • Composable function transformations: MLX supports composable function transformations for automatic differentiation, automatic vectorization, and computation graph optimization.

  • Lazy computation: Computations in MLX are lazy. Arrays are only materialized when needed.

  • Dynamic graph construction: Computation graphs in MLX are constructed dynamically. Changing the shapes of function arguments does not trigger slow compilations, and debugging is simple and intuitive.

  • Multi-device: Operations can run on any of the supported devices (currently the CPU and the GPU).

  • Unified memory: A notable difference from MLX and other frameworks is the unified memory model. Arrays in MLX live in shared memory. Operations on MLX arrays can be performed on any of the supported device types without transferring data.

Acknowledgement

I would like to send my many thanks to:

  • The Apple Machine Learning Research team for the amazing MLX library.
  • LangChain and ChromaDB for such easy RAG Implementation
  • People from Nous, VinBigData and Qwen team that helped me during the implementation.

Star History

Star History Chart

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chat_with_mlx-0.1.5.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

chat_with_mlx-0.1.5-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file chat_with_mlx-0.1.5.tar.gz.

File metadata

  • Download URL: chat_with_mlx-0.1.5.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.8

File hashes

Hashes for chat_with_mlx-0.1.5.tar.gz
Algorithm Hash digest
SHA256 e4222ff0d0bd7fee68bd3af319ac9d8cac9632d6d5d5d30ca62e1fbd07460d63
MD5 6ab96ab421e74d8a10b2507169a9606b
BLAKE2b-256 ea6cfcbeb58fda73d6ed570259110608dbe37f494b90c6e0e8946024b90f7c61

See more details on using hashes here.

File details

Details for the file chat_with_mlx-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for chat_with_mlx-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 47ec08ffe4763efbcc9cb735ceaf58129375995590e90a25e5295a769f3fd31f
MD5 025ae886eb1354e31c7b711b1bd6ca79
BLAKE2b-256 dc618f4383ae4076c17256c2d148b2819bc41271525e450a5909c3c54d7d0ee9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page