DataDM is your private data assistant. Slide into your data's DMs
Project description
dataDM 😏💬📊
DataDM is your private data assistant. A conversational interface for your data where you can load, clean, transform, and visualize without a single line of code. DataDM is open source and can be run entirely locally, keeping your juicy data secrets fully private. Slide into your data's DMs tonight.
Demo
https://github.com/approximatelabs/datadm/assets/916073/f15e6ab5-8108-40ea-a6de-c69a1389af84
Note: Demo above is GPT-4
, which sends the conversation to OpenAI's API. To use in full local mode, be sure to select starchat-alpha-cuda
or starchat-beta-cuda
as the model. This will use the StarChat model, which is a bit less capable but runs entirely locally.
⚠️ LLMs are known to hallucinate and generate fake results. So, double-check before trusting their results blindly!
Features
- Persistent Juptyer kernel backend for data manipulation during conversation
- Run entirely locally, keeping your data private
- Natural language chat, visualizations/plots, and direct download of data assets
- Easy to use docker-images for one-line deployment
- Load multiple tables directly into the chat
- Option to use OpenAI's GPT-3.5 or GPT-4 (requires API key)
- WIP: GGML based mode (CPU only, no GPU required)
- WIP: Rollback kernel state when undo using
criu
- TODO: Support for more data sources (e.g. SQL, S3, PySpark etc.)
- TODO: Export a conversation as a notebook or html
Things you can ask DataDM
- Load data from a URL
- Clean data by removing duplicates, nulls, outliers, etc.
- Join data from multiple tables into a single output table
- Visualize data with plots and charts
- Ask whatever you want to your very own private code-interpreter
Quickstart
You can use docker, colab, or install locally.
1. Docker to run locally
docker run -e OPENAI_API_KEY={{YOUR_API_KEY_HERE}} -p 7860:7860 -it ghcr.io/approximatelabs/datadm:latest
For local-mode using StarChat model (requiring a CUDA device with at least 24GB of RAM)
docker run --gpus all -p 7860:7860 -it ghcr.io/approximatelabs/datadm:0.2.1-cuda
2. Colab to run in the cloud
3. Use as a python package
⚠️ datadm used this way runs LLM generated code in your userspace
For local-data, cloud-model mode (no GPU required) - requires an OpenAI API key
$ pip install datadm
$ datadm
For local-mode using StarChat model (requiring a CUDA device with at least 24GB of RAM)
$ pip install "datadm[cuda]"
$ datadm
Special Thanks
- starchat-beta (starcoder with databricks-dolly and OpenAssistant/oasst1)
- Guidance
- HuggingFace
- OpenAI
Contributions
Contributions are welcome! Feel free to submit a PR or open an issue.
Community
Join the Discord to chat with the team
Check out our other projects: sketch and approximatelabs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.