Skip to main content

Data platform for LLMs - Load, index, retrieve and sync any unstructured data

Project description

embedchain

PyPI Slack Discord Twitter Substack Open in Colab codecov

Embedchain is a Data Platform for LLMs - load, index, retrieve, and sync any unstructured data. Using embedchain, you can easily create LLM powered apps over any data. If you want a javascript version, check out embedchain-js

Community

  • Join embedchain community on slack by accepting this invite

🤝 Schedule a 1-on-1 Session

Book a 1-on-1 Session with Taranjeet, the founder, to discuss any issues, provide feedback, or explore how we can improve Embedchain for you.

🔧 Quick install

pip install --upgrade embedchain

🔍 Demo

Try out embedchain in your browser:

Open in Colab

📖 Documentation

The documentation for embedchain can be found at docs.embedchain.ai.

💻 Usage

Embedchain empowers you to create ChatGPT like apps, on your own dynamic dataset.

Data Types Supported

  • Youtube video
  • PDF file
  • Web page
  • Sitemap
  • Doc file
  • JSON file
  • Code documentation website loader
  • Notion
  • Unstructured file loader and many more

You can find the full list of data types on our documentation.

Queries

For example, you can use Embedchain to create an Elon Musk bot using the following code:

import os
from embedchain import App

# Create a bot instance
os.environ["OPENAI_API_KEY"] = "YOUR API KEY"
elon_bot = App()

# Embed online resources
elon_bot.add("https://en.wikipedia.org/wiki/Elon_Musk")
elon_bot.add("https://www.forbes.com/profile/elon-musk")
elon_bot.add("https://www.youtube.com/watch?v=RcYjXbSJBN8")

# Query the bot
elon_bot.query("How many companies does Elon Musk run and name those?")
# Answer: Elon Musk currently runs several companies. As of my knowledge, he is the CEO and lead designer of SpaceX, the CEO and product architect of Tesla, Inc., the CEO and founder of Neuralink, and the CEO and founder of The Boring Company. However, please note that this information may change over time, so it's always good to verify the latest updates.

Examples

LLM Google Colab Replit
OpenAI Open In Colab Try with Replit Badge
Anthropic Open In Colab Try with Replit Badge
Azure OpenAI Open In Colab Try with Replit Badge
VertexAI Open In Colab Try with Replit Badge
Cohere Open In Colab Try with Replit Badge
Hugging Face Open In Colab Try with Replit Badge
JinaChat Open In Colab Try with Replit Badge
GPT4All Open In Colab Try with Replit Badge
Llama2 Open In Colab Try with Replit Badge
Embedding model Google Colab Replit
OpenAI Open In Colab Try with Replit Badge
VertexAI Open In Colab Try with Replit Badge
GPT4All Open In Colab Try with Replit Badge
Hugging Face Open In Colab Try with Replit Badge
Vector DB Google Colab Replit
ChromaDB Open In Colab Try with Replit Badge
Elasticsearch Open In Colab Try with Replit Badge
Opensearch Open In Colab Try with Replit Badge
Pinecone Open In Colab Try with Replit Badge

🤝 Contributing

Contributions are welcome! Please check out the issues on the repository, and feel free to open a pull request. For more information, please see the contributing guidelines.

For more reference, please go through Development Guide and Documentation Guide.

Telemetry

We collect anonymous usage metrics to enhance our package's quality and user experience. This includes data like feature usage frequency and system info, but never personal details. The data helps us prioritize improvements and ensure compatibility. If you wish to opt-out, set the app.config.collect_metrics = False in the code. We prioritize data security and don't share this data externally.

Citation

If you utilize this repository, please consider citing it with:

@misc{embedchain,
  author = {Taranjeet Singh, Deshraj Yadav},
  title = {Embedchain: Data platform for LLMs - load, index, retrieve, and sync any unstructured data},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/embedchain/embedchain}},
}

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedchain-0.0.75.tar.gz (63.7 kB view details)

Uploaded Source

Built Distribution

embedchain-0.0.75-py3-none-any.whl (103.8 kB view details)

Uploaded Python 3

File details

Details for the file embedchain-0.0.75.tar.gz.

File metadata

  • Download URL: embedchain-0.0.75.tar.gz
  • Upload date:
  • Size: 63.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.4 Darwin/22.3.0

File hashes

Hashes for embedchain-0.0.75.tar.gz
Algorithm Hash digest
SHA256 de0b546f200c5e954d06ecc90f5f4640d621af8d425428818c8bdc97fb155338
MD5 71aa40052953a28ce74e0c6a384381b7
BLAKE2b-256 7dd01af6587565eff50c61e1e9835a3428811838be5246aeeecd2f661528bba0

See more details on using hashes here.

File details

Details for the file embedchain-0.0.75-py3-none-any.whl.

File metadata

  • Download URL: embedchain-0.0.75-py3-none-any.whl
  • Upload date:
  • Size: 103.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.4 Darwin/22.3.0

File hashes

Hashes for embedchain-0.0.75-py3-none-any.whl
Algorithm Hash digest
SHA256 43cf5564947cd5a97e2438ed61e7e1877cd50cf82aad71f58c9afb12c7ae0b91
MD5 fb973eeff0c19795323d0e5ce2ed423b
BLAKE2b-256 37001c6497ae4369f9c9adbbd0fa2349c1156755063c43373127816ebfd2c212

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page