Skip to main content

Data Lake for Multi-Modal AI Search

Project description


Deep Lake: Database for AI

PyPI version PyPI version

DocsGet StartedAPI ReferenceLangChain & VectorDBs CourseBlogWhitepaperSlackTwitter

What is Deep Lake?

Deep Lake is a Database for AI powered by a storage format optimized for deep-learning applications. Deep Lake can be used for:

  1. Storing and searching data plus vectors while building LLM applications
  2. Managing datasets while training deep learning models

Deep Lake simplifies the deployment of enterprise-grade LLM-based products by offering storage for all data types (embeddings, audio, text, videos, images, dicom, pdfs, annotations, and more), querying and vector search, data streaming while training models at scale, data versioning and lineage, and integrations with popular tools such as LangChain, LlamaIndex, Weights & Biases, and many more. Deep Lake works with data of any size, it is serverless, and it enables you to store all of your data in your own cloud and in one place. Deep Lake is used by Intel, Bayer Radiology, Matterport, ZERO Systems, Red Cross, Yale, & Oxford.

Deep Lake includes the following features:

Multi-Cloud Support (S3, GCP, Azure) Use one API to upload, download, and stream datasets to/from S3, Azure, GCP, Activeloop cloud, local storage, or in-memory storage. Compatible with any S3-compatible storage such as MinIO.
Native Compression with Lazy NumPy-like Indexing Store images, audio, and videos in their native compression. Slice, index, iterate, and interact with your data like a collection of NumPy arrays in your system's memory. Deep Lake lazily loads data only when needed, e.g., when training a model or running queries.
Dataloaders for Popular Deep Learning Frameworks Deep Lake comes with built-in dataloaders for Pytorch and TensorFlow. Train your model with a few lines of code - we even take care of dataset shuffling. :)
Integrations with Powerful Tools Deep Lake has integrations with Langchain and LLamaIndex as a vector store for LLM apps, Weights & Biases for data lineage during model training, MMDetection for training object detection models, and MMSegmentation for training semantic segmentation models.
100+ most-popular image, video, and audio datasets available in seconds Deep Lake community has uploaded 100+ image, video and audio datasets like MNIST, COCO, ImageNet, CIFAR, GTZAN and others.
Instant Visualization Support in the Deep Lake App Deep Lake datasets are instantly visualized with bounding boxes, masks, annotations, etc. in Deep Lake Visualizer (see below).

Visualizer

🚀 How to install Deep Lake

Deep Lake can be installed using pip:

pip install deeplake

To access all of Deep Lake's features, please register in the Deep Lake App.

🧠 Deep Lake Code Examples by Application

Vector Store Applications

Using Deep Lake as a Vector Store for building LLM applications:

- Vector Store Quickstart

- Vector Store Tutorials

- LangChain Integration

- LlamaIndex Integration

- Image Similarity Search with Deep Lake

Deep Learning Applications

Using Deep Lake for managing data while training Deep Learning models:

- Deep Learning Quickstart

- Tutorials for Training Models

⚙️ Integrations

Deep Lake offers integrations with other tools in order to streamline your deep learning workflows. Current integrations include:

📚 Documentation

Getting started guides, examples, tutorials, API reference, and other useful information can be found on our documentation page.

🎓 For Students and Educators

Deep Lake users can access and visualize a variety of popular datasets through a free integration with Deep Lake's App. Universities can get up to 1TB of data storage and 100,000 monthly queries on the Tensor Database for free per month. Chat in on our website: to claim the access!

👩‍💻 Comparisons to Familiar Tools

Deep Lake vs Chroma

Both Deep Lake & ChromaDB enable users to store and search vectors (embeddings) and offer integrations with LangChain and LlamaIndex. However, they are architecturally very different. ChromaDB is a Vector Database that can be deployed locally or on a server using Docker and will offer a hosted solution shortly. Deep Lake is a serverless Vector Store deployed on the user’s own cloud, locally, or in-memory. All computations run client-side, which enables users to support lightweight production apps in seconds. Unlike ChromaDB, Deep Lake’s data format can store raw data such as images, videos, and text, in addition to embeddings. ChromaDB is limited to light metadata on top of the embeddings and has no visualization. Deep Lake datasets can be visualized and version controlled. Deep Lake also has a performant dataloader for fine-tuning your Large Language Models.

Deep Lake vs Pinecone

Both Deep Lake and Pinecone enable users to store and search vectors (embeddings) and offer integrations with LangChain and LlamaIndex. However, they are architecturally very different. Pinecone is a fully-managed Vector Database that is optimized for highly demanding applications requiring a search for billions of vectors. Deep Lake is serverless. All computations run client-side, which enables users to get started in seconds. Unlike Pinecone, Deep Lake’s data format can store raw data such as images, videos, and text, in addition to embeddings. Deep Lake datasets can be visualized and version controlled. Pinecone is limited to light metadata on top of the embeddings and has no visualization. Deep Lake also has a performant dataloader for fine-tuning your Large Language Models.

Deep Lake vs Weaviate

Both Deep Lake and Weaviate enable users to store and search vectors (embeddings) and offer integrations with LangChain and LlamaIndex. However, they are architecturally very different. Weaviate is a Vector Database that can be deployed in a managed service or by the user via Kubernetes or Docker. Deep Lake is serverless. All computations run client-side, which enables users to support lightweight production apps in seconds. Unlike Weaviate, Deep Lake’s data format can store raw data such as images, videos, and text, in addition to embeddings. Deep Lake datasets can be visualized and version controlled. Weaviate is limited to light metadata on top of the embeddings and has no visualization. Deep Lake also has a performant dataloader for fine-tuning your Large Language Models.

Deep Lake vs DVC

Deep Lake and DVC offer dataset version control similar to git for data, but their methods for storing data differ significantly. Deep Lake converts and stores data as chunked compressed arrays, which enables rapid streaming to ML models, whereas DVC operates on top of data stored in less efficient traditional file structures. The Deep Lake format makes dataset versioning significantly easier compared to traditional file structures by DVC when datasets are composed of many files (i.e., many images). An additional distinction is that DVC primarily uses a command-line interface, whereas Deep Lake is a Python package. Lastly, Deep Lake offers an API to easily connect datasets to ML frameworks and other common ML tools and enables instant dataset visualization through Activeloop's visualization tool.

Deep Lake vs MosaicML MDS format
  • Data Storage Format: Deep Lake operates on a columnar storage format, whereas MDS utilizes a row-wise storage approach. This fundamentally impacts how data is read, written, and organized in each system.
  • Compression: Deep Lake offers a more flexible compression scheme, allowing control over both chunk-level and sample-level compression for each column or tensor. This feature eliminates the need for additional compressions like zstd, which would otherwise demand more CPU cycles for decompressing on top of formats like jpeg.
  • Shuffling: MDS currently offers more advanced shuffling strategies.
  • Version Control & Visualization Support: A notable feature of Deep Lake is its native version control and in-browser data visualization, a feature not present for MosaicML data format. This can provide significant advantages in managing, understanding, and tracking different versions of the data.
Deep Lake vs TensorFlow Datasets (TFDS)

Deep Lake and TFDS seamlessly connect popular datasets to ML frameworks. Deep Lake datasets are compatible with both PyTorch and TensorFlow, whereas TFDS are only compatible with TensorFlow. A key difference between Deep Lake and TFDS is that Deep Lake datasets are designed for streaming from the cloud, whereas TFDS must be downloaded locally prior to use. As a result, with Deep Lake, one can import datasets directly from TensorFlow Datasets and stream them either to PyTorch or TensorFlow. In addition to providing access to popular publicly available datasets, Deep Lake also offers powerful tools for creating custom datasets, storing them on a variety of cloud storage providers, and collaborating with others via simple API. TFDS is primarily focused on giving the public easy access to commonly available datasets, and management of custom datasets is not the primary focus. A full comparison article can be found here.

Deep Lake vs HuggingFace Deep Lake and HuggingFace offer access to popular datasets, but Deep Lake primarily focuses on computer vision, whereas HuggingFace focuses on natural language processing. HuggingFace Transforms and other computational tools for NLP are not analogous to features offered by Deep Lake.
Deep Lake vs WebDatasets Deep Lake and WebDatasets both offer rapid data streaming across networks. They have nearly identical steaming speeds because the underlying network requests and data structures are very similar. However, Deep Lake offers superior random access and shuffling, its simple API is in python instead of command-line, and Deep Lake enables simple indexing and modification of the dataset without having to recreate it.
Deep Lake vs Zarr Deep Lake and Zarr both offer storage of data as chunked arrays. However, Deep Lake is primarily designed for returning data as arrays using a simple API, rather than actually storing raw arrays (even though that's also possible). Deep Lake stores data in use-case-optimized formats, such as jpeg or png for images, or mp4 for video, and the user treats the data as if it's an array, because Deep Lake handles all the data processing in between. Deep Lake offers more flexibility for storing arrays with dynamic shape (ragged tensors), and it provides several features that are not naively available in Zarr such as version control, data streaming, and connecting data to ML Frameworks.

Community

Join our Slack community to learn more about unstructured dataset management using Deep Lake and to get help from the Activeloop team and other users.

We'd love your feedback by completing our 3-minute survey.

As always, thanks to our amazing contributors!

Made with contributors-img.

Please read CONTRIBUTING.md to get started with making contributions to Deep Lake.

README Badge

Using Deep Lake? Add a README badge to let everyone know:

deeplake

[![deeplake](https://img.shields.io/badge/powered%20by-Deep%20Lake%20-ff5a1f.svg)](https://github.com/activeloopai/deeplake)

Disclaimers

Dataset Licenses

Deep Lake users may have access to a variety of publicly available datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have a license to use the datasets. It is your responsibility to determine whether you have permission to use the datasets under their license.

If you're a dataset owner and do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thank you for your contribution to the ML community!

Citation

If you use Deep Lake in your research, please cite Activeloop using:

@article{deeplake,
  title = {Deep Lake: a Lakehouse for Deep Learning},
  author = {Hambardzumyan, Sasun and Tuli, Abhinav and Ghukasyan, Levon and Rahman, Fariz and Topchyan, Hrant and Isayan, David and Harutyunyan, Mikayel and Hakobyan, Tatevik and Stranic, Ivo and Buniatyan, Davit},
  url = {https://www.cidrdb.org/cidr2023/papers/p69-buniatyan.pdf},
  booktitle={Proceedings of CIDR},
  year = {2023},
}

Acknowledgment

This technology was inspired by our research work at Princeton University. We would like to thank William Silversmith @SeungLab for his awesome cloud-volume tool.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

deeplake-4.5.2-cp313-cp313-manylinux2014_x86_64.whl (39.8 MB view details)

Uploaded CPython 3.13

deeplake-4.5.2-cp313-cp313-manylinux2014_aarch64.whl (37.7 MB view details)

Uploaded CPython 3.13

deeplake-4.5.2-cp313-cp313-macosx_11_0_arm64.whl (33.4 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

deeplake-4.5.2-cp313-cp313-macosx_10_12_x86_64.whl (35.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

deeplake-4.5.2-cp312-cp312-manylinux2014_x86_64.whl (39.8 MB view details)

Uploaded CPython 3.12

deeplake-4.5.2-cp312-cp312-manylinux2014_aarch64.whl (37.7 MB view details)

Uploaded CPython 3.12

deeplake-4.5.2-cp312-cp312-macosx_11_0_arm64.whl (33.4 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

deeplake-4.5.2-cp312-cp312-macosx_10_12_x86_64.whl (35.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

deeplake-4.5.2-cp311-cp311-manylinux2014_x86_64.whl (39.8 MB view details)

Uploaded CPython 3.11

deeplake-4.5.2-cp311-cp311-manylinux2014_aarch64.whl (37.7 MB view details)

Uploaded CPython 3.11

deeplake-4.5.2-cp311-cp311-macosx_11_0_arm64.whl (33.4 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

deeplake-4.5.2-cp311-cp311-macosx_10_12_x86_64.whl (35.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

deeplake-4.5.2-cp310-cp310-manylinux2014_x86_64.whl (39.8 MB view details)

Uploaded CPython 3.10

deeplake-4.5.2-cp310-cp310-manylinux2014_aarch64.whl (37.7 MB view details)

Uploaded CPython 3.10

deeplake-4.5.2-cp310-cp310-macosx_11_0_arm64.whl (33.4 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

deeplake-4.5.2-cp310-cp310-macosx_10_12_x86_64.whl (35.0 MB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

File details

Details for the file deeplake-4.5.2-cp313-cp313-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deeplake-4.5.2-cp313-cp313-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 47354bd3400312ba68ab7caf3d6fe5270a32617cfaa707950b931ca324479ed1
MD5 475f7f3573370b2c312e509a003ca7a6
BLAKE2b-256 bb038da7ae79848501c80d101de2cc647be7ba1cb4075de481c6e297be41e030

See more details on using hashes here.

File details

Details for the file deeplake-4.5.2-cp313-cp313-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for deeplake-4.5.2-cp313-cp313-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6025439aeb28cb38e0aa6bf83ae6b69d02cd57d65a718ad652144fbba06c1c40
MD5 61499d49be662995c78eba5523184132
BLAKE2b-256 06c5bc074250a07d796d1fd274dc861b8b7be72f61dac2df0b6ce5a8060f8d25

See more details on using hashes here.

File details

Details for the file deeplake-4.5.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for deeplake-4.5.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5e5ee64bdc915c588bcb2c27f5d97a845d2a43a36c825e4822cb1a7b1cd03668
MD5 d4be1f6ce178bc00a0170fd70eff433b
BLAKE2b-256 ddbfcc159d08e19317698978e24924f7f76b503a59931dbcc15b72256e4ac378

See more details on using hashes here.

File details

Details for the file deeplake-4.5.2-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for deeplake-4.5.2-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 83b93857736df1a71c1c858c9333210b7e0ecad8e4faca4102f5919375e7304d
MD5 29c5a17044b2470147f9bfdaa0e02b23
BLAKE2b-256 4b532880e0a988f6f2e071285fc66c6c2c727f9b2c438339992b638402b2dc05

See more details on using hashes here.

File details

Details for the file deeplake-4.5.2-cp312-cp312-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deeplake-4.5.2-cp312-cp312-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 029247c4ce5778161afe1673982fe74b9343a57aecbe9efd0da445c3149b1ebb
MD5 8cd59b73ebf6c289bd4ddbb09e2a574e
BLAKE2b-256 364759f0f9b6648c8014921742c6751a5f89b70fe75fea07aa3168baa444816d

See more details on using hashes here.

File details

Details for the file deeplake-4.5.2-cp312-cp312-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for deeplake-4.5.2-cp312-cp312-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 1a057da4b9f2c9852ddb9aa8abbaa13a5f5543487d5c4ff6a8685a0d1ec9bcff
MD5 24a18ea9de854a2fe4b70a6969dbb24b
BLAKE2b-256 e3a1e6e0dfc9a8f4babdcaf6b76c2ff40125ab6c25c414261e9e1c1ec4325e23

See more details on using hashes here.

File details

Details for the file deeplake-4.5.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for deeplake-4.5.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cc3e74fa6eef075d12b9651bc4fbdc29ef03d26b2c1e4db57195f5fa1c339681
MD5 bb2a99bf3585926673bb54048e65ca49
BLAKE2b-256 6060777b2f5b7165750d175904019850bc1faf8391bf3024019f1770992f6d8d

See more details on using hashes here.

File details

Details for the file deeplake-4.5.2-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for deeplake-4.5.2-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3be47d864bbbbdb278e155c46ada19b1e2021fe1dac8d7ef1cf8bb5b1071cee9
MD5 f193753fb30169c747a6ac3d465f97c0
BLAKE2b-256 6be30347cfd7a72886d92c072d796f58bec8dbf9faf03de260e138157131b53f

See more details on using hashes here.

File details

Details for the file deeplake-4.5.2-cp311-cp311-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deeplake-4.5.2-cp311-cp311-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a8dffd546f3a2b9eb15a401ca969f73d3ce5823d4cd4c59cc656c37ba6ea6ede
MD5 04f32abbb6a3cfff73bb0d63f1e891e2
BLAKE2b-256 af528ed3789c1bc10329334f07d205ae16c6885adfc16fc8dddefd0e13f7d2b2

See more details on using hashes here.

File details

Details for the file deeplake-4.5.2-cp311-cp311-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for deeplake-4.5.2-cp311-cp311-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8d6d917ef937d8f9196bcfb67a0fc2e3a9cf705bfed4297d0b8e61d2b02ffdd8
MD5 580e56ce0aee49a23ccb07f413997b60
BLAKE2b-256 3d53097b111c2e7343a23ae905d7874f85c0baec4db7a3f510e52c95f34f5fd2

See more details on using hashes here.

File details

Details for the file deeplake-4.5.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for deeplake-4.5.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f2e549001bd681ae2ad3328baac8f19ab33a9621e3d71fb01018d29ecfc3e65e
MD5 73fe923872936d86df83c8cdc7a58185
BLAKE2b-256 f9e65fb4a0c4511ee193e544484632e0073ba181207bad02502e30a49b45efde

See more details on using hashes here.

File details

Details for the file deeplake-4.5.2-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for deeplake-4.5.2-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4e448185b0cabb843059ba540d9c1851bd00eef3d6ea1df1b593c3b92b0b7895
MD5 fd97cd369d2a0076ccb418992a3f3b7f
BLAKE2b-256 6246c7b830ccbc87287e81e18dabef563c1c65f81e0a45c4546a61fee4697917

See more details on using hashes here.

File details

Details for the file deeplake-4.5.2-cp310-cp310-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deeplake-4.5.2-cp310-cp310-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d91f9244d61b13c55ecefbf4ffad4904ddd2c9305d30383dddd8263cdc62b177
MD5 5fe71a3166e6dc50a26de33cbe62a7f6
BLAKE2b-256 0acbad40d9658dce08e8f8e48305a9ae179a37454ed184fba5bfe76169b57772

See more details on using hashes here.

File details

Details for the file deeplake-4.5.2-cp310-cp310-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for deeplake-4.5.2-cp310-cp310-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 84900c7ef62f34593f2a14b4e49d67c3c5bfd78805b4891420620f30b67b96e0
MD5 eda9c4f119178ac4778164a2faf574fc
BLAKE2b-256 e34274228d860ad6c74d5f21e1e7eb88d092dc777bf1edab918f16ac3879e248

See more details on using hashes here.

File details

Details for the file deeplake-4.5.2-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for deeplake-4.5.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2f48f8ed10436d7cf7a386d0324da01e001fe08984d39dbc3ebef0cf00056203
MD5 56c2614fa7dfb432991b78f4ff86e689
BLAKE2b-256 9abd5807c37f8fa9e0ded1bf8c5abf4ee660ee589d6d319d4deb15e5bdcac45c

See more details on using hashes here.

File details

Details for the file deeplake-4.5.2-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for deeplake-4.5.2-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 bbf75d3ccd5acc63ffcdbb80df4b534a6a8586e95ef6db4d90a99d8e417291fe
MD5 edd569b69ab891f960f473cf5970e952
BLAKE2b-256 667d8c150de5920358a661844fa1c26942b381525fdf5e5ac9ef3db74b3e1c89

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page