Skip to main content

Data Lake for Multi-Modal AI Search

Project description


Deep Lake: Database for AI

PyPI version PyPI version

DocsGet StartedAPI ReferenceLangChain & VectorDBs CourseBlogWhitepaperSlackTwitter

What is Deep Lake?

Deep Lake is a Database for AI powered by a storage format optimized for deep-learning applications. Deep Lake can be used for:

  1. Storing and searching data plus vectors while building LLM applications
  2. Managing datasets while training deep learning models

Deep Lake simplifies the deployment of enterprise-grade LLM-based products by offering storage for all data types (embeddings, audio, text, videos, images, dicom, pdfs, annotations, and more), querying and vector search, data streaming while training models at scale, data versioning and lineage, and integrations with popular tools such as LangChain, LlamaIndex, Weights & Biases, and many more. Deep Lake works with data of any size, it is serverless, and it enables you to store all of your data in your own cloud and in one place. Deep Lake is used by Intel, Bayer Radiology, Matterport, ZERO Systems, Red Cross, Yale, & Oxford.

Deep Lake includes the following features:

Multi-Cloud Support (S3, GCP, Azure) Use one API to upload, download, and stream datasets to/from S3, Azure, GCP, Activeloop cloud, local storage, or in-memory storage. Compatible with any S3-compatible storage such as MinIO.
Native Compression with Lazy NumPy-like Indexing Store images, audio, and videos in their native compression. Slice, index, iterate, and interact with your data like a collection of NumPy arrays in your system's memory. Deep Lake lazily loads data only when needed, e.g., when training a model or running queries.
Dataloaders for Popular Deep Learning Frameworks Deep Lake comes with built-in dataloaders for Pytorch and TensorFlow. Train your model with a few lines of code - we even take care of dataset shuffling. :)
Integrations with Powerful Tools Deep Lake has integrations with Langchain and LLamaIndex as a vector store for LLM apps, Weights & Biases for data lineage during model training, MMDetection for training object detection models, and MMSegmentation for training semantic segmentation models.
100+ most-popular image, video, and audio datasets available in seconds Deep Lake community has uploaded 100+ image, video and audio datasets like MNIST, COCO, ImageNet, CIFAR, GTZAN and others.
Instant Visualization Support in the Deep Lake App Deep Lake datasets are instantly visualized with bounding boxes, masks, annotations, etc. in Deep Lake Visualizer (see below).

Visualizer

🚀 How to install Deep Lake

Deep Lake can be installed using pip:

pip install deeplake

To access all of Deep Lake's features, please register in the Deep Lake App.

🧠 Deep Lake Code Examples by Application

Vector Store Applications

Using Deep Lake as a Vector Store for building LLM applications:

- Vector Store Quickstart

- Vector Store Tutorials

- LangChain Integration

- LlamaIndex Integration

- Image Similarity Search with Deep Lake

Deep Learning Applications

Using Deep Lake for managing data while training Deep Learning models:

- Deep Learning Quickstart

- Tutorials for Training Models

⚙️ Integrations

Deep Lake offers integrations with other tools in order to streamline your deep learning workflows. Current integrations include:

📚 Documentation

Getting started guides, examples, tutorials, API reference, and other useful information can be found on our documentation page.

🎓 For Students and Educators

Deep Lake users can access and visualize a variety of popular datasets through a free integration with Deep Lake's App. Universities can get up to 1TB of data storage and 100,000 monthly queries on the Tensor Database for free per month. Chat in on our website: to claim the access!

👩‍💻 Comparisons to Familiar Tools

Deep Lake vs Chroma

Both Deep Lake & ChromaDB enable users to store and search vectors (embeddings) and offer integrations with LangChain and LlamaIndex. However, they are architecturally very different. ChromaDB is a Vector Database that can be deployed locally or on a server using Docker and will offer a hosted solution shortly. Deep Lake is a serverless Vector Store deployed on the user’s own cloud, locally, or in-memory. All computations run client-side, which enables users to support lightweight production apps in seconds. Unlike ChromaDB, Deep Lake’s data format can store raw data such as images, videos, and text, in addition to embeddings. ChromaDB is limited to light metadata on top of the embeddings and has no visualization. Deep Lake datasets can be visualized and version controlled. Deep Lake also has a performant dataloader for fine-tuning your Large Language Models.

Deep Lake vs Pinecone

Both Deep Lake and Pinecone enable users to store and search vectors (embeddings) and offer integrations with LangChain and LlamaIndex. However, they are architecturally very different. Pinecone is a fully-managed Vector Database that is optimized for highly demanding applications requiring a search for billions of vectors. Deep Lake is serverless. All computations run client-side, which enables users to get started in seconds. Unlike Pinecone, Deep Lake’s data format can store raw data such as images, videos, and text, in addition to embeddings. Deep Lake datasets can be visualized and version controlled. Pinecone is limited to light metadata on top of the embeddings and has no visualization. Deep Lake also has a performant dataloader for fine-tuning your Large Language Models.

Deep Lake vs Weaviate

Both Deep Lake and Weaviate enable users to store and search vectors (embeddings) and offer integrations with LangChain and LlamaIndex. However, they are architecturally very different. Weaviate is a Vector Database that can be deployed in a managed service or by the user via Kubernetes or Docker. Deep Lake is serverless. All computations run client-side, which enables users to support lightweight production apps in seconds. Unlike Weaviate, Deep Lake’s data format can store raw data such as images, videos, and text, in addition to embeddings. Deep Lake datasets can be visualized and version controlled. Weaviate is limited to light metadata on top of the embeddings and has no visualization. Deep Lake also has a performant dataloader for fine-tuning your Large Language Models.

Deep Lake vs DVC

Deep Lake and DVC offer dataset version control similar to git for data, but their methods for storing data differ significantly. Deep Lake converts and stores data as chunked compressed arrays, which enables rapid streaming to ML models, whereas DVC operates on top of data stored in less efficient traditional file structures. The Deep Lake format makes dataset versioning significantly easier compared to traditional file structures by DVC when datasets are composed of many files (i.e., many images). An additional distinction is that DVC primarily uses a command-line interface, whereas Deep Lake is a Python package. Lastly, Deep Lake offers an API to easily connect datasets to ML frameworks and other common ML tools and enables instant dataset visualization through Activeloop's visualization tool.

Deep Lake vs MosaicML MDS format
  • Data Storage Format: Deep Lake operates on a columnar storage format, whereas MDS utilizes a row-wise storage approach. This fundamentally impacts how data is read, written, and organized in each system.
  • Compression: Deep Lake offers a more flexible compression scheme, allowing control over both chunk-level and sample-level compression for each column or tensor. This feature eliminates the need for additional compressions like zstd, which would otherwise demand more CPU cycles for decompressing on top of formats like jpeg.
  • Shuffling: MDS currently offers more advanced shuffling strategies.
  • Version Control & Visualization Support: A notable feature of Deep Lake is its native version control and in-browser data visualization, a feature not present for MosaicML data format. This can provide significant advantages in managing, understanding, and tracking different versions of the data.
Deep Lake vs TensorFlow Datasets (TFDS)

Deep Lake and TFDS seamlessly connect popular datasets to ML frameworks. Deep Lake datasets are compatible with both PyTorch and TensorFlow, whereas TFDS are only compatible with TensorFlow. A key difference between Deep Lake and TFDS is that Deep Lake datasets are designed for streaming from the cloud, whereas TFDS must be downloaded locally prior to use. As a result, with Deep Lake, one can import datasets directly from TensorFlow Datasets and stream them either to PyTorch or TensorFlow. In addition to providing access to popular publicly available datasets, Deep Lake also offers powerful tools for creating custom datasets, storing them on a variety of cloud storage providers, and collaborating with others via simple API. TFDS is primarily focused on giving the public easy access to commonly available datasets, and management of custom datasets is not the primary focus. A full comparison article can be found here.

Deep Lake vs HuggingFace Deep Lake and HuggingFace offer access to popular datasets, but Deep Lake primarily focuses on computer vision, whereas HuggingFace focuses on natural language processing. HuggingFace Transforms and other computational tools for NLP are not analogous to features offered by Deep Lake.
Deep Lake vs WebDatasets Deep Lake and WebDatasets both offer rapid data streaming across networks. They have nearly identical steaming speeds because the underlying network requests and data structures are very similar. However, Deep Lake offers superior random access and shuffling, its simple API is in python instead of command-line, and Deep Lake enables simple indexing and modification of the dataset without having to recreate it.
Deep Lake vs Zarr Deep Lake and Zarr both offer storage of data as chunked arrays. However, Deep Lake is primarily designed for returning data as arrays using a simple API, rather than actually storing raw arrays (even though that's also possible). Deep Lake stores data in use-case-optimized formats, such as jpeg or png for images, or mp4 for video, and the user treats the data as if it's an array, because Deep Lake handles all the data processing in between. Deep Lake offers more flexibility for storing arrays with dynamic shape (ragged tensors), and it provides several features that are not naively available in Zarr such as version control, data streaming, and connecting data to ML Frameworks.

Community

Join our Slack community to learn more about unstructured dataset management using Deep Lake and to get help from the Activeloop team and other users.

We'd love your feedback by completing our 3-minute survey.

As always, thanks to our amazing contributors!

Made with contributors-img.

Please read CONTRIBUTING.md to get started with making contributions to Deep Lake.

README Badge

Using Deep Lake? Add a README badge to let everyone know:

deeplake

[![deeplake](https://img.shields.io/badge/powered%20by-Deep%20Lake%20-ff5a1f.svg)](https://github.com/activeloopai/deeplake)

Disclaimers

Dataset Licenses

Deep Lake users may have access to a variety of publicly available datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have a license to use the datasets. It is your responsibility to determine whether you have permission to use the datasets under their license.

If you're a dataset owner and do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thank you for your contribution to the ML community!

Citation

If you use Deep Lake in your research, please cite Activeloop using:

@article{deeplake,
  title = {Deep Lake: a Lakehouse for Deep Learning},
  author = {Hambardzumyan, Sasun and Tuli, Abhinav and Ghukasyan, Levon and Rahman, Fariz and Topchyan, Hrant and Isayan, David and Harutyunyan, Mikayel and Hakobyan, Tatevik and Stranic, Ivo and Buniatyan, Davit},
  url = {https://www.cidrdb.org/cidr2023/papers/p69-buniatyan.pdf},
  booktitle={Proceedings of CIDR},
  year = {2023},
}

Acknowledgment

This technology was inspired by our research work at Princeton University. We would like to thank William Silversmith @SeungLab for his awesome cloud-volume tool.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

deeplake-4.6.3-cp313-cp313-manylinux2014_x86_64.whl (45.4 MB view details)

Uploaded CPython 3.13

deeplake-4.6.3-cp313-cp313-manylinux2014_aarch64.whl (41.7 MB view details)

Uploaded CPython 3.13

deeplake-4.6.3-cp313-cp313-macosx_11_0_arm64.whl (33.3 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

deeplake-4.6.3-cp312-cp312-manylinux2014_x86_64.whl (45.4 MB view details)

Uploaded CPython 3.12

deeplake-4.6.3-cp312-cp312-manylinux2014_aarch64.whl (41.7 MB view details)

Uploaded CPython 3.12

deeplake-4.6.3-cp312-cp312-macosx_11_0_arm64.whl (33.3 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

deeplake-4.6.3-cp311-cp311-manylinux2014_x86_64.whl (45.4 MB view details)

Uploaded CPython 3.11

deeplake-4.6.3-cp311-cp311-manylinux2014_aarch64.whl (41.7 MB view details)

Uploaded CPython 3.11

deeplake-4.6.3-cp311-cp311-macosx_11_0_arm64.whl (33.3 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

deeplake-4.6.3-cp310-cp310-manylinux2014_x86_64.whl (45.4 MB view details)

Uploaded CPython 3.10

deeplake-4.6.3-cp310-cp310-manylinux2014_aarch64.whl (41.7 MB view details)

Uploaded CPython 3.10

deeplake-4.6.3-cp310-cp310-macosx_11_0_arm64.whl (33.3 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

deeplake-4.6.3-cp39-cp39-manylinux2014_x86_64.whl (45.4 MB view details)

Uploaded CPython 3.9

deeplake-4.6.3-cp39-cp39-manylinux2014_aarch64.whl (41.7 MB view details)

Uploaded CPython 3.9

deeplake-4.6.3-cp39-cp39-macosx_11_0_arm64.whl (33.3 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file deeplake-4.6.3-cp313-cp313-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deeplake-4.6.3-cp313-cp313-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ed8d4263d3856fe65d702076a67e7cbd5bc2f0048ae08a6f006e55c9b540cde3
MD5 1730b7fa263365c932bc12fda47e77ae
BLAKE2b-256 a8e4bf1c55df6e5e816514dca5fb600ba3cb92d911bd52d623cc17c2e50d8ddd

See more details on using hashes here.

File details

Details for the file deeplake-4.6.3-cp313-cp313-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for deeplake-4.6.3-cp313-cp313-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d2758e35bff22d81503365cf69e063cba89c2f0a5b9cc456647a91653e38ccab
MD5 79c422b079d19885c44140522adf0aa7
BLAKE2b-256 0b4bb34f47f16cf4635769a2b26d5e030c79d59114248cbda2a256a5be5082e5

See more details on using hashes here.

File details

Details for the file deeplake-4.6.3-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for deeplake-4.6.3-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ca80bb55358bf4b90a3f974e394e03d451c50d652bf8002d5cd9a644f42d308c
MD5 bd4c8fa7153375044d805b0ad3fc6801
BLAKE2b-256 44f2971553c6feb81dcb65662b7f868accd07ffc33dc51bad9d5111a06037cea

See more details on using hashes here.

File details

Details for the file deeplake-4.6.3-cp312-cp312-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deeplake-4.6.3-cp312-cp312-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c3bc0834ddb86c70a5dbd75765e22fe13d4c4ef3f5bae1ea9a2c3965a67efdf0
MD5 50d41259965b6eda8a1645d2d653952d
BLAKE2b-256 3d927637a870adc5a95df3866fca2338e89ea5b4e7333d0d5c96a8028c66bd78

See more details on using hashes here.

File details

Details for the file deeplake-4.6.3-cp312-cp312-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for deeplake-4.6.3-cp312-cp312-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 0e9b27c6c85b56b21188ae41cb4b8730cb2195511e3552e58e1ddb345c414fff
MD5 8215a26e9c909a0c72aa6f77c94eb7d6
BLAKE2b-256 d4ef26ebac4cf8aa5afb9bc0d38f7fa08a26cc859487a23782350f14b534d2ae

See more details on using hashes here.

File details

Details for the file deeplake-4.6.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for deeplake-4.6.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 746d5066d6d0f825331b628be99bba4510591c8b4715f6fe02db125756c73f33
MD5 256723b9de3b2651dec5333406417e3c
BLAKE2b-256 cb48a143763176dc53c82e1f9ac0628017e7287c071248f8f916eab7687fdb0b

See more details on using hashes here.

File details

Details for the file deeplake-4.6.3-cp311-cp311-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deeplake-4.6.3-cp311-cp311-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 46b0ab54008aa95d1c51dc87d3961f1c25b8c73cb550492539b15f12cc8762b7
MD5 f23b66582b35c99a550fd31ea76e97fd
BLAKE2b-256 7a81b36a29c6e893de4bc10cf81dc33e3f1a63d44c38a54412273dfcd9dd8c85

See more details on using hashes here.

File details

Details for the file deeplake-4.6.3-cp311-cp311-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for deeplake-4.6.3-cp311-cp311-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d121ffc68f28e06addfd53ef6568ac1b2c58faa1ed55d0ee2cb01b4aa7d99021
MD5 c4e15b003ea23cf40003c8d9725d81af
BLAKE2b-256 4906ade7794138c384136839b4f979d5a44f4a874b4fb4125f997582c3149f61

See more details on using hashes here.

File details

Details for the file deeplake-4.6.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for deeplake-4.6.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e88c8570dcee386c0d0e94ccb09234e204ce827d4af6b8649e0ed4e9cbcf85e4
MD5 2fc22eb82b807a24d04436749729470f
BLAKE2b-256 4c178817ae67d3cc088b8f64508bf21796140d179dd8d5db0ffc1b5575deb979

See more details on using hashes here.

File details

Details for the file deeplake-4.6.3-cp310-cp310-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deeplake-4.6.3-cp310-cp310-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b4a1c4d6a69dceb36935e54c65efcd1ea66f48ea91248bccde7b733659745c36
MD5 e3911e3d83eeaf231e785fcbd13765da
BLAKE2b-256 3c92fbfe8dadcb9bd00ada966e64badbf2e126eb218029f0948f1f0b326c6b71

See more details on using hashes here.

File details

Details for the file deeplake-4.6.3-cp310-cp310-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for deeplake-4.6.3-cp310-cp310-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7874d6e767babac7bc20a36ffb8070fec2f77ffe843bc58ea4e4ec3c0d8d9211
MD5 a376cb5172c83695e24a7dd43e4ff7e5
BLAKE2b-256 52d638a18518fdc6eddf1f9d76ceecc0d18db8e5371e5d18a9411c3e993e6ac7

See more details on using hashes here.

File details

Details for the file deeplake-4.6.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for deeplake-4.6.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ef07be2ec17e42e335dfabc076aa07c8b9516a5257d98f6cff8d60995393184a
MD5 1eff7a11317c2532fec36a30dfd31983
BLAKE2b-256 669be2965206180d398cc347bff6672e6f04351635226a1b26a5ef42c9c56698

See more details on using hashes here.

File details

Details for the file deeplake-4.6.3-cp39-cp39-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deeplake-4.6.3-cp39-cp39-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1cbfa111393e5173c8ce397ca4c56d0bd0be507e73305e79dad21c7a2e9fa1cd
MD5 b53809c51fb9abd9c90e50054d667458
BLAKE2b-256 b0dfbc633a468fe110d3f120ea9e7ddacee309e06a251f529c36cdee56b2517f

See more details on using hashes here.

File details

Details for the file deeplake-4.6.3-cp39-cp39-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for deeplake-4.6.3-cp39-cp39-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c4d843c0073d34de40a043f3d8ef0501f8e95ebfefd8952fc8c2610283197438
MD5 b2127e290a0a58f9f04c1e7a97f35de2
BLAKE2b-256 d40cdf4f05198246521af68b7a0bdb0abd3fe826b4e6625fe34f5138e1aa67f8

See more details on using hashes here.

File details

Details for the file deeplake-4.6.3-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for deeplake-4.6.3-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5c50220c523d9f38b4907a86302fbe933b72f419b718735e329b6596a037ecfd
MD5 928f9fe3c73a717371557abf84a3a09e
BLAKE2b-256 4f8548eda6beafbbd7558ae2b0b0459af616a8a672d4d496b4e1d73b23d365b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page