DaiSy: A Library for Scalable Data Series Similarity Search
Project description
DaiSy: A Library for Scalable Data Series Similarity Search
DaiSy (DAta series sImilarity sSearch librarY) is a unified library for exact data series similarity search that integrates multiple state-of-the-art algorithms within a single, coherent framework, developed at LIPADE, Université Paris Cité. It supports a wide range approaches tailored for different execution environments, including disk-based, in-memory, GPU-accelerated, and distributed scalable similarity search. DaiSy is implemented in C++, while it also offers a convenient Python interface for ease of use and integration with data science workflows.
Important Note: The current version of DaiSy is experimental. The library is still under active development, with special focus on improving and resolving issues related to installation and building. We welcome early suggestions and recommendations.
When using DaiSy, please consider citing the following paper:
Coming Soon!
Supported State-of-the-Art algorithms
We currently support several algorithms for exact similarity search, each optimized for specific use cases and environments. The following table summarizes the key features of each algorithm:
| Algorithm | Description |
|---|---|
| Bruteforce | Naive parallel similarity search implementation |
| Lower Bound Bruteforce | Optimized bruteforce with lower bounding for the distance calculations |
| MESSI | In-memory parallel similarity search |
| PARIS | Disk-based parallel similarity search |
| SING | GPU-accelerated in-memory parallel similarity search |
| Odyssey | Distributed and parallel in-memory similarity search |
Quickstart
Dependencies
- Operating System: Linux, macOS, or Windows
- C++ Compiler: C++14 or higher (GCC 6+, Clang 3.4+, MSVC 2015+)
- CMake: Version 3.15 or higher
Optionally,
- Python: 3.10-3.12
- MPI: Required for Odyssey distributed computing algorithm
- CUDA: Required for SING GPU acceleration algorithm
Installation
To download DaiSy, use:
git clone https://github.com/MChatzakis/daisy.git
cd daisy
git submodule update --init --recursive
Based on the available hardware, you can specify the below arguments to enable/disable features.
| Flag | Description | Default | Dependencies |
|---|---|---|---|
BUILD_PYTHON |
Enable Python bindings | ON |
Python 3.10+ |
BUILD_BENCHMARK |
Build benchmarking tools | ON |
GoogleBenchmark |
BUILD_TESTS |
Build test suite | ON |
GoogleTest |
BUILD_DEMO |
Build demonstration applications | ON |
Core library |
ODYSSEY_MPI |
Enable MPI for distributed computing | ON |
OpenMPI/MPICH |
SING_CUDA |
Enable CUDA for GPU acceleration | ON |
CUDA Toolkit |
DEBUG_MSG |
Enable debug output | OFF |
None |
To compile:
mkdir build && cd build
cmake ..
make
DaiSy with Python
pip install daisy-exact-search
Example Usage
We provide several usage examples in both C++ and Python under demos/, demonstrating how to utilize the library for various similarity search tasks.
About
DaiSy is developed by the diNo research group at LIPADE, Université Paris Cité. It is provided with no warranty, and we encourage contributions from the community to enhance its capabilities and performance. For questions, issues, or contributions, please open an issue or submit a pull request on GitHub. DaiSy licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file daisy_exact_search-1.0.1.tar.gz.
File metadata
- Download URL: daisy_exact_search-1.0.1.tar.gz
- Upload date:
- Size: 980.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32a3108ee4fd3777670f97708f4c5fad850b3073ac5dff07493000e3121ec1a7
|
|
| MD5 |
6528260347ce9bb05c3e8709a9cfdff5
|
|
| BLAKE2b-256 |
a25ed83e9605185690f25f455cfbbaa3917a93ab9e7816fcc3466c8ec51d3f8e
|
File details
Details for the file daisy_exact_search-1.0.1-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: daisy_exact_search-1.0.1-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 4.4 MB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71589abd55ed13300a68adbc3821d73ef355898112f8fec3f19100d9a37df0c5
|
|
| MD5 |
3aaf0f00bb5f5a15b608bd070f40d811
|
|
| BLAKE2b-256 |
16ddd1f8a432153cd04702d6a426ab1f6043be46f0bca7f51eb72d2e73fef29a
|