Skip to main content

DaiSy: A Library for Scalable Data Series Similarity Search

Project description

DaiSy: A Library for Scalable Data Series Similarity Search

DaiSy (DAta series sImilarity sSearch librarY) is a unified library for exact data series similarity search that integrates multiple state-of-the-art algorithms within a single, coherent framework, developed at LIPADE, Université Paris Cité. It supports a wide range approaches tailored for different execution environments, including disk-based, in-memory, GPU-accelerated, and distributed scalable similarity search. DaiSy is implemented in C++, while it also offers a convenient Python interface for ease of use and integration with data science workflows.

Important Note: The current version of DaiSy is experimental. The library is still under active development, with special focus on improving and resolving issues related to installation and building. We welcome early suggestions and recommendations.

When using DaiSy, please consider citing the following paper:

Coming Soon!

Supported State-of-the-Art algorithms

We currently support several algorithms for exact similarity search, each optimized for specific use cases and environments. The following table summarizes the key features of each algorithm:

Algorithm Description
Bruteforce Naive parallel similarity search implementation
Lower Bound Bruteforce Optimized bruteforce with lower bounding for the distance calculations
MESSI In-memory parallel similarity search
PARIS Disk-based parallel similarity search
SING GPU-accelerated in-memory parallel similarity search
Odyssey Distributed and parallel in-memory similarity search

Quickstart

Dependencies

  • Operating System: Linux, macOS, or Windows
  • C++ Compiler: C++14 or higher (GCC 6+, Clang 3.4+, MSVC 2015+)
  • CMake: Version 3.15 or higher

Optionally,

  • Python: 3.10-3.12
  • MPI: Required for Odyssey distributed computing algorithm
  • CUDA: Required for SING GPU acceleration algorithm

Installation

To download DaiSy, use:

git clone https://github.com/MChatzakis/daisy.git

cd daisy
git submodule update --init --recursive

Based on the available hardware, you can specify the below arguments to enable/disable features.

Flag Description Default Dependencies
BUILD_PYTHON Enable Python bindings ON Python 3.10+
BUILD_BENCHMARK Build benchmarking tools ON GoogleBenchmark
BUILD_TESTS Build test suite ON GoogleTest
BUILD_DEMO Build demonstration applications ON Core library
ODYSSEY_MPI Enable MPI for distributed computing ON OpenMPI/MPICH
SING_CUDA Enable CUDA for GPU acceleration ON CUDA Toolkit
DEBUG_MSG Enable debug output OFF None

To compile:

mkdir build && cd build

cmake ..
make

DaiSy with Python

pip install daisy-exact-search

Example Usage

We provide several usage examples in both C++ and Python under demos/, demonstrating how to utilize the library for various similarity search tasks.

About

DaiSy is developed by the diNo research group at LIPADE, Université Paris Cité. It is provided with no warranty, and we encourage contributions from the community to enhance its capabilities and performance. For questions, issues, or contributions, please open an issue or submit a pull request on GitHub. DaiSy licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

daisy_exact_search-1.0.1.tar.gz (980.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

daisy_exact_search-1.0.1-cp312-cp312-manylinux_2_34_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

File details

Details for the file daisy_exact_search-1.0.1.tar.gz.

File metadata

  • Download URL: daisy_exact_search-1.0.1.tar.gz
  • Upload date:
  • Size: 980.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for daisy_exact_search-1.0.1.tar.gz
Algorithm Hash digest
SHA256 32a3108ee4fd3777670f97708f4c5fad850b3073ac5dff07493000e3121ec1a7
MD5 6528260347ce9bb05c3e8709a9cfdff5
BLAKE2b-256 a25ed83e9605185690f25f455cfbbaa3917a93ab9e7816fcc3466c8ec51d3f8e

See more details on using hashes here.

File details

Details for the file daisy_exact_search-1.0.1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for daisy_exact_search-1.0.1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 71589abd55ed13300a68adbc3821d73ef355898112f8fec3f19100d9a37df0c5
MD5 3aaf0f00bb5f5a15b608bd070f40d811
BLAKE2b-256 16ddd1f8a432153cd04702d6a426ab1f6043be46f0bca7f51eb72d2e73fef29a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page