Skip to main content

DaiSy: A Library for Scalable Data Series Similarity Search

Project description

DaiSy Logo

DaiSy

A Library for Scalable Data Series Similarity Search

GitHub Stars

Francesca Del Gaudio, Manos Chatzakis, Gayathiri Ravendirane, Botao Peng, Themis Palpanas

Exact similarity search over large collections of data series is a fundamental operation in modern applications, yet existing solutions are often fragmented, specialized, or tailored to specific execution environments. We present DaiSy (Data series similarity Search library), a unified library for exact data series similarity search that integrates multiple state-of-the-art algorithms within a single, coherent framework. DaiSy is the first library to support exact similarity search across diverse execution environments, including implementations for disk-based, in-memory, GPU-accelerated, and distributed scalable similarity search. The library supports interfaces in both C++ and Python, enabling, researchers and practitioners to easily integrate its functionality in a variety of tasks.

ALPHA VERSION: Currently, DaiSy is experimental. The library is still under active development. We welcome suggestions and bug reports.

Supported State-of-the-Art algorithms

We currently support several algorithms for exact similarity search, each optimized for specific use cases and environments. The following table summarizes the key features of each algorithm:

Algorithm Description
Bruteforce Naive parallel similarity search implementation
Lower Bound Bruteforce Optimized bruteforce with lower bounding for the distance calculations
MESSI In-memory parallel similarity search
PARIS Disk-based parallel similarity search
SING GPU-accelerated in-memory parallel similarity search
Odyssey Distributed and parallel in-memory similarity search

Quickstart

Dependencies

  • Operating System: Linux, macOS, or Windows
  • C++ Compiler: C++14 or higher (GCC 6+, Clang 3.4+, MSVC 2015+)
  • CMake: Version 3.15 or higher

Optionally,

  • Python: 3.10-3.12
  • MPI: Required for Odyssey distributed computing algorithm
  • CUDA: Required for SING GPU acceleration algorithm

Installation

To download DaiSy, use:

git clone https://github.com/MChatzakis/daisy.git

cd daisy
git submodule update --init --recursive

Based on the available hardware, you can specify the below arguments to enable/disable features.

Flag Description Default Dependencies
BUILD_PYTHON Enable Python bindings OFF Python 3.10+
BUILD_BENCHMARK Build benchmarking tools OFF GoogleBenchmark
BUILD_TESTS Build test suite OFF GoogleTest
BUILD_DEMO Build demonstration applications ON Core library
BUILD_ODYSSEY Enable MPI for distributed computing OFF OpenMPI/MPICH
BUILD_SING Enable CUDA for GPU acceleration OFF CUDA Toolkit
DEBUG_MSG Enable debug output OFF None

To compile:

mkdir build && cd build

cmake ..
make

DaiSy with Python

If you intent to use only the Python interface, you can install the library directly from PyPI using pip:

pip install daisy-exact-search

If you want to use Odyssey, you will need to install mpi:

pip install daisy-exact-search[mpi]

Compatibility issues

Kindly note that we are aware for compatibility issues related to ARM processors (e.g., Apple MX processors). Due to pthread-barriers and SIMD being unavailable on ARM, we currently noticing compilations failling on ARM machines. We are currently working on possible solutions, however we recommend using DaiSy on non-ARM machines for the time being.

Others

We provide several usage examples in both C++ and Python under demos/, demonstrating how to utilize the library for various similarity search tasks. We provide several troubleshooting guides and extra resources in the docs/ directory. In this directory, we also provide useful information about how to contribute to the project, and how to implement new algorithms.

About

DaiSy is developed by researchers at the diNo research group, LIPADE, Université Paris Cité.

It is provided with no warranty, and we encourage contributions from the community to enhance its capabilities and performance. For questions, issues, or contributions, please open an issue or submit a pull request on GitHub. DaiSy licensed under the MIT License.

For questions and suggestions through mail, you can contact us at manos.chatzaki@gmail.com.

The logo of DaiSy was designed by Eva Chamilaki.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

daisy_exact_search-1.0.3.tar.gz (983.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

daisy_exact_search-1.0.3-cp312-cp312-manylinux_2_34_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

File details

Details for the file daisy_exact_search-1.0.3.tar.gz.

File metadata

  • Download URL: daisy_exact_search-1.0.3.tar.gz
  • Upload date:
  • Size: 983.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for daisy_exact_search-1.0.3.tar.gz
Algorithm Hash digest
SHA256 ecfc5753ee1c529b0d69ac02d5bc05505278f4e3ac7571d686676c514332b600
MD5 b3b29c9477718220f496cb6ca6502955
BLAKE2b-256 440a43068d8958f3d2aebc31a5e256340ce8319d09464bc6f0a7ce492620128d

See more details on using hashes here.

File details

Details for the file daisy_exact_search-1.0.3-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for daisy_exact_search-1.0.3-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 76f90909fc13ce7a6ad09bf250096209175a68759ddbb4338caa7ed0089485e4
MD5 ad8d9cfa0fd9d258155aafc8b7cf3a93
BLAKE2b-256 6b0592061ca7f04f9ce2053764fae85e739215b02352874bf35d794929e1b372

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page