A fast, scalable and light-weight C++ Fréchet distance library, exposed to python and focused on (k,l)-clustering of polygonal curves.

Project description

Fred

A fast, scalable and light-weight C++ Fréchet distance library, exposed to python and focused on (k,l)-clustering of polygonal curves.

NOW USING PYBIND11 INSTEAD OF BOOST!

NOW AVAILABLE VIA PIP

Ingredients C++ Backend

import Fred.backend as fred

Number of Threads

By default, Fred will automatically determine the number of threads to use. If you want to set an upper limit, call fred.set_maximum_number_threads(number).

Curve

signature: fred.Curve(np.ndarray), fred.Curve(np.ndarray, str name)
properties: fred.Curve.values: curves as np.ndarray, fred.Curve.name: get name of curve, fred.Curve.dimensions: dimension of curve, fred.Curve.complexity: number of points of curve

Curves

signature: fred.Curves()
methods: fred.Curves.add(curve): add curve, fred.Curves[i]: get ith curve, len(fred.Curves): number curves, fred.Curves.simplify(l): return set of simplified curves
properties: fred.Curves.m: maximum complexity of the contained curves, fred.Curves.values: curves as np.ndarray

continous Fréchet distance

signature: fred.continuous_frechet(curve1, curve2)
returns: fred.Continuous_Frechet_Result with members value, time_bounds: running-time for upper and lower bound, number_searches: number of free space diagrams built, time_searches: running-time for free spaces

continuous Frechet distance config

approximation error in percent of distance: fred.set_continuous_frechet_error(double percent) with parameter percent, which defaults to 1
rounding: fred.set_continuous_frechet_rounding(round) with parameter round, which defaults to true

discrete Fréchet distance

signature: fred.discrete_frechet(curve1, curve2)
returns: fred.Discrete_Frechet_Result with members value and time

discrete dynamic time warping distance

signature: fred.discrete_dynamic_time_warping(curve1, curve2)
returns: fred.Discrete_Dynamic_Time_Warping_Distance with members value and time

Curve Simplification

All simplifications are vertex-restricted!

minimum error simplification

graph approach from Polygonal Approximations of a Curve — Formulations and Algorithms
signature: fred.minimum_error_simplification(fred.Curve, int complexity)
returns: fred.Curvethat uses input curves vertices, with complexity number of vertices and that has minimum distance to input curve

approximate minimum link simplification

algorithm "FS" from Near-Linear Time Approximation Algorithms for Curve Simplification
signature: fred.approximate_minimum_link_simplification(fred.Curve, double error)
returns: fred.Curve that uses input curves vertices, is of small complexity and with distance to input curve at most error

approximate minimum error simplification

binary search on fred.approximate_minimum_link_simplification
signature: fred.approximate_minimum_error_simplification(fred.Curve, int complexity)
returns: fred.Curvethat uses input curves vertices, with complexity number of vertices and that has small distance to input curve

Clustering

Distance_Matrix

A fred.Distance_Matrix() can be used to speed up consecutive calls of fred.discrete_klcenter and fred.discrete_klmedian. As the name suggests, it stores the distances already computed.

discrete (k,l)-center clustering (continuous Fréchet)

from Approximating (k,l)-center clustering for curves
signature: fred.discrete_klcenter(k, l, curves, distances, random_first_center, fast_simplification) with parameters
- k: number of centers
- l: maximum complexity of the centers
- distances: fred.Distance_Matrix, defaults to empty fred.Distance_Matrix
- random_first_center: determines if first center is chosen uniformly at random or first curve is used as first center, optional, defaults to true
- fast_simplification: determines whether to use the minimum error simplification or the faster approximate minimum error simplification, defaults to false
returns: fred.Clustering_Result with mebers
- value: objective value
- time: running-time
- assignment: empty if compute_assignment has not been called

discrete (k,l)-median clustering (continuous Fréchet)

Algorithm from section 4.3 in Geometric Approximation Algorithms + simplification
signature: fred.discrete_klmedian(k, l, curves, distances, fast_simplification) with parameters
- k: number of centers
- l: maximum complexity of the centers
- distances: fred.Distance_Matrix, defaults to empty fred.Distance_Matrix
- fast_simplification: determines whether to use the minimum error simplification or the faster approximate minimum error simplification, defaults to false
returns: fred.Clustering_Result with mebers
- value: objective value
- time: running-time
- assignment: empty if compute_assignment has not been called

Clustering Result

signature: fred.Clustering_Result
methods: len(fred.Clustering_Result): number of centers, fred.Clustering_Result[i]: get ith center, fred.Clustering_Result.compute_assignment(fred.Curves): assigns every curve to its nearest center
members: value: objective value, time: running-time, assignment: empty if compute_assignment was not called

Cluster Assignment

signature: fred.Cluster_Assignment
methods: len(fred.Cluster_Assignment): number of centers, fred.Cluster_Assignment.count(i): number of curves assigned to center i, fred.Cluster_Assignment.get(i,j): get index of jth curve assigned to center i

Dimension Reduction via Gaussian Random Projection

Section 2 in Random Projections and Sampling Algorithms for Clustering of High Dimensional Polygonal Curves
signature: fred.dimension_reduction(curves, epsilon, empirical_constant) with parameters epsilon: (1+epsilon) approximation parameter, empirical_constant: use constant of empirical study (faster, but less accurate), defaults to True
returns: fred.Curves collection of curves

Installation

Requirements

You have to have installed:

cmake
git
openmp available (should be a part of your compiler)

Thats it!

Installation Procedure

Variant 1: simply run pip install Fred-Frechet
Variant 2: clone repository and run make for installation into userdir

Test

Just run python py/test.py.

Mini Example

import Fred.backend as fred
import Fred
import numpy as np
import pandas as pd

curve1d = fred.Curve(np.array([1., 2.])) # Curve stores a polygonal curve with 
                                         # at least two points of at least one 
                                         # and equal number of dimensions

curve2d1 = fred.Curve(np.array([[1., 0.], [2., 1.], [3., 0.]])) # any number of dimensions and points works
curve2d2 = fred.Curve(np.array([[1., -1.], [2., -2.], [3., -1.]]), "optional name, e.g. displayed in plot") 

print(curve2d1)

Fred.plot_curve(curve2d1, curve2d2)
Fred.plot_curve(curve2d2, fred.minimum_error_simplification(curve2d2, 2))

print("distance is {}".format(fred.continuous_frechet(curve2d1, curve2d2).value))

print("download HUGE curves") 

import requests, zipfile, io             # you can use all libraries 
                                         # that work with numpy to read data into fred
                                         
re = requests.get("https://archive.ics.uci.edu/ml/machine-learning-databases/00447/data.zip", stream=True)
zf = zipfile.ZipFile(io.BytesIO(re.content))

ps1 = fred.Curve(pd.read_csv(zf.open('PS1.txt'), delimiter="\t", header=None).values[:50], "PS1")
ps2 = fred.Curve(pd.read_csv(zf.open('PS2.txt'), delimiter="\t", header=None).values[:50], "PS2")
ps3 = fred.Curve(pd.read_csv(zf.open('PS3.txt'), delimiter="\t", header=None).values[:50], "PS3")
ps4 = fred.Curve(pd.read_csv(zf.open('PS4.txt'), delimiter="\t", header=None).values[:50], "PS4")
ps5 = fred.Curve(pd.read_csv(zf.open('PS5.txt'), delimiter="\t", header=None).values[:50], "PS5")
ps6 = fred.Curve(pd.read_csv(zf.open('PS6.txt'), delimiter="\t", header=None).values[:50], "PS6")

curves = fred.Curves() # for clustering or if you want to apply dimension reduction
                       # you need to encapsulate your curves in a Curves object
              
curves.add(ps1)
curves.add(ps2)
curves.add(ps3)
curves.add(ps4)
curves.add(ps5)
curves.add(ps6)

Fred.plot_curve(curves)

curves = fred.dimension_reduction(curves, 0.95) # fred is pretty fast but with high dimensional data
                                                # a dimension reduction massively improves running-time
                                                # even for smaller values of epsilon
                                                
Fred.plot_curve(curves)
                                  
# Oneshot clustering - if you already know the value of k
                                  
clustering = fred.discrete_klcenter(2, 10, curves) # fast but coarse
          
clustering = fred.discrete_klmedian(2, 10, curves) # slow but better results

print("clustering cost is {}".format(clustering.value))

for i, center in enumerate(clustering):
    print("center {} is {}".format(i, center))
    
    
Fred.plot_curve(clustering)

# Multiple clustering calls - if you need to find a suitable value for k

dm = fred.Distance_Matrix() # computing the Fréchet distance is costly,
                            # therefore we buffer each distance already computed to
                            # speed up consecutive clustering calls
                            
for k in range(2, 6):
    
    clustering = fred.discrete_klcenter(k, 10, curves, dm)
    print("clustering cost is {}".format(clustering.value))
            
    clustering = fred.discrete_klmedian(k, 10, curves, dm)
    print("clustering cost is {}".format(clustering.value))
    
clustering.compute_assignment(curves)

for i in range(0, len(clustering)):
    for j in range(0, clustering.assignment.count(i)):
        print("{} was assigned to center {}".format(curves[clustering.assignment.get(i,j)].name, clustering[i].name))

Project details

Release history Release notifications | RSS feed

1.14.5

Aug 29, 2023

1.14.4

Aug 22, 2023

1.14.3

Aug 22, 2023

1.14.2

Aug 15, 2023

1.14.1

Aug 15, 2023

1.14

Aug 14, 2023

1.13.1

Aug 7, 2023

1.13

Aug 3, 2023

1.12.4

Jul 17, 2023

1.12.3

Jul 15, 2023

1.12.2

Jul 12, 2023

1.12.1

Jul 12, 2023

1.12

Jul 10, 2023

1.11.2

May 9, 2023

1.11.1

May 9, 2023

1.11

May 8, 2023

1.10.10

Jan 30, 2023

1.10.9

Jan 26, 2023

1.10.8

Jan 26, 2023

1.10.7

Jan 26, 2023

1.10.6

Jan 26, 2023

1.10.5

Jan 26, 2023

1.10.4

Jan 26, 2023

1.10.3

Jan 25, 2023

1.10.2

Jan 25, 2023

1.10.1

Jan 23, 2023

1.9.25

Dec 9, 2022

1.9.24

Dec 1, 2022

1.9.23

Dec 1, 2022

1.9.22

Nov 30, 2022

1.9.20

Jun 9, 2022

1.9.19

Jun 9, 2022

1.9.18

Jun 9, 2022

1.9.17

Jun 9, 2022

1.9.16

Jun 9, 2022

1.9.15

Jun 9, 2022

1.9.14

Jun 9, 2022

1.9.13

May 16, 2022

1.9.12

May 16, 2022

1.9.11

May 7, 2022

1.9.10

Apr 29, 2022

1.9.9

Nov 21, 2021

1.9.8

Sep 15, 2021

1.9.7

Sep 10, 2021

1.9.6

Sep 7, 2021

1.9.5

Sep 3, 2021

1.9.4

Sep 3, 2021

1.9.3

Sep 3, 2021

1.9.2

Sep 2, 2021

1.9.1

Sep 2, 2021

1.8.4

Aug 30, 2021

This version

1.8.1

Aug 29, 2021

1.7.7

Aug 27, 2021

1.7.5

Aug 24, 2021

1.7.4

Aug 23, 2021

1.7.3

Aug 20, 2021

1.7.2

Aug 16, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Fred-Frechet-1.8.1.tar.gz (91.3 kB view hashes)

Uploaded Aug 29, 2021 Source

Hashes for Fred-Frechet-1.8.1.tar.gz

Hashes for Fred-Frechet-1.8.1.tar.gz
Algorithm	Hash digest
SHA256	`54c679b212454659e9bfdf9751854196e00d37b081e5a76f0569a76a0fbc1846`
MD5	`8a43407f28643d8da23967cacb3f7dd1`
BLAKE2b-256	`5a95d9c520b978100f6b8d7aefae90ec3352e7a9c7121768b1356e42fe7e276a`

Fred-Frechet 1.8.1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Project description

Fred

NOW USING PYBIND11 INSTEAD OF BOOST!

NOW AVAILABLE VIA PIP

Ingredients C++ Backend

Number of Threads

Curve

Curves

continous Fréchet distance

continuous Frechet distance config

discrete Fréchet distance

discrete dynamic time warping distance

Curve Simplification

minimum error simplification

approximate minimum link simplification

approximate minimum error simplification

Clustering

Distance_Matrix

discrete (k,l)-center clustering (continuous Fréchet)

discrete (k,l)-median clustering (continuous Fréchet)

Clustering Result

Cluster Assignment

Dimension Reduction via Gaussian Random Projection

Installation

Requirements

Installation Procedure

Test

Mini Example

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution