No project description provided

Project description

timeseries_analysis

Code for unsupervised clustering of time-correlated data. Reference to INSERT PAPER for further details.

Input data

A one-dimensional timeseries, computed on N particles for T frames. The input files must contain an array with shape (N, T) Supported formats: .npy, .npz, .txt.

Usage

The working directory must contain:

A text file called input_parameters.txt , whose format is explained below;
A text file called data_directory.txt containing one line with the path to the input data file (including the input data file name).

Examples of this two files are contained in this repository.

From this directory, the code is run with python3 ${PATH_TO_CODE}/main.py.

input_parameters.txt

The keyword and the value must be separated by one tab.

tau_window (int): the length of the time window (in number of frames).
t_smooth (int, optional): the length of the smoothing window (in number of frames) for the moving average. A value of t_smooth = 1 correspond to no smoothing. Default is 1.
t_delay (int, optional): is for ignoring the first tau_delay frames of the trajectory. Default is 0.
t_conv (int, optional): converts number of frames in time units. Default is 1.
time_units (str, optional): a string indicating the time units. Default is 'frames'.
example_ID (int, optional): plots the trajectory of the molecule with this ID, colored according to the identified states. Default is 0.
bins (int, optional): the number of bins used to compute histograms. This should be used only if all the fits fail with the automatic binning.
num_tau_w (int, optional): the number of different tau_window values tested. Default is 20.
min_tau_w (int, optional): the smaller tau_window value tested. It has to be larger that 1. Default is 2.
max_tau_w (int, optional): the larger tau_window value tested. It has to be larger that 2. Default is the largest possible window.
min_t_smooth (int, optional): the smaller t_smooth value tested. It has to be larger that 0. Default is 1.
max_t_smooth (int, optional): the larger t_smooth value tested. It has to be larger that 0. Default is 5.
step_t_smooth (int, optional): the step in the t_smooth values tested. It has to be larger that 0. Default is 1.

Output

The algorithm will attempt to perform the clustering on the input data, using different t_smooth (from 1 frame, i.e no smoothing, to 5 frames, unless differently specified in the input parameters) and different tau_window (logarithmically spaced between 2 frames and the entire trajectory length, unless differently specified in the input parameters).

number_of_states.txt contains the number of clusters for each combination of tau_window and t_smooth tested.
fraction_0.txt contains the fraction of unclassified data points for each combination of tau_window and t_smooth tested.
output_figures/Time_resolution_analysis.png plots the previous two data, for the case t_smooth = 1.
Figures with all the Gaussian fittings are saved in the folder output_figures with the format t_smooth_tau_window_Fig1_iteration.png.

Then, the analysis with the values of tau_window and t_smooth specified in input_parameters.txt will be performed.

The file states_output.txt contains information about the recursive fitting procedure, useful for debugging.
The file final_states.txt contains the list of the states, for which central value, width and relevance are listed.
The file final_tresholds.txt contains the list of the tresholds between states.
output_figures/Fig0.png plots the raw data.
output_figures/Fig1_iteration.png plot the histograms and best fits for each iteration.
output_figures/Fig2.png plots the data with the clustering thresholds and Gaussians.
output_figures/Fig3.png plots the colored signal for the particle with example_ID ID.
output_figures/Fig4.png shows the mean time sequence inside each state, and it's useful for checking the meaningfulness of the results.
The file all_cluster_IDs_xyz.dat allows to plot the trajectory using the clustering for the color coding. Altough, they are not super friendly to use.
If the trajectory from which the signal was computed is present in the working directory, and called trajectory.xyz, a new file, colored_trj.xyz will be printed, with the correct typing according to the clustering. But a bit of fine-tuning will be necessary inside the function print_colored_trj_from_xyz() in function.py.

Multivariate time-series version

The main_2d.py algorithm works in a similar fashion, taking as input 2D or 3D data. Each component of the signal has to be loaded with its own input data; just add one line with the path to the files to data_directory.txt. Signals are normalized between 0 and 1; changing this can change the performance of the algorithm, so you may want to try the clustering with different normalizations.

Required Python 3 packages

matplotlib, numpy, plotly, scipy.

Gaussian fitting procedure

The histogram of the timeseries is computed, using the bins='auto' numpy option (unless a different bins is passed as imput parameter).
The histogram is smoothed with moving average with window_size=3 (unless there are less that 50 bins, in wich case no smoothing occurs).
The absolute maximum of the histogram is found.
Two Gaussian fits are performed:

The first one inside the interval between the two minima surrounding the maximum.
The second one inside the interval where the peak around the maxima has its half height.

Both fits, if converged, are evaluated according to the following points:

mu is contained inside the fit interval;
sigma is smaller than the fit interval;
the height of the peak is at least half the value of the maximum;
the relative uncertanty over the fit parameters is smaller than 0.5.

Finally, the fit with the best score is chosen. If only one of the two converged, that one is chosen. If none of the fits converges, the iterative procedure stops, returning a warning message.

Aknowledgements

The comments in the code wouldn't have been possible without the help of ChatGPT.

Project details

Release history Release notifications | RSS feed

0.3.3

Sep 30, 2024

0.3.2

Sep 13, 2024

0.3.1

Jun 26, 2024

0.3.0

May 30, 2024

0.2.6

May 21, 2024

0.2.5

May 20, 2024

0.2.4

May 16, 2024

0.2.3

May 15, 2024

0.2.2

May 14, 2024

0.2.1

May 13, 2024

0.2.0

May 2, 2024

0.1.5

Apr 29, 2024

0.1.4

Apr 23, 2024

0.1.3

Mar 8, 2024

0.1.2

Feb 27, 2024

0.1.1

Feb 26, 2024

0.1.0

Feb 26, 2024

0.0.8

Feb 14, 2024

0.0.7

Feb 5, 2024

0.0.6

Feb 2, 2024

0.0.5

Jan 26, 2024

This version

0.0.4

Jan 24, 2024

0.0.3

Jan 23, 2024

0.0.2

Jan 23, 2024

0.0.1

Jan 19, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onion_clustering-0.0.4.tar.gz (96.1 kB view hashes)

Uploaded Jan 24, 2024 Source

Built Distribution

onion_clustering-0.0.4-py3-none-any.whl (56.5 kB view hashes)

Uploaded Jan 24, 2024 Python 3

Hashes for onion_clustering-0.0.4.tar.gz

Hashes for onion_clustering-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`ecf7cec292da117caed232b214661aff8b12808de5c732564bf4c64b7b6a0bfb`
MD5	`2d06d2b5a5c662a62a737d8cf7669a6d`
BLAKE2b-256	`54be5fc12bb99ca4c4527cee4c4af500cc88612e51d7dcf3ef350b482c7737a2`

Hashes for onion_clustering-0.0.4-py3-none-any.whl

Hashes for onion_clustering-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7b6d7a7360643697586903bb9cb92e17806055956ca860678299edef631a1aeb`
MD5	`34b3b7f2ee2d60f49e85b49d1368338c`
BLAKE2b-256	`88c2e1799ac97cd8f351918627abbf7315e4e5888968d7010943b36211d9a0f8`