Packages related to Consistent Weighted Sampling Algorithms
Project description
#cwslib
Introduction
This module contains the following algorithms: the standard MinHash algorithm for binary sets andseveral Consistent Weighted Sampling algorithms(CWS、ICWS、I2CWS、PCWS、CCWS、0-bit CWS、SCWS).
Each algorithm converts a data instance (i.e., vector) into the hash code of the specified length,and computes the time of encoding.
Installation
pip install cwslib
The homepage of the toolbox is here.
Usage
# Input data: {array-like, sparse matrix}, shape (n_features, n_instances), format='csc'
# a data matrix where row represents feature and column is data instance
# Import necessary libraries/modules
from os.path import basename
import cwslib
import scipy.io as scio
from cwslib.CWSlib import ConsistentWeightedSampling
from scipy.io import savemat
import os
from scipy.sparse import csr_matrix
# List of MATLAB file paths containing data
mat_files = [···]
# Iterate over each MATLAB file
for mat_file in mat_files:
# Load data from MATLAB file
mat_data = scio.loadmat(mat_file)
# Extract the 'jaccard' array from loaded data
arr = mat_data['jaccard']
# Convert the array data into a Compressed Sparse Row matrix
data = csr_matrix(arr)
# Iterate over a range of dimension numbers
for dimension_num in range(10, 100, 10):
# Apply the Weighted MinHash algorithm to generate fingerprints
cws = cwslib.CWSlib.ConsistentWeightedSampling(data, dimension_num)
fingerprints_k, fingerprints_y, elapsed = cws.algorithms-name()
# Print information about the current process
print(str(basename(mat_file)), 'dimension_num =', dimension_num, 'algorithms-name-elapsed = ', elapsed, '秒')
# Define the path to save the generated MATLAB files
save_path = "D:\\desktop\\mat\\"
# Create the directory if it doesn't exist
os.makedirs(save_path, exist_ok=True)
# Construct the file name for the saved MATLAB file
file_name = str(basename(mat_file)) + '-cws-' + str(dimension_num) + '.mat'
# Combine the directory path and file name
file_path = os.path.join(save_path, file_name)
# Save the fingerprints into a new MATLAB file
savemat(file_path, {'fingerprints_k': fingerprints_k, 'fingerprints_y': fingerprints_y})
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cwslib-1.0.0.tar.gz
(34.9 kB
view details)
File details
Details for the file cwslib-1.0.0.tar.gz
.
File metadata
- Download URL: cwslib-1.0.0.tar.gz
- Upload date:
- Size: 34.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
52479b0f0cf315e5277a8b838654640c21c5e9fdd16535a937043afad24ae5cd
|
|
MD5 |
c6e301e702780c9b1caba72cf7367673
|
|
BLAKE2b-256 |
125545df89cdc34813f6bb9d333941ef59a2651ae2423370ec4b16032af0832b
|