An implementation of Anchor Graph Hashing

These details have not been verified by PyPI

Project links

Homepage

Project description

aghasher

An implementation of the Anchor Graph Hashing algorithm (AGH-1), presented in Hashing with Graphs (Liu et al. 2011).

Dependencies

aghasher supports Python 2.7 and Python 3, with numpy and scipy. These should be linked with a BLAS implementation (e.g., OpenBLAS, ATLAS, Intel MKL). Without being linked to BLAS, numpy/scipy will use a fallback that causes PyAnchorGraphHasher to run over 50x slower.

Installation

aghasher is available on PyPI, the Python Package Index.

$ pip install aghasher

How To Use

To use aghasher, first import the aghasher module.

import aghasher

Training a Model

An AnchorGraphHasher is constructed using the train method, which returns an AnchorGraphHasher and the hash bit embedding for the training data.

agh, H_train = aghasher.AnchorGraphHasher.train(X, anchors, num_bits, nn_anchors, sigma)

AnchorGraphHasher.train takes 5 arguments:

X An n-by-d numpy.ndarray with training data. The rows correspond to n observations, and the columns correspond to d dimensions.
anchors An m-by-d numpy.ndarray with anchors. m is the total number of anchors. Rows correspond to anchors, and columns correspond to dimensions. The dimensionality of the anchors much match the dimensionality of the training data.
num_bits (optional; defaults to 12) Number of hash bits for the embedding.
nn_anchors (optional; defaults to 2) Number of nearest anchors that are used for approximating the neighborhood structure.
sigma (optional; defaults to None) sigma for the Gaussian radial basis function that is used to determine similarity between points. When sigma is specified as None, the code will automatically set a value, depending on the training data and anchors.

Hashing Data with an AnchorGraphHasher Model

With an AnchorGraphHasher object, which has variable name agh in the preceding and following examples, hashing out-of-sample data is done with the object's hash method.

agh.hash(X)

The hash method takes one argument:

X An n-by-d numpy.ndarray with data. The rows correspond to n observations, and the columns correspond to d dimensions. The dimensionality of the data much match the dimensionality of the training data used to train the AnchorGraphHasher.

Since Python does not have a native bit vector data structure, the hash method returns an n-by-r numpy.ndarray, where n is the number of observations in data, and r is the number of hash bits specified when the model was trained. The elements of the returned array are boolean values that correspond to bits.

Testing an AnchorGraphHasher Model

Testing is performed with the AnchorGraphHasher.test method.

precision = AnchorGraphHasher.test(H_train, H_test, y_train, y_test, radius)

AnchorGraphHasher.test takes 5 arguments:

H_train An n-by-r numpy.ndarray with the hash bit embedding corresponding to the training data. The rows correspond to the n observations, and the columns correspond to the r hash bits.
H_test An m-by-r numpy.ndarray with the hash bit embedding corresponding to the testing data. The rows correspond to the m observations, and the columns correspond to the r hash bits.
y_train An n-by-1 numpy.ndarray with the ground truth labels for the training data.
y_test An m-by-1 numpy.ndarray with the ground truth labels for the testing data.
radius (optional; defaults to 2) Hamming radius to use for calculating precision.

Tests

Tests are in tests/.

# Run tests
$ python3 -m unittest discover tests -v

Differences from the Matlab Reference Implementation

The code is structured differently than the Matlab reference implementation.

The Matlab code implements an additional hashing method, hierarchical hashing (referred to as 2-AGH), an extension of 1-AGH that is not implemented here.

There is one functional difference relative to the Matlab code. If sigma is specified (as opposed to being auto-estimated), then for the same value of sigma, the Matlab and Python code will produce different results. They will produce the same results when the Matlab sigma is sqrt(2) times bigger than the manually specified sigma in the Python code. This is because in the Gaussian RBF kernel, the Python code uses a 2 in the denominator of the exponent, and the Matlab code does not. A 2 was included in the denominator of the Python code, as that is the canonical way to use an RBF kernel.

License

aghasher has an MIT License.

See LICENSE.

References

Liu, Wei, Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2011. “Hashing with Graphs.” In Proceedings of the 28th International Conference on Machine Learning (ICML-11), edited by Lise Getoor and Tobias Scheffer, 1–8. ICML ’11. New York, NY, USA: ACM.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.1

Dec 25, 2022

This version

0.1.0

Mar 13, 2020

0.0.2

Nov 7, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aghasher-0.1.0.tar.gz (6.6 kB view details)

Uploaded Mar 13, 2020 Source

Built Distribution

aghasher-0.1.0-py2.py3-none-any.whl (7.8 kB view details)

Uploaded Aug 28, 2020 Python 2Python 3

File details

Details for the file aghasher-0.1.0.tar.gz.

File metadata

Download URL: aghasher-0.1.0.tar.gz
Upload date: Mar 13, 2020
Size: 6.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.4

File hashes

Hashes for aghasher-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`8fdf03da8edad3e5264a011025bd6ac6437eed914726b5d46d949640cd7a48d4`
MD5	`b88c40192a91467c287003d542c9cb1d`
BLAKE2b-256	`6bdb585d37675bbecbd320d20d0a744853bad570ea3b0a7ce69d99ffd66f7feb`

See more details on using hashes here.

File details

Details for the file aghasher-0.1.0-py2.py3-none-any.whl.

File metadata

Download URL: aghasher-0.1.0-py2.py3-none-any.whl
Upload date: Aug 28, 2020
Size: 7.8 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.2

File hashes

Hashes for aghasher-0.1.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`31be1dcbca988acdf9ec66321fd4756094204484c8d5d9bd0f564280d4451fe6`
MD5	`473e2adbf3f4e7335dfd8c3f0a1972ae`
BLAKE2b-256	`702ab2ef0a3e7097ff87934ad9728f7c47344d979f04f8cc788406cb50e94c49`

See more details on using hashes here.

aghasher 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

aghasher

Dependencies

Installation

How To Use

Training a Model

Hashing Data with an AnchorGraphHasher Model

Testing an AnchorGraphHasher Model

Tests

Differences from the Matlab Reference Implementation

License

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes