Skip to main content

High-dimensional embedding generation library

Project description

https://circleci.com/gh/VEVO/hidi/tree/master.svg?style=svg

HiDi is a library for high-dimensional embedding generation for collaborative filtering applications.

Read the full documentation.

How Do I Use It?

This will get you started.

from hidi import inout, clean, matrix, pipeline


# CSV file with link_id and item_id columns
in_files = ['hidi/examples/data/user-item.csv']

# File to write output data to
outfile = 'embeddings.csv'

transforms = [
    inout.ReadTransform(in_files),      # Read data from disk
    clean.DedupeTransform(),            # Dedupe it
    matrix.SparseTransform(),           # Make a sparse user*item matrix
    matrix.SimilarityTransform(),       # To item*item similarity matrix
    matrix.SVDTransform(),              # Perform SVD dimensionality reduction
    matrix.ItemsMatrixToDFTransform(),  # Make a DataFrame with an index
    inout.WriteTransform(outfile)       # Write results to csv
]

pl = pipeline.Pipeline(transforms)
pl.run()

Setup

Requirements

HiDi is tested against CPython 2.7, 3.4, 3.5, and 3.6. It may work with different version of CPython.

Installation

To install HiDi, simply run

$ pip install hidi

Run the Tests

$ pip install tox
$ tox

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

HiDi-0.0.3.tar.gz (8.3 kB view details)

Uploaded Source

File details

Details for the file HiDi-0.0.3.tar.gz.

File metadata

  • Download URL: HiDi-0.0.3.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for HiDi-0.0.3.tar.gz
Algorithm Hash digest
SHA256 2a1edfecc8ffd0afc8eea010e79f4ca1ebb201f7107583edf874cf2fb443c86f
MD5 9fca72240802b408fd5c1bf996f279f1
BLAKE2b-256 1c7768c28a07ce8e0344a2cef0c7c32ebecd58e4eae82592c60244489b6a15a0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page