Skip to main content

High-dimensional embedding generation library

Project description

# HiDi: Pipelines for Embeddings

HiDi is a library for high-dimensional embedding generation for collaborative
filtering applications.

## How Do I Use It?

This will get you started.

```python
from hidi import inout, clean, matrix, pipeline


# CSV file with link_id and item_id columns
in_files = ['hidi/examples/data/user-item.csv']

# File to write output data to
outfile = 'embeddings.csv'

transforms = [
inout.ReadTransform(in_files), # Read data from disk
clean.DedupeTransform(), # Dedupe it
matrix.SparseTransform(), # Make a sparse user*item matrix
matrix.SimilarityTransform(), # To item*item similarity matrix
matrix.SVDTransform(), # Perform SVD dimensionality reduction
matrix.ItemsMatrixToDFTransform(), # Make a DataFrame with an index
inout.WriteTransform(outfile) # Write results to csv
]

pl = pipeline.Pipeline(transforms)
pl.run()
```

## Setup

### Requirements

HiDi is tested against CPython 2.7, 3.4, 3.5, and 3.6. It may work with
different version of CPython.

### Installation

To install HiDi, simply run

```sh
$ pip install hidi
```

## Run the Tests

```
$ pip install tox
$ tox
```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

HiDi-0.0.1.tar.gz (6.9 kB view details)

Uploaded Source

File details

Details for the file HiDi-0.0.1.tar.gz.

File metadata

  • Download URL: HiDi-0.0.1.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for HiDi-0.0.1.tar.gz
Algorithm Hash digest
SHA256 5092f0511b23086c81f642d2bc7cf324ea8da2d6d0f0d0b67aa8614224d79469
MD5 475439a318f55bf979cbda383d0b0f33
BLAKE2b-256 cf4e532c33aebd2cba97631956e8fb4fc99a9f97203a99849ab7f46fd50bf9fd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page