Skip to main content

Tool for feature extracting from receipts from Russian stores

Project description

Receipt2vec

Framework по переводу строки товара из чека в векторное представление. Это способ формирования признаков (feature enginering) на которых пользователи смогут строить свои модели - модели оттока, рекомендации и тп. В частности, обмениваться признаками с партнёрами.

Установка

python3 -m venv venv
source venv/bin/activate
pip install receipt2vec

Использование

CLI

Перевод тестового файла в формате CSV.

$ receipt2vec --help
usage: receipt2vec [-h] -i INPUT -o OUTPUT [--batch BATCH] [--gpu GPU]
                   [--bpe BPE] [--encoder ENCODER]
                   [--write_header WRITE_HEADER] [--use_columns USE_COLUMNS]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Path to input file with receipts and prices
  -o OUTPUT, --output OUTPUT
                        Path to output file with results
  --batch BATCH         Batch size [Default 128]
  --gpu GPU             Num of gpu. If cpu use -1 [Default -1]
  --bpe BPE             Path to bpe model file. If None - used default model
                        [Default None]
  --encoder ENCODER     Name of encoder model. If None - used default model
                        [Default None]
  --write_header WRITE_HEADER
                        Write header to the output file [0 or 1. Default 0]
  --use_columns USE_COLUMNS
                        A string of columns separated by ',' from the input
                        file that will be written to the output file

Пример входного файла

Файл должен сожержать 2 колонки с заголовками - receipt[srting],price[float]

$ head items_.csv 
receipt,price
"Бутылка 1,0 Литр",8.0
Борщ с фасолью и сметаной,46.0
БЗМЖ СЫР PRETTO МОЦАРЕЛЛА ДЛЯ ,109.9
"БАЛТИКА №3 Пиво свет фильтр паст 4,8",52.99
Аккумулятор холода  800 млLTAK0048,139.9

Использование

$ receipt2vec -i items_.csv -o items.vec

Импорт модели

>>> from receipt2vec.model import Receipt2vecEncoder
>>> model = Receipt2vecEncoder()
>>> vec = model('БАЛТИКА №3 Пиво свет фильтр паст 4,8', 52.99)
>>> print(vec.shape)
torch.Size([256])

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

receipt2vec-0.1.tar.gz (59.2 MB view details)

Uploaded Source

File details

Details for the file receipt2vec-0.1.tar.gz.

File metadata

  • Download URL: receipt2vec-0.1.tar.gz
  • Upload date:
  • Size: 59.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.5

File hashes

Hashes for receipt2vec-0.1.tar.gz
Algorithm Hash digest
SHA256 f19005e02d553e54257aa61f6e8f02e5d05049e7027addeb14b56b4937219a1a
MD5 b40feabbdfbd12f50a8414edffbeccf0
BLAKE2b-256 6659fbb81e472d4b9bbdfd9259e728f1ba5fbc9e7117829df61c40e0156b97d5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page