SentencePiece python wrapper
Project description
# SentencePiece Python Wrapper
Python wrapper for SentencePiece with SWIG. This module wraps sentencepiece::SentencePieceProcessor class with the following modifications: * Encode and Decode methods are re-defined as EncodeAsIds, EncodeAsPieces, DecodeIds and DecodePieces respectevely. * SentencePieceText proto is not supported. * Added __len__ and __getitem__ methods. len(obj) and obj[key] returns vocab size and vocab id respectively.
## Build and Install SentencePiece You need to install SentencePiece before installing this python wrapper.
You can simply use pip comand to install SentencePiece python module.
` % pip install sentencepiece `
To install the wrapper manually, try the following commands: ` % python setup.py build % sudo python setup.py install `
If you don’t have write permission to the global site-packages directory or don’t want to install into it, please try: ` % python setup.py install --user `
## Usage
` % python >>> import sentencepiece as spm >>> sp = spm.SentencePieceProcessor() >>> sp.Load("test/test_model.model") True >>> sp.EncodeAsPieces("This is a test") ['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'est'] >>> sp.EncodeAsIds("This is a test") [284, 47, 11, 4, 15, 400] >>> sp.DecodePieces(['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'est']) 'This is a test' >>> sp.DecodeIds([284, 47, 11, 4, 15, 400]) 'This is a test' >>> sp.GetPieceSize() 1000 >>> sp.IdToPiece(2) '</s>' >>> sp.PieceToId('</s>') 2 >>> len(sp) 1000 >>> sp['</s>'] 2 `
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for sentencepiece-0.0.2-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 471294ca34fd83002d8d9ffdd3c196e9d834a689871ba5a55fb7a1047f183fd6 |
|
MD5 | 8452a2b0768e654ec37456a03e7f490b |
|
BLAKE2b-256 | 5d86051ef81a7fe1cc09c200708b70ddc878da2038a93b75529af35963793647 |
Hashes for sentencepiece-0.0.2-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5381683d8fc7b6b1bcc8f95037e25bd88ca27c1a15dd3ac59fb7050c98fa679c |
|
MD5 | 2afb1909e6605ae68e99305dd41f39e7 |
|
BLAKE2b-256 | 8c96dab3cb7758d6266c83ebb6777d0c0ee241efbc9de11d577fa06324fa3949 |
Hashes for sentencepiece-0.0.2-cp34-cp34m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 962b5ec3b11cc42a7761c91a2a001ac48b8aea41d781f8a0d37e0a65757a153e |
|
MD5 | 57db6751e6012685a9e5fd2d632f8540 |
|
BLAKE2b-256 | cfd349247ac6da6b5b95abc91a0e457a2c9b47f9ce011d7cc7b6159e5d411a5e |
Hashes for sentencepiece-0.0.2-cp33-cp33m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5024639b3b38e88cee6d5e0941d0f2034530d677f4870a37b2d79d7e6e6882d1 |
|
MD5 | 581602ccdbab4a1f9f7fd446434dfe7e |
|
BLAKE2b-256 | 0e30e9c51f7619e6106f01b9bb4f3ecb623755fd81f888c43b3ee483d5d5f81f |
Hashes for sentencepiece-0.0.2-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f04f2a7f85af42fe3c21e60889a064b1a000cb6adc8a8fb2100303ab49660e81 |
|
MD5 | 4e79e6a47cb048c782ed4d90a2cf7428 |
|
BLAKE2b-256 | d1f4d8e6249452031708b194b3c57509d2e22a4c5e0af6de0c10ad9c71e12588 |
Hashes for sentencepiece-0.0.2-cp27-cp27m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 523b77c8ebca7d4afce79ba5eddd2b6d046d1250aafc8ec8ce5d75a378c27e80 |
|
MD5 | b30f7ad9a7de5bbe4699f7539d2fc0f8 |
|
BLAKE2b-256 | f1a93d79c0684314ba78611af16a23cde1a15b8beff6370169c2608ede540349 |