Skip to main content

SentencePiece python wrapper

Project description

# SentencePiece Python Wrapper

Python wrapper for SentencePiece with SWIG. This module wraps sentencepiece::SentencePieceProcessor class with the following modifications: * Encode and Decode methods are re-defined as EncodeAsIds, EncodeAsPieces, DecodeIds and DecodePieces respectevely. * SentencePieceText proto is not supported. * Added __len__ and __getitem__ methods. len(obj) and obj[key] returns vocab size and vocab id respectively.

## Build and Install SentencePiece You need to install SentencePiece before installing this python wrapper.

You can simply use pip comand to install SentencePiece python module.

` % pip install sentencepiece `

To install the wrapper manually, try the following commands: ` % python setup.py build % sudo python setup.py install `

If you don’t have write permission to the global site-packages directory or don’t want to install into it, please try: ` % python setup.py install --user `

## Usage

` % python >>> import sentencepiece as spm >>> sp = spm.SentencePieceProcessor() >>> sp.Load("test/test_model.model") True >>> sp.EncodeAsPieces("This is a test") ['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'est'] >>> sp.EncodeAsIds("This is a test") [284, 47, 11, 4, 15, 400] >>> sp.DecodePieces(['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'est']) 'This is a test' >>> sp.DecodeIds([284, 47, 11, 4, 15, 400]) 'This is a test' >>> sp.GetPieceSize() 1000 >>> sp.IdToPiece(2) '</s>' >>> sp.PieceToId('</s>') 2 >>> len(sp) 1000 >>> sp['</s>'] 2 `

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sentencepiece-0.0.2.tar.gz (378.8 kB view hashes)

Uploaded Source

Built Distributions

sentencepiece-0.0.2-cp36-cp36m-manylinux1_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.6m

sentencepiece-0.0.2-cp35-cp35m-manylinux1_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.5m

sentencepiece-0.0.2-cp34-cp34m-manylinux1_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.4m

sentencepiece-0.0.2-cp33-cp33m-manylinux1_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.3m

sentencepiece-0.0.2-cp27-cp27mu-manylinux1_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 2.7mu

sentencepiece-0.0.2-cp27-cp27m-manylinux1_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 2.7m

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page