Skip to main content

SentencePiece python wrapper

Project description

# SentencePiece Python Wrapper

Python wrapper for SentencePiece with SWIG. This module wraps sentencepiece::SentencePieceProcessor class with the following modifications: * Encode and Decode methods are re-defined as EncodeAsIds, EncodeAsPieces, DecodeIds and DecodePieces respectevely. * SentencePieceText proto is not supported. * Added __len__ and __getitem__ methods. len(obj) and obj[key] returns vocab size and vocab id respectively.

## Build and Install SentencePiece You need to install SentencePiece before installing this python wrapper.

You can simply use pip comand to install SentencePiece python module.

` % pip install sentencepiece `

To install the wrapper manually, try the following commands: ` % python setup.py build % sudo python setup.py install `

If you don’t have write permission to the global site-packages directory or don’t want to install into it, please try: ` % python setup.py install --user `

## Usage

` % python >>> import sentencepiece as spm >>> sp = spm.SentencePieceProcessor() >>> sp.Load("test/test_model.model") True >>> sp.EncodeAsPieces("This is a test") ['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'est'] >>> sp.EncodeAsIds("This is a test") [284, 47, 11, 4, 15, 400] >>> sp.DecodePieces(['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'est']) 'This is a test' >>> sp.DecodeIds([284, 47, 11, 4, 15, 400]) 'This is a test' >>> sp.GetPieceSize() 1000 >>> sp.IdToPiece(2) '</s>' >>> sp.PieceToId('</s>') 2 >>> len(sp) 1000 >>> sp['</s>'] 2 `

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sentencepiece-0.0.2.tar.gz (378.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sentencepiece-0.0.2-cp36-cp36m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.6m

sentencepiece-0.0.2-cp35-cp35m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.5m

sentencepiece-0.0.2-cp34-cp34m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.4m

sentencepiece-0.0.2-cp33-cp33m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.3m

sentencepiece-0.0.2-cp27-cp27mu-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 2.7mu

sentencepiece-0.0.2-cp27-cp27m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 2.7m

File details

Details for the file sentencepiece-0.0.2.tar.gz.

File metadata

  • Download URL: sentencepiece-0.0.2.tar.gz
  • Upload date:
  • Size: 378.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for sentencepiece-0.0.2.tar.gz
Algorithm Hash digest
SHA256 9544a12d8db23490f9d9133d525f3175dafe0c5c10de5a878cbf3a026e8dbb85
MD5 bb50922b8265c036730f25602ba86029
BLAKE2b-256 f881297da1e97862684b3c3045ab83010e4a3b771e797a54a715742d189551a0

See more details on using hashes here.

File details

Details for the file sentencepiece-0.0.2-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sentencepiece-0.0.2-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 471294ca34fd83002d8d9ffdd3c196e9d834a689871ba5a55fb7a1047f183fd6
MD5 8452a2b0768e654ec37456a03e7f490b
BLAKE2b-256 5d86051ef81a7fe1cc09c200708b70ddc878da2038a93b75529af35963793647

See more details on using hashes here.

File details

Details for the file sentencepiece-0.0.2-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sentencepiece-0.0.2-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 5381683d8fc7b6b1bcc8f95037e25bd88ca27c1a15dd3ac59fb7050c98fa679c
MD5 2afb1909e6605ae68e99305dd41f39e7
BLAKE2b-256 8c96dab3cb7758d6266c83ebb6777d0c0ee241efbc9de11d577fa06324fa3949

See more details on using hashes here.

File details

Details for the file sentencepiece-0.0.2-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sentencepiece-0.0.2-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 962b5ec3b11cc42a7761c91a2a001ac48b8aea41d781f8a0d37e0a65757a153e
MD5 57db6751e6012685a9e5fd2d632f8540
BLAKE2b-256 cfd349247ac6da6b5b95abc91a0e457a2c9b47f9ce011d7cc7b6159e5d411a5e

See more details on using hashes here.

File details

Details for the file sentencepiece-0.0.2-cp33-cp33m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sentencepiece-0.0.2-cp33-cp33m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 5024639b3b38e88cee6d5e0941d0f2034530d677f4870a37b2d79d7e6e6882d1
MD5 581602ccdbab4a1f9f7fd446434dfe7e
BLAKE2b-256 0e30e9c51f7619e6106f01b9bb4f3ecb623755fd81f888c43b3ee483d5d5f81f

See more details on using hashes here.

File details

Details for the file sentencepiece-0.0.2-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sentencepiece-0.0.2-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f04f2a7f85af42fe3c21e60889a064b1a000cb6adc8a8fb2100303ab49660e81
MD5 4e79e6a47cb048c782ed4d90a2cf7428
BLAKE2b-256 d1f4d8e6249452031708b194b3c57509d2e22a4c5e0af6de0c10ad9c71e12588

See more details on using hashes here.

File details

Details for the file sentencepiece-0.0.2-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sentencepiece-0.0.2-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 523b77c8ebca7d4afce79ba5eddd2b6d046d1250aafc8ec8ce5d75a378c27e80
MD5 b30f7ad9a7de5bbe4699f7539d2fc0f8
BLAKE2b-256 f1a93d79c0684314ba78611af16a23cde1a15b8beff6370169c2608ede540349

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page