Skip to main content

SentencePiece python wrapper

Project description

# SentencePiece Python Wrapper

Python wrapper for SentencePiece with SWIG. This module wrapps sentencepiece::SentencePieceProcessor class with the following modifications: * Encode and Decode methods are re-defined as EncodeAsIds, EncodeAsPieces, DecodeIds and DecodePieces respectevely. * SentencePieceText proto is not supported. * Added __len__ and __getitem__ methods. len(obj) and obj[key] returns vocab size and vocab id respectively.

## Build and Install SentencePiece You need to install SentencePiece before before installing this python wrapper.

Please use pip comand to install SentencePiece python module.

` % pip install sentencepiece `

You can install SentencePiece python module manually as follows: ` % python setup.py build % sudo python setup.py install `

If you don’t have write permission to the global site-packages directory or don’t want to install into it, please try: ` % python setup.py install --user `

## Usage

` % python >>> import sentencepiece as spm >>> sp = spm.SentencePieceProcessor() >>> sp.Load("test/test_model.model") True >>> sp.EncodeAsPieces("This is a test") ['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'est'] >>> sp.EncodeAsIds("This is a test") [284, 47, 11, 4, 15, 400] >>> sp.DecodePieces(['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'est']) 'This is a test' >>> sp.DecodeIds([284, 47, 11, 4, 15, 400]) 'This is a test' >>> sp.GetPieceSize() 1000 >>> sp.IdToPiece(2) '</s>' >>> sp.PieceToId('</s>') 2 >>> len(sp) 1000 >>> sp['</s>'] 2 `

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sentencepiece-0.0.0.tar.gz (183.2 kB view details)

Uploaded Source

File details

Details for the file sentencepiece-0.0.0.tar.gz.

File metadata

  • Download URL: sentencepiece-0.0.0.tar.gz
  • Upload date:
  • Size: 183.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for sentencepiece-0.0.0.tar.gz
Algorithm Hash digest
SHA256 39ae6846b23460dca80b57260ba784b5de3af63d5dddf9e76abfc5ff24dc0696
MD5 ca2d3a6c483512291b4b5d6826a7d1d1
BLAKE2b-256 2ed5409b1f1cad792bdc5febbf2d6d4467f6e4a6d73672b74214385b33836ec1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page