Morphology-aware BPE tokenizer for Philippine languages (Tagalog)
Project description
Filipino Tokenizer
A morphology-aware BPE tokenizer for Philippine languages.
Existing subword tokenizers (SentencePiece, HuggingFace BPE) treat Filipino text as raw character sequences. They have no knowledge of Filipino morphology, so they routinely split words at linguistically meaningless points. A word like pinakamahusay ("the best") gets fragmented into arbitrary substrings instead of its actual morphemes: pinaka- + ma- + husay.
This project fixes that. It combines a rule-based morphological segmenter with a constrained BPE algorithm that never merges across morpheme boundaries. The result is a tokenizer that produces fewer, more meaningful tokens for Filipino text.
Before and After
Consider the sentence: Kumain siya ng masarap na pagkain.
A generic BPE tokenizer might produce:
["Ku", "main", " siya", " ng", " mas", "ar", "ap", " na", " pag", "ka", "in", "."]
This tokenizer understands that kumain contains the infix -um- and root kain, and that pagkain is prefix pag- plus the same root kain:
["k", "um", "ain", " ", "siya", " ", "ng", " ", "ma", "sarap", " ", "na", " ", "pag", "kain", "."]
The root kain is preserved as a single token and shared across both words. This gives downstream models a head start on understanding Filipino word formation.
Installation
pip install filipino-tokenizer
Pre-built wheels are available for Linux, macOS, and Windows on Python 3.10–3.13 — no compiler or Rust toolchain required.
For HuggingFace Transformers integration:
pip install filipino-tokenizer[hf]
To install from source for development (requires Rust via rustup.rs):
git clone https://github.com/JpCurada/filipino-tokenizer.git
cd filipino-tokenizer
pip install -e .
Quick Start
Use the bundled pretrained model
A 32k-vocabulary model trained on Wikitext-TL-39 ships inside the package — no download needed.
from filipino_tokenizer.tagalog import TagalogTokenizer
tok = TagalogTokenizer()
tok.load_pretrained()
ids = tok.encode("Kumain siya ng pagkain.")
print(tok.decode(ids)) # kumain siya ng pagkain.
print(tok.tokenize("Kumain siya ng pagkain."))
# ['k', 'um', 'ain', ' ', 'siya', ' ', 'ng', ' ', 'pag', 'kain', '.']
HuggingFace integration
from filipino_tokenizer.tagalog import TagalogHFTokenizer
tok = TagalogHFTokenizer() # loads bundled model
encoding = tok("Kumain siya ng pagkain.", return_tensors="pt")
Works directly with Trainer, TRL, Axolotl, LlamaFactory, and any other HuggingFace-based training pipeline.
Train a custom model
from filipino_tokenizer.tagalog import TagalogTokenizer
tok = TagalogTokenizer()
tok.train("corpus.txt", vocab_size=32000)
ids = tok.encode("Kumain siya ng pagkain.")
print(tok.decode(ids)) # kumain siya ng pagkain.
tok.save("my_tokenizer/")
tok2 = TagalogTokenizer()
tok2.load("my_tokenizer/")
How It Works
The tokenizer is a three-stage pipeline.
Stage 1: Affix Tables. Four JSON files in data/ define every known Filipino prefix, suffix, infix, and circumfix. Each entry is tagged by language (Tagalog, Cebuano, etc.), so the same data files support multiple Philippine languages. Prefixes are sorted longest-first for greedy matching.
Stage 2: Morphological Segmenter. The TagalogSegmenter decomposes a word into its constituent morphemes using a multi-pass algorithm:
- Check for frozen/lexicalized forms (e.g., pangalan is a word, not pang- + alan).
- Try circumfix detection (prefix + suffix pairs like ka- -han).
- Strip prefixes, longest match first, with recursion for stacked prefixes.
- Detect infixes (-um- and -in- after the first consonant).
- Strip suffixes, applying phonological rules (-an becomes -han after vowels).
- Validate every candidate root against a dictionary of 30,000+ Tagalog roots.
If no valid segmentation is found, the word is returned whole.
Stage 3: Constrained BPE. The MorphAwareBPE class runs an optimized, incremental byte-pair encoding algorithm (using doubly-linked lists and max-heaps) with one critical constraint: it never merges a pair of symbols that would cross a morpheme boundary marker (▁). Merges that respect this constraint are learned at training time. At inference time, the greedy BPE encoder is implemented in Rust (_bpe_rust.CoreBPE via PyO3) for fast, allocation-efficient encoding.
Evaluation
We evaluated our TagalogTokenizer against standard industry tokenizers (GPT-4's cl100k_base and SentencePiece Unigram) on a 5,000-line corpus evaluation split.
=======================================================================
Metric | Ours | GPT-4 | SPM
-----------------------------------------------------------------------
Total Tokens | 645 | 516 | 318
Tokens per Word (Fertility) | 2.34 | 1.87 | 1.15
Morpheme F1 Accuracy | 64.5% | 20.8% | 12.0%
=======================================================================
- Morpheme F1 Accuracy: Our tokenizer is 3x more likely to split Filipino words at actual linguistic boundaries than GPT-4, and 5x more likely than SentencePiece.
- Fertility: Our tokenizer produces slightly more tokens per word (2.34). This is the expected trade-off: because we strictly prevent merges across morpheme boundaries, frequent but morphologically distinct parts (like
pagandkain) are kept separate, rather than being memorized as a single unbroken token (pagkain). This ensures robust compositional understanding for AI models.
Project Structure
filipino-tokenizer/
src/
lib.rs # Rust BPE backend (CoreBPE, PyO3 bindings)
filipino_tokenizer/
base.py # BaseAffixes, BaseRoots, BaseSegmenter, BaseTokenizer
data/
prefix_table.json # Prefix definitions, multi-language
suffix_table.json # Suffix definitions
infix_table.json # Infix definitions
circumfix_table.json # Circumfix definitions
tagalog_roots.json # ~30k Tagalog root words
bisaya_roots.json # Bisaya root words
pretrained/
vocab.json # Bundled 32k vocabulary (Wikitext-TL-39)
merges.txt # Bundled merge rules
tagalog/
__init__.py # Package exports
affixes.py # TagalogAffixes (filters for language="Tagalog")
roots.py # TagalogRoots (loads tagalog_roots.json)
phonology.py # Nasal assimilation, suffix h-insertion
segmenter.py # TagalogSegmenter (multi-pass morpheme decomposition)
bpe.py # MorphAwareBPE (constrained BPE, delegates to Rust)
tokenizer.py # TagalogTokenizer (segmenter + BPE pipeline)
hf_tokenizer.py # TagalogHFTokenizer (PreTrainedTokenizer wrapper)
tests/
test_affixes.py # Affix loading and filtering tests
test_segmenter.py # Morphological segmentation tests
test_tokenizer.py # Full pipeline tests (round-trip, consistency, efficiency)
test_rust_backend.py # Rust extension tests (encode/decode, morpheme boundaries)
examples/
training_tagalog_tokenizer.py # End-to-end training example
demo/
demo_tagalog_tokenizer.ipynb # Usage guide notebook
tokenizer_comparisons.ipynb # Benchmark vs GPT-4 and SentencePiece
tokenizer_comparisons_fil.ipynb # Side-by-side comparison on Filipino sentences
slm_tokenizer_comparison.ipynb # SLM training metrics comparison
slm_training_experiment.ipynb # Full GPT-2 training experiment
Cargo.toml # Rust crate configuration
setup.py # setuptools-rust build hook
pyproject.toml # Package metadata and build system
Running Tests
# All tests
python -m unittest discover tests -v
# Individual test files
python -m unittest tests.test_affixes -v
python -m unittest tests.test_segmenter -v
python -m unittest tests.test_tokenizer -v
python -m unittest tests.test_rust_backend -v
# Rust unit tests (requires cargo)
cargo test
Adding a New Language
The architecture is designed to support multiple Philippine languages from the same data files. To add Bisaya, Ilokano, or another language:
- Add entries to the JSON affix tables in
filipino_tokenizer/data/with the appropriatelanguagefield. - Add a root word list (e.g.,
filipino_tokenizer/data/bisaya_roots.json). - Create
filipino_tokenizer/<language>/affixes.pysubclassingBaseAffixeswithsuper().__init__(language="<Language>"). - Create a roots class subclassing
BaseRoots. - Implement a segmenter subclassing
BaseSegmenterwith language-specific phonological rules. - Create a tokenizer class that wires the segmenter to
MorphAwareBPE.
References
-
Tacorda, A. J., Ignacio, M. J., Oco, N., & Roxas, R. E. (2017). Controlling byte pair encoding for neural machine translation. 2017 International Conference on Asian Language Processing (IALP), 168-171. The core idea behind the boundary-constrained (Controlled) BPE approach used here.
-
Cruz, J. C. B., & Cheng, C. (2022). Improving Large-scale Language Models and Resources for Filipino. Proceedings of the Thirteenth Language Resources and Evaluation Conference (LREC). Authors of key Filipino NLP datasets and benchmarks, including the TLUnified corpus.
-
Miranda, L. J. (2023). calamanCy: A Tagalog Natural Language Processing Toolkit. Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS). SpaCy-based NLP pipeline for Tagalog that informed the morphological analysis approach.
License
MIT License. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file filipino_tokenizer-0.4.0.tar.gz.
File metadata
- Download URL: filipino_tokenizer-0.4.0.tar.gz
- Upload date:
- Size: 3.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95811a5599fc353b685ce6d22e6f21cbe083ff2157e2e00c77000eb3c8593ef6
|
|
| MD5 |
52fa581c04aac82689c0bb549658c375
|
|
| BLAKE2b-256 |
af0da8e7935b1f716f81fe70b7eae26e4797418b93f883312fafefb849c9cfa3
|
Provenance
The following attestation bundles were made for filipino_tokenizer-0.4.0.tar.gz:
Publisher:
publish.yml on JpCurada/filipino-tokenizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filipino_tokenizer-0.4.0.tar.gz -
Subject digest:
95811a5599fc353b685ce6d22e6f21cbe083ff2157e2e00c77000eb3c8593ef6 - Sigstore transparency entry: 1392466953
- Sigstore integration time:
-
Permalink:
JpCurada/filipino-tokenizer@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JpCurada
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file filipino_tokenizer-0.4.0-cp313-cp313-win_amd64.whl.
File metadata
- Download URL: filipino_tokenizer-0.4.0-cp313-cp313-win_amd64.whl
- Upload date:
- Size: 3.2 MB
- Tags: CPython 3.13, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15b0273e5d198a22acdf297079c30540df7e2168076423f2a42592b95cfef78e
|
|
| MD5 |
1bf0af1d3724c1a2d6421eb76540f26c
|
|
| BLAKE2b-256 |
486431cd936c43d3892709f1377e6efde8799772b29f41816c2160a56a90d4f6
|
Provenance
The following attestation bundles were made for filipino_tokenizer-0.4.0-cp313-cp313-win_amd64.whl:
Publisher:
publish.yml on JpCurada/filipino-tokenizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filipino_tokenizer-0.4.0-cp313-cp313-win_amd64.whl -
Subject digest:
15b0273e5d198a22acdf297079c30540df7e2168076423f2a42592b95cfef78e - Sigstore transparency entry: 1392467025
- Sigstore integration time:
-
Permalink:
JpCurada/filipino-tokenizer@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JpCurada
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file filipino_tokenizer-0.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: filipino_tokenizer-0.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 3.3 MB
- Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b5534cf9b1e3ee995b021f0ad36ab44eef0a7839660d23c43b8004648c164fe
|
|
| MD5 |
e9c84d4ac851e74e125d00787cc4ce41
|
|
| BLAKE2b-256 |
c64c91cad8af39faea0ae4de9c06ea743a34af25d995f9e49767a26ab8a68194
|
Provenance
The following attestation bundles were made for filipino_tokenizer-0.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on JpCurada/filipino-tokenizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filipino_tokenizer-0.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
8b5534cf9b1e3ee995b021f0ad36ab44eef0a7839660d23c43b8004648c164fe - Sigstore transparency entry: 1392467080
- Sigstore integration time:
-
Permalink:
JpCurada/filipino-tokenizer@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JpCurada
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file filipino_tokenizer-0.4.0-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: filipino_tokenizer-0.4.0-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.3 MB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa0271db01ae07250dc02740837dd9178d2edf2d12b44b109f6cce4283d98cd9
|
|
| MD5 |
3bf2898d22a49ecc1016c82a6be1d50a
|
|
| BLAKE2b-256 |
026ff70b1c8b46039ec27b6dcaa48b30d0ee9290477706dccbac5f2e043e66d5
|
Provenance
The following attestation bundles were made for filipino_tokenizer-0.4.0-cp313-cp313-macosx_11_0_arm64.whl:
Publisher:
publish.yml on JpCurada/filipino-tokenizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filipino_tokenizer-0.4.0-cp313-cp313-macosx_11_0_arm64.whl -
Subject digest:
fa0271db01ae07250dc02740837dd9178d2edf2d12b44b109f6cce4283d98cd9 - Sigstore transparency entry: 1392467041
- Sigstore integration time:
-
Permalink:
JpCurada/filipino-tokenizer@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JpCurada
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file filipino_tokenizer-0.4.0-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: filipino_tokenizer-0.4.0-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 3.2 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e396db7daa177c75515a4fc8eecbe118091623aed946c1085e5db72d025ba90
|
|
| MD5 |
b802bf3b8894b8e4bac003c08890c685
|
|
| BLAKE2b-256 |
faf0d8edc1b140a4eca93e42f2cdc1626b6d2d1a57c2241a926a95df96a9dd8a
|
Provenance
The following attestation bundles were made for filipino_tokenizer-0.4.0-cp312-cp312-win_amd64.whl:
Publisher:
publish.yml on JpCurada/filipino-tokenizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filipino_tokenizer-0.4.0-cp312-cp312-win_amd64.whl -
Subject digest:
6e396db7daa177c75515a4fc8eecbe118091623aed946c1085e5db72d025ba90 - Sigstore transparency entry: 1392467069
- Sigstore integration time:
-
Permalink:
JpCurada/filipino-tokenizer@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JpCurada
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file filipino_tokenizer-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: filipino_tokenizer-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 3.3 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ec133eb0f60a8f1d23688ed55caf4d26dfd26d93f8c348b9f035770e63a03ae
|
|
| MD5 |
781f1fb1665946c1aaf07a831e00312c
|
|
| BLAKE2b-256 |
3a4d9e6a2fb1c6ca1f24d8325b7988771f04705e0505cc759d7b05cd2fa174fe
|
Provenance
The following attestation bundles were made for filipino_tokenizer-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on JpCurada/filipino-tokenizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filipino_tokenizer-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
3ec133eb0f60a8f1d23688ed55caf4d26dfd26d93f8c348b9f035770e63a03ae - Sigstore transparency entry: 1392467004
- Sigstore integration time:
-
Permalink:
JpCurada/filipino-tokenizer@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JpCurada
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file filipino_tokenizer-0.4.0-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: filipino_tokenizer-0.4.0-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.3 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a28f7dc65e818eb7fcf70125f7780397b5894c8fc4f602ed1ff0d4f06e7d165b
|
|
| MD5 |
5845b37d91723e0b3dd179c2b22f83e4
|
|
| BLAKE2b-256 |
d1a60a35884add1dfdcd52955d644566a5515eef2c9819227f1b93fe82e91dd4
|
Provenance
The following attestation bundles were made for filipino_tokenizer-0.4.0-cp312-cp312-macosx_11_0_arm64.whl:
Publisher:
publish.yml on JpCurada/filipino-tokenizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filipino_tokenizer-0.4.0-cp312-cp312-macosx_11_0_arm64.whl -
Subject digest:
a28f7dc65e818eb7fcf70125f7780397b5894c8fc4f602ed1ff0d4f06e7d165b - Sigstore transparency entry: 1392467034
- Sigstore integration time:
-
Permalink:
JpCurada/filipino-tokenizer@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JpCurada
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file filipino_tokenizer-0.4.0-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: filipino_tokenizer-0.4.0-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 3.2 MB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11767cea8068cd16bd36cfa68366da7d18fac16be34793a509d2592abaf9da38
|
|
| MD5 |
a5d68361ef67015409e55766aa302170
|
|
| BLAKE2b-256 |
19af1f3520c71e7a2200ea916da54d129947140250cc34255f9d866e19406713
|
Provenance
The following attestation bundles were made for filipino_tokenizer-0.4.0-cp311-cp311-win_amd64.whl:
Publisher:
publish.yml on JpCurada/filipino-tokenizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filipino_tokenizer-0.4.0-cp311-cp311-win_amd64.whl -
Subject digest:
11767cea8068cd16bd36cfa68366da7d18fac16be34793a509d2592abaf9da38 - Sigstore transparency entry: 1392466987
- Sigstore integration time:
-
Permalink:
JpCurada/filipino-tokenizer@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JpCurada
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file filipino_tokenizer-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: filipino_tokenizer-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 3.3 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0038893f6f2bb23a5cee79e6b4873a3bf9f35e09001c631ee6d100a9825a3ce9
|
|
| MD5 |
f4310e9dd55b73f532425b89315d0504
|
|
| BLAKE2b-256 |
a96ff9c5bcec97d07069a69825a8f1d731dd7e8b5f68c3147a3edfc94e1d8d42
|
Provenance
The following attestation bundles were made for filipino_tokenizer-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on JpCurada/filipino-tokenizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filipino_tokenizer-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
0038893f6f2bb23a5cee79e6b4873a3bf9f35e09001c631ee6d100a9825a3ce9 - Sigstore transparency entry: 1392467092
- Sigstore integration time:
-
Permalink:
JpCurada/filipino-tokenizer@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JpCurada
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file filipino_tokenizer-0.4.0-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: filipino_tokenizer-0.4.0-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.3 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2778e0ce10c067b4095b7e2c0e546fb5bd669d2d1673f302bdd7dd9308e4b21f
|
|
| MD5 |
23130784a760087378f94a5c4750e6b1
|
|
| BLAKE2b-256 |
1e3eefb541687955e9bc906fb09864cd0cd04f1db3850929ee9d0a275e6ba540
|
Provenance
The following attestation bundles were made for filipino_tokenizer-0.4.0-cp311-cp311-macosx_11_0_arm64.whl:
Publisher:
publish.yml on JpCurada/filipino-tokenizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filipino_tokenizer-0.4.0-cp311-cp311-macosx_11_0_arm64.whl -
Subject digest:
2778e0ce10c067b4095b7e2c0e546fb5bd669d2d1673f302bdd7dd9308e4b21f - Sigstore transparency entry: 1392466966
- Sigstore integration time:
-
Permalink:
JpCurada/filipino-tokenizer@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JpCurada
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file filipino_tokenizer-0.4.0-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: filipino_tokenizer-0.4.0-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 3.2 MB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
863066f09f84fb13685f1030e1578c0ff19f4f759e55d63fb52d82e355e63a37
|
|
| MD5 |
da218eda0943676e085c559dbda22ada
|
|
| BLAKE2b-256 |
697ecebce361e39239e76fb2977453e82710d59202a936dfb84487d900b75679
|
Provenance
The following attestation bundles were made for filipino_tokenizer-0.4.0-cp310-cp310-win_amd64.whl:
Publisher:
publish.yml on JpCurada/filipino-tokenizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filipino_tokenizer-0.4.0-cp310-cp310-win_amd64.whl -
Subject digest:
863066f09f84fb13685f1030e1578c0ff19f4f759e55d63fb52d82e355e63a37 - Sigstore transparency entry: 1392467050
- Sigstore integration time:
-
Permalink:
JpCurada/filipino-tokenizer@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JpCurada
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file filipino_tokenizer-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: filipino_tokenizer-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 3.3 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cfb00022ca29746d677795556dbdf955e66ecf995a2100972ab3e4e5bc871035
|
|
| MD5 |
1b93300c31f0cbbd6cab4bc797c049d4
|
|
| BLAKE2b-256 |
e9b0f9fb14ceda094768b804e1b3c0cba4bcc88b3e45b391616065932d925aad
|
Provenance
The following attestation bundles were made for filipino_tokenizer-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on JpCurada/filipino-tokenizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filipino_tokenizer-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
cfb00022ca29746d677795556dbdf955e66ecf995a2100972ab3e4e5bc871035 - Sigstore transparency entry: 1392467059
- Sigstore integration time:
-
Permalink:
JpCurada/filipino-tokenizer@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JpCurada
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file filipino_tokenizer-0.4.0-cp310-cp310-macosx_11_0_arm64.whl.
File metadata
- Download URL: filipino_tokenizer-0.4.0-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.3 MB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04dd0675041560ace56ac49f9ce7fb0c285dfc74343b7f6b7a76958af3e84199
|
|
| MD5 |
bd78a191cd58b34cb2625748e753a87e
|
|
| BLAKE2b-256 |
c72b6138a07aae4c9b0fe3f5803b3900702ebee3b1be8d83de7861afe78579cb
|
Provenance
The following attestation bundles were made for filipino_tokenizer-0.4.0-cp310-cp310-macosx_11_0_arm64.whl:
Publisher:
publish.yml on JpCurada/filipino-tokenizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filipino_tokenizer-0.4.0-cp310-cp310-macosx_11_0_arm64.whl -
Subject digest:
04dd0675041560ace56ac49f9ce7fb0c285dfc74343b7f6b7a76958af3e84199 - Sigstore transparency entry: 1392467017
- Sigstore integration time:
-
Permalink:
JpCurada/filipino-tokenizer@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JpCurada
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c5032f780b7ee1ab9d270fb573536c8ca9172807 -
Trigger Event:
workflow_dispatch
-
Statement type: