Skip to main content

Multilingual and diachronic function-word datasets, with modular composition

Project description

functionwordsets

Comprehensive multilingual function-word datasets with a simple Python API

DOI


Overview

functionwordsets ships ready-to-use function-word lists for many languages and time-periods.
Each dataset is a tiny Python module located in functionwordsets/datasets/ and is loaded on demand through a minimal API.

Supported out of the box :

ID Language / period Entries*
fr_21c French – 21st century 688
en_21c English – 21st century 390
sp_21c Spanish – 21st century 481
it_21c Italian – 21st century 495
nl_21c Dutch – 21st century 287
gr_5cbc Ancient Greek – 5th-4th c. BCE 264
oc_13c Old Occitan – 12th-13th c. 360
la_1cbc Classical Latin – 1st c. BCE 353

*Number of distinct word-forms in the union of all categories.

You can also add or fork your own datasets: just drop a <id>.py file following the template shown below.


💡 Supported grammatical categories

(summary unchanged – see below for details)


Installation

pip install functionwordsets         # from PyPI
# or, from a cloned repo
pip install -e .

Python ≥ 3.8 – zero runtime dependencies – wheel < 20 kB zipped.


Quick start

import functionwordsets as fw

# List available datasets
print(fw.available_ids())            # ['fr_21c', 'en_21c', …]

# Load one set (defaults to fr_21c)
fr = fw.load()                       # same as fw.load('fr_21c')
print(fr.name, len(fr.all))          # French – 21st century 688

# Membership test
if 'ne' in fr.all:
    ...

# Build a custom stop-set: only articles + prepositions
stops = fr.subset(['articles', 'prepositions'])

Command-line helpers

# List dataset IDs
fw-list

# Export every French function word to a text file
fw-export fr_21c -o fr.txt

# Export only conjunctions & negations from Spanish as JSON
fw-export sp_21c --include coord_conj subord_conj negations -o sp_stop.json

Dataset layout

Internally each dataset is defined as a small Python dictionary:

data = {
    "name": "English – 21st century",
    "language": "en",
    "period": "21c",
    "categories": {
        "articles": [...],
        "prepositions": [...],
        # …
    }
}

functionwordsets treats the object as read-only, so feel free to edit or extend it in your fork.


Notes on auxiliary categories

Keys for auxiliary verbs follow the pattern aux_<lemma> (e.g. aux_être, aux_be, aux_ser). They vary by language; see each dataset file for the exact key.


Enjoy !

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

functionwordsets-1.2.1.tar.gz (33.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

functionwordsets-1.2.1-py3-none-any.whl (35.8 kB view details)

Uploaded Python 3

File details

Details for the file functionwordsets-1.2.1.tar.gz.

File metadata

  • Download URL: functionwordsets-1.2.1.tar.gz
  • Upload date:
  • Size: 33.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for functionwordsets-1.2.1.tar.gz
Algorithm Hash digest
SHA256 91c435dd60a2163f3434e69159b4be97f4ee95d488fafd7c9ae652e6df565986
MD5 ff6b551a9b5a0592e4fd2b520bd5004b
BLAKE2b-256 dacadb0bb3dccb3cbd57d6e605121f5d1118fe8adaf3e07a2f352b204cb5563c

See more details on using hashes here.

Provenance

The following attestation bundles were made for functionwordsets-1.2.1.tar.gz:

Publisher: python-publish.yml on floriancafiero/Function_words

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file functionwordsets-1.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for functionwordsets-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3fba1e1fec8458ac997ca28650f08d0257606629de38898a5973d21f5f984237
MD5 7b26bb18ab16fe4793209fa8cea4992e
BLAKE2b-256 afd2212650853c3bd6369e5b5aa65b62a3114ba671a4f05b7a4932f445931953

See more details on using hashes here.

Provenance

The following attestation bundles were made for functionwordsets-1.2.1-py3-none-any.whl:

Publisher: python-publish.yml on floriancafiero/Function_words

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page