Skip to main content

Sparv plugin to collect statistics about the created corpus

Project description

sparv-sbx-corpus-statistics

PyPI version PyPI license PyPI - Python Version

Maturity badge - level 2 Stage

codecov

CI(check) CI(release) CI(scheduled) CI(test)

A Sparv plugin to collect statistics about a corpus.

Install

First, install Sparv as suggested,

with pipx:

pipx install sparv

or, with uv-pipx:

uvpipx install sparv

Then install sparv-sbx-corpus-statistics with,

the suggested method:

sparv plugins install sparv-sbx-corpus-statistics

or, if you used pipx above:

pipx inject sparv sparv-sbx-corpus-statistics

or, if you used uv-pipx above:

uvpipx install sparv-sbx-corpus-statistics --inject sparv

Usage

To use this plugin add sbx_corpus_statistics:stat_highlights under export.default in your config.yaml

export:
  default:
    - xml_export:pretty
    - sbx_corpus_statistics:stat_highlights
    # - more exports

Minimum Supported Python Version Policy

The Minimum Supported Python Version is fixed for a given minor (1.x) version. However it can be increased when bumping minor versions, i.e. going from 1.0 to 1.1 allows us to increase the Minimum Supported Python Version. Users unable to increase their Python version can use an older minor version instead. Below is a list of sparv-sbx-corpus-statistics versions and their Minimum Supported Python Version:

  • v0.1: Python 3.11.

Note however that sparv-sbx-corpus-statistics also has dependencies, which might have different MSRV policies. We try to stick to the above policy when updating dependencies, but this is not always possible.

Changelog

This project keeps a changelog.

License

This repository is licensed under the MIT license.

Development

Development prerequisites

For starting to develop on this repository:

  • Clone the repo (in one of the ways below):
    • git clone git@github.com:spraakbanken/sparv-sbx-corpus-statistics.git
    • git clone https://github.com/spraakbanken/sparv-sbx-corpus-statistics.git
  • Setup environment: make dev
  • Install pre-commit hooks: pre-commit install

Do your work.

Tasks to do:

  • Test the code with make test or make test-w-coverage.
  • Lint the code with make lint.
  • Check formatting with make check-fmt.
  • Format the code with make fmt.
  • Type-check the code with make type-check.
  • Test the examples with:
    • make test-example-small-txt

This repo uses conventional commits.

Release a new version

[!NOTE] Requirements bump-my-version for make bumpversion, install with uv tool install bump-my-version. git-cliff for make prepare-release sparv-sbx-metadata for make generate-metadata, installed automaticly.

  • Prepare the CHANGELOG: make prepare-release.
  • Edit CHANGELOG.md to your liking. Keep the header [unreleased]
  • Add to git: git add --update
  • Commit with git commit -m 'chore(release): prepare release' or cog commit chore 'prepare release' release.
  • Bump version (depends on `bump-my-version)
    • Major: make bumpversion part=major
    • Minor: make bumpversion part=minor
    • Patch: make bumpversion part=patch or make bumpversion
  • Push main and tags to GitHub: git push main --tags or make publish
  • Add metadata for Språkbanken's resource

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparv_sbx_corpus_statistics-0.2.0.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sparv_sbx_corpus_statistics-0.2.0-py3-none-any.whl (21.3 kB view details)

Uploaded Python 3

File details

Details for the file sparv_sbx_corpus_statistics-0.2.0.tar.gz.

File metadata

File hashes

Hashes for sparv_sbx_corpus_statistics-0.2.0.tar.gz
Algorithm Hash digest
SHA256 5ed61a6df9765c11ca559193d012a303c74fe5cb1cb6e04c4520dbcaecef64e0
MD5 ca115c998eff520a4b355b477df1eabf
BLAKE2b-256 989268fc444f3d25997a22fdcf6bb966281fed795facf706868966df6ec23f96

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparv_sbx_corpus_statistics-0.2.0.tar.gz:

Publisher: release.yml on spraakbanken/sparv-sbx-corpus-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sparv_sbx_corpus_statistics-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sparv_sbx_corpus_statistics-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 346233a269c4794eae56972a08b0a6f844b57824ce6a838031a6330e4e811daa
MD5 16d3108be13a98e0f08044812c482120
BLAKE2b-256 c94aeb8e7ae3f8496409230e0833471f0c4910b676263454afc1800fc13dced9

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparv_sbx_corpus_statistics-0.2.0-py3-none-any.whl:

Publisher: release.yml on spraakbanken/sparv-sbx-corpus-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page