Skip to main content

Sparv plugin to collect statistics about the created corpus

Project description

sparv-sbx-corpus-statistics

PyPI version PyPI license PyPI - Python Version

Maturity badge - level 2 Stage

codecov

CI(check) CI(release) CI(scheduled) CI(test)

A Sparv plugin to collect statistics about a corpus.

Install

First, install Sparv as suggested,

with pipx:

pipx install sparv

or, with uv-pipx:

uvpipx install sparv

Then install sparv-sbx-corpus-statistics with,

the suggested method:

sparv plugins install sparv-sbx-corpus-statistics

or, if you used pipx above:

pipx inject sparv sparv-sbx-corpus-statistics

or, if you used uv-pipx above:

uvpipx install sparv-sbx-corpus-statistics --inject sparv

Usage

To use this plugin add sbx_corpus_statistics:stat_highlights under export.default in your config.yaml

export:
  default:
    - xml_export:pretty
    - sbx_corpus_statistics:stat_highlights
    # - more exports

Minimum Supported Python Version Policy

The Minimum Supported Python Version is fixed for a given minor (1.x) version. However it can be increased when bumping minor versions, i.e. going from 1.0 to 1.1 allows us to increase the Minimum Supported Python Version. Users unable to increase their Python version can use an older minor version instead. Below is a list of sparv-sbx-corpus-statistics versions and their Minimum Supported Python Version:

  • v0.1: Python 3.11.

Note however that sparv-sbx-corpus-statistics also has dependencies, which might have different MSRV policies. We try to stick to the above policy when updating dependencies, but this is not always possible.

Changelog

This project keeps a changelog.

License

This repository is licensed under the MIT license.

Development

Development prerequisites

For starting to develop on this repository:

  • Clone the repo (in one of the ways below):
    • git clone git@github.com:spraakbanken/sparv-sbx-corpus-statistics.git
    • git clone https://github.com/spraakbanken/sparv-sbx-corpus-statistics.git
  • Setup environment: make dev
  • Install pre-commit hooks: pre-commit install

Do your work.

Tasks to do:

  • Test the code with make test or make test-w-coverage.
  • Lint the code with make lint.
  • Check formatting with make check-fmt.
  • Format the code with make fmt.
  • Type-check the code with make type-check.
  • Test the examples with:
    • make test-example-small-txt

This repo uses conventional commits.

Release a new version

[!NOTE] Requirements bump-my-version for make bumpversion, install with uv tool install bump-my-version. git-cliff for make prepare-release sparv-sbx-metadata for make generate-metadata, installed automaticly.

  • Prepare the CHANGELOG: make prepare-release.
  • Edit CHANGELOG.md to your liking. Keep the header [unreleased]
  • Add to git: git add --update
  • Commit with git commit -m 'chore(release): prepare release' or cog commit chore 'prepare release' release.
  • Bump version (depends on `bump-my-version)
    • Major: make bumpversion part=major
    • Minor: make bumpversion part=minor
    • Patch: make bumpversion part=patch or make bumpversion
  • Push main and tags to GitHub: git push main --tags or make publish
  • Add metadata for Språkbanken's resource

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparv_sbx_corpus_statistics-0.1.0.tar.gz (19.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sparv_sbx_corpus_statistics-0.1.0-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file sparv_sbx_corpus_statistics-0.1.0.tar.gz.

File metadata

File hashes

Hashes for sparv_sbx_corpus_statistics-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9f8ac9a897332f76b2ac27eb281955a7ce7a5188db3de0622e27b12b5ae4329f
MD5 ebdb0e177fe78dcad2d2c336bc62fc6b
BLAKE2b-256 65711520e6d337d6a07caf5228e6eaf3ed45ff401d351fef44ea01e0726572bb

See more details on using hashes here.

File details

Details for the file sparv_sbx_corpus_statistics-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sparv_sbx_corpus_statistics-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ac949c85c89b178212f25b21a4f9a0fb55490f8bb2e4fc1ad776249e755a37a1
MD5 813d604409a34b043382a4d75854bb19
BLAKE2b-256 d9c3624f0a37c77678f07a5152374b9efbcc2606b280650c8e0dcd036cbf5885

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page