Sparv plugin to collect statistics about the created corpus
Project description
sparv-sbx-corpus-statistics
A Sparv plugin to collect statistics about a corpus.
Install
First, install Sparv as suggested,
with pipx:
pipx install sparv
or, with uv-pipx:
uvpipx install sparv
Then install sparv-sbx-corpus-statistics with,
the suggested method:
sparv plugins install sparv-sbx-corpus-statistics
or, if you used pipx above:
pipx inject sparv sparv-sbx-corpus-statistics
or, if you used uv-pipx above:
uvpipx install sparv-sbx-corpus-statistics --inject sparv
Usage
To use this plugin add sbx_corpus_statistics:stat_highlights under export.default in your config.yaml
export:
default:
- xml_export:pretty
- sbx_corpus_statistics:stat_highlights
# - more exports
Minimum Supported Python Version Policy
The Minimum Supported Python Version is fixed for a given minor (1.x) version. However it can be increased when bumping minor versions, i.e. going from 1.0 to 1.1 allows us to increase the Minimum Supported Python Version. Users unable to increase their Python version can use an older minor version instead. Below is a list of sparv-sbx-corpus-statistics versions and their Minimum Supported Python Version:
- v0.1: Python 3.11.
Note however that sparv-sbx-corpus-statistics also has dependencies, which might have different MSRV policies. We try to stick to the above policy when updating dependencies, but this is not always possible.
Changelog
This project keeps a changelog.
License
This repository is licensed under the MIT license.
Development
Development prerequisites
For starting to develop on this repository:
- Clone the repo (in one of the ways below):
git clone git@github.com:spraakbanken/sparv-sbx-corpus-statistics.gitgit clone https://github.com/spraakbanken/sparv-sbx-corpus-statistics.git
- Setup environment:
make dev - Install
pre-commithooks:pre-commit install
Do your work.
Tasks to do:
- Test the code with
make testormake test-w-coverage. - Lint the code with
make lint. - Check formatting with
make check-fmt. - Format the code with
make fmt. - Type-check the code with
make type-check. - Test the examples with:
make test-example-small-txt
This repo uses conventional commits.
Release a new version
[!NOTE] Requirements
bump-my-versionformake bumpversion, install withuv tool install bump-my-version.git-cliffformake prepare-releasesparv-sbx-metadataformake generate-metadata, installed automaticly.
- Prepare the CHANGELOG:
make prepare-release. - Edit
CHANGELOG.mdto your liking. Keep the header[unreleased] - Add to git:
git add --update - Commit with
git commit -m 'chore(release): prepare release'orcog commit chore 'prepare release' release. - Bump version (depends on `bump-my-version)
- Major:
make bumpversion part=major - Minor:
make bumpversion part=minor - Patch:
make bumpversion part=patchormake bumpversion
- Major:
- Push
mainand tags to GitHub:git push main --tagsormake publish- GitHub Actions workflow will build, test and publish the package to PyPi.
- Add metadata for Språkbanken's resource
- Generate metadata:
make generate-metadata - Upload the files from
assets/metadata/export/sbx_metadata/utilityto https://github.com/spraakbanken/metadata/tree/main/yaml/utility.
- Generate metadata:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sparv_sbx_corpus_statistics-0.1.0.tar.gz.
File metadata
- Download URL: sparv_sbx_corpus_statistics-0.1.0.tar.gz
- Upload date:
- Size: 19.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f8ac9a897332f76b2ac27eb281955a7ce7a5188db3de0622e27b12b5ae4329f
|
|
| MD5 |
ebdb0e177fe78dcad2d2c336bc62fc6b
|
|
| BLAKE2b-256 |
65711520e6d337d6a07caf5228e6eaf3ed45ff401d351fef44ea01e0726572bb
|
File details
Details for the file sparv_sbx_corpus_statistics-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sparv_sbx_corpus_statistics-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac949c85c89b178212f25b21a4f9a0fb55490f8bb2e4fc1ad776249e755a37a1
|
|
| MD5 |
813d604409a34b043382a4d75854bb19
|
|
| BLAKE2b-256 |
d9c3624f0a37c77678f07a5152374b9efbcc2606b280650c8e0dcd036cbf5885
|