Skip to main content

Support for spaCy models trained on DocuScope and the CLAWS7 tagset

Project description

DocuScope

docuscospacy: Support for spaCy models trained on DocuScope and the CLAWS7 tagset

PyPI Version Test Status Downloads from PyPI Documentation Status Citable Zenodo DOI

The docuscospacy package contains a set of functions to facilitate the processing of tagged corpora using:

The current version of the package runs in polars

The package can also convert a corpus to and from:

  • tmtoolkit – a set of tools for text mining and topic modeling

The documentation for docuscospacy is available on docuscospacy.readthedocs.org and the GitHub code repository is on github.com/browndw/docuscospacy.

Requirements and installation

docuscospacy works with Python 3.9 or newer (tested up to Python 3.10). It also requires spacy >= 3.3.

The recommended way of installing docuscospacy is to:

pip install docuscospacy

Note that installing the model depends on your spaCy version. Some versions allow:

pip install https://huggingface.co/browndw/en_docusco_spacy/resolve/main/en_docusco_spacy-any-py3-none-any.whl

But new ones may require:

pip install "en_docusco_spacy @ https://huggingface.co/browndw/en_docusco_spacy/resolve/main/en_docusco_spacy-any-py3-none-any.whl"

Features

Corpus analysis

The docuscospacy package supports the post-tagging generation of:

Outputs can be controlled either by part-of-speech or by DocuScope tag. Thus, can as noun and can as verb, for example, can be disambiguated.

Additionally, tagged multi-token sequences are aggregated for analysis. So, for example, where in spite of is tagged as a token sequence, it is combined into a single token.

Other features

  • KWIC tables that locate a node word in a center column with context columns on either side

Limits

  • the model that this package is designed for has only been trained on English

  • all data must reside in memory, i.e. no streaming of large data from the hard disk (which for example Gensim supports)

License

Code licensed under Apache License 2.0. See LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docuscospacy-0.3.6.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docuscospacy-0.3.6-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file docuscospacy-0.3.6.tar.gz.

File metadata

  • Download URL: docuscospacy-0.3.6.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for docuscospacy-0.3.6.tar.gz
Algorithm Hash digest
SHA256 598ba24e38d4d44ecb5115d6364539a4231e641a92a2cb78f50a2211830f57a1
MD5 4259f67cc95862a1a5d70bcf8259f1ae
BLAKE2b-256 6e52897477c70e0cca216462559241fcb90d3c21e152ac5ac639827f57e71e51

See more details on using hashes here.

Provenance

The following attestation bundles were made for docuscospacy-0.3.6.tar.gz:

Publisher: ci.yml on browndw/docuscospacy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file docuscospacy-0.3.6-py3-none-any.whl.

File metadata

  • Download URL: docuscospacy-0.3.6-py3-none-any.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for docuscospacy-0.3.6-py3-none-any.whl
Algorithm Hash digest
SHA256 eaa24a1dbae9edf4d22b6409072b7490c002f800e9c7919e34c55b7e9c7a4a83
MD5 39c435f0f4eaedcbfaeda6fc6bc0b45a
BLAKE2b-256 4fb228a1ed4e9f0debe19e86a23e64f39d66a3e5d8a7f2993965ca61768c0b9e

See more details on using hashes here.

Provenance

The following attestation bundles were made for docuscospacy-0.3.6-py3-none-any.whl:

Publisher: ci.yml on browndw/docuscospacy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page