Skip to main content

Accessing and processing data from the DFG-funded SPP Computational Literary Studies

Project description

PyPI version

Python code for working with the data of the DFG-funded SPP Computational Literary Studies.

  • sppcls.py: the sppcls Python module to access the data:
    • blocking:
    from sppcls import sppcls
    df = sppcls.load_df(work="judenbuche", projects=["keypassages"])
    print(df.describe())
    
    • non blocking:
    from sppcls import sppcls
    df = await sppcls.load_df_async(work="judenbuche", projects=["keypassages"])
    print(df.describe())
    

Installation

PyPI

pip install sppcls

or with spacy

pip install sppcls[spacy]

From source

Setup an virtual environment, if necessary:

python3 -m venv env
source env/bin/activate

Install dependencies:

pip install -r requirements.txt

Note

For tokenization, the spacy model is required:

python -m spacy download de_core_news_lg

Usage

The package offers a command line interface, either by using the command sppcls after installing using PyPI or python -m sppcls.cli.sppclscli when running from source.

usage: sppclscli.py [-h] {tokenise,check} ...

Accessing and processing data from the DFG-funded SPP Computational Literary
Studies

positional arguments:
  {tokenise,check}
    tokenise        Tokenize text file and create output tsv.
    check           Compare two tsv files and check that the structures
                    matches.

optional arguments:
  -h, --help        show this help message and exit

tokenise

Tokenise takes a txt file, e.g. work.txt, and produces a tsv file containing the tokenized text, e.g. work.tsv. This base tsv file is then extended by the individual projects.

Note: Tokenise only works with spacy installed!

usage: sppclscli.py tokenise [-h] input_file output_folder

Tokenize text file and create output tsv.

positional arguments:
  input_file     Path to the input txt file.
  output_folder  Path to the output folder where the output tsv will be saved.

optional arguments:
  -h, --help     show this help message and exit

TODO: fix character offset to be byte instead

check

check.py takes two tsv files, e.g. work.tsv and keypassages.tsv, and makes sure that the project tsv file matches the structure of the base work tsv file.

usage: sppclscli.py check [-h] org-tokens-file-path project-tokens-file-path

Compare two tsv files and check that the structures matches.

positional arguments:
  org-tokens-file-path  Path to the original tokens tsv file
  project-tokens-file-path
                        Path to the project tokens tsv file

optional arguments:
  -h, --help            show this help message and exit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sppcls-0.0.12.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

sppcls-0.0.12-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file sppcls-0.0.12.tar.gz.

File metadata

  • Download URL: sppcls-0.0.12.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.15

File hashes

Hashes for sppcls-0.0.12.tar.gz
Algorithm Hash digest
SHA256 c5c7c800c0f2a166307756092d2c7aa825ed315d841d89c0dbc7026b5b70433b
MD5 4ae7c2b94e21140e9a881e54f32cc4a7
BLAKE2b-256 6d0daf83329708c08c8790075511583520446abe37230c262d25ed7b51fdd811

See more details on using hashes here.

File details

Details for the file sppcls-0.0.12-py3-none-any.whl.

File metadata

  • Download URL: sppcls-0.0.12-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.15

File hashes

Hashes for sppcls-0.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 9599813c5caabe85060aac660f46d687ed9f4ed0f8db24cb9bf8ee0855429034
MD5 7f5a070db287c32caee0838b3909ef3c
BLAKE2b-256 b495704443d1fa9261c1d743c41de98311670dd240e1e89d9e7d4e3bf8a257e4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page