Skip to main content

Accessing and processing data from the DFG-funded SPP Computational Literary Studies

Project description

PyPI version

Python code for working with the data of the DFG-funded SPP Computational Literary Studies.

  • sppcls.py: the sppcls Python module to access the data:
    • blocking:
    from sppcls import sppcls
    df = sppcls.load_df(work="judenbuche", projects=["keypassages"])
    print(df.describe())
    
    • non blocking:
    from sppcls import sppcls
    df = await sppcls.load_df_async(work="judenbuche", projects=["keypassages"])
    print(df.describe())
    

Installation

PyPI

pip install sppcls

or with spacy

pip install sppcls[spacy]

From source

Setup an virtual environment, if necessary:

python3 -m venv env
source env/bin/activate

Install dependencies:

pip install -r requirements.txt

Note

For tokenization, the spacy model is required:

python -m spacy download de_core_news_lg

Usage

The package offers a command line interface, either by using the command sppcls after installing using PyPI or python -m sppcls.cli.sppclscli when running from source.

usage: sppclscli.py [-h] {tokenise,check} ...

Accessing and processing data from the DFG-funded SPP Computational Literary
Studies

positional arguments:
  {tokenise,check}
    tokenise        Tokenize text file and create output tsv.
    check           Compare two tsv files and check that the structures
                    matches.

optional arguments:
  -h, --help        show this help message and exit

tokenise

Tokenise takes a txt file, e.g. work.txt, and produces a tsv file containing the tokenized text, e.g. work.tsv. This base tsv file is then extended by the individual projects.

Note: Tokenise only works with spacy installed!

usage: sppclscli.py tokenise [-h] input_file output_folder

Tokenize text file and create output tsv.

positional arguments:
  input_file     Path to the input txt file.
  output_folder  Path to the output folder where the output tsv will be saved.

optional arguments:
  -h, --help     show this help message and exit

TODO: fix character offset to be byte instead

check

check.py takes two tsv files, e.g. work.tsv and keypassages.tsv, and makes sure that the project tsv file matches the structure of the base work tsv file.

usage: sppclscli.py check [-h] org-tokens-file-path project-tokens-file-path

Compare two tsv files and check that the structures matches.

positional arguments:
  org-tokens-file-path  Path to the original tokens tsv file
  project-tokens-file-path
                        Path to the project tokens tsv file

optional arguments:
  -h, --help            show this help message and exit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sppcls-0.0.8.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

sppcls-0.0.8-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file sppcls-0.0.8.tar.gz.

File metadata

  • Download URL: sppcls-0.0.8.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.4.2 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.9.2

File hashes

Hashes for sppcls-0.0.8.tar.gz
Algorithm Hash digest
SHA256 fbd2ddf593727e4404fb2a06885498561311f4d1ac4eb83fcadefcf8497de51c
MD5 e64fdc1a444dcdd73e35141728a18046
BLAKE2b-256 f180edf4e5f2e0d3e98b2ec768ccb40ff091841818c8a3df3130866a7a1bcfad

See more details on using hashes here.

File details

Details for the file sppcls-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: sppcls-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.4.2 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.9.2

File hashes

Hashes for sppcls-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 c3cccfbca6a85dab867879157e87f9c0a3d69323fd1ff72969a22650b96d2b00
MD5 a41e3e1ae4d8c6a999e62a506db3cf63
BLAKE2b-256 964e3dc81159d005d007d9352a8faa4ab5559219e478699bae319b165135e250

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page