Skip to main content

Accessing and processing data from the DFG-funded SPP Computational Literary Studies

Project description

Python code for working with the data of the DFG-funded SPP Computational Literary Studies.

  • sppcls.py: the sppcls Python module to access the data:
    • blocking:
    from sppcls import sppcls
    df = sppcls.load_df(work="judenbuche", projects=["keypassages"])
    print(df.describe())
    
    • non blocking:
    from sppcls import sppcls
    df = await sppcls.load_df_async(work="judenbuche", projects=["keypassages"])
    print(df.describe())
    

Installation

PyPI

pip install sppcls

or with spacy

pip install sppcls[spacy]

From source

Setup an virtual environment, if necessary:

python3 -m venv env
source env/bin/activate

Install dependencies:

pip install -r requirements.txt

Note

For tokenization, the spacy model is required:

python -m spacy download de_core_news_lg

Usage

The package offers a command line interface, either by using the command sppcls after installing using PyPI or python -m sppcls.cli.sppclscli when running from source.

usage: sppclscli.py [-h] {tokenise,check} ...

Accessing and processing data from the DFG-funded SPP Computational Literary
Studies

positional arguments:
  {tokenise,check}
    tokenise        Tokenize text file and create output tsv.
    check           Compare two tsv files and check that the structures
                    matches.

optional arguments:
  -h, --help        show this help message and exit

tokenise

Tokenise takes a txt file, e.g. work.txt, and produces a tsv file containing the tokenized text, e.g. work.tsv. This base tsv file is then extended by the individual projects.

usage: sppclscli.py tokenise [-h] input_file output_folder

Tokenize text file and create output tsv.

positional arguments:
  input_file     Path to the input txt file.
  output_folder  Path to the output folder where the output tsv will be saved.

optional arguments:
  -h, --help     show this help message and exit

TODO: fix character offset to be byte instead

check

check.py takes two tsv files, e.g. work.tsv and keypassages.tsv, and makes sure that the project tsv file matches the structure of the base work tsv file.

usage: sppclscli.py check [-h] org-tokens-file-path project-tokens-file-path

Compare two tsv files and check that the structures matches.

positional arguments:
  org-tokens-file-path  Path to the original tokens tsv file
  project-tokens-file-path
                        Path to the project tokens tsv file

optional arguments:
  -h, --help            show this help message and exit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sppcls-0.1.0.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sppcls-0.1.0-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file sppcls-0.1.0.tar.gz.

File metadata

  • Download URL: sppcls-0.1.0.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for sppcls-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ddff616cc6d93bf2ceb3a85b67a45409569e94935eaa2e929696d1f66df9a2c0
MD5 68afe60210c61d294b5ff69a8c99e026
BLAKE2b-256 7797ea9a6a68c0ca8959551531d78e44fd2fb025643213bf41c111220460ac1f

See more details on using hashes here.

File details

Details for the file sppcls-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sppcls-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for sppcls-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b7d9c31ded91ce5db34bc1208a074061c2008716a070a6543066b5f770ea9e6f
MD5 827bb2685c9300bcf949c30db449b7b0
BLAKE2b-256 a500579696f802701ac08d14bf9b762ab38f08c6d65f5d81a412f598535fd707

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page