Skip to main content

Accessing and processing data from the DFG-funded SPP Computational Literary Studies

Project description

PyPI version

Python code for working with the data of the DFG-funded SPP Computational Literary Studies.

  • sppcls.py: the sppcls Python module to access the data:
    • blocking:
    from sppcls import sppcls
    df = sppcls.load_df(work="judenbuche", projects=["keypassages"])
    print(df.describe())
    
    • non blocking:
    from sppcls import sppcls
    df = await sppcls.load_df_async(work="judenbuche", projects=["keypassages"])
    print(df.describe())
    

Installation

PyPI

pip install sppcls

or with spacy

pip install sppcls[spacy]

From source

Setup an virtual environment, if necessary:

python3 -m venv env
source env/bin/activate

Install dependencies:

pip install -r requirements.txt

Note

For tokenization, the spacy model is required:

python -m spacy download de_core_news_lg

Usage

The package offers a command line interface, either by using the command sppcls after installing using PyPI or python -m sppcls.cli.sppclscli when running from source.

usage: sppclscli.py [-h] {tokenise,check} ...

Accessing and processing data from the DFG-funded SPP Computational Literary
Studies

positional arguments:
  {tokenise,check}
    tokenise        Tokenize text file and create output tsv.
    check           Compare two tsv files and check that the structures
                    matches.

optional arguments:
  -h, --help        show this help message and exit

tokenise

Tokenise takes a txt file, e.g. work.txt, and produces a tsv file containing the tokenized text, e.g. work.tsv. This base tsv file is then extended by the individual projects.

Note: Tokenise only works with spacy installed!

usage: sppclscli.py tokenise [-h] input_file output_folder

Tokenize text file and create output tsv.

positional arguments:
  input_file     Path to the input txt file.
  output_folder  Path to the output folder where the output tsv will be saved.

optional arguments:
  -h, --help     show this help message and exit

TODO: fix character offset to be byte instead

check

check.py takes two tsv files, e.g. work.tsv and keypassages.tsv, and makes sure that the project tsv file matches the structure of the base work tsv file.

usage: sppclscli.py check [-h] org-tokens-file-path project-tokens-file-path

Compare two tsv files and check that the structures matches.

positional arguments:
  org-tokens-file-path  Path to the original tokens tsv file
  project-tokens-file-path
                        Path to the project tokens tsv file

optional arguments:
  -h, --help            show this help message and exit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sppcls-0.0.11.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

sppcls-0.0.11-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file sppcls-0.0.11.tar.gz.

File metadata

  • Download URL: sppcls-0.0.11.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.15

File hashes

Hashes for sppcls-0.0.11.tar.gz
Algorithm Hash digest
SHA256 cb8b9a9f57455685c1af187b265473ba4f8f005fe53f7e66e3634befde0b7158
MD5 a2da04b46e29463a2d68f456ad057cc4
BLAKE2b-256 9bb30ec56a817f26aa8b3dd73e22fc7580615bcaa5f39c2b3d31138c78915f87

See more details on using hashes here.

File details

Details for the file sppcls-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: sppcls-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.15

File hashes

Hashes for sppcls-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 14418d49b27921f9c9ce02f67c8958d6bc3171817d654a70c688a6fe12c0c6c5
MD5 c35996c1eb9ec252475eb116ac60ba11
BLAKE2b-256 7a67a72a8684740fef120f014aa9b3763ad8e0f583f1aeda39b70496add2375e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page