Accessing and processing data from the DFG-funded SPP Computational Literary Studies
Project description
Python code for working with the data of the DFG-funded SPP Computational Literary Studies.
- sppcls.py: the sppcls Python
module to access the data:
- blocking:
from sppcls import sppcls df = sppcls.load_df(work="judenbuche", projects=["keypassages"]) print(df.describe())
- non blocking:
from sppcls import sppcls df = await sppcls.load_df_async(work="judenbuche", projects=["keypassages"]) print(df.describe())
Installation
PyPI
pip install sppcls
or with spacy
pip install sppcls[spacy]
From source
Setup an virtual environment, if necessary:
python3 -m venv env
source env/bin/activate
Install dependencies:
pip install -r requirements.txt
Note
For tokenization, the spacy model is required:
python -m spacy download de_core_news_lg
Usage
The package offers a command line interface, either by using the command sppcls
after installing using PyPI
or python -m sppcls.cli.sppclscli
when running from source.
usage: sppclscli.py [-h] {tokenise,check} ...
Accessing and processing data from the DFG-funded SPP Computational Literary
Studies
positional arguments:
{tokenise,check}
tokenise Tokenize text file and create output tsv.
check Compare two tsv files and check that the structures
matches.
optional arguments:
-h, --help show this help message and exit
tokenise
Tokenise
takes a txt file, e.g. work.txt,
and produces a tsv file containing the tokenized text, e.g. work.tsv.
This base tsv file is then extended by the individual projects.
Note: Tokenise only works with spacy installed!
usage: sppclscli.py tokenise [-h] input_file output_folder
Tokenize text file and create output tsv.
positional arguments:
input_file Path to the input txt file.
output_folder Path to the output folder where the output tsv will be saved.
optional arguments:
-h, --help show this help message and exit
TODO: fix character offset to be byte instead
check
check.py
takes two tsv files, e.g. work.tsv
and keypassages.tsv,
and makes sure that the project tsv file matches the structure of the base work tsv file.
usage: sppclscli.py check [-h] org-tokens-file-path project-tokens-file-path
Compare two tsv files and check that the structures matches.
positional arguments:
org-tokens-file-path Path to the original tokens tsv file
project-tokens-file-path
Path to the project tokens tsv file
optional arguments:
-h, --help show this help message and exit
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sppcls-0.0.11.tar.gz
.
File metadata
- Download URL: sppcls-0.0.11.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb8b9a9f57455685c1af187b265473ba4f8f005fe53f7e66e3634befde0b7158 |
|
MD5 | a2da04b46e29463a2d68f456ad057cc4 |
|
BLAKE2b-256 | 9bb30ec56a817f26aa8b3dd73e22fc7580615bcaa5f39c2b3d31138c78915f87 |
File details
Details for the file sppcls-0.0.11-py3-none-any.whl
.
File metadata
- Download URL: sppcls-0.0.11-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14418d49b27921f9c9ce02f67c8958d6bc3171817d654a70c688a6fe12c0c6c5 |
|
MD5 | c35996c1eb9ec252475eb116ac60ba11 |
|
BLAKE2b-256 | 7a67a72a8684740fef120f014aa9b3763ad8e0f583f1aeda39b70496add2375e |