Accessing and processing data from the DFG-funded SPP Computational Literary Studies
Project description
Python code for working with the data of the DFG-funded SPP Computational Literary Studies.
- sppcls.py: the sppcls Python
module to access the data:
- blocking:
from sppcls import sppcls df = sppcls.load_df(work="judenbuche", projects=["keypassages"]) print(df.describe())
- non blocking:
from sppcls import sppcls df = await sppcls.load_df_async(work="judenbuche", projects=["keypassages"]) print(df.describe())
Installation
PyPI
pip install sppcls
From source
Setup an virtual environment, if necessary:
python3 -m venv env
source env/bin/activate
Install dependencies:
pip install -r requirements.txt
python -m spacy download de_core_news_lg
Usage
tokenise.py
Tokenise.py
takes a txt file, e.g. work.txt,
and produces a tsv file containing the tokenized text, e.g. work.tsv.
This base tsv file is then extended by the individual projects.
python tokenise.py path_to_input_txt path_to_output_folder
TODO: fix character offset to be byte instead
check.py
check.py
takes two tsv files, e.g. work.tsv
and keypassages.tsv,
and makes sure that the project tsv file matches the structure of the base work tsv file.
python check.py path_to_work_tsv path_to_project_tsv
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.