The GermaParlPy Python package provides functionality to deserialize, serialize, manage, and query the GermaParlTEI corpus and derived corpora.
Project description
GermaParlPy
The GermaParlPy Python package provides functionality to deserialize, serialize, manage, and query the GermaParlTEI[^1] corpus and derived corpora.
The GermaParlTEI corpus comprises the plenary protocols of the German Bundestag (parliament), encoded in XML according to the TEI standard. The current version covers the first 19 legislative periods, encompassing transcribed speeches from the Bundestag's constituent session on 7 September 1949 to the final sitting of the Angela Merkel era in 2021. This makes it a valuable resource for research in various scientific disciplines.
For detailed information on the library, visit the official website.
Use Cases
Potential use cases range from the examination of research questions in political science, history or linguistics to the compilation of training data sets for AI.
In addition, this library makes it possible to access the GermaParl corpus in Python and apply powerful NLP libraries such as spacy or gensim to it. Previously, the corpus could only be accessed using the PolMineR package in the R programming language.
Installation
GermaParlPy is available on PyPi:
pip install germaparlpy
API Reference
Click here for the full API Reference.
XML Structure
Click here to learn more about the XML Structure of the underlying corpus GermaParlTEI[^1].
Tutorials
I have prepared three example scripts that showcase the utilisation and potential use cases of GermaParlPy. You can find the scripts in the /example directory or here.
Contributing
Contributions and feedback are welcome! Feel free to write an issue or open a pull request.
License
The code is licensed under the MIT License.
The GermaParl corpus, which is not part of this repository, is licensed under a CLARIN PUB+BY+NC+SA license.
Credits
Developed by Marlon-Benedikt George.
The underlying data set, the GermaParl corpus, was compiled and released by Blätte & Leonhardt (2024)[^1]. See also their R-Library PolMineR in the context of the PolMine-Project, which served as an inspiration for this library.
[^1]: Blaette, A.and C. Leonhardt. Germaparl corpus of plenary protocols. v2.2.0-rc1, Zenodo, 22 July 2024, doi:10.5281/zenodo.12795193
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file germaparlpy-1.0.6.tar.gz.
File metadata
- Download URL: germaparlpy-1.0.6.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89bdab9863d6397ac5b874a694084c04d9349b04a7a132a418a2ab8af802438b
|
|
| MD5 |
1564934ba131ca44d2b88947828a1847
|
|
| BLAKE2b-256 |
a87eb2ab1d638c08a1cde14457703b3ca4c91ea742ee28f99ef790f80a58e356
|
Provenance
The following attestation bundles were made for germaparlpy-1.0.6.tar.gz:
Publisher:
publish_pypi.yml on Nolram567/GermaParlPy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
germaparlpy-1.0.6.tar.gz -
Subject digest:
89bdab9863d6397ac5b874a694084c04d9349b04a7a132a418a2ab8af802438b - Sigstore transparency entry: 969124852
- Sigstore integration time:
-
Permalink:
Nolram567/GermaParlPy@0464627d2e07a7e22904df39c8ad1c6bcb9460cf -
Branch / Tag:
refs/tags/1.0.6 - Owner: https://github.com/Nolram567
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_pypi.yml@0464627d2e07a7e22904df39c8ad1c6bcb9460cf -
Trigger Event:
push
-
Statement type:
File details
Details for the file germaparlpy-1.0.6-py3-none-any.whl.
File metadata
- Download URL: germaparlpy-1.0.6-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0293ca60040677075a093be8055fb7374c042d7fc475a7ade5b9de5f76441e34
|
|
| MD5 |
f00cdd7729db33f0bc6cdfc56a6f5a1b
|
|
| BLAKE2b-256 |
4478adbcfa49ce0a89f39501abe31c5efb34d984bfbc30e4c37af8bf7c845bfe
|
Provenance
The following attestation bundles were made for germaparlpy-1.0.6-py3-none-any.whl:
Publisher:
publish_pypi.yml on Nolram567/GermaParlPy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
germaparlpy-1.0.6-py3-none-any.whl -
Subject digest:
0293ca60040677075a093be8055fb7374c042d7fc475a7ade5b9de5f76441e34 - Sigstore transparency entry: 969124855
- Sigstore integration time:
-
Permalink:
Nolram567/GermaParlPy@0464627d2e07a7e22904df39c8ad1c6bcb9460cf -
Branch / Tag:
refs/tags/1.0.6 - Owner: https://github.com/Nolram567
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_pypi.yml@0464627d2e07a7e22904df39c8ad1c6bcb9460cf -
Trigger Event:
push
-
Statement type: