A list of ~98,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.
Project description
German nouns
A comma seperated list of ~98 thousand German nouns and their grammatical properties (tense, number, gender) as CSV file. Plus a module to look up the data and parse compound words. Compiled from the WiktionaryDE.
The list can be found here: german_nouns/nouns.csv
If you want to look up nouns or parse compound words, install this package (for Python 3.8+) and follow the instructions below:
Installation
pip install german-nouns
Lookup words
from pprint import pprint
from german_nouns.lookup import Nouns
nouns = Nouns()
# Lookup a word
word = nouns['Fahrrad']
pprint(word)
# Output:
[{'flexion': {'akkusativ plural': 'Fahrräder',
'akkusativ singular': 'Fahrrad',
'dativ plural': 'Fahrrädern',
'dativ singular': 'Fahrrad',
'dativ singular*': 'Fahrrade',
'genitiv plural': 'Fahrräder',
'genitiv singular': 'Fahrrades',
'genitiv singular*': 'Fahrrads',
'nominativ plural': 'Fahrräder',
'nominativ singular': 'Fahrrad'},
'genus': 'n',
'lemma': 'Fahrrad',
'pos': ['Substantiv']}]
# parse compound word
words = nouns.parse_compound('Vermögensbildung')
print(words)
# Output:
['Vermögen', 'Bildung'] # Now lookup nouns['Vermögen'] etc.
Compiling the list
To compile the list yourself, you need Python 3.8+ and Poetry installed.
1. Clone the repository and install dependencies with Poetry:
$ git clone https://github.com/gambolputty/german-nouns
$ cd german-nouns
$ poetry install
2. Compile the list of nouns from a Wiktionary XML file:
Find the latest XML-dump files here: https://dumps.wikimedia.org/dewiktionary/latest, for example this one and download it. Then execute:
$ poetry run python -m german_nouns.parse_dump /path-to-xml-dump-file.xml.bz2
The CSV file will be saved here: german_nouns/nouns.csv.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for german_nouns-1.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 612f60ae6d9780a700723053eaa63e679fc1de4920e63b69086caa8a36822f6a |
|
MD5 | 4a23272c26b1acc159506438a59a95dd |
|
BLAKE2b-256 | 89d5b971600f526e427b794e7c07f080699abced3b81a99370f5c75281579709 |