Skip to main content

A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.

Project description

German nouns

A comma seperated list of ~100 thousand German nouns and their grammatical properties (tense, number, gender) as CSV file. Plus a module to look up the data and parse compound words. Compiled from the WiktionaryDE.

The list can be found here: german_nouns/nouns.csv

If you want to look up nouns or parse compound words, install this package (for Python 3.8+) and follow the instructions below:

Installation

pip install german-nouns

Lookup words

from pprint import pprint
from german_nouns.lookup import Nouns

nouns = Nouns()

# Lookup a word
word = nouns['Fahrrad']
pprint(word)

# Output:
[{'flexion': {'akkusativ plural': 'Fahrräder',
              'akkusativ singular': 'Fahrrad',
              'dativ plural': 'Fahrrädern',
              'dativ singular': 'Fahrrad',
              'dativ singular*': 'Fahrrade',
              'genitiv plural': 'Fahrräder',
              'genitiv singular': 'Fahrrades',
              'genitiv singular*': 'Fahrrads',
              'nominativ plural': 'Fahrräder',
              'nominativ singular': 'Fahrrad'},
  'genus': 'n',
  'lemma': 'Fahrrad',
  'pos': ['Substantiv']}]

# parse compound word
words = nouns.parse_compound('Vermögensbildung')
print(words)

# Output:
['Vermögen', 'Bildung'] # Now lookup nouns['Vermögen'] etc.

Compiling the list

To compile the list yourself, you need Python 3.8+ and Poetry installed.

1. Clone the repository and install dependencies with Poetry:

$ git clone https://github.com/gambolputty/german-nouns
$ cd german-nouns
$ poetry install

2. Compile the list of nouns from a Wiktionary XML file:

Find the latest XML-dump files here: https://dumps.wikimedia.org/dewiktionary/latest, for example this one and download it. Then execute:

$ poetry run python -m german_nouns.parse_dump /path-to-xml-dump-file.xml.bz2

The CSV file will be saved here: german_nouns/nouns.csv.

Remove german_nouns/index.txt to let the script recreate the word-index when using the lookup methods.


License: CC BY-SA 4.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

german-nouns-1.2.5.tar.gz (3.1 MB view hashes)

Uploaded Source

Built Distribution

german_nouns-1.2.5-py3-none-any.whl (3.1 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page