Skip to main content

A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.

Project description

German nouns

A comma seperated list of ~100 thousand German nouns and their grammatical properties (tense, number, gender) as CSV file. Plus a module to look up the data and parse compound words. Compiled from the WiktionaryDE.

The list can be found here: german_nouns/nouns.csv

If you want to look up nouns or parse compound words, install this package (for Python 3.8+) and follow the instructions below:

Installation

pip install german-nouns

Lookup words

from pprint import pprint
from german_nouns.lookup import Nouns

nouns = Nouns()

# Lookup a word
word = nouns['Fahrrad']
pprint(word)

# Output:
[{'flexion': {'akkusativ plural': 'Fahrräder',
              'akkusativ singular': 'Fahrrad',
              'dativ plural': 'Fahrrädern',
              'dativ singular': 'Fahrrad',
              'dativ singular*': 'Fahrrade',
              'genitiv plural': 'Fahrräder',
              'genitiv singular': 'Fahrrades',
              'genitiv singular*': 'Fahrrads',
              'nominativ plural': 'Fahrräder',
              'nominativ singular': 'Fahrrad'},
  'genus': 'n',
  'lemma': 'Fahrrad',
  'pos': ['Substantiv']}]

# parse compound word
words = nouns.parse_compound('Vermögensbildung')
print(words)

# Output:
['Vermögen', 'Bildung'] # Now lookup nouns['Vermögen'] etc.

Compiling the list

To compile the list yourself, you need Python 3.8+ and Poetry installed.

1. Clone the repository and install dependencies with Poetry:

$ git clone https://github.com/gambolputty/german-nouns
$ cd german-nouns
$ poetry install

2. Compile the list of nouns from a Wiktionary XML file:

Find the latest XML-dump files here: https://dumps.wikimedia.org/dewiktionary/latest, for example this one and download it. Then execute:

$ poetry run python -m german_nouns.parse_dump /path-to-xml-dump-file.xml.bz2

The CSV file will be saved here: german_nouns/nouns.csv.

Remove german_nouns/index.txt to let the script recreate the word-index when using the lookup methods.


License: CC BY-SA 4.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

german-nouns-1.2.5.tar.gz (3.1 MB view details)

Uploaded Source

Built Distribution

german_nouns-1.2.5-py3-none-any.whl (3.1 MB view details)

Uploaded Python 3

File details

Details for the file german-nouns-1.2.5.tar.gz.

File metadata

  • Download URL: german-nouns-1.2.5.tar.gz
  • Upload date:
  • Size: 3.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.9.10 Darwin/21.4.0

File hashes

Hashes for german-nouns-1.2.5.tar.gz
Algorithm Hash digest
SHA256 1258f2917db364d3661a651a47d1089c8ef40bddbe061e43cb529dae5f13ce54
MD5 8abf87f430c20d368e5837bb715006eb
BLAKE2b-256 bcb93d803c566f752b6c64bd5bed6438b78e5ca31305bf33037a257101a7ccbf

See more details on using hashes here.

File details

Details for the file german_nouns-1.2.5-py3-none-any.whl.

File metadata

  • Download URL: german_nouns-1.2.5-py3-none-any.whl
  • Upload date:
  • Size: 3.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.9.10 Darwin/21.4.0

File hashes

Hashes for german_nouns-1.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 0f113c59ea331aae750bc03c7b6c7217bae975c6677b4df8d8932093093f509f
MD5 5979a70b393d5ba720da706b74af3c3a
BLAKE2b-256 3fe5466f9559d9b2a413a1f5de3dac398d36ddafd1abfaa942660277234dc8e1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page