Skip to main content

Access various dictionaries from CDSL (Cologne Digital Sanskrit Lexicon)

Project description

https://img.shields.io/pypi/v/PyCDSL?color=success Documentation Status Python Version Support GitHub Issues GitHub Followers Twitter Followers

PyCDSL is a python interface to Cologne Digital Sanskrit Lexicon (CDSL).

Features

  • CDSL Corpus Management (Download, Update, Access)

  • Unified Programmable Interface to access all dictionaries available at CDSL

  • Console Command and REPL Interface for easy dictionary search

  • Extensive support for transliteration using indic-transliteration module

Install

To install PyCDSL, run this command in your terminal:

$ pip install PyCDSL

Usage

PyCDSL can be used in a python project, as a console command and as an interactive REPL interface.

Using PyCDSL in a Project

Import PyCDSL in a project:

import pycdsl

Create a CDSLCorpus Instance:

# Default installation at ~/cdsl_data
CDSL = pycdsl.CDSLCorpus()

# Custom installation path can be specified with argument `data_dir`
# e.g. CDSL = pycdsl.CDSLCorpus(data_dir="custom-installation-path")

# Custom transliteration schemes for input and output can be specified
# with arguments `input_scheme` and `output_scheme`.
# Values should be valid names of the schemes from `indic-transliteration`
# If unspecified, `DEFAULT_SCHEME` (`devanagari`) would be used.

CDSL = pycdsl.CDSLCorpus(input_scheme="itrans", output_scheme="iast")

Setup default dictionaries (["MW", "MWE", "AP90", "AE"]):

# Note: Any additional dictionaries that are installed will also be loaded.
CDSL.setup()

# For loading specific dictionaries only,
# a list of dictionary IDs can be passed to the setup function
# e.g. CDSL.setup(["VCP"])

# If `update` flag is True, update check is performed for every dictionary
# in `dict_ids` and if available, the updated version is installed
# e.g. CDSL.setup(["MW"], update=True)

Search in a dictionary:

# Any loaded dictionary is accessible using `[]` operator and dictionary ID
# e.g. CDSL["MW"]
results = CDSL["MW"].search("राम")

# Alternatively, they are also accessible like an attribute
# e.g. CDSL.MW, CDSL.MWE etc.
results = CDSL.MW.search("राम")

# Note: Attribute access and Item access both use the `dicts` property
# under the hood to access the dictionaries.
# >>> CDSL.MW is CDSL.dicts["MW"]
# True
# >>> CDSL["MW"] is CDSL.dicts["MW"]
# True

# `input_scheme` and `output_scheme` can be specified to the search function.
CDSL.MW.search("kṛṣṇa", input_scheme="iast", output_scheme="itrans")[0]
# <MWEntry: 55142: kRRiShNa = 1. kRRiShNa/ mf(A/)n. black, dark, dark-blue (opposed to shveta/, shukla/, ro/hita, and aruNa/), RV.; AV. &c.>

# Search using wildcard (i.e. `*`)
# e.g. To search all etnries starting with kRRi (i.e. कृ)
CDSL.MW.search("kRRi*", input_scheme="itrans")

# Limit and/or Offset the number of search results, e.g.
# Show the first 10 results
CDSL.MW.search("kṛ*", input_scheme="iast", limit=10)
# Show the next 10 results
CDSL.MW.search("kṛ*", input_scheme="iast", limit=10, offset=10)

Access an entry by ID:

# Access entry by `entry_id` using `[]` operator
entry = CDSL.MW["263938"]

# Alternatively, use `CDSLDict.entry` function
entry = CDSL.MW.entry("263938")

# Note: Access using `[]` operator calls the `CDSLDict.entry` function.
# The difference is that, in case an `entry_id` is absent,
# `[]` based access will raise a `KeyError`
# `CDSLDict.entry` will return None and log a `logging.ERROR` level message

# >>> entry
# <MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>

# Output transliteration scheme can also be provided

CDSL.MW.entry("263938", output_scheme="iast")
# <MWEntry: 263938: hṛṣīkeśa = lord of the senses (said of Manas), BhP.>

Entry class also supports transliteration after creation. Thus, any entry fetched either through search() function or through entry() function can be transliterated.

Transliterate a single entry:

CDSL.MW.entry("263938").transliterate("slp1")
# <MWEntry: 263938: hfzIkeSa = lord of the senses (said of Manas), BhP.>

Change transliteration scheme for a dictionary:

CDSL.MW.set_scheme(input_scheme="itrans")
CDSL.MW.search("rAma")

Classes CDSLCorpus and CDSLDict are iterable.

  • Iterating over CDSLCorpus yields loaded dictionary instances.

  • Iterating over CDSLDict yields entries in that dictionary.

# Iteration over a `CDSLCorpus` instance

for cdsl_dict in CDSL:
    print(type(cdsl_dict))
    print(cdsl_dict)
    break

# <class 'pycdsl.lexicon.CDSLDict'>
# CDSLDict(id='MW', date='1899', name='Monier-Williams Sanskrit-English Dictionary')

# Iteration over a `CDSLDict` isntance
for entry in CDSL.MW:
    print(type(entry))
    print(entry)
    break

# <class 'pycdsl.models.MWEntry'>
# <MWEntry: 1: अ = 1. अ   the first letter of the alphabet>

Note: Please check the documentation of modules in the PyCDSL Package for more detailed information on available classes and functions.

https://pycdsl.readthedocs.io/en/latest/pycdsl.html

Using Console Interface of PyCDSL

Help to the Console Interface:

usage: cdsl [-h] [-i] [-s SEARCH] [-p PATH] [-d DICTS [DICTS ...]]
            [-is INPUT_SCHEME] [-os OUTPUT_SCHEME] [-u] [-dbg] [-v]

Access dictionaries from Cologne Digital Sanskrit Lexicon (CDSL)

optional arguments:
-h, --help          show this help message and exit
-i, --interactive   Start in an interactive REPL mode
-s SEARCH, --search SEARCH
                    Search pattern. Ignored if `--interactive` mode is set.
-p PATH, --path PATH  Path to installation
-d DICTS [DICTS ...], --dicts DICTS [DICTS ...]
                    Dictionary IDs
-is INPUT_SCHEME, --input-scheme INPUT_SCHEME
                    Input transliteration scheme
-os OUTPUT_SCHEME, --output-scheme OUTPUT_SCHEME
                    Output transliteration scheme
-u, --update        Update the specified dictionaries.
-dbg, --debug       Turn debug mode on.
-v, --version       Show version and exit.

Note: Arguments for specifying installation path, dictionary IDs, input and output transliteration schemes are valid for both interactive REPL shell and non-interactive console command.

Using REPL Interface of PyCDSL

To use REPL Interface to Cologne Digital Sanskrit Lexicon (CDSL):

$ cdsl -i

REPL Session Example

Cologne Sanskrit Digital Lexicon (CDSL)
---------------------------------------
Install or load dictionaries by typing `use [DICT_IDS..]` e.g. `use MW`.
Type any keyword to search in the selected dictionaries. (help or ? for list of options)
Loaded 4 dictionaries.

(CDSL::None) help

Documented commands (type help <topic>):
========================================
EOF        dicts  info          output_scheme  show    use
available  exit   input_scheme  search         stats   version
debug      help   limit         shell          update

(CDSL::None) help available
Display a list of dictionaries available in CDSL

(CDSL::None) help search

    Search in the active dictionaries

    Note
    ----
    * Searching in the active dictionaries is also the default action.
    * In general, we do not need to use this command explicitly unless we
      want to search the command keywords, such as, `available` `search`,
      `version`, `help` etc. in the active dictionaries.

(CDSL::None) help dicts
Display a list of dictionaries available locally

(CDSL::None) dicts
CDSLDict(id='AP90', date='1890', name='Apte Practical Sanskrit-English Dictionary')
CDSLDict(id='MW', date='1899', name='Monier-Williams Sanskrit-English Dictionary')
CDSLDict(id='MWE', date='1851', name='Monier-Williams English-Sanskrit Dictionary')
CDSLDict(id='AE', date='1920', name="Apte Student's English-Sanskrit Dictionary")

(CDSL::None) update
Data for dictionary 'AP90' is up-to-date.
Data for dictionary 'MW' is up-to-date.
Data for dictionary 'MWE' is up-to-date.
Data for dictionary 'AE' is up-to-date.

(CDSL::None) use MW
Using 1 dictionaries: ['MW']

(CDSL::MW) हृषीकेश

Found 6 results in MW.

<MWEntry: 263922: हृषीकेश = हृषी-केश a   See below under हृषीक.>
<MWEntry: 263934: हृषीकेश = हृषीकेश b m. (perhaps = हृषी-केश cf. हृषी-वत् above) id. (-त्व n.), MBh.; Hariv. &c.>
<MWEntry: 263935: हृषीकेश = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: हृषीकेश = of a Tīrtha, Cat.>
<MWEntry: 263937: हृषीकेश = of a poet, ib.>
<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>

(CDSL::MW) show 263938

<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>

(CDSL::MW) input_scheme itrans
Input scheme: itrans

(CDSL::MW) hRRiSIkesha

Found 6 results in MW.

<MWEntry: 263922: हृषीकेश = हृषी-केश a   See below under हृषीक.>
<MWEntry: 263934: हृषीकेश = हृषीकेश b m. (perhaps = हृषी-केश cf. हृषी-वत् above) id. (-त्व n.), MBh.; Hariv. &c.>
<MWEntry: 263935: हृषीकेश = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: हृषीकेश = of a Tīrtha, Cat.>
<MWEntry: 263937: हृषीकेश = of a poet, ib.>
<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>

(CDSL::MW) output_scheme iast
Output scheme: iast

(CDSL::MW) hRRiSIkesha

<MWEntry: 263922: hṛṣīkeśa = hṛṣī-keśa a   See below under hṛṣīka.>
<MWEntry: 263934: hṛṣīkeśa = hṛṣīkeśa b m. (perhaps = hṛṣī-keśa cf. hṛṣī-vat above) id. (-tva n.), MBh.; Hariv. &c.>
<MWEntry: 263935: hṛṣīkeśa = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: hṛṣīkeśa = of a Tīrtha, Cat.>
<MWEntry: 263937: hṛṣīkeśa = of a poet, ib.>
<MWEntry: 263938: hṛṣīkeśa = lord of the senses (said of Manas), BhP.>

(CDSL::MW) limit 2
Limit: 2

(CDSL::MW) hRRiSIkesha

Found 2 results in MW.

<MWEntry: 263922: hṛṣīkeśa = hṛṣī-keśa a   See below under hṛṣīka.>
<MWEntry: 263934: hṛṣīkeśa = hṛṣīkeśa b m. (perhaps = hṛṣī-keśa cf. hṛṣī-vat above) id. (-tva n.), MBh.; Hariv. &c.>

(CDSL::MW) info
Total 1 dictionaries are active.
CDSLDict(id='MW', date='1899', name='Monier-Williams Sanskrit-English Dictionary')

(CDSL::MW) stats
Total 1 dictionaries are active.
---
CDSLDict(id='MW', date='1899', name='Monier-Williams Sanskrit-English Dictionary')
{'total': 287627, 'distinct': 194044, 'top': [('कृष्ण', 50), ('शिव', 46), ('विजय', 46), ('पुष्कर', 45), ('काल', 39), ('सिद्ध', 39), ('योग', 39), ('चित्र', 38), ('शुचि', 36), ('वसु', 36)]}

(CDSL::MW) use WIL

Downloading 'WIL.web.zip' ... (8394727 bytes)
100%|██████████████████████████████████████████████████████████████████████████████████████| 8.39M/8.39M [00:21<00:00, 386kB/s]
Successfully downloaded 'WIL.web.zip' from 'https://www.sanskrit-lexicon.uni-koeln.de/scans/WILScan/2020/downloads/wilweb1.zip'.
Using 1 dictionaries: ['WIL']

(CDSL::WIL)

(CDSL::WIL) use WIL MW
Using 2 dictionaries: ['WIL', 'MW']

(CDSL::WIL,MW) hRRiSIkesha

Found 1 results in WIL.

<WILEntry: 44411: hṛṣīkeśa = hṛṣīkeśa  m. (-śaḥ) KṚṢṆA or VIṢṆU. E. hṛṣīka an organ of sense, and īśa lord.>

Found 6 results in MW.

<MWEntry: 263922: hṛṣīkeśa = hṛṣī-keśa a   See below under hṛṣīka.>
<MWEntry: 263934: hṛṣīkeśa = hṛṣīkeśa b m. (perhaps = hṛṣī-keśa cf. hṛṣī-vat above) id. (-tva n.), MBh.; Hariv. &c.>
<MWEntry: 263935: hṛṣīkeśa = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: hṛṣīkeśa = of a Tīrtha, Cat.>
<MWEntry: 263937: hṛṣīkeśa = of a poet, ib.>
<MWEntry: 263938: hṛṣīkeśa = lord of the senses (said of Manas), BhP.>

(CDSL::WIL,MW) use MW AP90 MWE AE
Using 4 dictionaries: ['MW', 'AP90', 'MWE', 'AE']

(CDSL::MW+3) use ALL
Using 5 dictionaries: ['AP90', 'MW', 'MWE', 'AE', 'WIL']

(CDSL::AP90+3) use NONE
Using 0 dictionaries: []

(CDSL::None) exit
Bye

Credits

This application uses data from Cologne Digital Sanskrit Dictionaries, Cologne University.

History

0.7.0 (2022-03-17)

  • Add the explicit REPL command search

  • Add a REPL command stats

  • Interpret arguments all and none to the REPL command use

  • Add lexicon_id to Entry class

  • Add a placeholder for post-init hook in Entry. If implemented, this will be run after __init__() of Entry

  • Remove model_map from CDSLDict and add to CDSLCorpus

  • Add tests for lexicon initalization, download, setup, transliteration, iteration, getitem, stats, entry, dump

  • Add credits to CDSL website

  • Update documentation

  • Fix bugs

0.6.0 (2022-02-14)

  • Add __getitem__ method to CDSLCorpus to access loaded dictionaries using [] operator with dict_id

  • Add __getitem__ method to CDSLDict to access dictionary entries using [] operator with entry_id

  • Add unit tests and integration tests for pycdsl.utils

  • Add unit tests and integration tests for pycdsl.corpus

  • Update documentation

  • Fix bugs

0.5.0 (2022-02-13)

  • Add model_map argument to CDSLDict.connect for better customization

  • Make CDSLCorpus iterable (iterate over loaded dictionaries)

  • Make CDSLDict iterable (iterate over dictionary entries)

  • Update documentation

0.4.0 (2022-02-12)

  • Add ability to limit and offset the number of search results

  • Add .to_dict() method to Entry class

  • Add multi-dictionary .search() from CDSLCorpus

  • Add support for multiple active dictionaries in REPL

  • Improve code structure (more modular)

  • Improve documentation formatting

  • Update documentation

  • Fix bugs

0.3.0 (2022-02-07)

  • Functional CLI (console command) for dictionary search

  • Integration of existing REPL into the CLI. (--interactive)

  • Extend transliteration support on Corpus, Dictionary, Search and Entry level

  • Make the package Python 3.6 compatibile

0.2.0 (2022-02-05)

  • Improve dictionary setup

  • Add a function to dump data

  • Add logging support

  • Add transliteration support using indic-transliteration

0.1.0 (2022-01-28)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyCDSL-0.7.0.tar.gz (29.6 kB view hashes)

Uploaded Source

Built Distribution

PyCDSL-0.7.0-py2.py3-none-any.whl (22.9 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page