sanskrit-parser

Tools for lexical and morphological analysis of Sanskrit

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
- Python :: 3.6
Topic
- Text Processing :: Linguistic

Project description

Parsers for Sanskrit / संस्कृतम्

NOTE: This project is still under development. Both over-generation (invalid forms/splits) and under-generation (missing valid forms/splits) are quite likely. Please see the Sanskrit Parser Stack section below for detailed status. Report any issues here.

Please feel free to ping us if you would like to collaborate on this project.

Try it out!

A web interface is available here - https://kmadathil.github.io/sanskrit_parser/ui/index.html

Installation

This project has been tested and developed using Python 3.6.

pip install sanskrit_parser

Usage

See generated sphynx docs.
PS: Command line usage is also documented there.

Deploying REST API server

Run:

sudo mkdir /var/www/.sanskrit_parser
sudo chmod a+rwx /var/www/.sanskrit_parser

Contribution

Generate docs: cd docs; make html

Sanskrit Parser Stack

Stack of parsing tools

Level 0

Sandhi splitting subroutine Input: Phoneme sequence and Phoneme number to split at Action: Perform a sandhi split at given input phoneme number Ouptut: left and right sequences (multiple options will be output). No semantic validation will be performed (up to higher levels)

Current Status

Module that performs sandhi split/join and convenient rule definition is at parser/sandhi.py.

Rule definitions (human readable!) are at lexical_analyzer/sandhi_rules/*.txt

Use sanskrit_parser tags on the command line

Level 1

From dhatu + lakAra + puruSha + vachana to pada and vice versa
From prAtipadika + vibhakti + vachana to pada and vice versa
Upasarga + dhAtu forms - forward and backwards
nAmadhAtu forms
Krt forms - forwards and backwards
Taddhita forms - forwards and backwards

Current Status

To be done.

However, we have a usable solution with inriaxmlwrapper + Prof. Gerard Huet’s forms database to act as queriable form database. That gives us the bare minimum we need from Level 1, so Level 2 can work.

Level 2

Input

Sanskrit Sentence #### Action * Traverse the sentence, splitting it (or not) at each location to determine all possible valid splits * Traverse from left to right * Using dynamic programming, assemble the results of all choices

To split or not to split at each phoneme

If split, all possible left/right combination of phonemes that can result

Once split, check if the left section is a valid pada (use level 1 tools to pick pada type and tag morphologically)

If left section is valid, proceed to split the right section

At the end of this step, we will have all possible syntactically valid splits with morphological tags

Output

All semantically valid sandhi split sequences

Current Status

Module at parser/sandhi_analyer.py

Use sanskrit_parser sandhi on the command line

Level 3

Input

Semantically valid sequence of tagged padas (output of Level 1) #### Action: * Assemble graphs of morphological constraints

viseShaNa - viseShya

karaka/vibhakti

vachana/puruSha constraints on tiGantas and subantas

Check validity of graphs #### Output

Is the input sequence a morphologically valid sentence?
Enhanced sequence of tagged padas, with karakas tagged, and a dependency graph associated

Current Status

Module at parser/vakya_analyer.py

Limited version available using sanskrit_parser vakya

Seq2Seq based Sanskrit Parser

See: Grammar as a Foreign Language : Vinyals & Kaiser et. al. Google http://arxiv.org/abs/1412.7449

Method: Seq2Seq Neural Network (n? layers)
Input Embedding with word2vec (optional)

Input

Sanskrit sentence ### Output Sentence split into padas with tags ### Train/Test data DCS corpus, converted by Vishvas Vasuki

Current Status

Not begun

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
- Python :: 3.6
Topic
- Text Processing :: Linguistic

Release history Release notifications | RSS feed

0.2.6

Mar 16, 2023

0.2.5

Oct 18, 2022

0.2.4.post1

Aug 2, 2022

0.2.3.post2

Apr 13, 2021

0.2.3.post1

Apr 8, 2021

0.2.3.post0

Mar 23, 2021

0.2.3

Mar 20, 2021

0.2.2.post0

Mar 15, 2021

0.2.2

Mar 11, 2021

0.2.1

Mar 11, 2021

0.2.0

Mar 10, 2021

0.1.1

Jan 8, 2021

0.1.0.post4

Dec 28, 2020

0.1.0.post3

Jul 7, 2020

0.1.0.post2

Jun 26, 2020

This version

0.1.0.post1

Mar 9, 2020

0.1.0.post0

Mar 9, 2020

0.1.0

Mar 9, 2020

0.0.4

May 6, 2019

0.0.3

Apr 15, 2019

0.0.2

Jan 28, 2019

0.0.1.dev6 pre-release

Dec 9, 2017

0.0.1.dev5 pre-release

Nov 7, 2017

0.0.1.dev4 pre-release

Oct 1, 2017

0.0.1.dev3 pre-release

Aug 9, 2017

0.0.1.dev2 pre-release

Aug 1, 2017

0.0.1.dev1 pre-release

Aug 1, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sanskrit_parser-0.1.0.post1.tar.gz (50.1 kB view hashes)

Uploaded Mar 9, 2020 Source

Built Distribution

sanskrit_parser-0.1.0.post1-py2.py3-none-any.whl (94.3 kB view hashes)

Uploaded Mar 9, 2020 Python 2 Python 3

Hashes for sanskrit_parser-0.1.0.post1.tar.gz

Hashes for sanskrit_parser-0.1.0.post1.tar.gz
Algorithm	Hash digest
SHA256	`862fc9fd71fd6e563a78efc67e6cb8e7f35ccff9db2873fecf89e3e80eb6b5be`
MD5	`ece46b950386aa9bba785a526360e696`
BLAKE2b-256	`65192f8eb4de133975786fbaed951260a729e4ebbfc2c6186797ffc73896ae4c`

Hashes for sanskrit_parser-0.1.0.post1-py2.py3-none-any.whl

Hashes for sanskrit_parser-0.1.0.post1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`3b5dc589005544a991b93991694dd2d0b0ddd821f334e2b931f103189150d727`
MD5	`348f74134f87fba66102a1fd7a95d7e2`
BLAKE2b-256	`305d655a6c3e6cd8a4c92cdb12854d5bd9da672bc92c854f001cd1d3d59449ae`