A python module with Program Synthesis techniques for NLP

Project description

# PsyNLP

> Program SYnthesis for NLP

PsyNLP is a Python library, that intends to handle morphological inflections for any language in the form of an interpretable program. :tada:

### Table of Contents

1. [Installation Guidelines](#installation-guidelines)
2. [Usage](#usage)
3. [Repository structure](#repository-structure)
4. [Running the tests](#running-the-tests)
5. [Contribution Guidelines](#contribution-guidelines)
6. [License](#license)

### Installation Guidelines

1. Clone the repository

$ git clone

2. Go to the cloned repository

$ cd PsyNLP

3. Install the dependencies

$ pip3 install -r requirements.txt

Alternatively, you can also install the module from pip directly using the command:

`pip3 install psynlp`

### Usage

With the power of `argparse`, the []( acts as the central script to run any of the pipelines, for any language and training data quality.

- Help menu, for more details:

$ python3 -h

usage: [-h] [-p PIPELINE] [-l LANGUAGE] [-q QUALITY] [-v]

Runs one of the pipeline scripts, for a given language and quality.

optional arguments:
-h, --help show this help message and exit
-p PIPELINE, --pipeline PIPELINE
Name of the pipeline file (Default: deterministic)
-l LANGUAGE, --language LANGUAGE
Name of the language (Default: english)
-q QUALITY, --quality QUALITY
Size of the training data (Default: low)
-v, --verbose Prints verbose output if specified

- Running a pipeline (say, ostia) for a language (say, polish) and training data quality (say, high):

$ python3 -p ostia -l polish -q high

- Get more output debug-like details with verbose flags (max. 3)

# No verbose, just print the exact word-match accuracy
$ python3

# Verbose 1, print the expected and actual words
$ python3 -v

# Verbose 2, print the paths responsible for computing an inflection
$ python3 -vv

# Verbose 3, print debug details for PAC and OSTIA
$ python3 -vvv

### Repository structure

- Base classes:

The code for base classes can be found in the `psynlp/core` directory.

- ``: Contains implementations of PAC and other methods related to Formal Concept Analysis
- ``: Contains generic Transducer methods, like states and arcs
- ``: Contains the oracles that're used while computing the PAC basis in ``
- ``: Implementation of the well-known OSTIA algorithm, that uses ``

- Pipelines:

The code for the different pipelines can be found in the `psynlp/pipelines` directory.

- `` : Prediction based on Pandas' `group_by` (deterministic clustering) and OSTIA RegExp matching
- ``: Prediction based on just the input-output tapes of OSTIA
- ``: Prediction based on PAC clusters and OSTIA RegExp matching

- Helpers:

The code for the different helpers can be found in the `psynlp/helpers` directory.

- ``: Monkey-patches some required verbose-related builtin functions
- ``: Includes functions that imports training and testing data into different structures
- ``: Miscellaneous functions
- ``: Text-related functions such as inflecting, prefix, suffix, edit distance, etc.

- Data:

The `psynlp/data` directory contains all the training and testing data. The files are of the form:

- {language}-train-{quality}
- {language}-dev

### Running the tests

1. Basic run to check the results:


2. For debugging:

py.test -s --fulltrace

### Contribution Guidelines

Your contributions are always welcome! Please have a look at the [contribution guidelines]( first. :tada:

### License

MIT License 2018 - [Gaurav Sahu]( and [Athitya Kumar](

