Skip to main content

A utility library to assist in parsing natural language text.

Project description

Zensols Natural Language Parsing

PyPI Python 3.10 Python 3.11 Build Status

From the paper DeepZensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility. This framework wraps the spaCy framework and creates light weight features in a class hierarchy that reflects the structure of natural language. The motivation is to generate features from the parsed text in an object oriented fashion that is fast and easy to pickle.

Other features include:


Obtaining / Installing

The easiest way to install the command line program is via the pip installer. Since the package needs at least one spaCy module, the second command downloads the smallest model.

pip3 install --use-deprecated=legacy-resolver zensols.nlp
python -m spacy download en_core_web_sm

Binaries are also available on pypi.


A parser using the default configuration can be obtained by:

from zensols.nlp import FeatureDocumentParser
parser: FeatureDocumentParser = FeatureDocumentParser.default_instance()
doc = parser('Obama was the 44th president of the United States.')
for tok in doc.tokens:
    print(tok.norm, tok.pos_, tok.tag_)

the DET DT
45th ADJ JJ
president NOUN NN
the United States DET DT
(<Obama>, <45th>, <the United States>)

However, minimal effort is needed to configure the parser using a resource library:

from io import StringIO
from zensols.config import ImportIniConfig, ImportConfigFactory
from zensols.nlp import FeatureDocument, FeatureDocumentParser

CONFIG = """
# import the `zensols.nlp` library
config_file = resource(zensols.nlp): resources/obj.conf

# override the parse to keep only the norm, ent
token_feature_ids = set: ent_, tag_

if (__name__ == '__main__'):
    fac = ImportConfigFactory(ImportIniConfig(StringIO(CONFIG)))
    doc_parser: FeatureDocumentParser = fac('doc_parser')
    sent = 'He was George Washington and first president of the United States.'
    doc: FeatureDocument = doc_parser(sent)
    for tok in doc.tokens:

This uses a resource library to source in the configuration from this package so minimal configuration is necessary. More advanced configuration examples are also available.

See the feature documents for more information.


Certain scores in the scoring module need additional Python packages. These are installed with:

pip install -R src/python/requirements-score.txt


This project, or example code, uses:


If you use this project in your research please use the following BibTeX entry:

    title = "{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility",
    author = "Landes, Paul  and
      Di Eugenio, Barbara  and
      Caragea, Cornelia",
    editor = "Tan, Liling  and
      Milajevs, Dmitrijs  and
      Chauhan, Geeticka  and
      Gwinnup, Jeremy  and
      Rippeth, Elijah",
    booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
    month = dec,
    year = "2023",
    address = "Singapore, Singapore",
    publisher = "Empirical Methods in Natural Language Processing",
    url = "",
    pages = "141--146"


An extensive changelog is available here.


Please star this repository and let me know how and where you use this API. Contributions as pull requests, feedback and any input is welcome.


MIT License

Copyright (c) 2020 - 2023 Paul Landes

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

zensols.nlp-1.11.1-py3-none-any.whl (64.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page