Skip to main content

A collection of Orange3 widgets to perform natural language processing

Project description

orange3-nlp

This provides a collection of widgets for Natural Language Processing.

Installation

Within the Add-ons installer, click on "Add more..." and type in orange3-nlp

Widgets

Canvas with 8 major widgets provided by the Orange3-NLP package

  • General Widgets
    • Abstractive Summary
    • Extractive Summary
    • Named Entity Recognition
    • POS Tagger
    • POS Viewer
    • Question Answering
    • Reference Library
    • Ollama RAG

Text Splitting Widgets

  • Text Splitting Widgets
    • Text Chunker
    • Tokens to Corpus

Text Embedding Models

  • Text Embedding Models
    • Doc2Vec
    • E5
    • FastText
    • Gemini
    • Nomic
    • OpenAI
    • Sentence Embedder (SBERT)
    • spaCy
    • USE

Training widget for Doc2Vec embedder

  • Training of Text Embedding Widget
    • Train Doc2Vec

Polish sentiment analysis widget, Analiza Sentymentu

  • For Polish Sentiment Analysis
    • Analiza Sentymentu

Summary Widgets

  • Extractive Summary: Selects and joins key sentences or phrases from the original text.

Extractive Summary of The Little Match-Seller

  • Abstractive Summary: Generates new sentences that paraphrase and condense the original content (more similar to how humans summarize).

Abstractive Summary of The Litle Match-Seller

Named Entity Recognition

Named Entity Recognition (NER) is a task in NLP that locates and classifies named entities in text into predefined categories such as:

  • PERSON – names of people
  • ORG – organizations
  • GPE – countries, cities, or locations
  • DATE, TIME, MONEY, etc.

Part of Speech Tagging

Part-of-speech (POS) tagging assigns grammatical categories to each word in a sentence.

Common POS Tags

Tag Meaning Example
NN Noun cat, city
VB Verb run, is
JJ Adjective fast, red
RB Adverb quickly
DT Determiner the, an
IN Preposition on, with

POS tagging is essential for syntactic parsing and downstream NLP tasks.

Part of Speech Viewer

This uses spaCy's displacy HTML renderer to provide a parsed dependency tree of the parts of speech of the input text.

Part of Speech Viewer with parsed Slovenian text.

Question Answering

Question Answering (QA) systems aim to extract or generate answers to user questions from a text or knowledge base.

Question and Answers for "Who Died?" against the Book Excerpts corpus

Text Splitting Widgets

Tokens to Corpus

The Tokens to Corpus widget takes the tokens from the Preprocess Text widgets.

Tokens to Corpus workflow

Text Chunker

Text Chunker supports 2 chunking strategies to split text. The first is LangChain's RecursiveCharacterTextSplitter and the second is semantic-text-splitter.

Text Chunker widget

Reference Augmented Generation

Reference Augmented Generation (RAG) is a method of enhancing large language model (LLM) responses by providing external documents as supporting context. Instead of relying solely on the model's training data, RAG:

  • Retrieves relevant snippets from a document collection (knowledge base).
  • Augments the prompt to the LLM by including this retrieved content.
  • Generates a more accurate and grounded answer based on the context.

RAG Workflow

Let's take a look at the Reference Library

Reference Library

And lastly, let's look at the Ollama RAG use.

Ollama RAG Widget: Using the phi Ollama model, and a prompt of "Who were the Munchins and what are they good at?"

Polish Sentiment Analysis

Since Polish sentiment analysis support in Orange was limited, Analiza Sentymentu provides a tuned model.

Polish sentiment analysis workflow

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orange3_nlp-0.0.8.tar.gz (140.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orange3_nlp-0.0.8-py3-none-any.whl (186.2 kB view details)

Uploaded Python 3

File details

Details for the file orange3_nlp-0.0.8.tar.gz.

File metadata

  • Download URL: orange3_nlp-0.0.8.tar.gz
  • Upload date:
  • Size: 140.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for orange3_nlp-0.0.8.tar.gz
Algorithm Hash digest
SHA256 36e2efc7f37e3bd0f28ddb63c978b091e59a61aef6491c86c5a5ccee13a3d7aa
MD5 1cab80241f69f61b47eb0cb6a92b0119
BLAKE2b-256 1bff6efa4bf88820e61b7391d823ecd4d0f2bf8baf882456c2bee784416e4b6e

See more details on using hashes here.

File details

Details for the file orange3_nlp-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: orange3_nlp-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 186.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for orange3_nlp-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 74719b533f194eca6f80a9981c0f1dfc707bdab588b50929d814e6398186ef19
MD5 b459fd223cd9b0b0c176d74d63ab5563
BLAKE2b-256 17359286f33c56c6b1b4e29d3c27482693dff66eecb26dab6464d8c01705d6fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page