Pipeline components for Sci-kit learn to extract relevant features from text data, including tokens, parts of speech, lexicon scores, document-level statistics and embeddings.

These details have not been verified by PyPI

Project links

Homepage

Project description

Textplumber

Introduction to Textplumber

The Textplumber library is intended to make it easier to build text classification pipelines with Sci-kit learn. Sci-kit learn provides a powerful suite of tools for machine learning, including in-built support for text. Textplumber adds to Sci-kit learn’s functionality, leveraging libraries like spaCy and new feature extraction techniques like Model2Vec, and provides easy access to a range of text feature types.

Development status

Textplumber is in active development. It is currently released for beta testing. The Github site may be ahead of the Pypi version, so for latest functionality install from Github (see below). The Github code is pre-release and may change. For the latest release, install from Pypi (pip install textplumber). The documentation reflects the most recent functionality. See the CHANGELOG for notes on releases.

Development Team

The developers of Textplumber are:

Dr Geoff Ford, Senior Lecturer, Faculty of Arts, University of Canterbury
Dr Christopher Thomson, Senior Lecturer in English and Digital Humanities, University of Canterbury
Karin Stahel, PhD Candidate, Data Science, University of Canterbury

Dr Geoff Ford is leading development of Textplumber and is the main contributor to date.

Some Textplumber functionality has been created through collaborations of team members to develop teaching resources for DIGI405, Text, Discourses and Data, a course offered through the Digital Humanities and Master of Applied Data Science programmes at the University of Canterbury. The entire team are contributing to testing and will contribute to the development of Textplumber documentation.

Acknowledgements

Dr Ford’s work to create this Python library has been made possible by funding/support from:

“Mapping LAWS: Issue Mapping and Analyzing the Lethal Autonomous Weapons Debate” (Royal Society of New Zealand’s Marsden Fund Grant 19-UOC-068)
“Into the Deep: Analysing the Actors and Controversies Driving the Adoption of the World’s First Deep Sea Mining Governance” (Royal Society of New Zealand’s Marsden Fund Grant 22-UOC-059)
Sabbatical, University of Canterbury, Semester 1 2025.

The development team of Textplumber are researchers with Te Pokapū Aronui ā-Matihiko | UC Arts Digital Lab (ADL). Thanks to the ADL team and the ongoing support of the University of Canterbury’s Faculty of Arts who make work like this possible.

Installation

Install via pip

You can install Textplumber from pypi using this command:

$ pip install textplumber

To install the latest development version of Textplumber, which may be ahead of the version on Pypi, you can install from the repository:

$ pip install git+https://github.com/polsci/textplumber.git

Install a language model

Many of Textplumber’s pipeline components require a SpaCy language model. After installing textplumber, install a model. Here’s an example of how to install SpaCy’s small English model:

python -m spacy download en_core_web_sm

If you are working with a different language or want to use a different ‘en’ model, check the SpaCy models documentation for the relevant model name.

Using Textplumber

A good place to start is the quick introduction and an example notebook, which allows you to use Textplumber with different datasets and different kinds of text classification problems.

The documentation site provides a reference for Textplumber functionality and examples of how to use the various components. The current Textplumber components are listed below.

Component	Functionality	Requires Preprocessor
`TextCleaner`	Cleans text data	-
`SpacyPreprocessor`	Preprocessor, uses spaCy	-
`NLTKPreprocessor`	Preprocessor, uses NLTK	-
`TokensVectorizer`	Extract individual tokens or token ngram features	Yes
`POSVectorizer`	Extract individual part of speech or POS ngram features	Yes
`TextstatsTransformer`	Extract document-level statistics	Yes
`LexiconCountVectorizer`	Extract features based on lexicons (i.e. counts of lists of words)	Yes
`VaderSentimentExtractor`	Extract sentiment features using VADER	-
`VaderSentimentEstimator`	Predict sentiment using VADER	-
`Model2VecEmbedder`	Extract embeddings using Model2Vec	-
`CharNgramVectorizer`	Extract character ngrams	-

Here is some other helpful functionality for working with text pipelines …

Function/Class	Functionality
`preview_dataset`	Output information about a Huggingface dataset
`plot_confusion_matrix`	SVG confusion matrix with counts and row-wise proportions and appropriate labels
`plot_logistic_regression_features_from_pipeline`	Plot the most discriminative features for a logistic regression classifier
`plot_decision_tree_from_pipeline`	Plot the decision tree of the classifier from a pipeline using SuperTree
`preview_pipeline_features`	Output the features at each step in a pipeline
`SentimentIntensityInterpreter`	Functionality to aide interpretation of VADER scoring
`sentiment_wordcloud`	Visualize the salience of VADER lexicon words across multiple texts

Developer Guide

The instructions below are only relevant if you want to contribute to Textplumber. The nbdev library is being used for development. If you are new to using nbdevc, here are some useful pointers to get you started (or visit the nbdev website).

Install textplumber in Development mode

# make sure textplumber package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to textplumber
$ nbdev_prepare

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.11

Aug 22, 2025

0.0.10

Aug 20, 2025

0.0.9

Aug 7, 2025

0.0.8

Apr 24, 2025

0.0.7

Apr 23, 2025

0.0.6

Apr 22, 2025

0.0.5

Apr 22, 2025

0.0.4

Apr 22, 2025

0.0.3

Apr 22, 2025

0.0.2

Apr 12, 2025

0.0.1

Apr 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textplumber-0.0.11.tar.gz (46.3 kB view details)

Uploaded Aug 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

textplumber-0.0.11-py3-none-any.whl (48.1 kB view details)

Uploaded Aug 22, 2025 Python 3

File details

Details for the file textplumber-0.0.11.tar.gz.

File metadata

Download URL: textplumber-0.0.11.tar.gz
Upload date: Aug 22, 2025
Size: 46.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for textplumber-0.0.11.tar.gz
Algorithm	Hash digest
SHA256	`7286667d430e13bff48ce2499dd242f71b1003a57dda22664556796ad0ff35f0`
MD5	`d6f0e2a186d055a3b253b11125e1f6a6`
BLAKE2b-256	`ae91e3db9d39495ab44b8821703dbd4228025c12f1602932ecae235ff631ad7b`

See more details on using hashes here.

File details

Details for the file textplumber-0.0.11-py3-none-any.whl.

File metadata

Download URL: textplumber-0.0.11-py3-none-any.whl
Upload date: Aug 22, 2025
Size: 48.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for textplumber-0.0.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dda40a0683f0db35f88fd1fc26b8c4cee28f8b57b6067cd7fb16fc64e0aa42b3`
MD5	`20afabbfec773fda0f19067bcf4ffbb2`
BLAKE2b-256	`c6290c2475b70bdd72e1e8655ea10986cd730a10db61ba0a2dffe1a8607f0d18`

See more details on using hashes here.

textplumber 0.0.11

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Textplumber

Introduction to Textplumber

Development status

Development Team

Acknowledgements

Installation

Install via pip

Install a language model

Using Textplumber

Developer Guide

Install textplumber in Development mode

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes