Skip to main content

Wordview is a Python package for text analysis.

Project description

Wordview (Work In Progress)

PyPI PyPI - Python Version PyPI - Downloads

Wordview is a Python package for Exploratory Data Analysis (EDA) and Feature Extraction for text. Wordview’s Python API is open-source and available under the MIT license.

cover

Usage

Install the package via pip:

pip install wordview

To explore various features and functionalities, consult the documentation pages. The following sections present a high-level description of Wordview’s features and functionalities. For details, tutorials and worked examples, corresponding documentation pages are linked in each section.

Exploratory Data Analysis (EDA)

Wordview presents many statistics about your data in form of plots and tables allowing you to have both a high-level and detailed overview of your data. For instance, which languages are present in your dataset, how many unique words and unique words are there in your dataset, what percentage of them are Adjectives, Nouns or Verbs, what are the most common POS tags, etc. Wordview also provides several statistics for labels in labeled datasets.

Text Analysis

Using this feature, you can have an overview of your text data in terms of various statistics, plots and distribution. See text analysis documentation pages for usage and examples.

Analysis of Labels

Wordview calculates several statistics for labels in labeled datasets whether they are at document or sequence level. See label analysis documentation pages for usage and examples.

Feature Extraction

Wordview has various functionalities for feature extraction from text, including Multiword Expressions (MWEs), clusters, anomalies and outliers, and more. See the following sections as well as the linked documentation page in each section for details.

Multiword Expressions

Multiword Expressions (MWEs) are phrases that can be treated as a single semantic unit. E.g. swimming pool and climate change. MWEs have application in different areas including: parsing, language models, language generation, terminology extraction, and topic models. Wordview can extract different types of MWEs from text. See MWEs documentation page for usage and examples.

Anomalies and Outliers

Anomalies and outliers have wide applications in Machine Learning. While in some cases, you can capture them and remove them from the data to improve the performance of a downstream ML model, in other cases, they become the data points of interest where we endeavor to find them in order to shed light into our data.

Wordview offers several anomaly and outlier detection functions. See anomalies documentation page for usage and examples.

Clusters

Clustering can be used to identify different groups of documents with similar information, in an unsupervised fashion. Despite it’s ability to provide valuable insights into your data, you do not need labeled data for clustering. See wordview’s clustering documentation page for usage and examples.

Utilities

Wordview offers a number of utility functions that you can use for common pre and post processing tasks in NLP. See utilities documentation page for usage and examples.

Contributing

Thank you for contributing to wordview! We and the users of this repo appreciate your efforts! You can visit the contributing page for detailed instructions about how you can contribute to Wordview.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wordview-1.1.0.tar.gz (24.9 kB view hashes)

Uploaded Source

Built Distribution

wordview-1.1.0-py3-none-any.whl (30.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page