Skip to main content

Wordview is a Python package for text analysis.

Project description

PyPI - Version PyPI - Downloads Dependencies License

Wordview

Wordview is a Python package for Exploratory Data Analysis of text and provides many statistics about your data in the form of plots, tables, and descriptions allowing you to have both a high-level and detailed overview of your data. It has functions to analyze explicit text elements such as words, n-grams, POS tags, and multi-word expressions, as well as implicit elements such as clusters, anomalies, and biases. Wordview’s Python API is open-source and available under the MIT license.

cover

Usage

Install the package via pip:

pip install wordview

To explore various features and functionalities, consult the documentation pages. The following sections present a high-level description of Wordview’s features and functionalities. For details, tutorials and worked examples, corresponding documentation pages are linked in each section.

Text Analysis

Using this feature, you can gain a comprehensive overview of your text data in terms of various statistics, plots, and distributions. It enables a rapid understanding of the underlying patterns present in your dataset. By visually representing the data’s nuances, this feature can aid in making informed decisions for downstream applications. It’s a step forward in ensuring that you have a grasp on the intricacies of your data before delving deeper into more complex tasks. See text analysis documentation pages for usage and examples.

Text Analysis Cover

Analysis of Labels

In the realm of Natural Language Processing (NLP), the proper analysis and understanding of labels within datasets can provide valuable insights, ensuring that models are trained on balanced and representative data. Recognizing this, Wordview is engineered to compute an array of statistics tailored for labeled datasets. These statistics cater to both document and sequence levels, providing a holistic view of the dataset’s structure. By diving deep into the intricacies of the labels, Wordview offers an enriched perspective, helping researchers and practitioners identify potential biases, discrepancies, or areas of interest, which are essential for creating robust and effective models. See label analysis documentation pages for usage and examples.

Text Analysis Cover

Extraction & Analysis of Multiword Expressions

Multiword Expressions (MWEs) are phrases that can be treated as a single semantic unit. E.g. swimming pool and climate change. MWEs have application in different areas including: parsing, language models, language generation, terminology extraction, and topic models. Wordview can extract different types of MWEs from text. See MWEs documentation page for usage and examples.

Bias Analysis

In the rapidly evolving realm of Natural Language Processing (NLP), downstream models are as unbiased and fair as the data on which they are trained. Wordview Bias Analysis module is designed to assist in the rigorous task of ensuring that underlying training datasets are devoid of explicit negative biases related to categories such as gender, race, and religion. By identifying and rectifying these biases, Wordview attempts to pave the way for the creation of more inclusive, fair, and unbiased NLP applications, leading to better user experiences and more equitable technology. See the bias analysis documentation page for usage and examples.

Analysis of Anomalies and Outliers

Anomalies and outliers have wide applications in Machine Learning. While in some cases, you can capture them and remove them from the data to improve the performance of a downstream ML model, in other cases, they become the data points of interest where we endeavor to find them in order to shed light into our data.

Wordview offers several anomaly and outlier detection functions. See anomalies documentation page for usage and examples.

Cluster Analysis

Clustering can be used to identify different groups of documents with similar information, in an unsupervised fashion. Despite it’s ability to provide valuable insights into your data, you do not need labeled data for clustering. See wordview’s clustering documentation page for usage and examples.

Utilities

Wordview offers a number of utility functions that you can use for common pre and post processing tasks in NLP. See utilities documentation page for usage and examples.

Contributing

Thank you for contributing to wordview! We and the users of this repo appreciate your efforts! You can visit the contributing page for detailed instructions about how you can contribute to Wordview.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wordview-1.1.2.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wordview-1.1.2-py3-none-any.whl (30.7 kB view details)

Uploaded Python 3

File details

Details for the file wordview-1.1.2.tar.gz.

File metadata

  • Download URL: wordview-1.1.2.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.17 Linux/5.15.0-1042-azure

File hashes

Hashes for wordview-1.1.2.tar.gz
Algorithm Hash digest
SHA256 2c5556c267e871b4e483b0913283dacb794f2d3f4f54ec1ec49f5df5ca36706d
MD5 f539e2296b915688b47190eff6081acf
BLAKE2b-256 1769e2c1506b97d1655ddf98e7961a7940f46c18562d1f56501e6aadfec94948

See more details on using hashes here.

File details

Details for the file wordview-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: wordview-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 30.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.17 Linux/5.15.0-1042-azure

File hashes

Hashes for wordview-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e02860777d1d8e668a1d96910c016dbf90dc0557985433dd15ccd30875449aa9
MD5 75331bf000feee0b96e0da4bbc168972
BLAKE2b-256 93e30c59adb92c52c0c15e23760fc4778cbc8764242b885663f95172762ce6f1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page