Skip to main content

Metafeature Extraction for Unstructured Data

Project description

Elemeta: Metafeature Extraction for Unstructured Data

Elemeta is an open-source library in Python for metafeature extraction. With it, you will be able to explore, monitor, and extract features from unstructured data through enriched tabular representations. It provides a straightforward Python API for metafeature extraction from unstructured data like text and images.

Key usage of Elemeta includes:

  • Exploratory Data Analysis (EDA) - extract useful metafeature on unstructured data to analyze, investigate, and summarize the main characteristics and employ data visualization methods.
  • Data and model monitoring - utilize structured ML monitoring techniques in addition to the typical latent embedding visualizations.
  • Feature extraction - engineer alternative features to be utilized in simpler models such as decision trees.

Getting Started

Get started with Elemeta by installing the Python library via pip

pip install elemeta

Once installed, there are a few example dataframes that can be used for testing the library. You can find them in elemeta.dataset.dateset

from elemeta.dataset.dataset import get_imdb_reviews
# Load existing dataframe
reviews = get_imdb_reviews()

After you have a dataset with the text column, you can start using the library with the following Python API:

from elemeta.nlp.runners.metafeature_extractors_runner import MetafeatureExtractorsRunner

metafeature_extractors_runner = MetafeatureExtractorsRunner()
reviews = metafeature_extractors_runner.run_on_dataframe(dataframe=reviews, text_column='review')
reviews.show()

Pandas DataFrames

Elemeta can enrich standard dataframe objects:

from elemeta.nlp.runners.metafeature_extractors_runner import MetafeatureExtractorsRunner
import pandas as pd

df = pd.dataframe({"text": ["Hi I just met you, and this is crazy", "What does the fox say?", "I love robots"})
metafeature_extractors_runner = MetafeatureExtractorsRunner()
df_with_metafeatures = metafeature_extractors_runner.run_on_dataframe(dataframe=reviews, text_column="text")

Strings

Elemeta can enrich specific strings:

from elemeta.nlp.runners.metafeature_extractors_runner import MetafeatureExtractorsRunner

metafeature_extractors_runner = MetafeatureExtractorsRunner()
metafeature_extractors_runner.run("This is a text about how good life is :)")

Documentation

This package aims to help enrich non-tabular data (i.e. text:nlp pictures: image processing and so on). Currently, we only support textual data, and we enrich it by extracting meta features (such as avg word length).

Community

Elemeta is brand new, so we don't have a formal process for contributions just yet. If you have feedback or would like to contribute, just go ahead and post a GitHub issue

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elemeta-1.1.2.tar.gz (30.4 MB view details)

Uploaded Source

Built Distribution

elemeta-1.1.2-py3-none-any.whl (30.4 MB view details)

Uploaded Python 3

File details

Details for the file elemeta-1.1.2.tar.gz.

File metadata

  • Download URL: elemeta-1.1.2.tar.gz
  • Upload date:
  • Size: 30.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.12 Linux/6.2.0-1012-azure

File hashes

Hashes for elemeta-1.1.2.tar.gz
Algorithm Hash digest
SHA256 0469f5c10fd5e7c659fe203c4a2b1c41607ef618a68be73424cb0249ec83e83e
MD5 756de7b5cf3364d1298584ac7a028ada
BLAKE2b-256 d0aab9523d0a3149e1b060c9df6b6f8e30ef02bf41000c8e18121676b0d3df6c

See more details on using hashes here.

File details

Details for the file elemeta-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: elemeta-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 30.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.12 Linux/6.2.0-1012-azure

File hashes

Hashes for elemeta-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7a3dedfa1d6a9fc055a7a686b06c7690fdf13ba5944243453d6b03747528e644
MD5 0d5cc116ef3f10d9dff46b4aea6073c3
BLAKE2b-256 10597b9450040c496c76d8c0bf052e165c6a7e4975ae0efb5e180d3b95aa2bd1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page