Metafeature Extraction for Unstructured Data
Project description
Elemeta: Metafeature Extraction for Unstructured Data
Elemeta is an open-source library in Python for metafeature extraction. With it, you will be able to explore, monitor, and extract features from unstructured data through enriched tabular representations. It provides a straightforward Python API for metadata extraction from unstructured data like text and images.
Key usage of Elemeta includes:
- Exploratory Data Analysis (EDA) - extract useful metadata information on unstructured data to analyze, investigate, and summarize the main characteristics and employ data visualization methods.
- Data and model monitoring - utilize structured ML monitoring techniques in addition to the typical latent embedding visualizations.
- Feature extraction - engineer alternative features to be utilized in simpler models such as decision trees.
Getting Started
Get started with Elemeta by installing the Python library via pip
pip install elemeta
Once installed, there are a few example dataframes that can be used for testing the library.
You can find them in elemeta.dataset.dateset
from elemeta.dataset.dataset import get_imdb_reviews
# Load existing dataframe
reviews = get_imdb_reviews()
After you have a dataset with the text column, you can start using the library with the following Python API:
from elemeta.nlp.metadata_extractor_runner import MetadataExtractorsRunner
metadata_extractor_runner = MetadataExtractorsRunner()
reviews = metadata_extractor_runner.run_on_dataframe(dataframe=reviews,text_column='review')
reviews.show()
Pandas DataFrames
Elemeta can enrich standard dataframe objects:
from elemeta.nlp.metadata_extractor_runner import MetadataExtractorsRunner import pandas as pd
df = pd.dataframe({"text": ["Hi I just met you, and this is crazy","What does the fox say?","I love robots" })
metadata_extractor_runner = MetadataExtractorsRunner()
df_with_metadata = metadata_extractor_runner.run_on_dataframe(dataframe=reviews,text_column="text")
Strings
Elemeta can enrich specific strings:
from elemeta.nlp.metadata_extractor_runner import MetadataExtractorsRunner
metadata_extractor_runner = MetadataExtractorsRunner()
metadata_extractor_runner.run("This is a text about how good life is :)")
Documentation
This package aims to help enrich non-tabular data (i.e. text:nlp pictures: image processing and so on). Currently, we only support textual data, and we enrich it by extracting meta features (such as avg word length).
Community
Elemeta is brand new, so we don't have a formal process for contributions just yet. If you have feedback or would like to contribute, just go ahead and post a GitHub issue
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file elemeta-1.0.2.tar.gz
.
File metadata
- Download URL: elemeta-1.0.2.tar.gz
- Upload date:
- Size: 30.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.10.6 Linux/5.15.0-1035-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 840757c09e811e3f18d6078119fda47c6ec2bb7d3ff7765c4299e4c78feb2138 |
|
MD5 | bed6ee782783cf5bd03af20f2d4a5688 |
|
BLAKE2b-256 | 8dcdff0f7480ad11cc463437814c5926187d120218b546dce12e2fe3ce490140 |
File details
Details for the file elemeta-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: elemeta-1.0.2-py3-none-any.whl
- Upload date:
- Size: 30.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.10.6 Linux/5.15.0-1035-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 611c689d0bdefbdcc12564e74914224e16c36b997cee6e320d956bd0a69c096e |
|
MD5 | 6ff61bba715b3aa9d4f6f072aa9802e7 |
|
BLAKE2b-256 | 6b08ba31731374f7f2985b3fa3c528209ab02adac2dd1772a4013959282e0395 |