A Python library for easily generating low-dimensional vector embeddings from any tabular dataset.

These details have not been verified by PyPI

Project links

Project description

Row2Vec

Row2Vec is a Python library for easily generating low-dimensional vector embeddings from any tabular dataset. It uses deep learning and classical methods to create powerful, dense representations of your data, suitable for visualization, feature engineering, and gaining deeper insights into your data's structure.

Features

🎯 Multiple Embedding Methods

Neural (Autoencoder): Deep learning approach for complex, non-linear patterns
Target-based: Learn embeddings for categorical columns and their relationships
PCA: Fast, linear dimensionality reduction with interpretable components
t-SNE: Excellent for 2D/3D visualization and cluster discovery
UMAP: Balanced preservation of local and global structure

🧠 Intelligent Preprocessing

Adaptive Missing Value Imputation: Automatically analyzes patterns and applies optimal strategies
Pattern-Aware Analysis: Detects problematic missing patterns with configurable strategies
Automated Feature Engineering: Handles scaling, encoding, and preprocessing seamlessly

🚀 Advanced Features

Neural Architecture Search (NAS): Automatically discovers optimal network architectures
Multi-layer Networks: Support for deep architectures with dropout and regularization
Model Serialization: Save and load models with full preprocessing pipelines
Command-Line Interface: Complete CLI for batch processing and automation

🔧 Production Ready

Comprehensive Testing: 163+ test functions across 17 test files
Type Safety: Complete MyPy annotations
Modern Build System: Uses pyproject.toml with hatchling backend
Documentation: Interactive Jupyter Book with executable examples

Installation

pip install row2vec

Quick Start

import pandas as pd
from row2vec import learn_embedding, generate_synthetic_data

# Load your data
df = generate_synthetic_data(num_records=1000)

# Generate neural embeddings for each row
embeddings = learn_embedding(
    df,
    mode="unsupervised",
    embedding_dim=5
)
print(f"Embeddings shape: {embeddings.shape}")
print(embeddings.head())

# Learn categorical embeddings
country_embeddings = learn_embedding(
    df,
    mode="target",
    reference_column="Country",
    embedding_dim=3
)
print(f"Country embeddings: {country_embeddings}")

# Compare with classical methods
pca_embeddings = learn_embedding(df, mode="pca", embedding_dim=5)
tsne_embeddings = learn_embedding(df, mode="tsne", embedding_dim=2)

Command Line Interface

# Quick embeddings
row2vec annotate --input data.csv --output embeddings.csv --mode unsupervised --dim 5

# Train and save model
row2vec train --input data.csv --output model.py --mode unsupervised --dim 10 --epochs 50

# Use saved model
row2vec predict --input new_data.csv --model model.py --output predictions.csv

# Target-based embeddings
row2vec annotate --input data.csv --output categories.csv --mode target --target-col Category --dim 3

Advanced Usage

Neural Architecture Search

from row2vec import ArchitectureSearchConfig, search_architecture, EmbeddingConfig, NeuralConfig

# Configure architecture search
config = ArchitectureSearchConfig(
    method='random',
    max_layers=3,
    width_options=[64, 128, 256],
    max_trials=20
)

base_config = EmbeddingConfig(
    mode="unsupervised",
    embedding_dim=8,
    neural=NeuralConfig(max_epochs=50)
)

# Find optimal architecture
best_arch, results = search_architecture(df, base_config, config)
print(f"Best architecture: {best_arch}")

# Train with optimal settings
optimal_embeddings = learn_embedding(
    df,
    mode="unsupervised",
    embedding_dim=8,
    hidden_units=best_arch.get('hidden_units', [128]),
    max_epochs=100
)

Missing Value Imputation

from row2vec import ImputationConfig, AdaptiveImputer, MissingPatternAnalyzer

# Analyze missing patterns
analyzer = MissingPatternAnalyzer(ImputationConfig())
analysis = analyzer.analyze(df)
print(f"Missing patterns: {analysis['recommendations']}")

# Apply adaptive imputation
imputer = AdaptiveImputer(ImputationConfig(
    numeric_strategy='knn',
    categorical_strategy='mode',
    knn_neighbors=10
))
df_clean = imputer.fit_transform(df)

Documentation

Online Documentation

Installation Guide: Detailed setup instructions
Quick Start Tutorial: Get up and running in 5 minutes
API Reference: Complete function documentation
Example Gallery: Real-world use cases and tutorials
Advanced Features: Neural architecture search, imputation strategies

Local Documentation

User Guide: Comprehensive guide with mathematical background, detailed examples, and best practices
LLM Documentation: Practical guide for LLM coding agents integrating Row2Vec
API Reference: Complete function and class reference
Tutorials: Executable Python tutorials (Nhandu format) - run make docs to build HTML

Why Row2Vec?

Method	Row2Vec Advantage	Alternative
Manual Neural Networks	Automated preprocessing, simple API	200+ lines of boilerplate
sklearn PCA	Integrated preprocessing, multiple methods	Limited to linear reduction
sklearn t-SNE/UMAP	Unified interface, consistent preprocessing	Manual pipeline setup
Custom Embeddings	Production-ready with serialization	Significant development time

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Citation

If you use Row2Vec in your research, please cite:

@software{tresoldi_row2vec,
  author = {Tresoldi, Tiago},
  title = {Row2Vec: Neural and Classical Embeddings for Tabular Data},
  url = {https://github.com/evotext/row2vec},
  version = {1.0.0}
}

Acknowledgments

This library was originally developed as part of the "Cultural Evolution of Texts" project, led by Michael Dunn at the Department of Linguistics and Philology, Uppsala University. The project investigates the application of evolutionary models to textual data and cultural transmission patterns.

Authors

Tiago Tresoldi Affiliate Researcher, Department of Linguistics and Philology Uppsala University GitHub: @tresoldi

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Oct 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

row2vec-0.1.0.tar.gz (106.8 kB view details)

Uploaded Oct 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

row2vec-0.1.0-py3-none-any.whl (79.5 kB view details)

Uploaded Oct 13, 2025 Python 3

File details

Details for the file row2vec-0.1.0.tar.gz.

File metadata

Download URL: row2vec-0.1.0.tar.gz
Upload date: Oct 13, 2025
Size: 106.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for row2vec-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2f193a742d37742f2a9cd6b626a2d2102b20efea97fc4b7b1c02e5a454f5445f`
MD5	`558316079c676de804bc5989eb60f272`
BLAKE2b-256	`8478562ca6d78a5134b766e3a4ad25f1b2dc5b24a4fba86410b4811ed9c87194`

See more details on using hashes here.

File details

Details for the file row2vec-0.1.0-py3-none-any.whl.

File metadata

Download URL: row2vec-0.1.0-py3-none-any.whl
Upload date: Oct 13, 2025
Size: 79.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for row2vec-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9777d3ca8b3cb07793813c06eee5bbd23aeccfde9da3b1cb7c0eacc4496f1892`
MD5	`32c5f4299a278203b84fe8b92596ea2b`
BLAKE2b-256	`698080947bed71add35a5887a59f66c56cf56ed29f2e828de111ac34e79266a1`

See more details on using hashes here.

row2vec 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Row2Vec

Features

🎯 Multiple Embedding Methods

🧠 Intelligent Preprocessing

🚀 Advanced Features

🔧 Production Ready

Installation

Quick Start

Command Line Interface

Advanced Usage

Neural Architecture Search

Missing Value Imputation

Documentation

Online Documentation

Local Documentation

Why Row2Vec?

Contributing

Citation

Acknowledgments

Authors

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes