Skip to main content

Paste data as Python DataFrame definitions

Project description

datapasta

uv pdm-managed PyPI Supported Python versions License pre-commit.ci status

A Python package inspired by the R datapasta package for pasting tabular data into DataFrame code. datapasta analyzes clipboard content or text input and generates Python code to recreate the data as a pandas or polars DataFrame.

Features

  • Automatic detection of delimiters (comma, tab, pipe, semicolon, etc.)
  • Smart header detection
  • Type inference for columns (int, float, boolean, datetime, string)
  • Generates code for both pandas and polars DataFrames
  • Command-line interface for easy integration with text editors
  • Simple API for programmatic use

Installation

# Install with pip
pip install datapasta

# With Pandas support
pip install datapasta[pandas]

# With Polars support
pip install datapasta[polars]

# For Polars on older CPUs
pip install datapasta[polars-lts-cpu]

The polars dependency is not included in the package by default. It is shipped as an optional extra which can be activated by passing it in square brackets.

GitHub Artifacts example

If you go to the GitHub Actions results summary page you see a HTML table. datapasta will generate the DataFrame code for you from the clipboard :magic_wand:

(datapasta) louis ๐Ÿšถ ~/dev/datapasta $ datapasta --polars
import polars as pl

df = pl.DataFrame({
    'Name': ['wheels-linux-aarch64', 'wheels-linux-armv7', 'wheels-linux-ppc64le',
'wheels-linux-s390x'],
    'Size': ['4.2 MB', '3.78 MB', '4.63 MB', '5.5 MB'],
})
(datapasta) louis ๐Ÿšถ ~/dev/datapasta $ python -ic "$(datapasta --polars)"
>>> print(df)
shape: (4, 2)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Name                 โ”† Size    โ”‚
โ”‚ ---                  โ”† ---     โ”‚
โ”‚ str                  โ”† str     โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ wheels-linux-aarch64 โ”† 4.2 MB  โ”‚
โ”‚ wheels-linux-armv7   โ”† 3.78 MB โ”‚
โ”‚ wheels-linux-ppc64le โ”† 4.63 MB โ”‚
โ”‚ wheels-linux-s390x   โ”† 5.5 MB  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Command Line Usage

# Automatically uses HTML table content if available
datapasta

# Force using legacy clipboard access (no HTML support)
datapasta --legacy

How It Works

  1. datapasta checks if the cliptargets package is available
  2. If available, it looks for the text/html target in the clipboard
  3. If HTML content is found, it extracts tables using a lightweight HTML parser
  4. It detects headers based on HTML structure (<thead> or <th> elements)
  5. If no HTML content is found or no tables are present, it falls back to the text-based parsing

This feature is particularly useful when copying tables from web applications, where the HTML structure provides more reliable information about the table's layout and headers than plain text.

Note:

  • Will not use HTML if it can parse the table from text
  • Will only parse up to 10,000 rows (see max_rows argument) unless told otherwise

Usage

Command Line

# Read from clipboard, generate pandas code
datapasta > dataframe_code.py

# Read from clipboard, generate polars code
datapasta --polars > dataframe_code.py

# Read from file instead of clipboard
datapasta --file data.csv > dataframe_code.py

# Specify a separator (otherwise auto-detected)
datapasta --sep "," > dataframe_code.py

Python API

import datapasta

# Read from clipboard and get pandas code
pandas_code = datapasta.clipboard_to_pandas()
print(pandas_code)

# Read from clipboard and get polars code
polars_code = datapasta.clipboard_to_polars()
print(polars_code)

# Convert text directly to DataFrame code
csv_text = """name,age,city
Alice,25,New York
Bob,30,San Francisco
Charlie,35,Seattle"""

pandas_code = datapasta.text_to_pandas(csv_text)
print(pandas_code)

Controlling Header Detection

datapasta attempts to automatically detect whether your data has a header row, but you can override this behavior when needed:

Command Line

# Auto-detect headers (default behavior)
datapasta --file data.csv

# Force using the first row as a header
datapasta --file data.csv --header yes

# Force no header (generate column names like col1, col2, etc.)
datapasta --file data.csv --header no

Python API

import datapasta

# Auto-detect headers (default)
code = datapasta.text_to_pandas(text)

# Force using the first row as a header
code = datapasta.text_to_pandas(text, has_header=True)

# Force no header
code = datapasta.text_to_pandas(text, has_header=False)

This is particularly useful when:

  • The auto-detection logic misidentifies numeric headers as data
  • You want to preserve the first row as data but datapasta treats it as a header
  • You need consistent column names (col1, col2, etc.) for multiple similar datasets

Enhanced HTML Table Support

datapasta has the ability to extract tables directly from HTML content in the clipboard (as a fallback measure, experimental).

This is especially useful when copying tables from web pages, spreadsheets, or other applications that place HTML content in the clipboard.

import datapasta

# Will automatically use HTML table content if available
code = datapasta.clipboard_with_targets_to_pandas()
print(code)

Examples

From a CSV in the clipboard

name,age,city
Alice,25,New York
Bob,30,San Francisco
Charlie,35,Seattle

datapasta will generate:

import pandas as pd

df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "city": ["New York", "San Francisco", "Seattle"],
})

From a TSV in the clipboard

name	age	city
Alice	25	New York
Bob	30	San Francisco
Charlie	35	Seattle

datapasta will generate similar code, automatically detecting the tab delimiter.

Using in a Jupyter notebook

import datapasta

# Assuming you've copied data to clipboard
code = datapasta.clipboard_to_pandas()
print("Generated code:")
print(code)

# Execute the code to create the DataFrame
exec(code)
# Now 'df' contains your DataFrame
display(df)

How It Works

datapasta works by:

  1. Reading text from the clipboard or a file
  2. Intelligently guessing the delimiter/separator
  3. Detecting if there's a header row
  4. Inferring column types (int, float, boolean, datetime, string)
  5. Generating code to create a pandas or polars DataFrame

Project Structure

  • clipboard.py: Functions for reading from the system clipboard
  • parser.py: Functions for parsing text data, detecting delimiters, and headers
  • type_inference.py: Functions for inferring column data types
  • formatter.py: Functions for generating pandas and polars code
  • main.py: Main entry points and CLI functionality

Contributing

Contributions welcome!

  1. Issues & Discussions: Please open a GitHub issue or discussion for bugs, feature requests, or questions.
  2. Pull Requests: PRs are welcome!
    • Install the dev extra with pip install -e ".[dev]"
    • Run tests with pytest
    • Include updates to docs or examples if relevant

Requirements

  • Python 3.10+
  • pyperclip (for clipboard access)

License

This project is licensed under the MIT License.

Credits

Inspired by the R package datapasta by Miles McBain, which does the same for tibble::tribble and data.frame tables (entirely separate R libraries).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datapasta-0.1.3.tar.gz (23.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datapasta-0.1.3-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file datapasta-0.1.3.tar.gz.

File metadata

  • Download URL: datapasta-0.1.3.tar.gz
  • Upload date:
  • Size: 23.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.22.3 CPython/3.10.6 Linux/5.15.0-125-generic

File hashes

Hashes for datapasta-0.1.3.tar.gz
Algorithm Hash digest
SHA256 48d783f56a6157f53b4dd966fffe3f80b8e4d65ebcb70fcba8114e71a6084014
MD5 914feca0098052459d6bd2eff3f68b8c
BLAKE2b-256 aded5b25270e0a61db23cc0c5af37a3c75796c7d18d42c353a85df7d6f87fc40

See more details on using hashes here.

File details

Details for the file datapasta-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: datapasta-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.22.3 CPython/3.10.6 Linux/5.15.0-125-generic

File hashes

Hashes for datapasta-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7d7609b929988024f6d48e17cf7c1360f4a33b86b097223e8d29c6ea866320d7
MD5 c5618f866bc3cca3b67af0b81a321b10
BLAKE2b-256 6fbcf80afca9b16df1e961bea69da87e30183fd66a00154cb2509aee4fa19009

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page