Paste data as Python DataFrame definitions
Project description
datapasta
A Python package inspired by the R datapasta package for pasting tabular data into DataFrame code. datapasta analyzes clipboard content or text input and generates Python code to recreate the data as a pandas or polars DataFrame.
Features
- Automatic detection of delimiters (comma, tab, pipe, semicolon, etc.)
- Smart header detection
- Type inference for columns (int, float, boolean, datetime, string)
- Generates code for both pandas and polars DataFrames
- Command-line interface for easy integration with text editors
- Simple API for programmatic use
Installation
# Install with pip
pip install datapasta
# With Pyperclip support (for Windows/MacOS, or if you are on Linux but not using X windows manager)
pip install datapasta[pyperclip]
# With Pandas support
pip install datapasta[pandas]
# With Polars support
pip install datapasta[polars]
# For Polars on older CPUs
pip install datapasta[polars-lts-cpu]
The
pandasandpolars/polars-lts-cpudependencies are not included in the package by default, as typically you don't need to actually execute any code in those libraries. If you use the--reprCLI flag you do, hence the extras are provided for convenience.
Command Line Usage
usage: datapasta [-h] [--file FILE] [--sep SEP] [--max-rows MAX_ROWS]
[--polars] [--header {auto,yes,no}] [--legacy] [--repr]
Convert clipboard or text to DataFrame code
options:
-h, --help show this help message and exit
--file FILE, -f FILE Input file (if not using clipboard)
--sep SEP, -s SEP Separator (default: auto-detect)
--max-rows MAX_ROWS, -m MAX_ROWS
Max rows to parse
--polars, -p Generate polars code (default: pandas)
--header {auto,yes,no}
Header detection: 'auto' to detect automatically,
'yes' to force header, 'no' to force no header
--legacy Use legacy clipboard access (don't use cliptargets)
--repr, -r Execute the code and print the DataFrame repr
GitHub Artifacts example
If you go to the GitHub Actions results summary page you see a HTML table. datapasta will generate the DataFrame code for you from the clipboard :magic_wand:
(datapasta) louis ๐ถ ~/dev/datapasta $ datapasta --polars
import polars as pl
df = pl.DataFrame({
'Name': ['wheels-linux-aarch64', 'wheels-linux-armv7', 'wheels-linux-ppc64le',
'wheels-linux-s390x'],
'Size': ['4.2 MB', '3.78 MB', '4.63 MB', '5.5 MB'],
})
(datapasta) louis ๐ถ ~/dev/datapasta $ python -ic "$(datapasta --polars)"
>>> print(df)
shape: (4, 2)
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโ
โ Name โ Size โ
โ --- โ --- โ
โ str โ str โ
โโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโก
โ wheels-linux-aarch64 โ 4.2 MB โ
โ wheels-linux-armv7 โ 3.78 MB โ
โ wheels-linux-ppc64le โ 4.63 MB โ
โ wheels-linux-s390x โ 5.5 MB โ
โโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโ
If that's all you want, run:
datapasta --polars --repr
This will automatically execute the code and print out the result (you must have the DataFrame library installed!)
shape: (4, 2)
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโ
โ Name โ Size โ
โ --- โ --- โ
โ str โ str โ
โโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโก
โ wheels-linux-aarch64 โ 4.2 MB โ
โ wheels-linux-armv7 โ 3.78 MB โ
โ wheels-linux-ppc64le โ 4.63 MB โ
โ wheels-linux-s390x โ 5.5 MB โ
โโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโ
How It Works
- datapasta checks if the
cliptargetspackage is available - If available, it looks for the
text/htmltarget in the clipboard - If HTML content is found, it extracts tables using a lightweight HTML parser
- It detects headers based on HTML structure (
<thead>or<th>elements) - If no HTML content is found or no tables are present, it falls back to the text-based parsing
This feature is particularly useful when copying tables from web applications, where the HTML structure provides more reliable information about the table's layout and headers than plain text.
Note:
- Will not use HTML if it can parse the table from text
- Will only parse up to 10,000 rows (see
max_rowsargument) unless told otherwise
Usage
Command Line
# Read from clipboard, generate pandas code
datapasta > dataframe_code.py
# Read from clipboard, generate polars code
datapasta --polars > dataframe_code.py
# Read from file instead of clipboard
datapasta --file data.csv > dataframe_code.py
# Specify a separator (otherwise auto-detected)
datapasta --sep "," > dataframe_code.py
Python API
import datapasta
# Read from clipboard and get pandas code
pandas_code = datapasta.clipboard_to_pandas()
print(pandas_code)
# Read from clipboard and get polars code
polars_code = datapasta.clipboard_to_polars()
print(polars_code)
# Convert text directly to DataFrame code
csv_text = """name,age,city
Alice,25,New York
Bob,30,San Francisco
Charlie,35,Seattle"""
pandas_code = datapasta.text_to_pandas(csv_text)
print(pandas_code)
Controlling Header Detection
datapasta attempts to automatically detect whether your data has a header row, but you can override this behavior when needed:
Command Line
# Auto-detect headers (default behavior)
datapasta --file data.csv
# Force using the first row as a header
datapasta --file data.csv --header yes
# Force no header (generate column names like col1, col2, etc.)
datapasta --file data.csv --header no
Python API
import datapasta
# Auto-detect headers (default)
code = datapasta.text_to_pandas(text)
# Force using the first row as a header
code = datapasta.text_to_pandas(text, has_header=True)
# Force no header
code = datapasta.text_to_pandas(text, has_header=False)
This is particularly useful when:
- The auto-detection logic misidentifies numeric headers as data
- You want to preserve the first row as data but datapasta treats it as a header
- You need consistent column names (col1, col2, etc.) for multiple similar datasets
Enhanced HTML Table Support
datapasta has the ability to extract tables directly from HTML content in the clipboard (as a fallback measure, experimental).
This is especially useful when copying tables from web pages, spreadsheets, or other applications that place HTML content in the clipboard.
import datapasta
# Will automatically use HTML table content if available
code = datapasta.clipboard_with_targets_to_pandas()
print(code)
Examples
From a CSV in the clipboard
name,age,city
Alice,25,New York
Bob,30,San Francisco
Charlie,35,Seattle
datapasta will generate:
import pandas as pd
df = pd.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"city": ["New York", "San Francisco", "Seattle"],
})
From a TSV in the clipboard
name age city
Alice 25 New York
Bob 30 San Francisco
Charlie 35 Seattle
datapasta will generate similar code, automatically detecting the tab delimiter.
Using in a Jupyter notebook
import datapasta
# Assuming you've copied data to clipboard
code = datapasta.clipboard_to_pandas()
print("Generated code:")
print(code)
# Execute the code to create the DataFrame
exec(code)
# Now 'df' contains your DataFrame
display(df)
How It Works
datapasta works by:
- Reading text from the clipboard or a file
- Intelligently guessing the delimiter/separator
- Detecting if there's a header row
- Inferring column types (int, float, boolean, datetime, string)
- Generating code to create a pandas or polars DataFrame
Project Structure
clipboard.py: Functions for reading from the system clipboardparser.py: Functions for parsing text data, detecting delimiters, and headerstype_inference.py: Functions for inferring column data typesformatter.py: Functions for generating pandas and polars codemain.py: Main entry points and CLI functionality
Contributing
Contributions welcome!
- Issues & Discussions: Please open a GitHub issue or discussion for bugs, feature requests, or questions.
- Pull Requests: PRs are welcome!
- Install the dev extra with
pip install -e ".[dev]" - Run tests with
pytest - Include updates to docs or examples if relevant
- Install the dev extra with
Requirements
- Python 3.10+
- either cliptargets (Linux X11) or pyperclip (Windows, Mac, non-X11 Linux)
License
This project is licensed under the MIT License.
Credits
Inspired by the R package datapasta by Miles McBain,
which does the same for tibble::tribble and data.frame tables (entirely separate R libraries).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datapasta-0.1.6.tar.gz.
File metadata
- Download URL: datapasta-0.1.6.tar.gz
- Upload date:
- Size: 24.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.22.3 CPython/3.12.6 Linux/5.15.0-125-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1af62f5ce3f15baed705e0694109237bc67b6d95ae06eda4cad78640cbea7d87
|
|
| MD5 |
f208a14cd8338d1a0ed697f42b76645b
|
|
| BLAKE2b-256 |
e81689c0f8c01efdf1ff189f202aef87d1d38a0efd10371dd07b675bfc3d19b3
|
File details
Details for the file datapasta-0.1.6-py3-none-any.whl.
File metadata
- Download URL: datapasta-0.1.6-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.22.3 CPython/3.12.6 Linux/5.15.0-125-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
235fdff7ac2daed4b89275d67714384e7bcf9774b6466aad7fce528e99bedfcf
|
|
| MD5 |
35f7ca847a5b56ca193ff47be5e51e25
|
|
| BLAKE2b-256 |
d515922bd334fbcb71663c0afea39b820f7d9ee95a0de9cf76f4e7f765cb8c85
|