Peek at local datafiles fast!
Project description
sleepydatapeek
A quick way to peek at local datafiles.
Welcome to sleepydatapeek!
One often needs to spit out a configurable preview of a data file. It would also be nice if said tool could detect and read several formats automatically.
sleepydatapeek has entered the chat!
Quickly summarize data files of type:
csvparquetjsonpklxlsx
And glance metadata for files:
pdfpngjpg|jpeg
ℹ️ Note that this tool presumes format by file extension. If you leave out extensions, or give csv data a
.jsonextension for funsies, then you're being silly.
ℹ️ Due to how metadata formats vary across file types, how metadata is presented varies.
ℹ️ For further configuration options, see the sleepyconfig section below.
Get Started 🚀
pip install sleepydatapeek
pip install --upgrade sleepydatapeek
python -m sleepydatapeek --help
python -m sleepydatapeek data.csv
python -m sleepydatapeek doc.pdf
Usage ⚙
Set a function in your shell environment to run a script like:
alias datapeek='python -m sleepydatapeek'
Presuming you've named said macro datapeek, print the help message:
$ datapeek data.xlsx
════════════════════ data.xlsx ════════════════════
Unnamed: 0 CustomerID ProductName Quantity OrderDate Price
-- ------------ ------------ ------------- ---------- ----------- -------
0 0 101 Laptop 2 2023-10-26 1200
1 1 102 Mouse 1 2023-10-26 25
2 2 103 Keyboard 1 2023-10-27 50
3 3 104 Monitor 1 2023-10-27 300
4 4 105 Headphones 3 2023-10-28 80
═══Summary Stats
╭──────────────┬─────────────────╮
│ Index Column │ (no_name):int64 │
├──────────────┼─────────────────┤
│ Row Count │ 30 │
├──────────────┼─────────────────┤
│ Column Count │ 6 │
├──────────────┼─────────────────┤
│ Memory Usage │ < 0.00 bytes │
╰──────────────┴─────────────────╯
═══Schema
╭─────────────┬────────╮
│ Unnamed: 0 │ int64 │
├─────────────┼────────┤
│ CustomerID │ int64 │
├─────────────┼────────┤
│ ProductName │ object │
├─────────────┼────────┤
│ Quantity │ int64 │
├─────────────┼────────┤
│ OrderDate │ object │
├─────────────┼────────┤
│ Price │ int64 │
╰─────────────┴────────╯
═══════════════════════════════════════════════════
Optionally, you can also get group-by counts for distinct values of a given column:
$ datapeek test.xlsx --groupby-count-column=ProductName
# typical output (elided)
═══Groupby Counts
(row counts for distinct values of ProductName)
╭──────────────┬───╮
│ Laptop │ 3 │
├──────────────┼───┤
│ Mouse │ 3 │
├──────────────┼───┤
│ Keyboard │ 3 │
├──────────────┼───┤
│ Monitor │ 3 │
├──────────────┼───┤
│ Headphones │ 3 │
├──────────────┼───┤
│ USB Drive │ 3 │
├──────────────┼───┤
│ Printer │ 3 │
├──────────────┼───┤
│ Webcam │ 3 │
├──────────────┼───┤
│ Speakers │ 3 │
├──────────────┼───┤
│ External HDD │ 3 │
╰──────────────┴───╯
═══════════════════════════════════════════════════
You can check metadata for certain file types too:
$ datapeek resume.pdf
📄test.pdf
╭──────────────┬─────────────────────────────────╮
│ CreationDate │ D:20250306111007-06'00' │
├──────────────┼─────────────────────────────────┤
│ Creator │ Adobe InDesign 20.1 (Macintosh) │
├──────────────┼─────────────────────────────────┤
│ ModDate │ D:20250306111048-06'00' │
├──────────────┼─────────────────────────────────┤
│ Producer │ Adobe PDF Library 17.0 │
├──────────────┼─────────────────────────────────┤
│ Trapped │ /False │
├──────────────┼─────────────────────────────────┤
│ Length │ 48 pages │
╰──────────────┴─────────────────────────────────╯
SleepyConfig
You can personalize a few aspects of datapeek's behavior via a file strictly named ~/.sleepyconfig/params.yml. Paste the following into said file, and tinker to your liking:
datapeek_sample_size: 5
datapeek_table_style: 'rounded_grid'
datapeek_max_terminal_width: 80
All other sleepytools use this file as well. Browse my PyPI if you're interested!
Technologies 🧰
Contribute 🤝
If you have thoughts on how to make the tool more pragmatic, submit a PR 😊.
To add support for more data/file types:
- append extension name to
supported_formatsinsleepydatapeek_toolchain.params.py - add detection logic branch to the
mainfunction insleepydatapeek_toolchain/command_logic.py - update this readme
License, Stats, Author 📜
See License for the full license text.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sleepydatapeek-1.7.3.tar.gz.
File metadata
- Download URL: sleepydatapeek-1.7.3.tar.gz
- Upload date:
- Size: 33.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.2 Darwin/24.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebeb3591d3610403ab74b313b8b96daa1e49e3a8e3458fae2f19062cf6044ed7
|
|
| MD5 |
a35da681fb8829c7f37acde09dab57da
|
|
| BLAKE2b-256 |
af8733bd79df4a3045a49d6c6947886bb3b6f46a342d9304ddd875df3b46148a
|
File details
Details for the file sleepydatapeek-1.7.3-py3-none-any.whl.
File metadata
- Download URL: sleepydatapeek-1.7.3-py3-none-any.whl
- Upload date:
- Size: 32.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.2 Darwin/24.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db5985d60a12cc98e74610a3b8d6055bbaa181302dcb78d5987dd190f305da35
|
|
| MD5 |
53c865c3aad913ecad67eadf4a4d9dd6
|
|
| BLAKE2b-256 |
660441f5946d3ca4397ac6c92ff491656cfac2b0c8375c32d9e3839d4dc38e66
|