Skip to main content

Peek at local datafiles fast!

Project description

sleepydatapeek

A quick way to peek at local datafiles.


Welcome to sleepydatapeek!

One often needs to spit out a configurable preview of a data file. It would also be nice if said tool could detect and read several formats automatically.
sleepydatapeek has entered the chat!

Quickly summarize data files of type:

  • csv
  • parquet
  • json
  • pkl
  • xlsx

And glance metadata for files:

  • pdf
  • png
  • jpg|jpeg

ℹ️ Note that this tool presumes format by file extension. If you leave out extensions, or give csv data a .json extension for funsies, then you're being silly.

ℹ️ Due to how metadata formats vary across file types, how metadata is presented varies.

ℹ️ For further configuration options, see the sleepyconfig section below.


Get Started 🚀

pip install sleepydatapeek
pip install --upgrade sleepydatapeek

python -m sleepydatapeek --help
python -m sleepydatapeek data.csv
python -m sleepydatapeek doc.pdf

Usage ⚙

Set a function in your shell environment to run a script like:

alias datapeek='python -m sleepydatapeek'

Presuming you've named said macro datapeek, print the help message:

$ datapeek data.xlsx

════════════════════ data.xlsx ════════════════════
      Unnamed: 0    CustomerID  ProductName      Quantity  OrderDate      Price
--  ------------  ------------  -------------  ----------  -----------  -------
 0             0           101  Laptop                  2  2023-10-26      1200
 1             1           102  Mouse                   1  2023-10-26        25
 2             2           103  Keyboard                1  2023-10-27        50
 3             3           104  Monitor                 1  2023-10-27       300
 4             4           105  Headphones              3  2023-10-28        80

═══Summary Stats
╭──────────────┬─────────────────╮
│ Index Column  (no_name):int64 │
├──────────────┼─────────────────┤
│ Row Count     30              │
├──────────────┼─────────────────┤
│ Column Count  6               │
├──────────────┼─────────────────┤
│ Memory Usage  < 0.00 bytes    │
╰──────────────┴─────────────────╯

═══Schema
╭─────────────┬────────╮
│ Unnamed: 0   int64  │
├─────────────┼────────┤
│ CustomerID   int64  │
├─────────────┼────────┤
│ ProductName  object │
├─────────────┼────────┤
│ Quantity     int64  │
├─────────────┼────────┤
│ OrderDate    object │
├─────────────┼────────┤
│ Price        int64  │
╰─────────────┴────────╯
═══════════════════════════════════════════════════

Optionally, you can also get group-by counts for distinct values of a given column:

$ datapeek test.xlsx --groupby-count-column=ProductName

# typical output (elided)

═══Groupby Counts
  (row counts for distinct values of ProductName)
╭──────────────┬───╮
│ Laptop        3 │
├──────────────┼───┤
│ Mouse         3 │
├──────────────┼───┤
│ Keyboard      3 │
├──────────────┼───┤
│ Monitor       3 │
├──────────────┼───┤
│ Headphones    3 │
├──────────────┼───┤
│ USB Drive     3 │
├──────────────┼───┤
│ Printer       3 │
├──────────────┼───┤
│ Webcam        3 │
├──────────────┼───┤
│ Speakers      3 │
├──────────────┼───┤
│ External HDD  3 │
╰──────────────┴───╯
═══════════════════════════════════════════════════

You can check metadata for certain file types too:

$ datapeek resume.pdf

📄test.pdf
╭──────────────┬─────────────────────────────────╮
│ CreationDate │ D:20250306111007-06'00'         │
├──────────────┼─────────────────────────────────┤
│ Creator      │ Adobe InDesign 20.1 (Macintosh) │
├──────────────┼─────────────────────────────────┤
│ ModDate      │ D:20250306111048-06'00'         │
├──────────────┼─────────────────────────────────┤
│ Producer     │ Adobe PDF Library 17.0          │
├──────────────┼─────────────────────────────────┤
│ Trapped      │ /False                          │
├──────────────┼─────────────────────────────────┤
│ Length       │ 48 pages                        │
╰──────────────┴─────────────────────────────────╯

SleepyConfig

You can personalize a few aspects of datapeek's behavior via a file strictly named ~/.sleepyconfig/params.yml. Paste the following into said file, and tinker to your liking:

datapeek_sample_size: 5
datapeek_table_style: 'rounded_grid'
datapeek_max_terminal_width: 80

All other sleepytools use this file as well. Browse my PyPI if you're interested!


Technologies 🧰


Contribute 🤝

If you have thoughts on how to make the tool more pragmatic, submit a PR 😊.

To add support for more data/file types:

  1. append extension name to supported_formats in sleepydatapeek_toolchain.params.py
  2. add detection logic branch to the main function in sleepydatapeek_toolchain/command_logic.py
  3. update this readme

License, Stats, Author 📜

example image tag

PyPI - License PyPI - Version GitHub repo size

See License for the full license text.

This package was authored by Isaac Yep.
👉 GitHub
👉 PyPI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sleepydatapeek-1.7.3.tar.gz (33.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sleepydatapeek-1.7.3-py3-none-any.whl (32.8 kB view details)

Uploaded Python 3

File details

Details for the file sleepydatapeek-1.7.3.tar.gz.

File metadata

  • Download URL: sleepydatapeek-1.7.3.tar.gz
  • Upload date:
  • Size: 33.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.2 Darwin/24.4.0

File hashes

Hashes for sleepydatapeek-1.7.3.tar.gz
Algorithm Hash digest
SHA256 ebeb3591d3610403ab74b313b8b96daa1e49e3a8e3458fae2f19062cf6044ed7
MD5 a35da681fb8829c7f37acde09dab57da
BLAKE2b-256 af8733bd79df4a3045a49d6c6947886bb3b6f46a342d9304ddd875df3b46148a

See more details on using hashes here.

File details

Details for the file sleepydatapeek-1.7.3-py3-none-any.whl.

File metadata

  • Download URL: sleepydatapeek-1.7.3-py3-none-any.whl
  • Upload date:
  • Size: 32.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.2 Darwin/24.4.0

File hashes

Hashes for sleepydatapeek-1.7.3-py3-none-any.whl
Algorithm Hash digest
SHA256 db5985d60a12cc98e74610a3b8d6055bbaa181302dcb78d5987dd190f305da35
MD5 53c865c3aad913ecad67eadf4a4d9dd6
BLAKE2b-256 660441f5946d3ca4397ac6c92ff491656cfac2b0c8375c32d9e3839d4dc38e66

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page