Skip to main content

Spicy data blending and time-series DataFrame merging — Thai food truck style.

Project description

ThaiTruck

Spicy data blending and time-series DataFrame merging — Thai food truck style.

You've got six DataFrames. Three different date column names. Two frequencies.
One deadline.

ThaiTruck.

pip install thaitruck

The Problem

Every data engineer has stared at something like this:

prices_df      # daily, column called "Date"
earnings_df    # quarterly, column called "report_date"
macro_df       # monthly, column called "ts"
sentiment_df   # irregular, index is already a DatetimeIndex

And thought: I just want one DataFrame.

That's what ThaiTruck is for.


The Menu

fried_rice — The Flagship

Merge N DataFrames with mismatched timestamps into one coherent result.

The workhorse. Handles date auto-detection, frequency normalization, forward-filling, and conflict resolution. Accepts as many DataFrames as you can throw at it.

from thaitruck import fried_rice

result = fried_rice(prices_df, earnings_df, macro_df, freq="D")

Parameters:

Parameter Default Description
*dfs Two or more DataFrames
freq "D" Target frequency ("D", "W", "ME", "QE", …)
heat 3 Conflict resolution — see Heat Guide below
fuzzy_columns False Normalize column names before merging
fill_method "ffill" "ffill", "bfill", or "interpolate"
date_col None Override auto-detection
# Quarterly earnings merged into a daily price series
result = fried_rice(
    prices_df,        # daily, "Date" column
    earnings_df,      # quarterly, "report_date" column
    macro_df,         # monthly, "ts" column
    freq="D",
    heat=3,
    fuzzy_columns=True,
)

ThaiTruck auto-detects columns named date, ts, timestamp, report_date, trade_date, as_of_date, and more. If your column has a truly cursed name, pass date_col="your_cursed_name".


orange_chicken — The Glaze

Normalize and transform raw data into clean, uniform output.

Raw data is ugly. orange_chicken fixes that. Column names lowercased, separators unified, numeric strings coerced, boolean strings resolved, sparse columns evicted.

from thaitruck import orange_chicken

clean = orange_chicken(raw_df, heat=3)

What each heat level glazes:

Heat What gets cleaned
1 Column names only ("Open Price""open_price")
2 + strip cell whitespace, drop all-null rows and columns
3 + coerce numeric strings to numbers (default)
4 + coerce boolean strings ("yes"/"true"/"on"True), drop ≥90% null columns
5 + drop ≥50% null columns (napalm)
# Raw CSV fresh off the truck
raw = pd.DataFrame({
    "  Open Price  ": ["1,250.00", "1,300.00"],
    "Active?":        ["yes", "no"],
    "Notes":          [None, None],   # 100% null — getting dropped at heat=2
})

clean = orange_chicken(raw, heat=4)
# columns: open_price (float), active (bool)

larb — The Raw Bar

Fast statistical profile of a DataFrame. No cooking required.

larb gives you a one-row-per-column profile covering counts, nulls, descriptive stats, and outlier detection via IQR fences. Heat controls how aggressively it flags outliers.

from thaitruck import larb

profile = larb(df, heat=3)
print(profile)
         dtype  count  null_pct     mean      std    min    p25  median    p75     max    skew  lower_fence  upper_fence  outliers  outlier_pct  ...
price    float64    365      0.0  142.30    38.21  88.00  112.0  140.00  168.0  310.00    0.72        56.0        224.0         3         0.82
volume   int64      365      0.0  1.02M   480K      10K  700K   980K    1.3M    8.5M      2.10      -350K        2.35M         2         0.55

Outlier sensitivity by heat:

Heat IQR Multiplier What gets flagged
1 × 3.0 Extreme outliers only
2 × 2.5
3 × 2.0 Moderate outliers (default)
4 × 1.5 Standard Tukey fences
5 × 1.0 Very sensitive — expects tightly clustered data

Non-numeric columns get unique, top, and top_freq instead of numeric stats.


pad_thai — The Noodles

String padding, alignment, and formatting utilities.

Works on a single string, a list, or a pandas Series. Handles left/right/center alignment and optional truncation with a trailing ellipsis.

from thaitruck import pad_thai

pad_thai("close", 10)                          # "close     "
pad_thai("close", 10, align="right")           # "     close"
pad_thai("close", 10, align="center")          # "  close   "
pad_thai("a very long label", 12, truncate=True)  # "a very long…"

# Works on a Series too
df["ticker"] = pad_thai(df["ticker"], width=6, align="right")

sticky_rice — The Cache

Persistent disk cache for expensive computations.

Wrap any function. Results are pickled to .thaitruck_cache/ and reused within the TTL. When the cache is warm, the function never runs.

from thaitruck import sticky_rice

@sticky_rice(ttl=3600)
def fetch_and_merge(ticker: str) -> pd.DataFrame:
    # ... expensive API calls, processing, merging ...
    return result

df = fetch_and_merge("NVDA")  # computed and cached
df = fetch_and_merge("NVDA")  # served from disk in milliseconds

Clear the cache manually when you need a fresh run:

fetch_and_merge.clear()

Options:

@sticky_rice(
    ttl=1800,                    # seconds before expiry (0 = never)
    key="my_fixed_key",          # fixed key instead of hash
    cache_dir=Path("/tmp/cache") # custom cache directory
)
def my_fn(): ...

satay — The Skewer

Expressive multi-dimensional DataFrame slicing.

Pass any combination of column names, row slices, range filters, equality filters, and callables. Skewers are applied in order — row filters first, column selectors last.

from thaitruck import satay

# Column selection
satay(df, "price")
satay(df, ["price", "volume"])

# Row slice (positional)
satay(df, slice(0, 100))

# Range filter
satay(df, ("price", 10.0, 50.0))

# Equality / isin filter
satay(df, {"sector": "Tech"})
satay(df, {"sector": ["Tech", "Energy"]})

# Lambda
satay(df, lambda d: d["volume"] > 1_000_000)

# Mix and match — filters applied left to right
satay(df, {"sector": "Tech"}, ("price", 10, 200), "price", "volume")

tom_kha — The Broth

Deep config merging with sensible coconut-milk defaults.

Later dicts win. Nested dicts are merged recursively — not overwritten wholesale. Lists are replaced. Pass defaults= for a base that everything else overrides.

from thaitruck import tom_kha

config = tom_kha(
    base_config,
    env_config,
    cli_overrides,
    defaults={"retries": 3, "timeout": 30, "db": {"port": 5432}},
)
tom_kha(
    {"db": {"host": "localhost", "port": 5432}},
    {"db": {"port": 5433}, "debug": True},
)
# → {"db": {"host": "localhost", "port": 5433}, "debug": True}

The Heat Guide

Most ThaiTruck functions accept a heat parameter (1–5). The metaphor is consistent: higher heat is more aggressive.

Heat Vibe
1 Mild. Barely noticeable. Tourist-safe.
2 A little warmth.
3 Medium. The default. Regular customer.
4 Getting spicy. Know what you're doing.
5 Napalm. No survivors.

Installation

pip install thaitruck

Requires Python ≥ 3.9 and pandas ≥ 1.5.


The Full Menu

from thaitruck import fried_rice    # time-series DataFrame merger
from thaitruck import orange_chicken # data normalization and cleaning
from thaitruck import larb           # fast statistical profiling
from thaitruck import pad_thai       # string padding and alignment
from thaitruck import sticky_rice    # persistent disk caching
from thaitruck import satay          # expressive DataFrame slicing
from thaitruck import tom_kha        # deep config dict merging

Why the name?

Mrs. Babble Baz looked over at the screen one day and said "why do you people make up such ridiculous names for things?"

She had a point. pandas is a ridiculous name for a data library. pickle is a serialization format. fuzzywuzzy is a string matcher. These are load-bearing tools in production systems at serious companies, and they sound like rejected Muppet characters.

So we leaned in. If the name is going to be unhinged, it should at least be sizzling hot.

ThaiTruck is genuinely useful. The food truck is just the vibe — and Mrs. Babble Baz is why it exists.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thaitruck-0.2.2.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thaitruck-0.2.2-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file thaitruck-0.2.2.tar.gz.

File metadata

  • Download URL: thaitruck-0.2.2.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for thaitruck-0.2.2.tar.gz
Algorithm Hash digest
SHA256 8b60456f38b9abf6f1f74729aa795e88b981c1a9330cceae1bcf0419b099b25c
MD5 13a7149fc7d1abc757799a0c0191b52f
BLAKE2b-256 015ca5d690fc23cd051b26b2ebf4c422ade141e6e44bd8e71feaa0b8a51f4083

See more details on using hashes here.

File details

Details for the file thaitruck-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: thaitruck-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for thaitruck-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a16a7a4558bb067cd87a646fe2c80ef5064f3117c65501c31cdc321dff25de8c
MD5 323b22c3497b4c79f66130145214b839
BLAKE2b-256 dfabd4e6f8fdaf7fb840dbf926e5de50113bbea1bcdaf2dc9178a70b5243a8ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page