Skip to main content

Spicy data blending and time-series DataFrame merging — Thai food truck style.

Project description

ThaiTruck

Spicy data blending and time-series DataFrame merging — Thai food truck style.

You've got six DataFrames. Three different date column names. Two frequencies.
One deadline.

ThaiTruck.

pip install thaitruck

The Problem

Every data engineer has stared at something like this:

prices_df      # daily, column called "Date"
earnings_df    # quarterly, column called "report_date"
macro_df       # monthly, column called "ts"
sentiment_df   # irregular, index is already a DatetimeIndex

And thought: I just want one DataFrame.

That's what ThaiTruck is for.


The Menu

fried_rice — The Flagship

Merge N DataFrames with mismatched timestamps into one coherent result.

The workhorse. Handles date auto-detection, frequency normalization, forward-filling, and conflict resolution. Accepts as many DataFrames as you can throw at it.

from thaitruck import fried_rice

result = fried_rice(prices_df, earnings_df, macro_df, freq="D")

Parameters:

Parameter Default Description
*dfs Two or more DataFrames
freq "D" Target frequency ("D", "W", "ME", "QE", …)
heat 3 Conflict resolution — see Heat Guide below
fuzzy_columns False Normalize column names before merging
fill_method "ffill" "ffill", "bfill", or "interpolate"
date_col None Override auto-detection
# Quarterly earnings merged into a daily price series
result = fried_rice(
    prices_df,        # daily, "Date" column
    earnings_df,      # quarterly, "report_date" column
    macro_df,         # monthly, "ts" column
    freq="D",
    heat=3,
    fuzzy_columns=True,
)

ThaiTruck auto-detects columns named date, ts, timestamp, report_date, trade_date, as_of_date, and more. If your column has a truly cursed name, pass date_col="your_cursed_name".


orange_chicken — The Glaze

Normalize and transform raw data into clean, uniform output.

Raw data is ugly. orange_chicken fixes that. Column names lowercased, separators unified, numeric strings coerced, boolean strings resolved, sparse columns evicted.

from thaitruck import orange_chicken

clean = orange_chicken(raw_df, heat=3)

What each heat level glazes:

Heat What gets cleaned
1 Column names only ("Open Price""open_price")
2 + strip cell whitespace, drop all-null rows and columns
3 + coerce numeric strings to numbers (default)
4 + coerce boolean strings ("yes"/"true"/"on"True), drop ≥90% null columns
5 + drop ≥50% null columns (napalm)
# Raw CSV fresh off the truck
raw = pd.DataFrame({
    "  Open Price  ": ["1,250.00", "1,300.00"],
    "Active?":        ["yes", "no"],
    "Notes":          [None, None],   # 100% null — getting dropped at heat=2
})

clean = orange_chicken(raw, heat=4)
# columns: open_price (float), active (bool)

larb — The Raw Bar

Fast statistical profile of a DataFrame. No cooking required.

larb gives you a one-row-per-column profile covering counts, nulls, descriptive stats, and outlier detection via IQR fences. Heat controls how aggressively it flags outliers.

from thaitruck import larb

profile = larb(df, heat=3)
print(profile)
         dtype  count  null_pct     mean      std    min    p25  median    p75     max    skew  lower_fence  upper_fence  outliers  outlier_pct  ...
price    float64    365      0.0  142.30    38.21  88.00  112.0  140.00  168.0  310.00    0.72        56.0        224.0         3         0.82
volume   int64      365      0.0  1.02M   480K      10K  700K   980K    1.3M    8.5M      2.10      -350K        2.35M         2         0.55

Outlier sensitivity by heat:

Heat IQR Multiplier What gets flagged
1 × 3.0 Extreme outliers only
2 × 2.5
3 × 2.0 Moderate outliers (default)
4 × 1.5 Standard Tukey fences
5 × 1.0 Very sensitive — expects tightly clustered data

Non-numeric columns get unique, top, and top_freq instead of numeric stats.


pad_thai — The Noodles

String padding, alignment, and formatting utilities.

Works on a single string, a list, or a pandas Series. Handles left/right/center alignment and optional truncation with a trailing ellipsis.

from thaitruck import pad_thai

pad_thai("close", 10)                          # "close     "
pad_thai("close", 10, align="right")           # "     close"
pad_thai("close", 10, align="center")          # "  close   "
pad_thai("a very long label", 12, truncate=True)  # "a very long…"

# Works on a Series too
df["ticker"] = pad_thai(df["ticker"], width=6, align="right")

sticky_rice — The Cache

Persistent disk cache for expensive computations.

Wrap any function. Results are pickled to .thaitruck_cache/ and reused within the TTL. When the cache is warm, the function never runs.

from thaitruck import sticky_rice

@sticky_rice(ttl=3600)
def fetch_and_merge(ticker: str) -> pd.DataFrame:
    # ... expensive API calls, processing, merging ...
    return result

df = fetch_and_merge("NVDA")  # computed and cached
df = fetch_and_merge("NVDA")  # served from disk in milliseconds

Clear the cache manually when you need a fresh run:

fetch_and_merge.clear()

Options:

@sticky_rice(
    ttl=1800,                    # seconds before expiry (0 = never)
    key="my_fixed_key",          # fixed key instead of hash
    cache_dir=Path("/tmp/cache") # custom cache directory
)
def my_fn(): ...

satay — The Skewer

Expressive multi-dimensional DataFrame slicing.

Pass any combination of column names, row slices, range filters, equality filters, and callables. Skewers are applied in order — row filters first, column selectors last.

from thaitruck import satay

# Column selection
satay(df, "price")
satay(df, ["price", "volume"])

# Row slice (positional)
satay(df, slice(0, 100))

# Range filter
satay(df, ("price", 10.0, 50.0))

# Equality / isin filter
satay(df, {"sector": "Tech"})
satay(df, {"sector": ["Tech", "Energy"]})

# Lambda
satay(df, lambda d: d["volume"] > 1_000_000)

# Mix and match — filters applied left to right
satay(df, {"sector": "Tech"}, ("price", 10, 200), "price", "volume")

tom_kha — The Broth

Deep config merging with sensible coconut-milk defaults.

Later dicts win. Nested dicts are merged recursively — not overwritten wholesale. Lists are replaced. Pass defaults= for a base that everything else overrides.

from thaitruck import tom_kha

config = tom_kha(
    base_config,
    env_config,
    cli_overrides,
    defaults={"retries": 3, "timeout": 30, "db": {"port": 5432}},
)
tom_kha(
    {"db": {"host": "localhost", "port": 5432}},
    {"db": {"port": 5433}, "debug": True},
)
# → {"db": {"host": "localhost", "port": 5433}, "debug": True}

The Heat Guide

Most ThaiTruck functions accept a heat parameter (1–5). The metaphor is consistent: higher heat is more aggressive.

Heat Vibe
1 Mild. Barely noticeable. Tourist-safe.
2 A little warmth.
3 Medium. The default. Regular customer.
4 Getting spicy. Know what you're doing.
5 Napalm. No survivors.

Installation

pip install thaitruck

Requires Python ≥ 3.9 and pandas ≥ 1.5.


The Full Menu

from thaitruck import fried_rice    # time-series DataFrame merger
from thaitruck import orange_chicken # data normalization and cleaning
from thaitruck import larb           # fast statistical profiling
from thaitruck import pad_thai       # string padding and alignment
from thaitruck import sticky_rice    # persistent disk caching
from thaitruck import satay          # expressive DataFrame slicing
from thaitruck import tom_kha        # deep config dict merging

Why the name?

Because pandas is a ridiculous name for a data library and it's one of the most important tools in data science. pickle is a serialization format. fuzzywuzzy is a string matcher. If those names get to be beloved, so does this one.

ThaiTruck is genuinely useful. The food truck is just the vibe.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thaitruck-0.2.0.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thaitruck-0.2.0-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file thaitruck-0.2.0.tar.gz.

File metadata

  • Download URL: thaitruck-0.2.0.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for thaitruck-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f6c1b1da3d76b6b65c1de25d3d59b67aeb79f46ca7b77643886b9b036fd713a1
MD5 fe28e8ba1ffeb5fb3c330d1702754664
BLAKE2b-256 2ff2ee2e6b147ba77bc81ff645d6eb94dc30cae26d0046744e059242eec7f7df

See more details on using hashes here.

File details

Details for the file thaitruck-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: thaitruck-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for thaitruck-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7ac0df1c8d9b1f35fae4d23c2e87e77b7d4898c8e7bf8f360b487bd231baf370
MD5 0cfa5433aa33be9425fc26c50d3980ca
BLAKE2b-256 2347345363fa1aaa2a0f212b2a87649dc6c79cb37c4384b36bf3f223751da45c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page