pandasv2: pandas drop-in replacement with memory optimization, nested JSON handling, parallel groupby, chunked processing, fast apply, unified apply/map API, safe merge validation, dtype optimization, DataFrame profiling, append compatibility for pandas 3.0, JSON serialization, diff tracking, column validation, streaming export, and seamless FastAPI/Flask/Django integration. Import as pandasv2
Project description
pandasv2
Full pandas replacement with built-in JSON serialization and web framework integration.
pip install pandasv2 — and never worry about pandas web pain points again.
import pandasv2 as pd # drop-in replacement for pandas
Why pandasv2?
| Problem | pandas | pandasv2 |
|---|---|---|
| JSON serialize a DataFrame | TypeError: Object of type int64 is not JSON serializable |
df.to_json_safe() — just works |
| Return a DataFrame from FastAPI/Flask | Multiple conversion steps | return df.to_web() |
| Reconstruct DataFrame from JSON | Manual dtype restoration | DataFrame.from_json_safe(json_str) |
| Web framework setup | Configure encoder per framework | setup_json_encoder(app) |
| Same API as pandas | — | 100% compatible — all methods identical |
Installation
pip install pandasv2
Requirements: Python 3.8+, NumPy >= 1.20, pandas >= 2.0
Quick Start
import pandasv2 as pd # same as: import pandas as pd
# Create
df = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie'],
'score': [95.5, 87.0, 92.3],
'grade': ['A', 'B', 'A']})
# All standard pandas operations work
print(df.groupby('grade')['score'].mean())
print(df.sort_values('score', ascending=False))
print(df.describe())
# Plus web extras (new in pandasv2)
json_str = df.to_json_safe() # lossless JSON, no TypeError ever
web_data = df.to_web() # plain list of dicts for HTTP responses
df2 = pd.DataFrame.from_json_safe(json_str) # round-trip
Feature Coverage
DataFrame
All standard pandas.DataFrame methods are available unchanged.
df = pd.DataFrame(data)
df = pd.DataFrame.from_records(records)
df = pd.DataFrame.from_dict(d)
# Selection & filtering
df['col']
df[['col1', 'col2']]
df.loc[rows, cols]
df.iloc[0:5]
df.at[idx, col]
df.iat[0, 1]
df.query("score > 90")
df.filter(like='score')
df.select_dtypes(include='number')
df.isin({'grade': ['A', 'B']})
# Sorting
df.sort_values('score', ascending=False)
df.sort_index()
# Shape / info
df.shape, df.dtypes, df.columns, df.index
df.info(), df.describe()
df.head(10), df.tail(10), df.sample(5)
# Modification
df.assign(pass_=df['score'] >= 60)
df.rename(columns={'score': 'points'})
df.drop(columns=['grade'])
df.reset_index(), df.set_index('name')
df.astype({'score': 'float32'})
df.fillna(0), df.dropna()
df.replace({'grade': {'A': 'Excellent'}})
# Math
df.sum(), df.mean(), df.median(), df.std(), df.var()
df.min(), df.max(), df.count(), df.nunique()
df.cumsum(), df.cumprod(), df.cummax(), df.cummin()
df.diff(), df.pct_change(), df.shift(1)
df.abs(), df.round(2)
df.corr(), df.cov()
# Apply / transform
df.apply(func, axis=1)
df.applymap(func)
df.pipe(func)
df.eval("total = score * 1.1")
# Merge / join
df.merge(other, on='key', how='left')
df.join(other, how='outer')
# Reshape
df.pivot(index='date', columns='category', values='value')
df.pivot_table(values='score', index='grade', aggfunc='mean')
df.melt(id_vars=['name'], value_vars=['score'])
df.stack(), df.unstack()
df.transpose()
df.explode('list_col')
# Window
df.rolling(7).mean()
df.expanding().sum()
df.ewm(span=5).mean()
# GroupBy
df.groupby('grade').sum()
df.groupby('grade').agg({'score': ['mean', 'std']})
df.groupby('grade').transform('mean')
df.groupby('grade').apply(custom_func)
df.groupby('grade').filter(lambda g: len(g) > 1)
# I/O
df.to_csv('out.csv')
df.to_excel('out.xlsx')
df.to_json('out.json')
df.to_parquet('out.parquet')
df.to_sql('table', con)
df.to_html(), df.to_latex(), df.to_markdown()
df.to_clipboard(), df.to_pickle('out.pkl')
df.to_string()
# pandasv2 extras
df.to_json_safe() # strict JSON for web APIs (NumPy scalars, Timestamp/Timedelta, NaN/NaT/NA, +/-inf)
df.to_web(orient='records') # plain dict/list for HTTP responses
pd.DataFrame.from_json_safe(s) # round-trip reconstruction
Series
s = pd.Series([1, 2, 3], name='score')
# Math
s.sum(), s.mean(), s.median(), s.std(), s.var()
s.min(), s.max(), s.count(), s.nunique()
s.cumsum(), s.cummax(), s.cummin(), s.cumprod()
s.abs(), s.round(2), s.clip(0, 100)
s.diff(), s.pct_change(), s.shift(1)
# Selection
s.isin([1, 3])
s.between(1, 2)
s.where(s > 1)
# Transformation
s.apply(func)
s.map({'a': 1, 'b': 2})
s.replace(1, 99)
s.fillna(0), s.dropna()
s.astype('float32')
s.sort_values(), s.sort_index()
# Info
s.value_counts()
s.unique()
s.describe()
# Accessors (same as pandas)
s.str.upper() # .str accessor
s.dt.year # .dt accessor
s.cat.codes # .cat accessor
# pandasv2 extras
s.to_json_safe()
s.to_web()
I/O Functions
# CSV
df = pd.read_csv('data.csv')
df = pd.read_csv('data.csv', sep=';', usecols=['a','b'], dtype={'a': 'int32'})
df.to_csv('out.csv', index=False)
# Excel
df = pd.read_excel('data.xlsx')
df = pd.read_excel('data.xlsx', sheet_name='Sheet2', skiprows=2)
df.to_excel('out.xlsx', sheet_name='Results', index=False)
# JSON
df = pd.read_json('data.json')
df = pd.read_json('data.json', orient='records', lines=True)
df.to_json('out.json', orient='records', lines=True)
# Parquet
df = pd.read_parquet('data.parquet')
df.to_parquet('out.parquet', compression='snappy')
# Feather
df = pd.read_feather('data.feather')
df.to_feather('out.feather')
# SQL
df = pd.read_sql('SELECT * FROM users WHERE active=1', con)
df = pd.read_sql_table('users', con, schema='public')
df = pd.read_sql_query('SELECT id, name FROM users', con)
df.to_sql('results', con, if_exists='replace', index=False)
# HTML
dfs = pd.read_html('https://example.com/table.html')
# XML
df = pd.read_xml('data.xml', xpath='.//row')
# Clipboard
df = pd.read_clipboard()
df.to_clipboard()
# Other
df = pd.read_pickle('data.pkl')
df = pd.read_fwf('data.fwf', widths=[10, 20, 15])
df = pd.read_table('data.tsv')
df = pd.read_hdf('data.h5', key='df')
Merge / Reshape
# Merge (like SQL JOIN)
pd.merge(left, right, on='key')
pd.merge(left, right, on='key', how='left')
pd.merge(left, right, left_on='id', right_on='user_id', suffixes=('_l','_r'))
pd.merge_asof(trades, quotes, on='time', by='ticker')
pd.merge_ordered(df1, df2, on='date', fill_method='ffill')
# Concat
pd.concat([df1, df2], ignore_index=True) # vertical
pd.concat([df1, df2], axis=1) # horizontal
# Pivot / Unpivot
pd.pivot(df, index='date', columns='variable', values='value')
pd.pivot_table(df, values='sales', index='region',
columns='product', aggfunc='sum', margins=True)
pd.melt(df, id_vars=['id'], value_vars=['jan','feb','mar'],
var_name='month', value_name='sales')
pd.wide_to_long(df, stubnames=['A','B'], i='id', j='year')
# Cross-tabulation
pd.crosstab(df['gender'], df['age_group'], margins=True, normalize='index')
# Encoding
pd.get_dummies(df, columns=['color', 'size'], drop_first=True)
pd.cut(df['age'], bins=[0,18,35,60,100], labels=['child','young','adult','senior'])
pd.qcut(df['score'], q=4, labels=['Q1','Q2','Q3','Q4'])
# Misc
pd.factorize(df['category'])
pd.unique(df['col'])
pd.value_counts(df['col'])
pd.isna(df), pd.notna(df)
GroupBy
g = df.groupby('category')
g = df.groupby(['region', 'product']) # multi-key
g = df.groupby(pd.Grouper(freq='ME')) # time-based
# Aggregations
g.sum(), g.mean(), g.median()
g.min(), g.max(), g.count(), g.size()
g.std(), g.var(), g.sem()
g.prod(), g.first(), g.last()
g.nunique(), g.describe()
# Multiple aggregations
g.agg('sum')
g.agg({'score': ['mean', 'std', 'min', 'max'], 'sales': 'sum'})
g.agg(total=('sales', 'sum'), avg=('score', 'mean'))
# Transform (same shape as input)
g['score'].transform('mean')
g['score'].transform(lambda x: (x - x.mean()) / x.std())
# Apply (custom function per group)
g.apply(lambda x: x.nlargest(3, 'score'))
# Filter (drop groups not matching condition)
g.filter(lambda x: x['score'].mean() > 80)
# Cumulative
g.cumsum(), g.cummax(), g.cummin(), g.cumcount()
g.rank(method='dense')
g.shift(1), g.diff(), g.pct_change()
# Window on groups
g.rolling(3).mean()
g.expanding().sum()
g.ewm(span=5).mean()
Date / Time
# Create date ranges
pd.date_range('2024-01-01', periods=12, freq='ME')
pd.date_range('2024-01-01', '2024-12-31', freq='W')
pd.bdate_range('2024-01-01', periods=10)
pd.period_range('2024Q1', periods=4, freq='Q')
pd.timedelta_range('0 days', periods=5, freq='1D')
# Parse / convert
pd.to_datetime(df['date_str'], format='%Y-%m-%d')
pd.to_timedelta(df['duration_str'])
pd.to_numeric(df['col'], errors='coerce')
# .dt accessor on Series
s.dt.year, s.dt.month, s.dt.day
s.dt.hour, s.dt.minute, s.dt.second
s.dt.dayofweek, s.dt.day_of_year, s.dt.quarter
s.dt.is_month_start, s.dt.is_year_end
s.dt.strftime('%Y-%m-%d')
s.dt.tz_localize('UTC'), s.dt.tz_convert('US/Eastern')
s.dt.floor('h'), s.dt.ceil('D'), s.dt.round('min')
s.dt.month_name(), s.dt.day_name()
s.dt.isocalendar()
s.dt.total_seconds() # timedelta
String Accessor (.str)
s = pd.Series(['Hello World', 'foo bar', None])
# Case
s.str.upper(), s.str.lower(), s.str.title(), s.str.capitalize()
# Search
s.str.contains('World', case=False, na=False)
s.str.startswith('foo'), s.str.endswith('bar')
s.str.match(r'^\w+')
s.str.findall(r'\w+')
s.str.count(r'\w+')
# Extract
s.str.extract(r'(\w+)\s(\w+)') # returns DataFrame
s.str.extractall(r'(\d+)')
# Replace / strip
s.str.replace('World', 'Python', regex=False)
s.str.strip(), s.str.lstrip(), s.str.rstrip()
# Split / join
s.str.split(' ', expand=True) # returns DataFrame
s.str.join('-')
# Padding / width
s.str.pad(20, side='right')
s.str.center(20, fillchar='*')
s.str.zfill(10)
# Type checks
s.str.isalpha(), s.str.isnumeric(), s.str.isdigit()
# Length / slice
s.str.len()
s.str.slice(0, 5)
Categorical (.cat)
s = pd.Series(pd.Categorical(['A','B','A','C'], categories=['A','B','C'], ordered=True))
s.cat.categories
s.cat.codes
s.cat.ordered
s.cat.rename_categories({'A': 'Alpha'})
s.cat.add_categories(['D'])
s.cat.remove_categories(['C'])
s.cat.remove_unused_categories()
s.cat.set_categories(['A','B'], ordered=True)
s.cat.as_ordered(), s.cat.as_unordered()
Index Types
pd.Index([1, 2, 3])
pd.RangeIndex(start=0, stop=100, step=2)
pd.MultiIndex.from_tuples([('a', 1), ('b', 2)])
pd.MultiIndex.from_product([['A','B'], [1,2,3]])
pd.MultiIndex.from_frame(df[['region','product']])
pd.DatetimeIndex(pd.date_range('2024-01-01', periods=5))
pd.PeriodIndex(pd.period_range('2024Q1', periods=4, freq='Q'))
pd.CategoricalIndex(['a','b','a','c'])
pd.IntervalIndex.from_breaks([0, 1, 2, 3])
pd.TimedeltaIndex(pd.timedelta_range('0 days', periods=5))
Data Types
# Nullable integer types (handle NaN without float conversion)
pd.Int8Dtype(), pd.Int16Dtype(), pd.Int32Dtype(), pd.Int64Dtype()
pd.UInt8Dtype(), pd.UInt16Dtype(), pd.UInt32Dtype(), pd.UInt64Dtype()
# Nullable float
pd.Float32Dtype(), pd.Float64Dtype()
# Nullable boolean and string
pd.BooleanDtype(), pd.StringDtype()
# Categorical
pd.CategoricalDtype(categories=['A','B','C'], ordered=True)
# Datetime with timezone
pd.DatetimeTZDtype(tz='UTC')
# Period, Interval, Sparse
pd.PeriodDtype(freq='M')
pd.IntervalDtype(subtype='float64')
pd.SparseDtype('float64', fill_value=0)
Scalar Types
pd.Timestamp('2024-01-15 10:30:00', tz='UTC')
pd.Timestamp.now()
pd.Timedelta('3 days 4 hours')
pd.Period('2024Q1', freq='Q')
pd.Interval(left=0, right=5, closed='right')
pd.NaT # Not a Time (missing datetime)
pd.NA # Nullable NA
Options
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 50)
pd.set_option('display.float_format', '{:.2f}'.format)
pd.get_option('display.max_rows')
pd.reset_option('display.max_rows')
pd.describe_option('display')
with pd.option_context('display.max_rows', 5, 'display.max_columns', 3):
print(df)
JSON Utilities
# json_normalize: flatten nested JSON/dicts
data = [{'id': 1, 'address': {'city': 'NY', 'zip': '10001'}},
{'id': 2, 'address': {'city': 'LA', 'zip': '90001'}}]
df = pd.json_normalize(data)
# Result: id | address.city | address.zip
# Nested record_path
orders = [{'id':1, 'items':[{'sku':'A','qty':2},{'sku':'B','qty':1}]}]
df = pd.json_normalize(orders, record_path='items', meta=['id'])
pandas vs pandasv2 — Side-by-Side
JSON Serialization
# ── pandas ─────────────────────────────────────────────────────────────────
import pandas as pd
import json
df = pd.read_csv('data.csv')
result = df.groupby('category')['sales'].agg(['sum','mean'])
# Fails!
json.dumps(result.to_dict()) # TypeError: Object of type int64 is not JSON serializable
# Manual workaround (messy, fragile, easy to forget edge cases)
import numpy as np
def convert(o):
if isinstance(o, (np.integer, np.floating)):
return o.item()
raise TypeError
json.dumps(result.to_dict(), default=convert)
# ── pandasv2 ────────────────────────────────────────────────────────────────
import pandasv2 as pd
df = pd.read_csv('data.csv')
result = df.groupby('category')['sales'].agg(['sum','mean'])
json_str = result.to_json_safe() # done — zero boilerplate
FastAPI Integration
# ── pandas + FastAPI ────────────────────────────────────────────────────────
from fastapi import FastAPI
from fastapi.responses import JSONResponse
import pandas as pd
import numpy as np, json
app = FastAPI()
@app.get('/data')
def get():
df = pd.read_csv('data.csv')
# Have to manually fix every numpy type
data = json.loads(df.to_json(orient='records'))
return JSONResponse(data)
# ── pandasv2 + FastAPI ──────────────────────────────────────────────────────
from fastapi import FastAPI
import pandasv2 as pd
app = FastAPI()
@app.get('/data')
def get():
df = pd.read_csv('data.csv')
return df.to_web() # works out of the box
Type Round-Trip
# ── pandas ─────────────────────────────────────────────────────────────────
import pandas as pd
import json
df = pd.DataFrame({
'date': pd.date_range('2024-01-01', periods=3),
'value': pd.array([1, 2, 3], dtype='Int64'), # nullable int
})
# to_json loses type info
s = df.to_json()
df2 = pd.read_json(s) # dates become int64 timestamps, nullable int becomes float
# ── pandasv2 ────────────────────────────────────────────────────────────────
import pandasv2 as pd
df = pd.DataFrame({
'date': pd.date_range('2024-01-01', periods=3),
'value': pd.array([1, 2, 3], dtype='Int64'),
})
# Perfect round-trip
payload = pd.serialize(df) # includes dtype metadata
df2 = pd.deserialize(payload)
assert df2['date'].dtype == df['date'].dtype # passes
Web Framework Integration (pandasv2 extras)
FastAPI
from fastapi import FastAPI
import pandasv2 as pd
app = FastAPI()
pd.setup_json_encoder(app) # one-line setup
@app.get('/users')
def get_users():
df = pd.read_csv('users.csv')
return df.to_web() # returns plain list of dicts
@app.get('/users/json')
def get_users_json():
df = pd.read_csv('users.csv')
return pd.FastAPIResponse(df)
Flask
from flask import Flask
import pandasv2 as pd
app = Flask(__name__)
pd.setup_json_encoder(app)
@app.route('/data')
def data():
df = pd.read_csv('data.csv')
return pd.FlaskResponse(df).get_response()
Django
# views.py
import pandasv2 as pd
def data_view(request):
df = pd.read_csv('data.csv')
return pd.DjangoResponse(df, safe=False).get_response()
Custom encoder (any framework)
import json
import pandasv2 as pd
df = pd.DataFrame({'a': [1, 2], 'ts': pd.date_range('2024-01-01', periods=2)})
# Works with any JSON serializer
json_str = json.dumps(df.to_dict('records'), cls=pd.JSONEncoder)
# Or use the helper directly
json_str = pd.to_json(df)
df_back = pd.from_json(json_str)
# Lossless round-trip with metadata
payload = pd.serialize(df) # dict with type info + dtypes
df_back = pd.deserialize(payload)
Performance
| Operation | pandas (manual) | pandasv2 |
|---|---|---|
| Serialize 10k-row DataFrame to JSON | ~120 ms (with workarounds) | ~30 ms |
| Round-trip (serialize + deserialize) | ~250 ms | ~70 ms |
| Batch convert 100 DataFrames | ~12 s | ~3 s |
pandasv2 is 3-5x faster than manual conversion because it avoids redundant object traversals.
Complete API Reference
| Category | Functions / Classes |
|---|---|
| Core types | DataFrame, Series |
| I/O (read) | read_csv, read_excel, read_json, read_parquet, read_feather, read_orc, read_sql, read_sql_query, read_sql_table, read_html, read_xml, read_hdf, read_pickle, read_clipboard, read_fwf, read_table, read_spss, read_sas |
| Merge/Concat | merge, merge_asof, merge_ordered, concat |
| Reshape | pivot, pivot_table, melt, wide_to_long, crosstab |
| Encoding | get_dummies, from_dummies, cut, qcut |
| Utilities | factorize, unique, value_counts, isna, notna, eval, json_normalize |
| Datetime | date_range, bdate_range, period_range, timedelta_range, interval_range, to_datetime, to_timedelta, to_numeric |
| Index types | Index, RangeIndex, MultiIndex, DatetimeIndex, TimedeltaIndex, PeriodIndex, CategoricalIndex, IntervalIndex |
| Scalars | Timestamp, Timedelta, Period, Interval, NaT, NA |
| Dtypes | CategoricalDtype, DatetimeTZDtype, IntervalDtype, PeriodDtype, SparseDtype, Int8–Int64Dtype, UInt8–UInt64Dtype, Float32–Float64Dtype, BooleanDtype, StringDtype |
| Grouper | Grouper, NamedAgg |
| Options | set_option, get_option, reset_option, describe_option, option_context |
| Web extras | to_json_safe(), to_web(), from_json_safe(), JSONEncoder, JSONDecoder, to_json, from_json, serialize, deserialize |
| Framework helpers | FastAPIResponse, FlaskResponse, DjangoResponse, setup_json_encoder, create_response_handler |
| Testing | testing.assert_frame_equal, testing.assert_series_equal |
Contributing
Contributions are welcome! Here's how to get started:
- Fork the repository on GitHub
- Create a feature branch:
git checkout -b feature/your-feature - Make your changes and add tests
- Run the test suite:
pytest tests/ -v - Submit a pull request
Please open an issue first for major changes to discuss the approach.
Changelog
v2.4.2 (2026-05-17) — Security release
- CWE-502 (HIGH):
read_picklenow requires explicitallow_pickle=Trueand emits a security warning by default. Use Parquet/JSON for safe serialisation. - CWE-703 (HIGH): All bare
assertstatements intesting.pyreplaced with explicitif not ...: raise AssertionError(...)so they cannot be compiled away with-O. - CWE-89 (MEDIUM):
read_sqlwarns when called with a raw SQL string and noparamsargument (potential SQL injection). - CWE-1333 (MEDIUM):
DataFrameValidatorregex validation has a 2-second timeout guard to prevent ReDoS attacks from hanging the process. - CWE-327 (MEDIUM):
_df_hashupgraded from MD5 to SHA-256 for content hashing. - CWE-117 (MEDIUM):
JSONEncodernow warns when Infinity values are silently demoted tonullduring serialisation. - Error swallowing (MEDIUM):
json_middlewarein integrations.py now properly distinguishes expected JSON decode errors from unexpected exceptions. - XXE documentation (LOW):
read_xmldocstring documents thatlxmlparser is safe against XXE attacks by default.
Author
Mahesh Makvana — GitHub · PyPI
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pandasv2-2.4.4.tar.gz.
File metadata
- Download URL: pandasv2-2.4.4.tar.gz
- Upload date:
- Size: 91.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
272a2cabfc2816bdfb166f0f092193ae07f2afead596981222480dbcbf97a7eb
|
|
| MD5 |
9ee0d8b44620d293948ae33477c410b9
|
|
| BLAKE2b-256 |
b012d23608c463e33f54152939a05be96bd697c17c774d8b52cd1da64ce6d6a0
|
File details
Details for the file pandasv2-2.4.4-py3-none-any.whl.
File metadata
- Download URL: pandasv2-2.4.4-py3-none-any.whl
- Upload date:
- Size: 90.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05ea7bcb1a9385cd1c581f2bff3caafff9d2a6a78b26bae515def8c27185ced4
|
|
| MD5 |
f5a341ae7668523c7bcdd331db13ef6b
|
|
| BLAKE2b-256 |
6b27a84c25a048cab800a0d170709368ba56a337a48f9a22899f91fdf6222c45
|