Skip to main content

A package for handling dataframes with optional backends.

Project description

This README was produced by Anthropic's Claude LLM

dataframe_handlers

License: MIT python security: bandit Ruff Code style: black pre-commit Checked with mypy Test coverage

dataframe_handlers aims to provide an abstract base class and concrete implementations for handling and manipulating dataframes of various types in Python. While the interface may be applicable to many use cases, the immediate goal is to standardize interaction with dataframe libraries and enable interoperability between them in the context of the Solara Python library, which can be used to create reactive web apps using pure Python.

Installation

https://pypi.org/project/dataframe-handlers

# any of the following

pip install dataframe-handlers
pip install dataframe-handlers[pandas]
pip install dataframe-handlers[dask]
pip install dataframe-handlers[xarray]
# pip install dataframe-handlers[vaex]
pip install dataframe-handlers[testing]

pip install dataframe-handlers[pandas,xarray,testing]

Usage

The base class, BaseDataFrameHandler, defines an abstract interface with the following methods:

  • get_unique(column: str, limit: Optional[int] = None) -> Collection
  • get_value_counts(column: str, limit: Optional[int] = None) -> Mapping[str, int]
  • get_data_range(column: str) -> Sequence
  • get_missing_filter(column: str) -> Sequence[bool]
  • get_value_filter(column: str, values: list, invert: bool = False) -> Sequence[bool]
  • get_columns() -> Collection[str]
  • get_numeric_columns() -> Collection[str]
  • get_column_types(default_str: bool = True) -> Mapping[str, Union[object, type, str]]

Concrete implementations of this interface exist for:

  • Pandas (PandasDataFrameHandler)
  • Dask (DaskDataFrameHandler)
  • Xarray (XarrayDataFrameHandler)
  • Vaex (VaexDataFrameHandler, currently disabled)

The easiest way to get a handler does not require knowing what type of dataframe you're dealing with.

You can do this by using dataframe_handlers.get_handler, which will return a handler of the appropriate type based on the type of dataframe it is given.

import pandas as pd
from dataframe_handlers import get_handler

df = pd.DataFrame({'A': [1, 2, 3]})
handler = get_handler(df)

columns = handler.get_columns()
# ['A']

Libraries built on the dataframe_handlers interface can then support multiple dataframe types interchangeably.

Contributing

There are a few ways you can contribute to dataframe_handlers and help guide its future:

  1. Add support for another dataframe library by implementing a new concrete handler class. This helps expand the scope of the project and allows it to support new use cases.

  2. Improve or expand the abstract base interface. As new methods are identified to broadly support dataframe interaction and manipulation, the interface can be expanded. However, we aim to keep the interface as concise as possible to facilitate implementation for many types. User feedback on what methods/functionality would be most useful to support is appreciated!

  3. Improve existing concrete implementations. More comprehensive testing, performance optimizations and support for newer library versions all help improve the quality of the project.

  4. Improve documentation. Additional details on implementing new handlers, more examples, and type hints help make the project more contributor-friendly.

  5. Improve validation. Stricter checks that subclasses implement the required methods, consistent method signatures, and edge case testing all help users build on the interface.

We aim for dataframe_handlers to be a community project guided by user needs and feedback. Please feel free to open issues or start a discussion to propose new ideas, give feedback on the direction of the project or interface design, or submit pull requests with your contributions and improvements!

For specific instructions, please see CONTRIBUTING.md.

License

dataframe_handlers is licensed under the MIT license. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframe_handlers-0.0.5.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

dataframe_handlers-0.0.5-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file dataframe_handlers-0.0.5.tar.gz.

File metadata

  • Download URL: dataframe_handlers-0.0.5.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for dataframe_handlers-0.0.5.tar.gz
Algorithm Hash digest
SHA256 dae43c4467e313294c7de86ddc786b1c408512110f1cc6e7bcba8cbaefac6e6c
MD5 e066f138cd060ab4856056ac8692fb25
BLAKE2b-256 711899d74442c25e6e1ceec672ce5925dcfa567bd3a2fc0a3eeb5b3c119db234

See more details on using hashes here.

File details

Details for the file dataframe_handlers-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for dataframe_handlers-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 af5d1d635d063380e89dc3acb0cd21f7acd403f2a746224023db77db8fdcde42
MD5 9b1ab1c249531a0d61938a12e7809562
BLAKE2b-256 7a7b9ed71bd90083115d85b71eeb7d6712a2f60ccf76e45e205cf75375236752

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page