A pandas DataFrame accessor for accessing Enterprise Data Repository (EDR) tables with Spark.
Project description
EDR Accessor - Pandas Extension for Enhanced Data Representation with Spark
The EDR Accessor is a custom pandas DataFrame accessor that simplifies the interaction with Spark, making it easy to list databases, tables, import tables, and write to Delta Lake tables.
Features
- List all Spark databases and tables
- Import Spark tables into a pandas DataFrame
- Retrieve table row counts
- Write pandas DataFrame to Delta Lake tables
Installation
To install EDR Accessor, simply use pip:
pip install edr_accessor
Usage
After installation, you can use the extension by accessing the .edr
attribute on your pandas DataFrame.
import pandas as pd
# Create an empty DataFrame
df = pd.DataFrame()
# List all databases
databases = df.edr.list_databases()
# List all tables in a specific database
tables = df.edr.list_tables('my_database')
# Import a table from Spark
df.edr.import_table('my_table', database='my_database')
# Get row counts for tables in a database
row_counts = df.edr.table_rowcounts(database='my_database')
# Write DataFrame to a Delta Lake table
df.edr.to_delta_table('my_delta_table', 'my_container', 'my_storage_account')
Requirements
- Pandas
- PySpark
Contributing
Contributions welcome! Feel free to submit a pull request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
edr-accessor-0.1.0.tar.gz
(4.8 kB
view hashes)
Built Distribution
Close
Hashes for edr_accessor-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cde7ff11583cc92fee3dc42da66e831039ab4dc6674542c3d8b5a0daec737431 |
|
MD5 | 368be10dfb261d00f05ef3ed9e9f6cbd |
|
BLAKE2b-256 | e1788bfcd13549d1358b2cfc0e5a64cf66b512e2430300b55c022784d27d3e19 |