A pandas DataFrame accessor for accessing Enterprise Data Repository (EDR) tables with Spark.
Project description
EDR Accessor - Pandas Extension to access the Enterprise Data Repository (EDR) with Spark
The EDR Accessor is a custom pandas DataFrame accessor that simplifies the interaction with Spark, making it easy to list databases, tables, import tables, and write to Delta Lake tables.
Features
- List all Spark databases and tables
- Import Spark tables into a pandas DataFrame
- Retrieve table row counts
- Write pandas DataFrame to Delta Lake tables
Installation
To install EDR Accessor, simply use pip:
pip install edr-accessor
Usage
After installation, you can use the extension by accessing the .edr
attribute on your pandas DataFrame.
import pandas as pd
import edr_accessor
# Create an empty DataFrame
df = pd.DataFrame()
# List all databases
databases = df.edr.list_databases()
# List all tables in a specific database
tables = df.edr.list_tables('my_database')
# Import a table from Spark
df.edr.import_table('my_table', database='my_database')
# Get row counts for tables in a database
row_counts = df.edr.table_rowcounts(database='my_database')
# Write DataFrame to a Delta Lake table
df.edr.to_delta_table('my_delta_table', 'my_container', 'my_storage_account')
Requirements
- Pandas
- PySpark
Contributing
Contributions welcome! Feel free to submit a pull request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file edr-accessor-0.1.7.tar.gz
.
File metadata
- Download URL: edr-accessor-0.1.7.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0265c811ff31fa75b3fe7f14a03f556cb5dd5380c1709111a642ac5460dd24df |
|
MD5 | 7076b1fcc268b78052841ea32daccfe5 |
|
BLAKE2b-256 | 5ea3cd92f7173f2268906b61740c4c252fb0872864bb3ee19d7228d91b182b1b |
File details
Details for the file edr_accessor-0.1.7-py3-none-any.whl
.
File metadata
- Download URL: edr_accessor-0.1.7-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bc37a91a7ff1cd54c5b82eabab2dc3ff370d753bb6fb981ede0d438fd7b526d |
|
MD5 | 09ccba77e40dff7b7969fc9207b79c46 |
|
BLAKE2b-256 | a45e8418e6dc8dca6b8ba84fcffea06026df33b955131d189f9d4bb6b8fd4ca9 |