Skip to main content

A pandas DataFrame accessor for accessing Enterprise Data Repository (EDR) tables with Spark.

Project description

EDR Accessor - Pandas Extension to access the Enterprise Data Repository (EDR) with Spark

The EDR Accessor is a custom pandas DataFrame accessor that simplifies the interaction with Spark, making it easy to list databases, tables, import tables, and write to Delta Lake tables.

Features

  • List all Spark databases and tables
  • Import Spark tables into a pandas DataFrame
  • Retrieve table row counts
  • Write pandas DataFrame to Delta Lake tables

Installation

To install EDR Accessor, simply use pip:

pip install edr-accessor

Usage

After installation, you can use the extension by accessing the .edr attribute on your pandas DataFrame.

import pandas as pd
import edr_accessor

# Create an empty DataFrame
df = pd.DataFrame()

# List all databases
databases = df.edr.list_databases()

# List all tables in a specific database
tables = df.edr.list_tables('my_database')

# Import a table from Spark
df.edr.import_table('my_table', database='my_database')

# Get row counts for tables in a database
row_counts = df.edr.table_rowcounts(database='my_database')

# Write DataFrame to a Delta Lake table
df.edr.to_delta_table('my_delta_table', 'my_container', 'my_storage_account')

Requirements

  • Pandas
  • PySpark

Contributing

Contributions welcome! Feel free to submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edr-accessor-0.1.7.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

edr_accessor-0.1.7-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file edr-accessor-0.1.7.tar.gz.

File metadata

  • Download URL: edr-accessor-0.1.7.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for edr-accessor-0.1.7.tar.gz
Algorithm Hash digest
SHA256 0265c811ff31fa75b3fe7f14a03f556cb5dd5380c1709111a642ac5460dd24df
MD5 7076b1fcc268b78052841ea32daccfe5
BLAKE2b-256 5ea3cd92f7173f2268906b61740c4c252fb0872864bb3ee19d7228d91b182b1b

See more details on using hashes here.

File details

Details for the file edr_accessor-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for edr_accessor-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 8bc37a91a7ff1cd54c5b82eabab2dc3ff370d753bb6fb981ede0d438fd7b526d
MD5 09ccba77e40dff7b7969fc9207b79c46
BLAKE2b-256 a45e8418e6dc8dca6b8ba84fcffea06026df33b955131d189f9d4bb6b8fd4ca9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page