Pyspark Dataframe Analyzer - Smartest DataFrame Analysis
Project description
DFAnalyzer
DFAnalyzer Python is a Python package for data analysis, built on top of the popular DFAnalyzer for Excel. It provides a powerful set of tools for importing, exploring, cleaning, transforming, and visualizing data. It also offers features such as filtering, sorting, grouping, and performing calculations on data. DFAnalyzer Python is designed to enable users to quickly and easily analyze large amounts of data and extract meaningful insights.
- Find details & insight about each columns.
- Easy to perform cycles over pyspark.
- Percentage stats around NaN , Blank Values, Null Values.
- Describes datatypes of Pyspark Dataframe.
- Help in POC of data.
Who Should use DFAnalyser
- Developers working with bigdata
- Developers using pyspark in the Data exploration.
- Developers who needs to do poc over raw data.
Usage
PySpark
You can install the DFAnalyzer package using the pip command. To install DFAnalyzer, open a terminal window and type: pip install dfanalyzer. Once the installation is complete, you can start using DFAnalyzer with Python.
-
Install the preset:
pip install dfanalyzer
-
Import it:
import DFAnalyzer as dfa
-
Use it on existing pyspark dataframe:
#[isHavingNullData,%NullData,isHavingNanValues,%NanValues,isHavingBlankValues,%BlankValues,DataType] options=[1,1,1,1,1,1,1]#flags of what all kind of analysis you need dfa.analyze(df,options)
More is about to come. Stay tuned.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dfanalyzer-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0903a2aa1bcb2a010b6c54e9890fff7d5f8973b2bc960b93280b0c391560d316 |
|
MD5 | 4d840a175a8020be15dd3d961899d0d0 |
|
BLAKE2b-256 | 7e5438a3ef27336e56c79a610eae092c104af388993e57c867acbcc09bfcd93f |