The first package for Principal Feature Analysis

These details have not been verified by PyPI

Project description

Principal-Feature-Analysis (PFA)

If you use the presented PFA method or the provided Python scripts inspired you for further extensions or variations of this framework, we’ll be happy if you cite our paper “A principal feature analysis” (https://doi.org/10.1016/j.jocs.2021.101502) in course of which the Python implementations of this git repository have been worked out.

https://arxiv.org/abs/2101.12720

A parallelized version of the algorithm can be found here: https://github.com/LauritzR/Parallel-Principal-Feature-Analysis

Installation

pip install principal-feature-analysis

Usage

from principal_feature_analysis import pfa # import the main pfa function

pfa(path*, number_output_functions, number_sweeps, cluster_size, alpha, min_n_datapoints_a_bin, shuffle_feature_numbers, frac, claculate_mutual_information, basis_log_mutual_information) # function call

Parameters

path (String, required): Path to the input CSV file.
number_output_functions (int, default=1): Number of output features that are to be modeled, i.e. the number of components of the vector-valued output-function. The values are stored in the first number_output_functions rows of the csv-file.
number_sweeps (int, default=1): Number of sweeps of the PFA. The result of the last sweep is returned. In addition, the return of each sweep are interesected and returned as well.
cluster_size (int, default=50): Number of nodes of a subgraph in the principal_feature_analysis.
alpha (float, default=0.01): Level of significance.
min_n_data_points_a_bin (int, default=500):: The minimum number of data points for each bin in the chi-square test.
shuffle_feature_numbers (bool, default=False): If True the number of the features is randomly shuffled.
frac (int, default=1): The fraction of the dataset that is used for the analysis. The set is randomly sampled from the input csv.
calculate_mutual_information (bool, default=False): If True the mutual information with features from the PFA with the system state is calculated.
basis_log_mutual_information (int. default=2): Basis of the logarithm used in the calculation of the mutual information.

Output Files

principal_features_depending_system_state[i].txt: Lists the indices (related to the rows of the input csv) of the features that depend on the system state (row 0) where [i] is replaced by the number of sweeps. Each row of this file is a subgraph that could not be divided further where a * separates the features on which the system state depends (before *) and the ones on which the system state does not depend (after *).
principal_features_depending_system_state_intersection.txt: Analog to the “principal_features_depending_system_state[i].txt”. Due to the intersection the information of subgraphs is missed and there is only one feature a row.
principal_features_global_indices[i].txt: is the result from the dissection of the graph of all input features before testing for dependence to the system state of the sweep [i]. Each row corresponds to a subgraph that could not have been dissected further where the numbers refer to the features stored in the corresponding row of the input csv.
global_indices_and_principal_features_state_dependency[i].csv: A csv file where for each sweep [i] the first column is the feature number referring to the row of the input csv file and the second row is the p-value from the chi2 test of the feature with the system state. A p-value of 1.1 means that it was not possible to make at least two bins for corresponding feature due to for a second not at least min_n_datapoints_a_bin where left. Consequently the feature is considered as constant and thus independent of the system state.

Returns

pf_from_intersection (list): A list with content analog to the file principal_features_depending_system_state_intersection.txt.
data_frame_feature_mutual_information (pandas.DataFrame, if calculate_mutual_information=True): A Pandas data frame that contains the mutual information with the feature (index related to the row in the input csv) with the system state (row 0 in the input csv).

Advanced

The principal_feature_analysis package also grants access to other functions used for the principal component analysis algorithm. In case you want to access those you can import them like this.

from principal_feature_analysis import find_relevant_principal_features, get_mutual_information, principal_feature_analysis

Project details

These details have not been verified by PyPI

License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.0.9

Apr 27, 2023

1.0.8

Apr 7, 2023

This version

1.0.7

Jan 30, 2023

1.0.6

Feb 3, 2022

1.0.5

Feb 3, 2022

1.0.4

Jan 27, 2022

1.0.2

Dec 3, 2021

1.0.1

Nov 20, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

principal-feature-analysis-1.0.7.tar.gz (12.0 kB view details)

Uploaded Jan 30, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

principal_feature_analysis-1.0.7-py3-none-any.whl (13.7 kB view details)

Uploaded Jan 30, 2023 Python 3

File details

Details for the file principal-feature-analysis-1.0.7.tar.gz.

File metadata

Download URL: principal-feature-analysis-1.0.7.tar.gz
Upload date: Jan 30, 2023
Size: 12.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for principal-feature-analysis-1.0.7.tar.gz
Algorithm	Hash digest
SHA256	`15d4c62122dadc64343b95e8ebd962a01334ffe0333864f0f978fc453cc17f48`
MD5	`dd96312d58be0359f0968e5e446dcd50`
BLAKE2b-256	`3b563a47afef35cc0efcb5b2a1de0ae744526c1acee5c3d2477e7950b7197cbf`

See more details on using hashes here.

File details

Details for the file principal_feature_analysis-1.0.7-py3-none-any.whl.

File metadata

Download URL: principal_feature_analysis-1.0.7-py3-none-any.whl
Upload date: Jan 30, 2023
Size: 13.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for principal_feature_analysis-1.0.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`43fe10fadcb59aad24e6a7276dad5ed01c3b31787fa8a741aeed32db4d9eda73`
MD5	`83121b70880fef36ee1a3940421b8dd1`
BLAKE2b-256	`f380f323b6a1efa76c3890792649652b7357c99dbedd571ad214b16b80578b41`

See more details on using hashes here.

principal-feature-analysis 1.0.7

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Principal-Feature-Analysis (PFA)

Installation

Usage

Parameters

Output Files

Returns

Advanced

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes