Skip to main content

Tool for quickly creating a composition-based feature vector.

Project description

CBFV Package

Tool to quickly create a composition-based feature vectors from materials datafiles.

Installation

The source code is currently hosted on GitHub at: https://github.com/kaaiian/CBFV

Binary installers for the latest released version are available at the Python Package Index (PyPI)

# PyPI
pip install CBFV

Making the composition-based feature vector

The CBFV package assumes your data is stored in a pandas dataframe of the following structure:

formula target
Tc1V1 248.539
Cu1Dy1 66.8444
Cd3N2 91.5034

To featurize this data, the generate_features function can be called as follows:

from CBFV import composition
X, y, formulae, skipped = composition.generate_features(df)

Extended Functionality

The featurization scheme can be adjusted by calling the the elem_prop parameter. The following featurization schemes are included within CBFV:

  • jarvis
  • magpie
  • mat2vec
  • oliynyk (default)
  • onehot
  • random_200

Duplicate formula handeling is controlled by the drop_duplicates parameter. It is set to False by default to preserve datapoints containing variation outside of their formula. For example, heat capacity measurements performed for the same material at different temperatures.

The extend_features parameter is used to specify whether columns outside of ['formula', 'target'] should be considered during featurization. It is set to False by default to exclude nuisance information from consideration. Setting extend_features=True would allow additional information (i.e. ['temperature', 'pressure']) to be preserved.

The sum_feat parameter specifies whether to calculate the sum features when generating the CBFVs for the chemical formulae. It is set to False by default.

Calling generate_features with these parameters can be implemented as follows:

formula target temp
Tc1V1 248.539 373
Tc1V1 66.8444 473
Cd3N2 91.5034 273
from CBFV import composition
X, y, formulae, skipped = composition.generate_features(df,
                                                        elem_prop='magpie',
                                                        drop_duplicates=False,
                                                        extend_features=True,
                                                        sum_feat=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

composition_based_feature_vector-1.0.6.tar.gz (5.3 MB view hashes)

Uploaded source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page