Skip to main content

Tool for quickly creating a composition-based feature vector.

Project description

CBFV Package

Tool to quickly create a composition-based feature vectors from materials datafiles.

Installation

The source code is currently hosted on GitHub at: https://github.com/kaaiian/CBFV

Binary installers for the latest released version are available at the Python Package Index (PyPI)

# PyPI
pip install CBFV

Making the composition-based feature vector

The CBFV package assumes your data is stored in a pandas dataframe of the following structure:

formula target
Tc1V1 248.539
Cu1Dy1 66.8444
Cd3N2 91.5034

To featurize this data, the generate_features function can be called as follows:

from CBFV import composition
X, y, formulae, skipped = composition.generate_features(df)

Extended Functionality

The featurization scheme can be adjusted by calling the the elem_prop parameter. The following featurization schemes are included within CBFV:

  • jarvis
  • magpie
  • mat2vec
  • oliynyk (default)
  • onehot
  • random_200

Duplicate formula handeling is controlled by the drop_duplicates parameter. It is set to False by default to preserve datapoints containing variation outside of their formula. For example, heat capacity measurements performed for the same material at different temperatures.

The extend_features parameter is used to specify whether columns outside of ['formula', 'target'] should be considered during featurization. It is set to False by default to exclude nuisance information from consideration. Setting extend_features=True would allow additional information (i.e. ['temperature', 'pressure']) to be preserved.

The sum_feat parameter specifies whether to calculate the sum features when generating the CBFVs for the chemical formulae. It is set to False by default.

Calling generate_features with these parameters can be implemented as follows:

formula target temp
Tc1V1 248.539 373
Tc1V1 66.8444 473
Cd3N2 91.5034 273
from CBFV import composition
X, y, formulae, skipped = composition.generate_features(df,
                                                        elem_prop='magpie',
                                                        drop_duplicates=False,
                                                        extend_features=True,
                                                        sum_feat=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file composition_based_feature_vector-1.0.6.tar.gz.

File metadata

File hashes

Hashes for composition_based_feature_vector-1.0.6.tar.gz
Algorithm Hash digest
SHA256 a71e95c91eb83a680ecd9c862410a071c79429afddf853394f4dbf469f359f4d
MD5 2b87466d750479fd03141400c40003fd
BLAKE2b-256 841ff83fbf0c6435f3507bd3919f7cdb336282ed50cc3136ce1e7919a6ad40eb

See more details on using hashes here.

File details

Details for the file composition_based_feature_vector-1.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for composition_based_feature_vector-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 9d89e5fa27af9cbe47a8985dcdca2ef4702ff458c7043a882c7e3ccf8e30da76
MD5 534eb4faca19ad01d78ef9088d8c9286
BLAKE2b-256 928e00fa3b1718d5793dea7d4af38081bba672832476d6ebe008409a5ef578ac

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page