shapfire.utils

This module contains a collection of general constants and functions that are used by several other functions and classes in the ShapFire library.

shapfire.utils.DEFAULT_RANDOM_SEED : int = 123

The default random seed used across modules.

shapfire.utils.DEFAULT_REPLACE_VALUE : float = 0.0

The default value that NaN or None values are replaced with.

shapfire.utils.DROP : str = 'drop'

The default string value used to indicate that samples associated with a dataset (X) and target variable (y) should be dropped if NaN or None values are contained in a sample.

shapfire.utils.DROP_FEATURES : str = 'drop_features'

The default string value used to indicate that a feature (column) in a dataset (X) should be dropped if it contains NaN or None values.

shapfire.utils.DROP_SAMPLES : str = 'drop_samples'

The default string value used to indicate that a sample (row) in a dataset (X) should be dropped if it contains NaN or None values.

shapfire.utils.REPLACE : str = 'replace'

The default string value used to indicate that NaN or None values should be replaced with another given value.

shapfire.utils.SKIP : str = 'skip'

The default string value used to indicate that a value should be skipped whenever a NaN or None value is encountered.

shapfire.utils.associations(X: DataFrame, nan_strategy: str = DROP_SAMPLES, nan_replace_value: float = DEFAULT_REPLACE_VALUE) DataFrame[source]

Calculate pairwise measures of association/correlation between numerical and categorical features in a given dataset. Numerical-numerical association is measured through Spearman’s correlation coefficient, numerical-categorical association is measured through the correlation ratio and categorical- categorical association is measured through Cramer’s V.

Parameters:
X: DataFrame

The input dataset that is assumed to contain features (columns) and corresponding observations (rows).

nan_strategy: str = DROP_SAMPLES

The action to take in case the input dataset contains NaN or None values. Defaults to DROP_SAMPLES.

nan_replace_value: float = DEFAULT_REPLACE_VALUE

In case the nan_strategy is shapfire.utils.REPLACE, then this argument determines the value which NaN or None values are replaced by. Defaults to shapfire.utils.DEFAULT_REPLACE_VALUE.

Raises:

ValueError – If the number of category and float features (columns) in the pandas dataframe do not add up to the total number of features (columns) contained in the dataframe.

Returns:

A symmetric pandas dataframe that contains all pariwise feature correlation/association values.


Last update: Jun 12, 2022