`shapfire.utils`¶

This module contains a collection of general constants and functions that are used by several other functions and classes in the ShapFire library.

shapfire.utils.DEFAULT_RANDOM_SEED : int = 123¶: The default random seed used across modules.

shapfire.utils.DEFAULT_REPLACE_VALUE : float = 0.0¶: The default value that NaN or None values are replaced with.

shapfire.utils.DROP : str = 'drop'¶: The default string value used to indicate that samples associated with a dataset (X) and target variable (y) should be dropped if NaN or None values are contained in a sample.

shapfire.utils.DROP_FEATURES : str = 'drop_features'¶: The default string value used to indicate that a feature (column) in a dataset (X) should be dropped if it contains NaN or None values.

shapfire.utils.DROP_SAMPLES : str = 'drop_samples'¶: The default string value used to indicate that a sample (row) in a dataset (X) should be dropped if it contains NaN or None values.

shapfire.utils.REPLACE : str = 'replace'¶: The default string value used to indicate that NaN or None values should be replaced with another given value.

shapfire.utils.SKIP : str = 'skip'¶: The default string value used to indicate that a value should be skipped whenever a NaN or None value is encountered.

shapfire.utils.associations(X: DataFrame, nan_strategy: str = DROP_SAMPLES, nan_replace_value: float = DEFAULT_REPLACE_VALUE) → DataFrame[source]¶

Calculate pairwise measures of association/correlation between numerical and categorical features in a given dataset. Numerical-numerical association is measured through Spearman’s correlation coefficient, numerical-categorical association is measured through the correlation ratio and categorical- categorical association is measured through Cramer’s V.

Parameters¶:

X: DataFrame¶: The input dataset that is assumed to contain features (columns) and corresponding observations (rows).
nan_strategy: str = DROP_SAMPLES¶: The action to take in case the input dataset contains NaN or None values. Defaults to DROP_SAMPLES.
nan_replace_value: float = DEFAULT_REPLACE_VALUE¶: In case the nan_strategy is shapfire.utils.REPLACE, then this argument determines the value which NaN or None values are replaced by. Defaults to shapfire.utils.DEFAULT_REPLACE_VALUE.

Raises¶:

ValueError – If the number of category and float features (columns) in the pandas dataframe do not add up to the total number of features (columns) contained in the dataframe.

Returns¶:

A symmetric pandas dataframe that contains all pariwise feature correlation/association values.

Last update: Jun 12, 2022

shapfire.utils¶

`shapfire.utils`¶