shapfire.utils
¶
This module contains a collection of general constants and functions that are used by several other functions and classes in the ShapFire library.
-
shapfire.utils.DEFAULT_REPLACE_VALUE : float =
0.0
¶ The default value that NaN or None values are replaced with.
-
shapfire.utils.DROP : str =
'drop'
¶ The default string value used to indicate that samples associated with a dataset (X) and target variable (y) should be dropped if NaN or None values are contained in a sample.
-
shapfire.utils.DROP_FEATURES : str =
'drop_features'
¶ The default string value used to indicate that a feature (column) in a dataset (X) should be dropped if it contains NaN or None values.
-
shapfire.utils.DROP_SAMPLES : str =
'drop_samples'
¶ The default string value used to indicate that a sample (row) in a dataset (X) should be dropped if it contains NaN or None values.
-
shapfire.utils.REPLACE : str =
'replace'
¶ The default string value used to indicate that NaN or None values should be replaced with another given value.
-
shapfire.utils.SKIP : str =
'skip'
¶ The default string value used to indicate that a value should be skipped whenever a NaN or None value is encountered.
-
shapfire.utils.associations(X: DataFrame, nan_strategy: str =
DROP_SAMPLES
, nan_replace_value: float =DEFAULT_REPLACE_VALUE
) DataFrame [source]¶ Calculate pairwise measures of association/correlation between numerical and categorical features in a given dataset. Numerical-numerical association is measured through Spearman’s correlation coefficient, numerical-categorical association is measured through the correlation ratio and categorical- categorical association is measured through Cramer’s V.
- Parameters¶:
- X: DataFrame¶
The input dataset that is assumed to contain features (columns) and corresponding observations (rows).
- nan_strategy: str =
DROP_SAMPLES
¶ The action to take in case the input dataset contains NaN or None values. Defaults to DROP_SAMPLES.
- nan_replace_value: float =
DEFAULT_REPLACE_VALUE
¶ In case the
nan_strategy
isshapfire.utils.REPLACE
, then this argument determines the value which NaN or None values are replaced by. Defaults toshapfire.utils.DEFAULT_REPLACE_VALUE
.
- Raises¶:
ValueError – If the number of category and float features (columns) in the pandas dataframe do not add up to the total number of features (columns) contained in the dataframe.
- Returns¶:
A symmetric pandas dataframe that contains all pariwise feature correlation/association values.