RegressionErrorAnalysisReport¶
-
class
olliepy.RegressionErrorAnalysisReport.
RegressionErrorAnalysisReport
(**kwargs)[source]¶ RegressionErrorAnalysisReport creates a report that analyzes the error in regression problems.
- titlestr
the title of the report
- output_directorystr
the directory where the report folder will be created
- train_dfpd.DataFrame
the training pandas dataframe of the regression problem which should include the target feature
- test_dfpd.DataFrame
the testing pandas dataframe of the regression problem which should include the target feature and the error column in order to calculate the error class
- target_feature_namestr
the name of the regression target feature
- error_column_namestr
the name of the calculated error column ‘Prediction - Target’ (see example on github for more information)
- error_classesDict[str, Tuple]
a dictionary containing the definition of the error classes that will be created. The key is the error_class name and the value is the minimum (inclusive) and maximum (exclusive) which will be used to calculate the error_class of the test observations.
- For example: error_classes = {
- ‘EXTREME_UNDER_ESTIMATION’: (-8.0, -4.0),
returns ‘EXTREME_UNDER_ESTIMATION’ if -8.0 <= error < -4.0
- ‘HIGH_UNDER_ESTIMATION’: (-4.0, -3.0),
returns ‘HIGH_UNDER_ESTIMATION’ if -4.0 <= error < -3.0
- ‘MEDIUM_UNDER_ESTIMATION’: (-3.0, -1.0),
returns ‘MEDIUM_UNDER_ESTIMATION’ if -3.0 <= error < -1.0
- ‘LOW_UNDER_ESTIMATION’: (-1.0, -0.5),
returns ‘LOW_UNDER_ESTIMATION’ if -1.0 <= error < -0.5
- ‘ACCEPTABLE’: (-0.5, 0.5),
returns ‘ACCEPTABLE’ if -0.5 <= error < 0.5
- ‘OVER_ESTIMATING’: (0.5, 3.0) }
returns ‘OVER_ESTIMATING’ if -0.5 <= error < 3.0
- acceptable_error_class: str
the name of the acceptable error class that was defined in error_classes
- numerical_featuresList[str] default=None
a list of the numerical features to be included in the report
- categorical_featuresList[str] default=None
a list of the categorical features to be included in the report
- subtitlestr default=None
an optional subtitle to describe your report
- report_folder_namestr default=None
the name of the folder that will contain all the generated report files. If not set, the title of the report will be used.
- encryption_secretstr default=None
the 16 characters secret that will be used to encrypt the generated report data. If it is not set, the generated data won’t be encrypted.
- generate_encryption_secretbool default=False
the encryption_secret will be generated and its value returned as output. you can also view encryption_secret to get the generated secret.
- create_report()
creates the error analysis report
-
create_report
(enable_patterns_report: bool = True, patterns_report_group_by_categorical_features: Union[str, List[str]] = 'all', patterns_report_group_by_numerical_features: Union[str, List[str]] = 'all', patterns_report_number_of_bins: Union[int, List[int]] = 10, enable_parallel_coordinates_plot: bool = True, cosine_similarity_threshold: float = 0.8, parallel_coordinates_q1_threshold: float = 0.25, parallel_coordinates_q2_threshold: float = 0.75, parallel_coordinates_features: Union[str, List[str]] = 'auto') → None[source]¶ Creates a report using the user defined data and the data calculated based on the error.
- Parameters
enable_patterns_report – enables the patterns report. default: True
patterns_report_group_by_categorical_features – categorical features to use in the patterns report. default: ‘all’
patterns_report_group_by_numerical_features – numerical features to use in the patterns report. default: ‘all’
patterns_report_number_of_bins – number of bins to use for each provided numerical feature or one number of bins to use for all provided numerical features. default: 10
enable_parallel_coordinates_plot – enables the parallel coordinates plot. default: True
cosine_similarity_threshold – The cosine similarity threshold to decide if the categorical distribution of the primary and secondary datasets are similar.
parallel_coordinates_q1_threshold – the first quantile threshold to be used if parallel_coordinates_features == ‘auto’. default: 0.25
parallel_coordinates_q2_threshold – the second quantile threshold to be used if parallel_coordinates_features == ‘auto’. default: 0.75
parallel_coordinates_features – The list of features to display on the parallel coordinates plot. default: ‘auto’
- If parallel_coordinates_features is set to ‘auto’, OlliePy will select the features with a distribution shift based on 3 thresholds:
cosine_similarity_threshold to be used to select categorical features if the cosine_similarity is lower than the threshold.
- parallel_coordinates_q1_threshold and parallel_coordinates_q2_threshold which are two quantile values.
- if primary_quantile_1 >= secondary_quantile_2 or secondary_quantile_1 >= primary_quantile_2
then the numerical feature is selected and will be added to the plot.
- Returns
None
-
save_report
(zip_report: bool = False) → None[source]¶ Creates the report directory, copies the web application based on the template name, saves the report data.
- Parameters
zip_report – enable it in order to zip the directory for downloading. default: False
- Returns
None
-
serve_report_from_local_server
(mode: str = 'server', port: int = None) → None[source]¶ Serve the report to the user using a web server. Available modes:
‘server’: will open a new tab in the default browser using webbrowser package
‘js’: will open a new tab in the default browser using IPython
‘jupyter’: will open the report in a jupyter notebook
- Parameters
mode – the selected web server mode. default: ‘server’
port – the server port. default: None. a random port will be generated between (1024-49151)
- Returns
None