Model Evaluation
Manually evaluate candidate ensemble
Nomally, in the training process. validation will run at the end of every epoch
trainer.py
# Start training
for epoch in range(self.start_epoch,self.args.nb_epoch):
epoch += 1
print('Epoch: {} / {} '.format(epoch, self.args.nb_epoch))
# optimize parameters
training_metrics = self.optimize_epoch(dl_train , epoch)
# validated performance
val_dict, val_metrics = self.validate(dl_val)
# -------- LOG TRAINING METRICS --------
....
# -------- LOG VALIDATION METRICS --------
....
# Save Checkpoint weight
.....
And at the end of training process, test step(with test dataset) is run to produce evaluation result in --out_dir
directory along with confusion plot name confusion.png
Trainer will skip testing process if confusion.png
is already exists.
But if you want to run the test manually or use with specific checkpoint to observe result follow this step
- Create empty dirctory and move checkpoint here, rename as
best_weights.pt
- Use argument
--skip_training True
and specify path of directory with--out_dir
python3 train.py [--train_input_data_dir TRAIN_DIR] [--train_label_data_dir GT_DIR] [--all_checkpoint_dir CKP_DIR] [--skip_training True] [--out_dir MODEL_WEIGHT_PATH]
Evaluate ensemble model
To evalute the efficiency of all candidate ensemble with test set. We need to calculate avarage output of all model along with total variance as describe in uncertainty section.
Follow these step
Produce evaluation result for each candidat ensemble model. Each model must have
predictions.npy
andvariances.npy
to calculate output of ensemble and total varianceCreate a empty directory and place each model’s directory contain evaluation result from 1. Rename each model directory to
model_i
, subscript i with number of candiate (1 <= i <= n)
Directory hierarchy must be follow : example : parent directory name is ’ensemble_best_weight’ and candidate ensemble n = 5
/ensemble_best_weight
/model_1
/log
best_weights.pt
confusion.png
predictions.npy
results.txt
targets.npy
test_results.json
variances.npy
/model_2
...
/model_3
....
/model_4
....
/model_5
....
- Collect ensemble result, use command
python3 collect_ensembles.py [WEIGHTS_DIRS] [N]
Specify directory contain candidate ensemble in args[1] and number of candidate ensembles in args[2]
arguments:
WEIGHTS_DIRS absolute path of model ensemble weight's collection parent directory
N total number of candidate ensemble
The script will creates:
- a new subdirectory in e.g. “/experiment_dir/ensemble” # for futher implementation of K-fold cross validation,
- a new subdirectory in the experiment base directory e.g. “/experiment_dir/ensemble_collected”, containing the collected ensemble predictions and results
ensemble_collected directory contains
- Aleatoric Uncertainty on test set
- Epistemic Uncertainty on test set
- error_metrics.json contain evaluation result
- Calibration plot
- Confusion plot
Figure : Example of confusion plot
Figure : Example of calibration plot