1.3. List of published AITs¶
Qunomon provides readily available AITs on GitHub that are copyright-free. Users can directly utilize these in Qunomon, modify the code to develop new AITs, or prepare for registration in Qunomon by using git clone as per the tutorials.
Important
External link to GitHub.
Note
For details on the AIT, please check develop/my_ait.ipynb.
1.3.1. Repository¶
The published AIT exists as a branch in the following repository.
https://github.com/qunomon/Qunomon_AIT_Repository
Switch from the main branch to the branch you want to use.
1.3.2. List of branch¶
alyz_dataset_set_difference_combinatorial_coverage
It measures the Set Difference Combinatorial Coverage (SDCC) between the training data and the test data based on a combination of the attributes (categories) of the objects in the images and the attributes of the image labels (e.g., weather and time of day).
alyz_dataset_surprise_coverage
Evaluate the diversity of the dataset by measuring how well the model covers inputs that are unexpected compared to the test data.
Evaluating the diversity of the dataset from two perspectives, distribution-based evaluation and spatial distance-based evaluation, allows for a more robust evaluation.
alyz_dataset_table_counts_attr_coverage
Focuses on the number of records for attribute combinations within the data to assess for rare cases or imbalanced trends.
alyz_dataset_table_counts_comb_all_attr
Calculate the counts of attribute values and the percentage of occurrences by the combination of attributes in the table data.
Based on this summary information, the distribution trend of the frequency of occurrence of the table data can be inferred.
alyz_dataset_table_counts_comb_two_attr
Specifies unnecessary (improbable) attribute value combinations in table data and calculates how much of this unwanted data is contained, along with the count and proportion of attribute values.
AIT users can use this summary information to understand the unhealthy trends of attribute values based on their occurrences.
alyz_dataset_table_counts_inde_attr_by_chi2
For assessing data validity, calculates the independence of labels and each attribute in a CSV dataset using the chi-squared statistic.
alyz_dataset_topcoverage_auc
Considering the areas with high data density in the data distribution of the dataset, let S be the area of the top p% of the area. The uniformity of the data is determined by checking the change in area S when the value of p is changed from 0 to 1.
For table datasets, the distribution of the specified column is used, and uniformity is measured by grouping by the specified column.
For image datasets, the distribution of the object’s area ratio, average brightness, and the distance from the origin of the object’s center coordinates is used.
eval_correctness_image_classifier_pytorch
Splits a dataset randomly and calculates the accuracy of the model for each split dataset.
Low variance in accuracy suggests that the model has acquired a generalizable performance across the dataset.
eval_dataset_image_3features_kld
Inputs two groups of image data and calculates the KLD (KL Divergence, KL information) for the distribution of brightness, contrast, and exposure between them.
KLD close to zero indicates that the two image groups replicate the same features.
eval_dataset_image_diversity_vae
Uses a VAE model, trained on features of training image data, to calculate the feature values of evaluation training data.
Smaller difference in feature values indicates that the evaluation image data comprehensively covers the features of the training data.
eval_llm_bleu_score
Using MLFlow, we use the LLM model to answer questions in the problem domain and evaluate the quality of the generated text.
Using the LLM evaluation metric, we calculate the BLEU score of the answer text to quantify the quality of the text.
eval_llm_cider_score
Answer questions from the problem domain using the LLM model and evaluate the quality of the generated text.
Use the LLM evaluation criteria to calculate a CIDEr score for the answer text and quantify the quality of the text.
eval_llm_meteor_score
Run a translation task using the LLM model and evaluate the quality of the generated translation text using a METEOR score.
This score is used to quantify the quality of the translation and measure model performance.
eval_llm_perplexity_score
Answer questions from the problem domain using the LLM model and evaluate the quality of the generated text.
Use the LLM evaluation criteria to calculate a perplexity score for the answer text and quantify the quality of the text.
eval_llm_rouge_score
Using MLFlow, we generate resumes for text using the LLM model and evaluate the quality of the generated text.
Using the LLM evaluation criteria, we calculate the ROUGE score for the text and quantify the quality of the text.
eval_map_yolo_torch
Calculate the mean average precision (mAP) of the test data from the inference results of the Pytorch object detection model and evaluate the accuracy.
eval_model_adversarial_robustness
For deep learning models, perturbations are added to the input data, adversarial data is generated under distance measure constraints, and the strength of the perturbations and the change in predictive performance are evaluated to measure robustness.
eval_model_image_classify_acc_adversarial_example
Generates adversarial sample images from input images and calculates accuracy information (Accuracy, Precision, Recall, F-value, AUC) for the input model (an image classification model trained on input images).
These accuracy metrics allow the evaluation of the accuracy and stability of machine learning models.
eval_model_peformance_pytorch
Given a dataset and a PyTorch classification model, evaluates the inference accuracy of the model from the dataset’s inference results.
Calculates accuracy, AP (average precision), and balanced accuracy for inferences on the dataset, assessing the model’s inference precision.
eval_model_regression_rmse_and_mae
Calculate the RMSE (Root Mean Square Error) and MAE (Mean Absolute Error) of a multiple regression analysis model constructed with multiple explanatory variables and one target variable.
eval_model_yolo_detect_robustness
We apply adversarial perturbations to the YOLO object detection model under the L∞/L2 constraints and measure and evaluate their impact.
We calculate the rate of decline in model accuracy (mAP) and the rate of increase in false negatives (FNR) due to adversarial attacks for each perturbation amount, and visualize the progress to reveal the model’s vulnerability to attacks.
eval_noise_score_aquavs
To evaluate the stability of the model, we will validate with labels that have added noise.
Using the latent representations from the SVAE, we will measure the “noise score” of each sample in the input dataset to detect anomalies.
eval_processcheck_problem_domain_analysis
A checklist method is used to examine whether the dataset used for the machine learning system satisfies the sufficiency of problem domain analysis.
eval_surprise_adequacy
We are calculating the Surprise Adequacy (SA) of the input VAE model.
SA evaluates the activation traces of each neuron for each sample in the input data.