Interpretability Guarantees with Merlin-Arthur Classifiers (Q657): Difference between revisions

The repository provides the codebase for the Merlin-Arthur Classifiers, a novel multi-agent framework designed to enhance interpretability in machine learning models. Inspired by the Merlin-Arthur protocol from interactive proof systems, this project introduces a method to ensure interpretability guarantees, as detailed in our AISTATS 2024 paper, Interpretability Guarantees with Merlin-Arthur Classifiers. The approach is tested on the MNIST and UCI Census datasets, employing a verifier (Arthur) and two provers (Merlin and Morgana) in a setup that mimics a min-max game to refine classification outcomes. Our objective is to contribute to the development of interpretable AI systems, providing a toolkit for researchers and practitioners to replicate our experiments, engage with our methodology, and extend it to new contexts. The repository includes comprehensive guidance on setup, usage, and customization for various datasets and training modes. Getting Started involves cloning the repository, setting up the Conda environment with the necessary dependencies, and initializing wandb for experiment tracking. Basic Usage outlines steps for regular and Merlin-Arthur training on supported datasets, with examples for different configurations and advanced features. Regular training examples for MNIST and UCI Census datasets demonstrate how to customize training parameters, while Merlin-Arthur training provides a template for engaging in the strategic min-max game that characterizes our interpretability-enhancing methodology. Advanced Features detail customization options for loss functions, optimization techniques, and regularization, enabling researchers to fine-tune the training process according to their specific needs. This repository is intended as a collaborative platform for advancing interpretability in AI, and we welcome contributions, feedback, and partnerships from the broader community.

0 references

MaRDI profile type

MaRDI software profile

0 references

Identifiers

Zenodo ID

10715563

0 references

DOI

10.5281/zenodo.10715563

0 references

@@ description / en / description / en @@
-Software published at Zenodo repository
+Software published at Zenodo repository.
@@ Property / description @@
+The repository provides the codebase for the Merlin-Arthur Classifiers, a novel multi-agent framework designed to enhance interpretability in machine learning models. Inspired by the Merlin-Arthur protocol from interactive proof systems, this project introduces a method to ensure interpretability guarantees, as detailed in our AISTATS 2024 paper, Interpretability Guarantees with Merlin-Arthur Classifiers. The approach is tested on the MNIST and UCI Census datasets, employing a verifier (Arthur) and two provers (Merlin and Morgana) in a setup that mimics a min-max game to refine classification outcomes. Our objective is to contribute to the development of interpretable AI systems, providing a toolkit for researchers and practitioners to replicate our experiments, engage with our methodology, and extend it to new contexts. The repository includes comprehensive guidance on setup, usage, and customization for various datasets and training modes. Getting Started involves cloning the repository, setting up the Conda environment with the necessary dependencies, and initializing wandb for experiment tracking. Basic Usage outlines steps for regular and Merlin-Arthur training on supported datasets, with examples for different configurations and advanced features. Regular training examples for MNIST and UCI Census datasets demonstrate how to customize training parameters, while Merlin-Arthur training provides a template for engaging in the strategic min-max game that characterizes our interpretability-enhancing methodology. Advanced Features detail customization options for loss functions, optimization techniques, and regularization, enabling researchers to fine-tune the training process according to their specific needs. This repository is intended as a collaborative platform for advancing interpretability in AI, and we welcome contributions, feedback, and partnerships from the broader community.
+Normal rank
@@ Property / MaRDI profile type @@
+MaRDI software profile
@@ Property / MaRDI profile type: MaRDI software profile / rank @@
+Normal rank

Interpretability Guarantees with Merlin-Arthur Classifiers (Q657): Difference between revisions

Latest revision as of 09:14, 20 February 2025

Statements

Identifiers

Sitelinks

mathematics(0 entries)