In collaboration with @ema-ridopoco.bsky.social Tommaso Carraro @paolomorettin.bsky.social @emilevankrieken.com @nolovedeeplearning.bsky.social @looselycorrect.bsky.social @andreapasserini.bsky.social
10.12.2024 19:10 โ ๐ 7 ๐ 0 ๐ฌ 0 ๐ 0@samubortolotti.bsky.social
Ph.D. student in Artificial Intelligence at the University of Trento.
In collaboration with @ema-ridopoco.bsky.social Tommaso Carraro @paolomorettin.bsky.social @emilevankrieken.com @nolovedeeplearning.bsky.social @looselycorrect.bsky.social @andreapasserini.bsky.social
10.12.2024 19:10 โ ๐ 7 ๐ 0 ๐ฌ 0 ๐ 0Want to know more?
1๏ธโฃ Learn more about RSs: Why they appear, their root causes, and mitigation: arxiv.org/abs/2305.19951
2๏ธโฃ Make NeSy models aware of their shortcuts: arxiv.org/abs/2402.12240
For other details regarding rsbench, datasets, and experiments, check the links below:
Website: unitn-sml.github.io/rsbench/
Paper: openreview.net/forum?id=5Vt...
GitHub: github.com/unitn-sml/rs...
Easy to set up and use!
1๏ธโฃ Configurable: can be easily configured with YAML/JSON files.
2๏ธโฃ Intuitive: straightforward to use:
๐ 8 challenging tasks, all with predefined settings.
3 new benchmarks:
๐ข MNMath for arithmetic reasoning
๐ MNLogic for SAT-like problems
๐ SDD-OIA, a synthetic self-driving task!
They can all be made easier or harder with our data generator!
๐งช Test your models!
- ๐ Evaluate concepts in in- and out-of-distribution scenarios.
- ๐ฏ Ground-truth concept annotations are available for all tasks.
- ๐ Visualize how your models handle different learning & reasoning tasks!
๐ rsbench allows you to:
- ๐งฎ Run algorithmic, logical, and high-stakes tasks w/ known reasoning shortcuts (RSs).
- ๐ Eval concept quality via F1, accuracy & concept collapse.
- ๐ ๏ธ Easily customize the tasks and count RSs a priori using our countrss tool!
๐ค What are reasoning shortcuts?
NeSy models might learn wrong concepts but still make perfect predictions!
Example: A self-driving car ๐ stops in front of a ๐ฆ๐ด or a ๐ถ. Even if it confuses the two, it outputs the right prediction!
๐ rsbench allows you to evaluate the concepts learned by:
1๏ธโฃ Neuro-Symbolic models (#NeSy)
2๏ธโฃ Concept Bottleneck Models (#CBMs)
3๏ธโฃ Black-box Neural Networks (NNs*)
4๏ธโฃ Vision-Language Models (#VLMs*)
* through post-hoc concept-based explanations (e.g., TCAV)
๐ฃ Does your model learn high-quality #concepts, or does it learn a #shortcut? 
Test it with our #NeurIPS2024 dataset & benchmark track paper!
rsbench: A Neuro-Symbolic Benchmark Suite for Concept Quality and Reasoning Shortcuts
What's the deal with rsbench? ๐งต
by @ema-ridopoco.bsky.social @looselycorrect.bsky.social @andreapasserini.bsky.social @samubortolotti.bsky.social 
eg
๐ proceedings.neurips.cc/paper_files/...
๐ openreview.net/forum?id=pDc...
๐ unitn-sml.github.io/rsbench/