Robustness Gym: Unifying the NLP Evaluation Landscape

Despite impressive performance on standard benchmarks, deep neural networks often fail when deployed to real-world systems, due to distribution shifts, training artifacts, and noisy data. To address these vulnerabilities, we introduce Robustness Gym: a simple and extensible toolkit for robustness testing that supports the entire spectrum of evaluation methodologies, from adversarial attacks to rule-based data augmentations.

Download paper View on Github

Coming soon

robustness gym screen capture