Evaluating Agent-Based Program Repair Using Modern Language Models

Evaluating Agent-based Program Repair at Google 🔗

Agent-based program repair is being evaluated for its effectiveness in automatically fixing complex bugs using modern language models (LLMs). The study focuses on a dataset of 178 bugs from Google's issue tracking system, which includes both human-reported and machine-reported issues. An agent called Passerine was developed to work within Google's development environment, demonstrating a repair success rate of 73% for machine-reported bugs and 25.6% for human-reported ones. Manual reviews indicated that a significant portion of the patches were semantically equivalent to the correct solutions. This research establishes a performance baseline for agent-based repair methods in an industrial context, highlighting differences in bug characteristics compared to the SWE-Bench dataset.

What is the main focus of the research?

The research focuses on evaluating the effectiveness of agent-based program repair methods in fixing bugs within Google's enterprise environment.

What were the success rates for bug repairs?

Passerine achieved a success rate of 73% for machine-reported bugs and 25.6% for human-reported bugs.

How does this study compare to the SWE-Bench dataset?

This study highlights that the bugs in the Google dataset differ from those in SWE-Bench in terms of language diversity, size, and types of changes required for fixes.