Using Nudges to Accelerate Code Reviews at ScaleDistinguished Paper Award
We describe a large-scale study to reduce the amount of time code review takes. Each quarter at Meta we survey developers. Combining sentiment data from a developer experience survey and telemetry data from our diff review tool, we address, ``When does a diff review feel too slow?'' From the sentiment data alone, we learn that 84.7% of developers are satisfied with the time their diffs spend in review. By enriching the survey results with telemetry for each respondent, we determined that sentiment is closely associated with the 75th percentile time in review for that respondent's diffs, ie those that take more than 24 hours.
To encourage developers to act on stale diffs that have had no action for 24 or more hours, we designed a NudgeBot to notify, ie nudge, reviewers. To determine who to nudge when a diff is stale, we created a model to rank the reviewers based on the probability that they will make a comment or perform some other action on a diff. This model outperformed models that looked at files the reviewer had modified in the past. Combining this information with prior author-review relationships, we achieved an MRR and AUC of .81 and .88, respectively.
To evaluate NudgeBot in production, we conducted an A/B cluster-randomized experiment on over 30k engineers. We observed substantial statistically significant decrease in both time in review (-6.8%, p=0.049) and time to first reviewer action (-9.9%, p=0.010). We also used guard metrics to ensure that most reviews were still done in fewer than 24 hours and that reviewers still spend the same amount of time looking at diffs, and saw {\it no} statistically significant change in these metrics.
NudgeBot is now rolled out company wide and is used daily by thousands of engineers at Meta.