cursor.com
We use a hybrid online-offline eval process to keep our understanding of model quality aligned with what developers actually do.