The app for independent voices

Eric, really nice work and very clearly written. And very cost efficient!

A few Qs, if you don’t mind.

  1. Did you test any text-only models (like OSS 20B)? I notice all three models you tested were multi-modal and am wondering how important that is.

  2. In the “Score-weighed program selection” column, does “No” mean you sampled uniformly (at random), while “Yes” means you greedily take the highest train accuracy, using pixel match as a tie breaker?

  3. In the library generation phase, you did one round. Was that also 5 programs per task there?

  4. I suppose you have to execute the whole library on every task so that you get the scoring, correct? Should be quick, but does that become a bottleneck?

  5. The score is determined first by train accuracy and then by pixel accuracy. I assume that train accuracy is nearly always zero for the first round on a given task, so that means pixel accuracy must be doing all of the heavy lifting?

Sep 18
at
9:07 AM

Log in or sign up

Join the most interesting and insightful discussions.