Discussion
Search code, repositories, users, issues, pull requests...
falcor84: The only thing missing is for the agents to publish and peer-review their research.
AlexCoventry: Wow, Gemini suggested a very similar experiment to me yesterday. Guess I know where it got the idea from, now. :-)
decker_dev: Interesting approach to constrain the search space - single file modification + fixed 5-min time budget makes experiments directly comparable. That's a smart design choice because the hardest part of automated ML research isn't running experiments, it's making results comparable across different changes.The program.md as a "skill" file for the agent is essentially prompt engineering for research direction. Would be curious to see how sensitive the results are to how you phrase the research program - whether a vaguely specified program.md leads to more creative discoveries vs. a tightly specified one that converges faster but explores less.
lostmsu: [delayed]
abeppu: but the experiments it did that "improved" validation BPB in the GH screenshot were all basically hyperparameter changes right? So is this better or worse, either per experiment or per unit time, than hyperparameter tuning techniques that don't involve an LLM? It's not clear from this if the LLM is more or less making random changes which sometimes work , and or the LLM thinking actually finds "good" changes because of what the LLM has internalized. E.g. how does this compare to a hyperparameter tuning pass with e.g. BayesOpt that does the same number of 5-min training experiments?
kubb: He's burning Claude tokens to slightly improve his tiny and not very capable LLM? It's fun, I bet, but wake me up when it leads to a research breakthrough.
ting0: That's a great idea.