Discussion

Research-Driven Agents: What Happens When Your Agent Reads Before It Codes

hungryhobbit: I think anyone who uses Claude knows that it works smarter when you have it make a plan first, and ask it to research the existing code as much as possible first ... so the results in this article doesn't surprise me at all.However, I'd be curious to hear back from others who have tried adding the shell script (at the end of the article) to their flow: does it (really) improve Claude?

phendrenad2: This is obvious, right? If you want to build a Facebook clone, you wouldn't tell the agent "build Facebook". You would provide it with a description of every page on Facebook, behaviors, interactions, UI, etc.

faeyanpiraat: Have you even read the TL;DR in the linked article??

simlevesque: I've been making skills from arxiv papers for a while. I have a one for multi-object tracking for example. It has a SKILL.md describing all important papers (over 30) on the subject and a folder with each paper's full content as reStructuredText.To feed Arxiv papers to LLMs I found that RST gives the best token count/fidelity ratio. Markdown lacks precision. LateX is too verbose. I have a script with the paper's urls, name and date that downloads the LateX zips from Arxiv, extracts it, transforms them to RST and then adds them to the right folder. Then I ask a LLM to make a summary from the full text, then I give other LLMs the full paper again with the summary and ask them to improve on and and proofread them. While this goes on I read the papers myself and at the end I read the summaries and if I approve them I add it to the skill. I also add for each paper info on how well the algorithms described do in common benchmarks.I highly recommend doing something similar if you're working in a cutting-edge domain. Also I'd like to know if anyone has recommendations to improve what I do.

paulluuk: This sounds like it would work, but honestly if you've already read all 30 papers fully, what do you still need to llm to do for you? Just the boilerplate?

MrLeap: What is RST?

outside1234: A research step (gather insights from across the codebase and internet for how to accomplish the next step), planning step (how should I sequence implementation given that research), an implementation step, and a verification step (code review of the implementation) is super effective workflow for me.

alex000kim: yup, as the blog says > The full setup works with any project that has a benchmark and test suite. so having a clear and measurable verification step is key. Meaning you can't simply give an AI agent a vague goal e.g. "improve the quality of the codebase" because it's too general.

Reader /

Discussion