Discussion
Search code, repositories, users, issues, pull requests...
cluckindan: Nice. Now, to vibe myself a locally hosted alternative.
marconardus: It might be worthwhile to include some of an example run in your readme.I scrolled through and didn’t see enough to justify installing and running a thing
vidarh: I was about to say they have a self-hosting guide, but I see they use third party services that seem absolutely pointless for such a tiny dataset. For comparison, I have a project that happily analyzes 150 million tokens worth of Claude session data w/some basic caching in plain text files on a $300 mini pc in seconds... If/when I reach billions, I might throw Sqlite into the stack. Maybe once I reach tens of billions, something bigger will be worthwhile.
keks0r: There is also a docker setup in there to run everything locally.
lau_chan: Does it work for Codex?
keks0r: Yes we added codex support, but its not yet extensively tested. Session upload works, but we kinda have to still QA all the analytics extraction.
ekropotin: > That's it. Your Claude Code sessions will now be uploaded automatically.No, thanks
keks0r: It will be only enabled for the repo where you called the `enable` command. Or use the cli `upload` command for specific sessions.Or you can run your own instance, but we will need to add docs, on how to control the endpoint properly in the CLI.
emehex: Claude Code comes with a built in /insights command
mrothroc: Nice, I've been working on the same problem from a different direction. Instead of analyzing sessions after the fact, I built a pipeline that structures them. Stages (plan, design, code, review, same as you'd have with humans) with gates in between.The gates categorize issues into auto-fix or human-review. Auto-fix gets sent back to the coding agent, it re-reviews, and only the hard stuff makes it to me. That structure took me from about 73% first-pass acceptance to over 90%.What I've been focused on lately is figuring out which gates actually earn their keep and which ones overlap with each other. The session-level analytics you're building would be useful on top of this, I don't have great visibility into token usage or timing per stage right now.I wrote up the analysis: https://michael.roth.rocks/research/543-hours/I also open sourced my log analysis tools: https://github.com/mrothroc/claude-code-log-analyzer
keks0r: This is great. How are you "identifying" these stages in the session? Or is it just different slash commands / skills per stage? If its something generic enough, maybe we can build the analysis into it, so it works for your use case. Otherwise feel free to fork the repo, and add your additional analysis. Let me know if you need help.
blef: Reminds me https://www.agentsview.io/.
mihir_kanzariya: The 26% abandonment rate with most dropping in the first 60 seconds is interesting but not surprising. I've noticed the same pattern with my own usage. If the initial prompt doesn't land right or the agent starts going down a wrong path early, it's way faster to just kill it and rephrase than to try correcting mid-session.The error cascade finding in the first 2 minutes tracks with this too. Once the agent misunderstands the task scope or makes a bad architectural choice early, everything downstream compounds. I've started being much more specific in my initial prompts because of this, almost like writing a mini spec before hitting enter.Curious if the data shows any correlation between prompt length/specificity and session success rate. That would be really useful for figuring out the sweet spot between "too vague" and "over-constraining the agent".
dmix: I've seen Claude ignore important parts of skills/agent files multiple times. I was running a clean up SKILL.md on a hundred markdown files, manually in small groups of 5, and about half the time it listened and ran the skill as written. The other half it would start trying to understand the codebase looking for markdown stuff for 2min, for no good reason, before reverting back to what the skill said.LLMs are far from consistent.
Aurornis: > 26% of sessions are abandoned, most within the first 60 secondsStarting new sessions frequently and using separate new sessions for small tasks is a good practice.Keeping context clean and focused is a highly effective way to keep the agent on task. Having an up to date AGENTS.md should allow for new sessions to get into simple tasks quickly so you can use single-purpose sessions for small tasks without carrying the baggage of a long past context into them.
keks0r: yes we had to tune the claude.md and the skill trigger quite a bit, to get it much better. But to be honest also 4.6 did improve it quite a bit. Did you run into your issues under 4.5 or 4.6?
dmix: I was using Sonnet 4.6 since it was a menial task
evrendom: true, the best comes out of it when one uses claude code and codex as a tag team