Discussion
Search code, repositories, users, issues, pull requests...
dev_tools_lab: Nice use of native video embedding. How do you handle cases where Gemini's response confidence is low? Do you have a fallback or threshold?
klntsky: why not skip the text conversion? is it usable at all?
sohamrj: gemini embedding 2 converts straight video to vectors. in this case, dashcam clips don't have audio to transcribe and even if they did, it would be useless in the search
Aeroi: very cool, anybody have apparent use cases for this?
hebelehubele: State surveillance
SpaceManNabs: > No transcription, no frame captioning, no intermediate text.If there is text on the video (like a caption or wtv), will the embedding capture that? Never thought about this before.If the video has audio, does the embedding capture that too?
sohamrj: Yes to both. The embedding is over raw video frames, so anything visible (text, signs, captions) gets captured in the vector. And Gemini Embedding 2 extracts the audio track and embeds it alongside the visual frames. So a query like 'someone yelling' would theoretically match on audio. My dashcam footage doesn't have audio though, so I haven't tested that side yet.