Discussion
Meryll Dindin
truelson: Just going to say it... no mention of handling the security aspects of this. None. Scary.
mritchie712: > The data infrastructure underneath it took two years.yep, that's what Definite is for: https://www.definite.app/All the data infra (datalake + ELT/ETL + dashboards) you need in 5 minutes.
RobRivera: If I order now, do I get a second set for free?
georgeburdell: The article mentions that there’s an identification process and that at least some data has access control. What were you expecting?
Aperocky: Off meta: Are we tired of "one single product that solves everything" that every single AI product has became?grep didn't try to become awk, and jq and curl did exactly what they needed to do without wanting to become an OS (looking at you emacs), can we have that in the AI world? I think we will, in a few years, once this century's iteration of FSF catches up.
ricktdotorg: is it just me or was the scrollbar purposefully hidden on this site? in chrome on windows, i found it very jarring and user-hostile to NOT know how far along i was in reading the article.i make a judgement call early on: is this worth my time? my whole article calculation algo was thrown off by this.do not like.
iLoveOncall: Developing and serving GenAI models is highly unprofitable, so, no, we're not going to have that in the AI world.Either those model developers & providers package them in as many services as possible so that they can be somewhat profitable, or they die, and we don't have model developers & providers anymore.
pwr1: We tried something similar at a previous company — ended up with 3 different bots all answering slightly differently depending on which doc chunk they hit. The consistency problem is real.Curious how you handle updates. Like if someone edits the source doc, does the bot just start returning different answers or is there a review step?
Aperocky: Well, the product here has nothing to do with serving GenAI models. It's now application territory.And I prefer unix philosophy vs. the Copilot product approach.
add-sub-mul-div: I'm tired of endless LLM spam submissions from people who only use their accounts here to advertise and self promote.
Aperocky: LLM submissions are no different from tech submission of yesterday. But most people used to build tools that does one thing well instead whatever the current meta is.
meryll_dindin: We're a 30-person ed-tech company. I built a Slack bot that connects our data warehouse, 250k Google Drive files, support tickets, and codebase so anyone on the team can ask it a question and get a sourced answer back. The bot took two and a half weeks to build; the data infrastructure under it took two years. Wrote up the architecture, where trust breaks down, and what I'd build first if starting over.
mannanj: Hi. thanks for sharing. One thing I'd like to know is how often do you validate the answers? If a human gives an answer like the one the AI is giving for example, you'd probably expect a margin of error of like 1% of making a mistake. The AI though, is it 1% or less - and who's validating it? Are you trusting it more or less than a human?
oliver236: data engineering is all you need.everything else is smokeall ai applications are smoke and will be obsolete in a yeardo not be deceived
xmprt: > When a question touches restricted data — student PII, sensitive HR information — the agent doesn’t just refuse. It explains what it can’t access and proposes a safe reformulation. "I can’t show individual student names, but here’s the same analysis using anonymized IDs."This part is scary. It implies that if I'm in a department that shouldn't have access to this data, the AI will still run the query for me and then do some post-processing to "anonymize" the data. This isn't how security is supposed to work... did we learn nothing from SQL injection?
thunfischbrot: In the strongest interpretation of that it would offer only data which the user is allowed to access. Why do you assume that them implementing a feature to prevent PII being accessed that they then turn around and return data which the user is not supposed to access?
xmprt: If it's PII data the best thing for them to do is not even allow the AI to have access to it. They're admitting to that so I doubt they've gone through the effort to forward the user's auth token to the downstream database.And with security it's always best to assume the worst case (unless you're certain that something is safe) because that would lead you to add more safeguards rather than less.
jedberg: The thing that AI is best at is summarizing vast quantities of information. That means the most natural thing for an AI to do is be "the one tool to rule them all".The more information it has access to, the more useful the answer can be. But that also means that it can answer all the questions.
skeeter2020: >> The thing that AI is best at is summarizing vast quantities of informationby definition a summary is the best at nothing though, and the mentality that the best way to rule is from a single summarized interpretation is both flawed and scary. It's not answering all questions; it's attempting to provide a single summation dramatically influenced by training. Go ahead and incorporate this into your balanced and multi-perspective decision-making process, but "one tool to rule them all" is not the same thing and definitely not what we're getting.
saltcured: "If all you have is an LLM, every problem looks like summarizing information."Emphasis on looks like ;-)