Discussion
The Future of Everything is Lies, I Guess
bensyverson: I get the frustration, but it's reductive to just call LLMs "bullshit machines" as if the models are not improving. The current flagship models are not perfect, but if you use GPT-2 for a few minutes, it's incredible how much the industry has progressed in seven years.It's true that people don't have a good intuitive sense of what the models are good or bad at (see: counting the Rs in "strawberry"), but this is more a human limitation than a fundamental problem with the technology.
zdragnar: That's not why the author calls them bullshit machines.> One way to understand an LLM is as an improv machine. It takes a stream of tokens, like a conversation, and says “yes, and then…” This yes-and behavior is why some people call LLMs bullshit machines. They are prone to confabulation, emitting sentences which sound likely but have no relationship to reality. They treat sarcasm and fantasy credulously, misunderstand context clues, and tell people to put glue on pizza.Yes, there have been improvements on them, but none of those improvements mitigate the core flaw of the technology.
Arainach: Whether LLMs can create correct content doesn't matter. We've already seen how they are being used and will be used.Fake content and lies. To drive outrage. To influence elections. To distract from real crimes. To overload everyone so they're too tired to fight or to understand. To weaken the concept that anything's true so that you can say anything. Because who cares if the world dies as long as you made lots of money on the way.
ajross: > it's reductive to just call LLMs "bullshit machines" as if the models are not improvingThis is true, but I prefer to think of it as "It's delusional to pretend as if human beings are not bullshit machines too".Lies are all we have. Our internal monologue is almost 100% fantasy. Even in serious pursuits, that's how it works. We make shit up and lie to ourselves, and then only later apply our hard-earned[1] skill prompts to figure out whether or not we're right about it.How many times have the nerds here been thinking through a great new idea for a design and how clever it would be before stopping to realize "Oh wait, that won't work because of XXX, which I forgot". That's a hallucination right there![1] Decades of education!
iamjackg: The problem, unfortunately, is the scale. It's always scale. Humans make all the kinds of mistakes that we ascribe to LLMs, but LLMs can make them much faster and at much larger scale.Models have gotten ridiculously better, they really have, but the scale has increased too, and I don't think we're ready to deal with the onslaught.
4ndrewl: It doesn't matter how good the models become. They can only deal in bullshit, in the academic use of the term.
josefritzishere: I appreciate the directness of calling LLMs "Bullshit machines." This terminology for LLMs is well established in academic circles and is much easier for laypeople to understand than terms like "non-deterministic." I personally don't like the excessive hype on the capabilities of AI. Setting realistic expectations will better drive better product adoption than carpet bombing users with marketing.
AStrangeMorrow: I have still mixed feelings about LLMs.If I take the example of code, but that extends to many domains, it can sometimes produce near perfect architecture and implementation if I give it enough details about the technical details and fallpits. Turning a 8h coding job into a 1h review work.On the other hand, it can be very wrong while acting certain it is right. Just yesterday Claude tried gaslighting me into accepting that the bug I was seeing was coming from a piece of code with already strong guardrails, and it was adamant that the part I was suspecting could in no way cause the issue. Turns out I was right, but I was starting to doubt myself
dwallin: Some people point at LLMs confabulating, as if this wasn’t something humans are already widely known for doing.I consider it highly plausible that confabulation is inherent to scaling intelligence. In order to run computation on data that due to dimensionality is computationally infeasible, you will most likely need to create a lower dimensional representation and do the computation on that. Collapsing the dimensionality is going to be lossy, which means it will have gaps between what it thinks is the reality and what is.
the_snooze: Two things can be true at the same time: The technology has improved, and the technology in its current state still isn't fit for purpose.I stress test commercially deployed LLMs like Gemini and Claude with trivial tasks: sports trivia, fixing recipes, explaining board game rules, etc. It works well like 95% of the time. That's fine for inconsequential things. But you'd have to be deeply irresponsible to accept that kind of error rate on things that actually matter.The most intellectually honest way to evaluate these things is how they behave now on real tasks. Not with some unfalsifiable appeal to the future of "oh, they'll fix it."
bensyverson: > the technology in its current state still isn't fit for purpose.This is a broad statement that assumes we agree on the purpose.For my purpose, which is software development, the technology has reached a level that is entirely adequate.Meanwhile, sports trivia represents a stress test of the model's memorized world knowledge. It could work really well if you give the model a tool to look up factual information in a structured database. But this is exactly what I meant above; using the technology in a suboptimal way is a human problem, not a model problem.
p_stuart82: models are improving. the pricing already assumes they're ready for prod. that's where the fires start
Kuyawa: And the past too, if we've been paying attention
floren: Six months bro, we're still so early
perching_aix: This is like all the usual anti-LLM talking points fused together.Doesn't it get boring?I like using these models a lot more than I stand hearing people talk about them, pro or contra. Just slop about slop. And the discussions being artisanal slop really doesn't make them any better.Every time I hear some variation of bullshitting or plagarizing machines, my eyes roll over. Do these people think they're actually onto something? I've been seeing these talking points for literal years.
gdulli: Computer graphics have been improving for decades but the uncanny valley remains undefeated. I don't know why anyone expects a breakthrough in other areas. There's a wall we hit and we don't understand our own consciousness and effectiveness well enough to replicate it.
PaulKeeble: In computer graphics we understand how it works, we just lack the computational power to do it real time, but we can with sufficient processing produce realistic looking images with physically accurate lighting. But when it comes to cognition its a lot of guesswork, we haven't yet mapped out the neuron connections in a brain, we haven't validated it works as popular science writing suggests. We don't understand intelligence, so all we can do is accidentally bumble into it and it seems unlikely that will just happen especially when its so hard to compute what we are already doing.
n4r9: The concern for me about LLMs confabulating is not that humans don't do it. It's that the massive scale at which LLMs will inevitably be deployed makes even the smallest confabulation extremely risky.
the_snooze: There's nothing in these models that say its purpose is software development. Their design and affordances scream out "use me for anything." The marketing certainly matches that, so do the UIs, so do the behaviors. So I take them at their word, and I see that failure modes are shockingly common even under regular use. I'm not out to break these things at all. I'm being as charitable and empirical as I can reasonably be.If the purpose is indeed software development with review, then there's nothing stopping multi-billion dollar companies from putting friction into these sytems to direct users towards where the system is at its strongest.
AIorNot: Yes see Karl Frisstons Free energy principlehttps://www.nature.com/articles/nrn2787
masfuerte: Why do you insist on reading and commenting on these articles that bore you so much?
simianwords: If it bullshits so much, you wouldn't have a problem giving me an example of it bullshitting on ChatGPT (paid version)? Lets take any example of a text prompt fitting a few pages - it may be a question in science or math or any domain. Can you get it to bullshit?
nradov: Which things actually matter? I think we can all agree that an LLM isn't fit for purpose to control a nuclear power plant or fly a commercial airliner. But there's a huge spectrum of things below that. If an LLM trading error causes some hedge fund to fail then so what? It's only money.
_dwt: I have a question for all the "humans make those mistakes too" people in this thread, and elsewhere: have you ever read, or at least skimmed a summary of, "The Origin of Consciousness in the Breakdown of the Bicameral Mind"? Did you say "yeah, that sounds right"? Do you feel that your consciousness is primarily a linguistic phenomenon?I am not trying to be snarky; I used to think that intelligence was intrinsically tied to or perhaps identical with language, and found deep and esoteric meaning in religious texts related to this (i.e. "in the beginning was the Word"; logos as soul as language-virus riding on meat substrate).The last ~three years of LLM deployment have disabused me of this notion almost entirely, and I don't mean in a "God of the gaps" last-resort sort of way. I mean: I see the output of a purely-language-based "intelligence", and while I agree humans can make similar mistakes/confabulations, I overwhelmingly feel that there is no "there" there. Even the dumbest human has a continuity, a theory of the world, an "object permanence"... I'm struggling to find the right description, but I believe there is more than language manipulation to intelligence.(I know this is tangential to the article, which is excellent as the author's usually are; I admire his restraint. However, I see exemplars of this take all over the thread so: why not here?)
simianwords: > I stress test commercially deployed LLMs like Gemini and Claude with trivial tasks: sports trivia, fixing recipes, explaining board game rules, etc. It works well like 95% of the time. That's fine for inconsequential things. But you'd have to be deeply irresponsible to accept that kind of error rate on things that actually matter.95% is not my experience and frankly dishonest.I have ChatGPT open right now, can you give me examples where it doesn't work but some other source may have got it correct?I have tested it against a lot of examples - it barely gets anything wrong with a text prompt that fits a few pages.> The most intellectually honest way to evaluate these things is how they behave now on real tasksA falsifiable way is to see how it is used in real life. There are loads of serious enterprise projects that are mostly done by LLMs. Almost all companies use AI. Either they are irresponsible or you are exaggerating.Lets be actually intellectually honest here.
xandrius: It feels like you probably went too deep in the LLM bandwagon.An LLM is a statistical next token machine trained on all stuff people wrote/said. It blends texts together in a way that still makes sense (or no sense at all).Imagine you made a super simple program which would answer yes/no to any questions by generating a random number. It would get things right 50% of the times. You can them fine-tune it to say yes more often to certain keywords and no to others.Just with a bunch of hardcoded paths you'd probably fool someone thinking that this AI has superhuman predictive capabilities.This is what it feels it's happening, sure it's not that simple but you can code a base GPT in an afternoon.
simianwords: If it were not "just a statistical next token machine", how different would it behave?Can you find an example and test it out?
beders: Thank you for putting it so succinctly.I keep explaining to my peers, friends and family that what actually is happening inside an LLM has nothing to do with conscience or agency and that the term AI is just completely overloaded right now.
LogicFailsMe: Old and stupid hot take IMO. I want the time back I put into perusing this. Even the scale of LLMs is puny next to the scale of lying humans and the sheer impact one compulsively lying human can have given we love to be led by confidently wrong narcissists. I mean if that isn't obvious by now, I guess it never will be. The Vogon constructor fleet is way overdue in my book.
stavros: I think there are two types of discussions, when it comes to LLMs: Some people talk about whether LLMs are "human" and some people talk about whether LLMs are "useful" (ie they perform specific cognitive tasks at least as well as humans).Both of those aspects are called "intelligence", and thus these two groups cannot understand each other.
throwaway27448: Humans can be reasoned with, though, and are capable of learning.
danieltanfh95: I think the discussion has to be more nuanced than this. "LLMs still can't do X so it's an idiot" is a bad line of thought. LLMs with harnesses are clearly capable of engaging with logical problems that only need text. LLMs are not there yet with images, but we are improving with UI and access to tools like figma. LLMs are clearly unable to propose new, creative solutions for problems it has never seen before.
throwaway27448: > LLMs with harnesses are clearly capable of engaging with logical problems that only need text.To some extent. It's not clear where specifically the boundaries are, but it seems to fail to approach problems in ways that aren't embedded in the training set. I certainly would not put money on it solving an arbitrary logical problem.
nomdep: "As LLMs etc. are deployed in new situations, and at new scale, there will be all kinds of changes in work, politics, art, sex, communication, and economics."For an article five years in the making, this is what I expected it to be about. Instead, we got a ramble about how imperfect LLMs are right now.
drob518: > It remains unclear whether continuing to throw vast quantities of silicon and ever-bigger corpuses at the current generation of models will lead to human-equivalent capabilities. Massive increases in training costs and parameter count seem to be yielding diminishing returns. Or maybe this effect is illusory. Mysteries!I’m not even sure whether this is possible. The current corpus used for training includes virtually all known material. If we make it illegal for these companies to use copyrighted content without remuneration, either the task gets very expensive, indeed, or the corpus shrinks. We can certainly make the models larger, with more and more parameters, subject only to silicon’s ability to give us more transistors for RAM density and GPU parallelism. But it honestly feels like, without another “Attention is All You Need” level breakthrough, we’re starting to see the end of the runway.
danny_codes: > Because who cares if the world dies as long as you made lots of money on the way.Guiding principle of the AI industry
gdulli: It's really the whole tech industry as it exists right now and AI is a victim of bad timing. If this AI had been invented 40 years ago there'd have been a lower ceiling on the damage it could do.Another way of saying that is that capitalism is the real problem, but I was never anti-capitalist in principle, it's just gotten out of hand in the last 5-10 years. (Not that it hadn't been building to that.)
palmotea: > Another way of saying that is that capitalism is the real problem, but I was never anti-capitalist in principle, it's just gotten out of hand in the last 5-10 years. (Not that it hadn't been building to that.)Capitalism is a tool and it's fine as a tool, to accomplish certain goals while subordinated to other things. Unfortunately it's turned into an ideology (to the point it's worshiped idolatrously by some), and that's where things went off the rails.
Frieren: > Some people point at LLMs confabulatingNo. LLMs do not confabulate they bullshit. There is a big difference. AIs do not care, cannot care, have not capacity to care about the output. String tokens in, string tokes out. Even if they have all the data perfectly recorded they will still fail to use it for a coherent output.> Collapsing the dimensionality is going to be lossy, which means it will have gaps between what it thinks is the reality and what is.Confabulation has to do with degradation of biological processes and information storage.There is no equivalent in a LLM. Once the data is recorded it will be recalled exactly the same up to the bit. A LLM representation is immutable. You can download a model a 1000 times, run it for 10 years, etc. and the data is the same. The closes that you get is if you store the data in a faulty disk, but that is not why LLMs output is so awful, that would be a trivial problem to solve with current technology. (Like having a RAID and a few checksums).
stronglikedan: I don't even think they bullshit, since that requires conscious effort that they do not an cannot possess. They just simply interpret things incorrectly sometimes, like any of us meatbags.
thayne: They make incorrect predictions of text to respond to prompts.The neat thing about LLMs is they are very general models that can be used for lots of different things. The downside is they often make incorrect predictions, and what's worse, it isn't even very predictable to know when they make incorrect predictions.
bitwize: The fact that these "bullshit machines" have already proven themselves relatively competent at programming, with upcoming frontier models coming close to eliminating it as a human activity, probably says a lot about the actual value and importance of programming in the scheme of things.
slopinthebag: I think it says more about the amount of automation we left on the table in the last few decades. So much of the code LLM's can generate are stuff that we should have completely abstracted away by now.
52-6F-62: > The Vogon constructor fleet is way overdue in my bookDon't you see it? That's exactly what "AI" in this context is.It's the bypass.Where does it end, eh? Build a quantum "AI" that will end up just needing more data, more input. The end goal must starts looking like creating an entirely new universe, a complete clone of everything we have here so it can run all the necessary computations and we can... ? (You are what a quantum AI looks like as it bumbles through the infinitude of calculable parameters on its way to the ultimate answer)
nathell: The post is just a prelude to a 10-part article, most of which is not yet released (but will be shortly). Judging by the table of contents, the things you expected will be elaborated on in subsequent parts.
xandrius: Wait, you're asking to find and produce a example of a feasible and better alternative to LLMs when they are the current forefront of AI technology?Anyway, just to play along, if it weren't just a statistical next token machine, the same question would have always the same answer and not be affected by a "temperature" value.
Scaevolus: They are bullshit machines because they do not have an internal mental model of truth like a human does. The flagship models bullshit less, but their fundamental architectures prevent having truth interfere with output.https://philosophersmag.com/large-language-models-and-the-co...
bensyverson: "Bullshit" is a human concept. LLMs do not work like the human brain, so to call their output "bullshit" is ascribing malice and intent that is simply not there. LLMs do not "think." But that does not mean they're not incredibly powerful and helpful in the right context.
slopinthebag: I sort of agree. In this context "bullshit" means "speech intended to persuade without regard for truth", and while it's true that LLM output is without regard for truth, it's not an entity capable of the agency to persuade, although functionally that is what it can appear like.https://en.wikipedia.org/wiki/On_Bullshit
stickfigure: I think it's too early to declare the Turing test passed. You just need to have a conversation long enough to exhaust the context window. Less than that, since response quality degrades long before you hit hard window limits. Even with compaction.Neuroplasticity is hard to simulate in a few hundred thousand tokens.
criley2: For as rigorous of a Turing test as you present, I believe many (or even most) humans would also fail it.How many humans seriously have the attention span to have a million "token" conversation with someone else and get every detail perfect without misremembering a single thing?
stickfigure: Response quality degrades long before you hit a million tokens.But sure, let's say it doesn't. If you interact with someone day after day, you'll eventually hit a million tokens. Add some audio or images and you will exhaust the context much much faster.However, I'll grant you that Turing's original imitation game (text only, human typist, five minutes) is probably pretty close, and that's impressive enough to call intelligence (of a sort). Though modern LLMs tend to manifest obvious dead giveaways like "you're absolutely right!"
nisegami: Here's the opening paragraph of chapter 2 with "people" subbed out for terms referring AI/models/etc."People are chaotic, both in isolation and when working with other people or with systems. Their outputs are difficult to predict, and they exhibit surprising sensitivity to initial conditions. This sensitivity makes them vulnerable to covert attacks. Chaos does not mean people are completely unstable; most people behave roughly like anyone else. Since people produce plausible output, errors can be difficult to detect. This suggests that human systems are ill-suited where verification is difficult or correctness is key. Using people to write code (or other outputs) may make systems more complex, fragile, and difficult to evolve."To me, this modified paragraph reads surprisingly plainly. The wording is off ("using people to write code") and I had to change that part about attractor behavior (although it does still apply IMO), but overall it doesn't seem like an incoherent paragraph.This is not meant to dunk on the author, but I think it highlights the author's mindset and the gap between their expectations and reality.
qsera: >95% is not my experience and frankly dishonest.Quite frankly, this is exactly like how two people can use the same compression program on two different files and get vastly different compression ratios (because one has a lot of redundancy and the other one has not).
simianwords: I'm asking for a single example.
qsera: But why do you need an example? Isn't it pretty well understood that LLMS will have trouble responding to stuff that is under represented in the training data?You will just won't have any clue what that could be.
simianwords: fair so it must be easy to give an example? I have ChatGPT open with 5.4-thinking. I'm honestly curious about what you can suggest since I have not been able to get it to bullshit easily.
embedding-shape: > I’m not even sure whether this is possible.Based on what's happened so far, maybe. At least that's exactly how we got to the current iteration back in 2022/2023, quite literally "lets see what happens when we throw an enormous amount data at them while training" worked out up until one point, then post-training seems to have taken over where labs currently differ.
htrp: We pay people to create more high quality tokens (mercor, turing) which are then fed into data generating processes (synthetic data) to create even more tokens to train on
drob518: But does that really help, or do you get distortion? The frequency distribution of human generated content moves slowly over time as new subjects are discussed. What frequency distribution do those “data generating processes” use? And at root, aren’t those “data generating processes” basically just another LLM (I.e., generating tokens according to a probability distribution)? Thus, aren’t we just sort of feeding AI slop into the next training run and humoring ourselves by renaming the slop as “synthetic data?” Not trying to be argumentative. I’m far from being an AI expert, so maybe I’m missing it. Feel free to explain why I’m wrong.
lamasery: I think this is leaning on the "lies are when you tell falsehoods on purpose; bullshit is when you simply don't care at all whether what you're saying is true" definition of bullshit. Cf. On Bullshit.So, they can't lie, but they can (and, in fact, exclusively do) bullshit.
NiloCK: I don't understand this. Many small errors distributed across a large deployment sounds a lot like normal mode of error prone humans / cogs / whatevers distributed over a wide deployment.
krainboltgreene: Your project vue-skuilder has 6 github action steps devoted to checking the work you do before it's allowed to go out. You do not trust yourself to get things right 100% of the time.I am watching people trust LLM-based analysis and actions 100% of the time without checking.
rudhdb773b: > what actually is happening inside an LLM has nothing to do with conscience or agencyWhat makes you think natural brains are doing something so different from LLMs?
qsera: For starters, natural brains have the innate ability to differentiate between things that it knows and things that it have no possibility of knowing...
hedgehog: Structurally a transformer model is so unrelated to the shape of the brain there's no reason to think they'd have many similarities. It's also pretty well established that the brain doesn't do anything resembling wholesale SGD (which to spell it is evidence that it doesn't learn in the same way).
mcpar-land: it's not a bullshit machine because its output is bad, it's a bullshit machine because its output is literally 'bullshit' as in, output that is statistically likely but with no factual or reasoning basis. as the models have improved, their bullshit is more statistically likely to sound coherent (maybe even more likely to be 'accurate'), but no more factual and with no more reasoning.
abraxas: However, when fed source material into the context they will lie less, right? So at this point is it not just a battle of the nines until it's called "good enough"?I also wonder if I leave my secretary with a ream of papers and ask him for a summary how many will he actually read and understand vs skim and then bullshit? It seems like the capacity for frailty exists in both "species".
qsera: I am not the OP, an I have only used ChatGPT free version. Last day I asked it something. It answered. Then I asked it to provide sources. Then it provided sources, and also changed its original answer. When I checked the new answers it was wrong, and when I checked sources, it didn't actually contain the information that I asked for, and thus it hallucinated the answers as well as the sources...
simianwords: I trust you. If it were happening so frequently you may be able to give me a single prompt to get it to bullshit?
downboots: It was not meant as a pass/fail
knowaveragejoe: > No. LLMs do not confabulate they bullshit. There is a big difference. AIs do not care, cannot care, have not capacity to care about the output. String tokens in, string tokes out. Even if they have all the data perfectly recorded they will still fail to use it for a coherent output.Isn't "caring" a necessary pre-requisite for bullshitting? One either bullshits because they care, or don't care, about the context.
marssaxman: They're presumably referring to the Harry Frankfurt definition of bullshit: "speech intended to persuade without regard for truth. The liar cares about the truth and attempts to hide it; the bullshitter doesn't care whether what they say is true or false."
dgb23: Thought of the same book when reading the above.
krainboltgreene: Any amount of reading into how we understand brains and LLMs to work.
beders: I think you highlight one of the problems with users of LLMs: You can't tell anymore if it is BS or not.I caught Claude the other day hallucinating code that was not only wrong, but dangerously wrong, leading to tasks being failed and never recover. But it certainly wasn't obvious.
munificent: There is a whole giant essay I probably need to write at some point, but I can't help but see parallels between today and the Industrial Revolution.Prior to the industrial revolution, the natural world was nearly infinitely abundant. We simply weren't efficient enough to fully exploit it. That meant that it was fine for things like property and the commons to be poorly defined. If all of us can go hunting in the woods and yet there is still game to be found, then there's no compelling reason to define and litigate who "owns" those woods.But with the help of machines, a small number of people were able to completely deplete parts of the earth. We had to invent giant legal systems in order to determine who has the right to do that and who doesn't.We are truly in the Information Age now, and I suspect a similar thing will play out for the digital realm. We have copyright and intellecual property law already, of course, but those were designed presuming a human might try to profit from the intellectual labor of others. With AI, we're in the industrial era of the digital world. Now a single corporation can train an AI using someone's copyrighted work and in return profit off the knowledge over and over again at industrial scale.This completely unpends the tenuous balance between creators and consumers. Why would a writer put an article online if ChatGPT will slurp it up and regurgitate it back to users without anyone ever even finding the original article? Who will contribute to the digital common when rapacious AI companies are constantly harvesting it? Why would anyone plant seeds on someone else's farm?It really feels like we're in the soot-covered child-coal-miner Dickensian London era of the Information Revolution and shit is gonna get real rocky before our social and legal institutions catch up.
drob518: A couple thoughts…Mostly, AIs don’t recite back various works. Yes, there a couple of high profile cases where people were able to get an AI to regurgitate pieces it New York Times articles and Harry Potter books, but mostly not. Mostly, it is as if the AI is your friend who read a book and gives you a paraphrase, possibly using a couple sentences verbatim. In other words, it probably falls under a fair use rule.Secondly, given the modern world, content that doesn’t appear online isn’t consumed much, so creators who are,doing it for the money will certainly continue putting content online. Much of that content will be generated by AIs, however.
giraffe_lady: "These arguments may be correct but they aren't novel" ??
simianwords: I don't think calling AI a bullshit machine is correct. In spirit.
camgunz: I'm earnestly curious why not.
camgunz: If I have to suffer "look at this busted ass thing I slopped out with AI" a few times a week, you all have to suffer grouchy "AI bad" a few times a week. Fair is fair.
erichocean: AI is exactly the right term: the machines can do "intelligence", and they do so artificially.Just like we have machines that can do "math", and they do so artificially.Or "logic", and they do so artificially.I assume we'll drop the "artificial" part in my lifetime, since there's nothing truly artificial about it (just like math and logic), since it's really just mechanical.No one cares that transistors can do math or logic, and it shouldn't bother people that transistors can predict next tokens either.
mayama: > AI is exactly the right term: the machines can do "intelligence", and they do so artificially.AI in pop culture doesn't mean that at all. Most people impression to AI pre-LLM craze was some form of media based on Asmiov laws of robotics. Now, that LLMs have taken over the world, they can define AI as anything they want.
nine_k: If you look at different ancient traditions, you will notice how they struggle with the limitations of language, with its inability to represent certain things that are not just crucial for understanding the world, but also are even somehow communicable. Buddhists dug into that in a very analytical, articulate way, for instance.Another perspective: cetaceans are considered to be as conscious as humans, but any attempts to interpret their communication as a language failed so far. They can be taught simple languages to communicate with humans, as can be chimps. But apparently it's not how they process the world inside.
gbgarbeb: You're a little out of date. Cetaceans communicate images to each other in the form of ultrasonic chirps. They chirp, they hear a reflection, and they repeat the reflection.
nine_k: Does this resemble human language, with syntax, the ability to define new notions based on known notions, etc?
SoftTalker: The bullshitter does have an objective in mind however. There is some ultimate purpose to his bullshitting. LLMs don't even have that. They just spew words.
glitchc: > Claude launched into a detailed explanation of the differential equations governing slumping cantilevered beams. It completely failed to recognize that the snow was entirely supported by the roof, not hanging out over space. No physicist would make this mistake, but LLMs do this sort of thing all the time.You have to meet some physicist friends of mine then. They are likely to assume that the roof is spherical and frictionless.
drob518: > "LLMs still can't do X so it's an idiot"Let’s be careful. That’s a straw man. I don’t know anyone who says that. Aphyr says in the article that AIs can do things. But they have been marketed as “intelligent,” and I agree with Aphyr that the word is suggesting way more than AIs currently deliver. They do not reason and they do not think and are not truly intelligent. As the article says, they are big wads of linear algebra. Sometimes, that’s useful.
simianwords: Can you try to get a question that fits in 2-3 pages (text only) and test whether ChatGPT bullshits? I can’t do it. It gets pretty much everything.
drob518: Right, but we played the scaling card and it worked but is now reaching limits. What is the next card? You can surely argue that we can find a new one at any time. That’s the definition of a breakthrough. I just don’t see one at the moment.
embedding-shape: > I just don’t see one at the moment.Did you see the one before the current one was even found? Things tend to look easy in hindsight, and borderline impossible trying to look forward. Otherwise it sounds like you're in the same spot as before :)
dsign: > At the same time, ML models are idiots. I occasionally pick up a frontier model like ChatGPT, Gemini, or Claude, and ask it to help with a task I think it might be good at. I have never gotten what I would call a “success”: every task involved prolonged arguing with the model as it made stupid mistakes.I have a ton of skepticism built-in when interacting with LLMs, and very good muscles for rolling my eyes, so I barely notice when I shrug a bad answer and make a derogatory inner remark about the "idiots". But the truth is, that for such an "stochastic parrot", LLMs are incredibly useful. And, when was the last time we stopped perfecting something we thought useful and valuable? When was the last time our attempts were so perfectly futile that we stopped them, invented stories about why it was impossible, and made it a social taboo to be met with derision, scorn and even ostracism? To my knowledge, in all of known human history, we have done that exactly once, and it was millennia ago.
wk_end: > And, when was the last time we stopped perfecting something we thought useful and valuable? When was the last time our attempts were so perfectly futile that we stopped them, invented stories about why it was impossible, and made it a social taboo to be met with derision, scorn and even ostracism? To my knowledge, in all of known human history, we have done that exactly once, and it was millennia ago.I feel dense here, but I can't figure out what you're referring to. I asked ChatGPT (hah!) and it suggested the Tower of Babel, perpetual motion machines, or alchemy, but none of them really fit the bill.
lamasery: The Tower of Babel seems like an OK fit, but that's rather more poetic than what this seems to be getting at."Millennia" is what's really throwing me. We (respectable society, as the post outlines) didn't stop attempting alchemy or perpetual motion machines "millennia" ago, but a few centuries at most.All I can think of is immortality. The very first surviving long recorded tale in human history that I'm aware of is about how it's a futile quest (The Epic of Gilgamesh, IIRC ~5,000ish years old in its earliest extant fragments, a few hundred years newer in reasonably-complete form). The trouble with that is despite wide observations over literally millennia that this has never even come close to working and repeated supposition and suggestion that it's unwise to attempt, outright impossible, or somehow sacrilegious (the "taboo" thing, as mentioned), I'm not aware of any time in history that rich people haven't been actively trying for it (including today! That's what all the body-freezing business is about, it's modern mummification, the contracts are the formulaic prayers carved in the tomb walls) and usually they're not exactly "scorned" or "ostracized" for it.
hackinthebochs: >Structurally a transformer model is so unrelated to the shape of the brain there's no reason to think they'd have many similarities.Substrate dissimilarities will mask computational similarities. Attention surfaces affinities between nearby tokens; dendrites strengthen and weaken connections to surrounding neurons according to correlations in firing rates. Not all that dissimilar.
krainboltgreene: > The current corpus used for training includes virtually all known material.This is just totally incorrect. It's one of those things everyone just assumes, but there's an immense amount of known material that isn't even digitized, much less in the hands of tech companies.
drob518: What large caches of undigitized content exists? Surely, not everything has been digitized, but I can’t think it’s much in percentage terms.
cgh: The Vatican Library contains roughly 1.1 million printed books and around 75,000 codices, only a small percentage of which have been digitised.
drob518: Which is what percent of the world’s content? 0.000000001% or something similar. It’s nothing in the scheme of things. To put it another way, if we were to digitize that continent and train on it, our AIs would not get noticeably better in any way. It doesn’t move the needle.