Discussion
AI And The Ship of Theseus
thangalin: Translate an alternative?https://github.com/albfernandez/juniversalchardet
scuff3d: The solution to this whole situation seems pretty simple to me. LLMs were trained on a giant mix of code, and it's impossible to disentangle it, but a not insignificant portion of their capabilities comes from GPL licenced code. Therefore, any codebase that uses LLM code is now GPL. You have a proprietary product? Not anymore.Not saying there's a legal precedent for that right now, but it's the only thing that makes any sense to me. Either that or retain the models on only MIT/similarly licenced code or code you have explicit permission to train on.
keithnz: if you train yourself by looking at GPL code then go implement your own things, is that code GPL?
moralestapia: >I personally have a horse in the race here because I too wanted chardet to be under a non-GPL license for many years.Ugh, it's so disgusting to see people who are either malicious or non mentally capable enough to understand what is the purpose of software licenses."But I wish that car was free", sure pal, but it's not. Are you like, 8 years old?Licenses exists for a reason, which is to enforce them. When the author of a project choose a specific license s/he is making a deliberate decision. S/he wants these terms to be reigning over his/her work, in perpetuity. People who pretend they didn't see it or play dumb are in for some well-deserved figuring out.
jimmaswell: This entirely misses the point. Re-implementing code based on API surface and compatibility is established fair use if done properly (Compaq v. IBM, Google v. Oracle). There's nothing wrong with doing that if you don't like a license. What's in question is doing this with AI that may or may not have been trained on the source. In the instance in the article where the result is very different, it's probably in the clear regardless. I'm sympathetic to the author as I generally don't like GPL either outside specific cases where it works well like the Linux kernel.
trueismywork: The real test would be to see how much of generated code is similar to the old code. Because then it is still a copyright. Just becsuse you drew mickey mouse from memory doesnt above you if it looks close enough to original hickey mouse.
AberrantJ: Of course not, because everyone making these arguments wants people to have some magic sauce so they get to ignore all the rules placed on the "artificial" thing.
bakugo: If you genuinely believe that you are not above a literal text completion algorithm and do not deserve any more rights than it, that says more about you than anything else.
nomdep: In this emerging reality, the whole spectrum of open-source licenses effectively collapses toward just two practical choices: release under something permissive like MIT (no real restrictions), or keep your software fully proprietary and closed.These are fascinating, if somewhat scary, times.
measurablefunc: If you listen to the people who believe real AI is right around the corner then any software can be recreated from a detailed enough specification b/c whatever special sauce is hidden in the black box can be inferred from its outward behavior.
coldtea: >Licenses exists for a reasonYes, and the choice of license for a project is made for a reason that not necessarily everybody agree with.And the people who don't agree, have every right to implement a similar, even file-format or API compatible, project and give it another license. Gnumeric vs Excel, for example.It's so disgusting to see people who are either malicious or non mentally capable enough to understand this.
the_mitsuhiko: > "But I wish that car was free", sure pal, but it's not. Are you like, 8 years old?Just because things are not as one wants, does not stop that desire to be there.> When the author of a project choose a specific license s/he is making a deliberate decision.Potentially, potentially not. I used to release software under GPL and LGPL but changed my mind a few years after that. I did so in part because of conversations I had with others that convinced me that my values are closer aligned with permissive licenses.So engaging in a friendly discourse with a maintainer to ask them to relicense is a perfectly fine thing to do and an issue has been with chardet for many, many years on the license.
estimator7292: If you copy and paste one line from a thousand different GPL projects, is the resulting program GPL?Let's be honest about what's happening here.
erelong: hopefully this continues to show how awkward the idea of "intellectual property" (IP) is until people abandon itIP sounds good in theory but enables things like "patent trolling" by large corps and creating all kinds of goofy barriers and arbitrary questions like we're asking about if re-implementations of ideas are "really ours"(maybe they were never anyone's in the first place, outside of legally created mentalities)ideas seem to fundamentally not operate like physical things so asserting they can be considered "property" opens the door for all kinds of absurdities like as pondered in the OP
HappyPanacea: > b/c whatever special sauce is hidden in the black box can be inferred from its outward behavior.This is not always true, for an extreme example see Indistinguishability obfuscation.
cheesecompiler: After cloning a test suite you're still left with ongoing maintenance and development, maintaining feature parity etc. There's a lot more than passing a test suite. If the rewrite is truly superior it deserves to become the new Ship of Theseus. But e.g. I doubt anyone's AI rewrites of SQLite will ever put a dent in its marketshare.
GaggiX: >Real AI is more brilliant than whatever algorithm you could ever think ofSo with "Real AI" you actually mean artificial superintelligence.
measurablefunc: I wrote what I meant & meant what I wrote. You can take up your argument w/ the people who think they're working on AI by adding more data centers & more matrix multiplications to function graphs if you want to argue about marketing terms.
GaggiX: I was just thinking that calling artificial superintelligence "Real AI" was funny.
vintagedave: Or GPL. Which I’m increasingly thinking is the only license. It requires sharing.And if anything can be reimplemented and there’s no value in the source any more, just the spec or tests, there’s no public-interest reason for any restriction other than completely free, in the GPL sense.
Hamuko: >Or GPL. Which I’m increasingly thinking is the only license. It requires sharing.It doesn't if Dan Blanchard spends some tokens on it and then licenses the output as MIT.
jmalicki: Who are you talking about? I can't find reference to this person.
embedding-shape: > or keep your software fully proprietary and closed.I guess it depends on your intention, but eventually I'm not sure it'll even be possible to keep it "fully proprietary and closed" in the hopes of no one being able to replicate it, which seems to be the main motivation for many to go that road.If you're shipping something, making something available, others will be able to use it (duh) and therefore replicate it. The barrier for being able to replicate things like this either together with LLMs or letting the LLM straight it up do it themselves with the right harness, seems to get lowered real quick, massive difference in just a few years already.
7777777phil: The legal question is a distraction. GPL was always enforced by economics: reimplementation had to cost more than compliance. At $1,100 for 94% API coverage, it doesn't. Copyleft was built for a world where clean-room rewrites were painful but they aren't anymore.
pixl97: Real AI will never be invented, because as AI systems become more capable we'll figure out humans weren't intelligent in the first place, therefore intelligence never existed.
measurablefunc: Don't worry, just 10 more data centers & a few more gigawatts will get you there even if the people building the data centers & powerplants are unintelligent & mindless drones. But in any event, I have no interest in religious arguments & beliefs so your time will be better spent convincing people who are looking for another religion to fill whatever void was left by secular education since such people are much more amenable to religious indoctrination & will very likely find many of your arguments much more persuasive & convincing.
measurablefunc: Corporate marketing is very effective. I don't have as many dollars to spend on convincing people that AI is when they give me as much data as possible & the more data they give me the more "super" it gets.
Something related, but different, happened with chardet. The current maintainer reimplemented it from scratch by only pointing it to the API and the test suite.
coldtea: >Something related, but different, happened with chardet. The current maintainer reimplemented it from scratch by only pointing it to the API and the test suite.Only "pointing it". But the LLM, who can recite over 90% of a book in its training set verbatim *, would have also have had trained on the original code.Maybe "the slop of Theseus" is a better title.* https://the-decoder.com/researchers-extract-up-to-96-of-harr...
the_mitsuhiko: Maybe, but the LLM did not recite the chardet source code so that argument does not appear to apply here.