Discussion

AI And The Ship of Theseus

thangalin: Translate an alternative?https://github.com/albfernandez/juniversalchardet

scuff3d: The solution to this whole situation seems pretty simple to me. LLMs were trained on a giant mix of code, and it's impossible to disentangle it, but a not insignificant portion of their capabilities comes from GPL licenced code. Therefore, any codebase that uses LLM code is now GPL. You have a proprietary product? Not anymore.Not saying there's a legal precedent for that right now, but it's the only thing that makes any sense to me. Either that or retain the models on only MIT/similarly licenced code or code you have explicit permission to train on.

keithnz: if you train yourself by looking at GPL code then go implement your own things, is that code GPL?

moralestapia: >I personally have a horse in the race here because I too wanted chardet to be under a non-GPL license for many years.Ugh, it's so disgusting to see people who are either malicious or non mentally capable enough to understand what is the purpose of software licenses."But I wish that car was free", sure pal, but it's not. Are you like, 8 years old?Licenses exists for a reason, which is to enforce them. When the author of a project choose a specific license s/he is making a deliberate decision. S/he wants these terms to be reigning over his/her work, in perpetuity. People who pretend they didn't see it or play dumb are in for some well-deserved figuring out.

jimmaswell: This entirely misses the point. Re-implementing code based on API surface and compatibility is established fair use if done properly (Compaq v. IBM, Google v. Oracle). There's nothing wrong with doing that if you don't like a license. What's in question is doing this with AI that may or may not have been trained on the source. In the instance in the article where the result is very different, it's probably in the clear regardless. I'm sympathetic to the author as I generally don't like GPL either outside specific cases where it works well like the Linux kernel.

trueismywork: The real test would be to see how much of generated code is similar to the old code. Because then it is still a copyright. Just becsuse you drew mickey mouse from memory doesnt above you if it looks close enough to original hickey mouse.

AberrantJ: Of course not, because everyone making these arguments wants people to have some magic sauce so they get to ignore all the rules placed on the "artificial" thing.

bakugo: If you genuinely believe that you are not above a literal text completion algorithm and do not deserve any more rights than it, that says more about you than anything else.

nomdep: In this emerging reality, the whole spectrum of open-source licenses effectively collapses toward just two practical choices: release under something permissive like MIT (no real restrictions), or keep your software fully proprietary and closed.These are fascinating, if somewhat scary, times.

measurablefunc: If you listen to the people who believe real AI is right around the corner then any software can be recreated from a detailed enough specification b/c whatever special sauce is hidden in the black box can be inferred from its outward behavior.

coldtea: >Licenses exists for a reasonYes, and the choice of license for a project is made for a reason that not necessarily everybody agree with.And the people who don't agree, have every right to implement a similar, even file-format or API compatible, project and give it another license. Gnumeric vs Excel, for example.It's so disgusting to see people who are either malicious or non mentally capable enough to understand this.

the_mitsuhiko: > "But I wish that car was free", sure pal, but it's not. Are you like, 8 years old?Just because things are not as one wants, does not stop that desire to be there.> When the author of a project choose a specific license s/he is making a deliberate decision.Potentially, potentially not. I used to release software under GPL and LGPL but changed my mind a few years after that. I did so in part because of conversations I had with others that convinced me that my values are closer aligned with permissive licenses.So engaging in a friendly discourse with a maintainer to ask them to relicense is a perfectly fine thing to do and an issue has been with chardet for many, many years on the license.

estimator7292: If you copy and paste one line from a thousand different GPL projects, is the resulting program GPL?Let's be honest about what's happening here.

erelong: hopefully this continues to show how awkward the idea of "intellectual property" (IP) is until people abandon itIP sounds good in theory but enables things like "patent trolling" by large corps and creating all kinds of goofy barriers and arbitrary questions like we're asking about if re-implementations of ideas are "really ours"(maybe they were never anyone's in the first place, outside of legally created mentalities)ideas seem to fundamentally not operate like physical things so asserting they can be considered "property" opens the door for all kinds of absurdities like as pondered in the OP

HappyPanacea: > b/c whatever special sauce is hidden in the black box can be inferred from its outward behavior.This is not always true, for an extreme example see Indistinguishability obfuscation.

cheesecompiler: After cloning a test suite you're still left with ongoing maintenance and development, maintaining feature parity etc. There's a lot more than passing a test suite. If the rewrite is truly superior it deserves to become the new Ship of Theseus. But e.g. I doubt anyone's AI rewrites of SQLite will ever put a dent in its marketshare.

GaggiX: >Real AI is more brilliant than whatever algorithm you could ever think ofSo with "Real AI" you actually mean artificial superintelligence.

measurablefunc: I wrote what I meant & meant what I wrote. You can take up your argument w/ the people who think they're working on AI by adding more data centers & more matrix multiplications to function graphs if you want to argue about marketing terms.

GaggiX: I was just thinking that calling artificial superintelligence "Real AI" was funny.

vintagedave: Or GPL. Which I’m increasingly thinking is the only license. It requires sharing.And if anything can be reimplemented and there’s no value in the source any more, just the spec or tests, there’s no public-interest reason for any restriction other than completely free, in the GPL sense.

Hamuko: >Or GPL. Which I’m increasingly thinking is the only license. It requires sharing.It doesn't if Dan Blanchard spends some tokens on it and then licenses the output as MIT.

jmalicki: Who are you talking about? I can't find reference to this person.

embedding-shape: > or keep your software fully proprietary and closed.I guess it depends on your intention, but eventually I'm not sure it'll even be possible to keep it "fully proprietary and closed" in the hopes of no one being able to replicate it, which seems to be the main motivation for many to go that road.If you're shipping something, making something available, others will be able to use it (duh) and therefore replicate it. The barrier for being able to replicate things like this either together with LLMs or letting the LLM straight it up do it themselves with the right harness, seems to get lowered real quick, massive difference in just a few years already.

7777777phil: The legal question is a distraction. GPL was always enforced by economics: reimplementation had to cost more than compliance. At $1,100 for 94% API coverage, it doesn't. Copyleft was built for a world where clean-room rewrites were painful but they aren't anymore.

pixl97: Real AI will never be invented, because as AI systems become more capable we'll figure out humans weren't intelligent in the first place, therefore intelligence never existed.

measurablefunc: Don't worry, just 10 more data centers & a few more gigawatts will get you there even if the people building the data centers & powerplants are unintelligent & mindless drones. But in any event, I have no interest in religious arguments & beliefs so your time will be better spent convincing people who are looking for another religion to fill whatever void was left by secular education since such people are much more amenable to religious indoctrination & will very likely find many of your arguments much more persuasive & convincing.

measurablefunc: Corporate marketing is very effective. I don't have as many dollars to spend on convincing people that AI is when they give me as much data as possible & the more data they give me the more "super" it gets.

f33d5173: I don't think it changes much about licensing in particular. People are going on about how since the AI was trained on this code, that makes it a derivative work. But it must be borne in mind that AI training doesn't usually lead to memorizing the training data, but rather learning the general patterns of it. In the case of source code, it learns how to write systems and algorithms in general, not a particular function. If you then describe an interface to it, it is applying general principles to implement that interface. Its ability to succeed in this depends primarily on the complexity of the task. If you give it the interfaces of a closed source and open sourced project of similar complexity, it will have a relatively equal time of implementing them.Even prior to this, relatively simple projects licensed under share alike licenses were in danger of being cloned under either proprietary or more permissive licenses. This project in particular was spared, basically because the LGPL is permissive enough that it was always easier to just comply with the license terms. A full on GPLed project like GCC isn't in danger of an AI being able to clone it anytime soon. Nevermind that it was already cloned under a more permissive license by human coders.

moralestapia: Hmm ... you don't have to ask for consent. You just slap the license you want to your code and that's it.It's not some sort of democracy, lol, it's a set of exclusive rights that are created the moment the work being copyrighted is produced.(For a quick intro I recommend: https://www.youtube.com/watch?v=bxVs7FCgOig)In the case of the license in question (L/GPL), it's one of the most strict ones out there, it explicitly forbids relicensing code under a different non-compatible license, like MIT; let me says that again, L/GPL EXPLICITLY FORBIDS the thing that happened here from happening.I sympathize with the guy that spent 12 years of his life maintaining the code, thank you for your service or something, but that does not make a difference. The wording of the (L/GPL) license is clear and the original author and most of the other 50 or so contributors did not approve of this.

coldtea: >Hmm ... you don't have to ask for consentNobody said you have.>You just slap the license you want to your code and that's it.Nobody said you can't.>It's not some sort of democracy, lolNobody said it is, lol.I'm answering to what you actually wrote, that those expressing their dislike of a project having a speicific license are "either malicious or non mentally capable enough" what licenses are for.That's a stupid argument putting other people down with a silly strawman.One can be perfecty capable to understand what licenses are for and still think a project made a mistake chosing a specific language, or want it to change to another (and sometimes, like in the examples I gave, the latter works too).

moralestapia: Hey, you can definitely rewrite your argument without resorting to bad language.Take a look at the guidelines that keep this place together: https://news.ycombinator.com/newsguidelines.html

Devasta: This is awful news, but I don't know what can be done, is it possible to have a new GPL4 that deals with this? I doubt it.

scuff3d: I work with people who literally won't even look at GPL code, because of the risk. So yes, potentially.

moralestapia: 100% agree, if we are fair and honorable.In practice, well ... you saw what's been going on with the Epstein files, etc... we are far from being ourselves in a world that's fair and honorable.(I'm not condoning it, I think it's massively trashy to steal code like this then pretend you're the good guy because of some super weird mental gymnastics you're doing)

scuff3d: Completely agree. This isn't practical. It's never going to happen just because of the sheer amount of capital behind LLM companies.You can do anything rotten, as long as you throw enough money at it.

scuff3d: It could be. The amount of code you copy doesn't matter, just depends on context and if your work could now be considered derivative.I said this else where, but I work with people who won't even look at GPL code because of the potential legal entanglements.Yes let's. Corporations with billions of dollars behind them whole sale stole copy right work and licenced code to train models, and then turned around and sold the result with no attribution or monetary benefit given to the people they stole from. They knew what they were doing and relied on the legal system being slow enough that they could plant a flag in the market before legal challenges killed them.It's an industry built on theft. By all rights they should have been sued/fined out of existence before it ever got this far. But if you have enough money you can make almost anything legal.

pixl97: I mean, it sounds kinda like you're the one making religious arguments. My response is one mocking how poorly egotistical people deal with the AI effect.Evolution built man that has intelligence based on components that do not have intelligence themselves, it is an emergent property of the system. It is therefore scientific to think we could build machines on similar principles that exhibit intelligence as an emergent property of the system. No woo woo needed.

AberrantJ: If you genuinely believe you cannot create something that has just as much rights as you have then I feel sorry for your children and anything you create.

measurablefunc: Me & a few friends are constructing a long ladder to get to the moon. Our mission is based on sound scientific & engineering principles we have observed on the surface of the planet which allows regular people to scale heights they could not by jumping or climbing. We only need a few trillions of dollars & a sufficiently large wall to support it while we climb up to the moon.There are lots of other analogies but the moon ladder is simple enough to be understood even by children when explaining how nothing can emerge from inert building blocks like transistors that is not reducible to their constituent parts.As I said previously, your time will be much better spent convincing people who are looking for another religion b/c they will be much more susceptible to your beliefs in emergent properties of transistors & data centers of sufficient scale & magnitude.

pixl97: >friends are constructing a long ladder to get to the moonCongratulations, you're working on a space elevator. A few trillion dollars would certainly get us out of the atmosphere, and the amount of advances in carbon nanotube and foam metal would rocket us ahead decades in material sciences. Couple this with massive banks of capacitors and you could probably generate enough electricity for a country by the charge differential from the top to the bottom.Oh, I get it, you were trying to be clever by saying something ignorant because it makes you feel special as a human rather than make realistic statements for the progress currently being made in the sciences.

cheesecompiler: > I personally think all of this is exciting. I’m a strong supporter of putting things in the open with as little license enforcement as possible. I think society is better off when we share, and I consider the GPL to run against that spirit by restricting what can be done with it.I like sharing too but could permissive only licenses not backfire? GPL emerged in an era where proprietary software ruled and companies weren't incentivized to open source. GPL helped ensure software stayed open which helped it become competitive against the monopoly proprietary giants resting on their laurels. The restriction helped innovation, not the supposedly free market.

jason_oster: You're putting a lot of responsibility on a license that has several permissive contemporaries. The original BSD license "Net/1" and GPL 1.0 were both published in 1989, while the MIT license has its roots set in "probably 1987" [1] with the release of X11.No doubt, GPL had some influence. But I would hardly single it out as the force that ensured software stayed open. Software stayed open because "information wants to be free" [2], not because some authors wield copyright law like a weapon to be used against corporations.[1]: https://opensource.com/article/19/4/history-mit-license[2]: A popular phase based on a fundamental idea that predates software.

cheesecompiler: I’m not saying it’s the only force. But if it wasn’t instrumental what’s your take on the cause of proprietary software dominating until relatively recently?

measurablefunc: I don't think you get it but good luck. I've already spent enough time in this thread & further engagement is not going to be productive for anyone involved.

qsera: >It is therefore scientific to think we could build machines on similar principles that exhibit intelligence as an emergent property of the system.Sure, but this ain't it.Actually, I think LLMs are a step in the wrong direction if we really want to reach true AI. So it actually delays it, instead of bringing us close to true AI.But LLMs are a very good scam that is not entirely snake oil. That is the best kind of scam.

luma: What you describe is essentially what happened, the AI result working from specs and tests was more performant than the original. The real AI you describe just rewrote chardet without looking at the source, only better.

duskdozer: It was instructed to look at the source...

duskdozer: Absolutist permissive licenses are how you get the xkcd jenga tower

JambalayaJimbo: How do you know it didn’t look at the source?

mzi: GPL was a response to Symbolics incorporating public domain into their software without giving back to the community (and Lisp Machines).

randallsquared: The vast majority of running instances of operating systems are Linux or BSD. I don't think proprietary software has dominated for 15-20 years.The two places it has won out thus far is in retail and SaaS. The environment of 1980 when most important software was locked behind proprietary licenses is quite far behind us.

moregrist: I suspect there’s a middle ground that involves either keeping tests more proprietary or a copyright license that bars using the work for AI reimplementation, or both.I think it’s entirely reasonable to release a test suite under a license that bars using it for AI reimplementation purposes. If someone wants to reimplement your work with a more permissive license, they can certainly do so, but maybe they should put the legwork in to write their own test suite.

Decabytes: The existence of permissive licenses like BSD or MIT does not show that copyleft was unimportant.Those licenses allowed code to remain open, but they also allowed it to be absorbed into proprietary products.The GPL’s significance was that it changed the default outcome. At a time when software was overwhelmingly proprietary, it created a mechanism that required improvements to remain available to users and developers downstream.Gcc was a massive deal for the reasons why compilers are free now today for example

josephg: I completely agree.Right now you can point claude at any program and ask it to analyse it, write an architecture document describing all the functionality. Then clear memory and get it to code against that architecture document.You can't do that as easily with closed source software. Except, if you can read assembly, every program is open source. I suspect we're not far away from LLMs being able to just disassemble any program and do the same thing.Is there a driver in windows that isn't in linux? No problem. Just ask claude to reverse engineer it, write out a document describing exactly how the driver issues commands to the device and what constraints and invariants it needs to hold. Then make a linux driver that works the same way.Have an old video game you wanna play on your modern computer? No problem. Just get claude to disassemble the whole thing. Then function by function, rewrite it in C. Then port that C code to modern APIs.It'll be chaos. But I'm quite excited about the possibilities.

raincole: I highly recommend read the post in question first before commenting.

pabs3: The latter will become MIT sooner or later with Ghidra plus LLM-assisted reverse engineering.https://reorchestrate.com/posts/your-binary-is-no-longer-saf... https://reorchestrate.com/posts/your-binary-is-no-longer-saf...Even SaaSS isn't safe from that type of process:https://news.ycombinator.com/item?id=47259485

visarga: If you got access to a working prototype of a software, you can use it for differential testing. So you got unlimited tests for free.

simonw: It was instructed NOT to look at the source, with the one exception that it was told to look at this single file full of charset definitions: https://github.com/chardet/chardet/blob/f0676c0d6a4263827924...

TZubiri: the issue with this Stallmanian view on IP is that IP predates software and solves an actual issue.I don't think Stallman has a real proposal to how innovation can be incentivized and compensated.Take the example of medical innovations, sure big pharma is bad, but if they don't get to monetize their inventions, how will R&D get funded?If you destroy IP and allow everyone to clone whatever, you will have a great result in the short term, then no one will continue R&D

catoc: They’re looking for AI that’s só good it’s unreal

beloch: Perhaps code licensing is going to become more similar to music.e.g. Somebody wrote a library, and then you had an LLM implement it in a new language.You didn't come up with the idea for whatever the library does, and you didn't "perform" the new implementation. You're neither writer nor performer, just the person who requested a new performance. You're basically a club owner who hired a band to cover some tunes. There's a lot involved in running a club, just like there's a fair bit involved in operating a LLM, but none of that gives you rights over the "composition". If you want to make money off of that performance, you need to pay the writer and/or satisfy whatever terms and conditions they've made the library available under.IANAL, so I don't even know what species of worms are inside this can I've opened up. It seems sensible, to me, that running somebody else's work through a LLM shouldn't give you something that you can then claim complete control over.---------Edit: For the sake of this argument, let's pretend we're somewhere with sensible music copyright laws, and not the weird piano-roll derived lunacy that currently exists in the U.S..

nkmnz: What about the code that wasn't even GPL, but "all rights reserved", i.e., without any license? That's even stronger than GPL and based on your reasoning, this would mean that any code created by an LLM is not licensed to be used for anything.

PaulDavisThe1st: Code created by an LLM cannot, in the USA, be copyrighted. No copyright, no license.

PaulDavisThe1st: US courts have already ruled that in the USA, no machine-generated code can be copyrighted. No copyright, no license, of any type.

PaulDavisThe1st: US courts have ruled that machine generated code cannot be copyright. Ergo, it cannot be licensed (under any license; nobody owns the copyright, thus nobody can "license" it to anyone else).You cannot (*) use LLMs to generate code that you then license, whether that license is GPL, MIT or some proprietary mumbo-jumbo.(*) unless you just lie about this part.

__mharrison__: Licensing is done. Reimplementation will be to easy...

nl: This oversimplifies it.You can't copyright a work that is only generated by a machine: "In February 2022, the Copyright Office’s Review Board issued a final decision affirming the refusal to register a work claimed to be generated with no human involvement"But human direction of machine processes can be copyright:"A year later, the Office issued a registration for a comic book incorporating AI-generated material."and"In most cases, however, humans will be involved in the creation process, and the work will be copyrightable to the extent that their contributions qualify as authorship. It is axiomatic that ideas or facts themselves are not protectible by copyright law and the Supreme Court has made clear that originality is required, not just time and effort. In Feist Publications, Inc. v. Rural Telephone Service Co., the Court rejected the theory that “sweat of the brow” alone could be sufficient for copyright protection. “To be sure,” the Court further explained, “the requisite level of creativity is extremely low; even a slight amount will suffice."See https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

chii: > then no one will continue R&Di would like to see a system of publicly funded R&D.

the_mitsuhiko: > You can't do that as easily with closed source software. Except, if you can read assembly, every program is open source. I suspect we're not far away from LLMs being able to just disassemble any program and do the same thing.I have successfully created a partial implementation of p4 by pointing it at the captured network stream and some strace output. It's amazing how good those things are.

dec0dedab0de: it can be, depending on if it is different enough to convince a jury that it is not a copyright violation. See the lawsuits from Marvin Gaye's family to see how that can be unpredictable.

rmast: I would imagine there must also be some aspect of uniqueness to it as well for even recognizing where a line of code came from… otherwise almost every Python script might have copied this line from a GPL licensed program:`if __name__ == "__main__":`I have no idea where that line first appeared, so figuring out what license it was originally written under would be difficult to track down, and most software only has license info at the file rather than line level.

galaxyLogic: We will need ... software patents!

galaxyLogic: If a recordoing is made in a club, doesn't the party doing the recording have the copyright to that (live) recording, or is it the performers?

fouc: [delayed]

AuthAuth: I have no data to back this up but patent trolling seems to happen far less than companies that already own significant infra/talent ripping products from smaller companies and out competing them with their scale. I'd rather have patent trolling than have Amazon manufacturer everything i launch.The problem with IP laws and the US is that the big companies already do what IP is suppose to protect and the US refuses to legislate effectively against them.

galaxyLogic: And the reason for this is that there is no limit as to how much money corporations can pay for the election campaigns of politicians who make the laws. Right?

mellosouls: Note the Ship of Theseus, while a nice comparison for the title, is not - as the author eventually points out - an appropriate analogy here. A fundamental contribution to the idea of whether the identity of the entity persists or not is the continuity between intermediate states.In the example given and discussed here the last couple of days there seems to be a process more akin to having an AI create a cast of the pre-existing work and fill it for the new one.

jdndbdjsj: Sans contract? Probably like if I take a photo of you holding a copy of a recent book. I own copy right of the photo. The author still has copyright of the book.

benob: It's funny that real value is now in test suites. Or maybe it's always been...

senko: Maybe, just maybe, this whole AI thing could result in us collectively waking up and realizing copyright is entirely unsuitable for software.

fergie: > A court still might rule that all AI-generated code is in the public domain, because there was not enough human input in it. That’s quite possible, though probably not very likely.Its not only likely, it is in fact the current position, at least in the US.

StephenHerlihyy: At what point does the cost of reimplementation shrink below the benefits of obfuscation? Consider a new CVE in Linux. Well maybe my Linux is not the same as the public one. Maybe I just set a swarm of AI agents on making me a drop in replacement that is different but with an identical interface. Same-same but different. Right now writing your own OS to replace the entirety of Linux would be costly and error prone. Foolish. But will it always? What happens when Claude Code Infinute Opus can 1-shot a perfect reimagining in 24 hours? Or 30 minutes? Do all my servers have the same copy or are they all slightly different implementations of the same thing? I dunno.

cubefox: > Unlike the Ship of Theseus, though, this seems more clear-cut: if you throw away all code and start from scratch, even if the end result behaves the same, it’s a new ship.That's not how copyright works. It doesn't require exact copies. You also can't just rephrase an existing book from scratch when the ideas expressed are essentially the same. Same with music.

marcus_holmes: > For me personally, what is more interesting is that we might not even be able to copyright these creations at all. A court still might rule that all AI-generated code is in the public domain, because there was not enough human input in it. That’s quite possible, though probably not very likely.As I understand it, the US Supreme Court has just this week ruled exactly this. LLM output cannot be copyrighted, so the only part of any piece of software that can be copyrighted is that part that was created by a human.If you vibe-code the entire thing, it's not copyrightable. And if it can't be copyrighted that means it is in the public domain from the instant it was created and can't be licensed.

IsTom: If you take AI image (that cannot be copyrighted) and adjust it in photo edition software of your choice then the changes are potentially copyrightable and the resulting image can be copyrighted (you need to ensure that your changes pass the low bar of creativity).It's not clear to me how much code you would need to modify by hand to qualify for copyright this way, but that's not an impossible avenue.

galaxyLogic: Would writing a prompt, or few, for an LLM qualify as "the requisite level of creativity is extremely low; even a slight amount will suffice"

nl: Read the linked report - it discusses this.The short answer is that it's possible if the prompt has sufficient control but only the parts controlled by the human are eligible for copyright.Using AI doesn't automatically disqualify from copyright protection though.

jneen: I mean, it has to be asked... was the source of chardet not in the training set...?

BerislavLopac: Code is one thing, but what about writing? There is no 100% foolproof way to identify content written by LLMs, and human writing routinely gets incorrectly flagged as such. If I write a book, and a checker says that it's written by LLM, is it automatically in the public domain?

radarsat1: This is interesting because I've been considering a similar project. I maintain a package for a scientific simulation codebase, it's all in Fortran and C++ with too much template code, which takes ages to build and is very error prone, and frankly a pain to maintain with its monstrous CMake spaghetti build system. Furthermore the whole thing would benefit with a rewrite around GPU-based execution, and generally a better separation between the API for specifying the simulation and the execution engine. So I've been thinking of rewriting it in Jax and did an initial experiment to port a few of the main classes to Python using Gemini. It did a fairly good job. I want to continue with it, but I'm also a bit hesitant because this is software that the upstream developers have been working on for 20+ years. The idea of just saying to them "hey look I rewrote this with AI and it's way better now" is not something I would do without giving myself pause for thought. In this case it's not about the license, they already use a permissive one, but just the general principle of suggesting a "replacement" for their work.. if I was doing it by hand it might be different, I don't know, they might appreciate that more, but I have no interest in spending that much time on it. Probably what I will do is just present the PoC and ask if they think it's worth attempting to auto-convert everything, they might be open to it. But yeah, the possibilities of auto-transpiling huge amounts of software for modernization purposes is a really interesting application of AI, amazing to think of all the possibilities. But I'm happy to have read the article because I certainly didn't think about the copyright implications.

semi-extrinsic: > And if it can't be copyrighted that means it is in the public domain from the instant it was created and can't be licensed.I don't think this follows? If I vibe code something and never post it anywhere public, I can still license that code to a company and ask them to pay me for using the code?So as a corollary, the business model of providing software where you can choose either free (as in beer) and restrictive license (e.g. GPL), or pay money and get a permissive business-compatible license, will cease to exist.I think that's a shame actually, because it has been a good way of providing software that does something useful but where large companies that earn money from the use will have to pay the software creator.

vbarrielle: The test suite was also licensed under the LGPL. The reimplementation can be seen as a derivative work of the test suite, and thus should fall under the LGPL. This does not even mention the fact that the coding agent, AND the user steering it, both had ample exposure to chardet's source code, making it hard to argue that the reimplementation is a new ship.

ChrisMarshallNY: > slopforksGood term.For myself, I tend to have a similar view as the author (I publish MIT on most of my work), but it’s not really something I’m zealous about.

rmoriz: I know it's a bit off-topic, but https://www.youtube.com/watch?v=DTYnzLbHUHA

graemep: > As I understand it, the US Supreme Court has just this week ruled exactly this. LLM output cannot be copyrighted, so the only part of any piece of software that can be copyrighted is that part that was created by a human.Your understanding is incorrect. The case was about whether an LLM can be an author, and did not whether the person using it can be (which will be the case). https://news.ycombinator.com/item?id=47260110

ketzu: > I can still license that code to a company and ask them to pay me for using the codeI believe you can do that with public domain/copyright free material in general. There is no requirement to tell someone that the material you license them is also available under a different one or that your license is not enforceable.

jagged-chisel: This is the correct understanding. Go back to the self of the monkey. Is the monkey the creator of the photo? Does he own the copyright? No. The photographer who created the opportunity for the monkey to take the selfie is the holder of the copyright on that image.Similarly, the operator of the LLM is the holder of the copyright of the LLM’s output.

philipwhiuk: Meanwhile elsewhere: https://www.theguardian.com/technology/2026/mar/06/uk-arts-m...

LucasAegis: AI is merely a sophisticated tool. If your original thoughts achieve a tangible result through this tool, the ownership should reside with the thinker. Reverse-engineering, in this context, shouldn't be seen merely as an infringement on AI-generated code, but as a violation of the human intellect and systemic design that orchestrated that code. We need to move past protecting 'lines of code' and start protecting the 'intent and architecture' behind them.

littlestymaar: Linux won against the multiple proprietary Unixes because it forced corporations to contribute back instead of keeping their secret sauce for themselves.

davidcollantes: > Right now I would argue that unless some evidence of the contrary could be provided, this can be seen as a new implementation from ground up.Not ship of Theseus, but a "new implementation from ground up.Evidently, the author prefers MIT (https://github.com/chardet/chardet/issues/327#issuecomment-4...), and seems OK with slop-coding.

magicalist: > This is the correct understanding. Go back to the selfie of the monkey. Is the monkey the creator of the photo? Does he own the copyright? No. The photographer who created the opportunity for the monkey to take the selfie is the holder of the copyright on that image.This is incorrect. The monkey is unable to have a copyright on the photograph, but there was no court case suggesting the owner of the camera (Slater) has a copyright on the photo, and the Copyright Office's rules actually say the opposite, that it isn't copyrightable at all (the Wikipedia summary of the situation is good, pointing out they specifically added an example of "a photograph taken by a monkey" to make their point clear).

magicalist: Depending on how you do it and they find out, you could certainly be sued for fraud and misrepresentation, though. And, if you put a "copyright by me" at the top of a public domain work, it's technically a crime under 17 U.S.C. § 506(c) - Fraudulent Copyright Noticehttps://www.law.cornell.edu/uscode/text/17/506#c

latexr: > There is an obvious moral question here, but that isn’t necessarily what I’m interested in.And thus we arrive at the absolute shit state the world is in. We keep putting morality aside for something “more interesting” then forget to consider it back in when making the final point.“Have you tried: “kill all the poor?””https://youtube.com/watch?v=s_4J4uor3JE

rkJahsdg: Ronacher has a startup Earendil that markets itself as a non-profit like OpenAI. He appears with Austrian OpenClaw people.He is totally in on AI and that quote of his is self-serving. Can't we go back to flaming Unicode in Python?

globular-toast: > The motivation: enabling relicensing from LGPL to MIT.Good heavens, that's incredibly unethical. I suppose I should expect nothing more from a profession that has shied away from ethics essentially since its conception.> I think society is better off when we shareMe too.> and I consider the GPL to run against that spirit by restricting what can be done with it.The GPL explicitly allows anyone to do anything with it, apart from not sharing it.You want me to share with you, but you don't want to share with me.

bayindirh: What if the tool needs an amalgam of everything on the internet to barely function and some of this everything has a big red label saying that adding said thing to this amalgam is forbidden for a reason or another?Further, what if this tool can reproduce these forbidden things almost or completely verbatim and the user of the tool has no way to verify it?

LucasAegis: You are focusing on the 'bricks' (the literal lines of code), but your argument overlooks the fundamental reality of Architectural Interdependency. In the era of AI-driven synthesis, we must shift our perspective from linguistic expression to systemic logic.Think of software development as finding a structural path from point A to point D.1.The Foundational Gateway (A → B): You are correct that AI tools are an amalgam of existing data. This foundational layer (A-B) represents the "Prior Art" or the existing IP that serves as a necessary gateway for any further development. If the path starts here, the rights of the original creators must be respected through the established legal framework of Intellectual Property Offices.2.The Innovative Branch (F → D): However, if an orchestrator uses a tool to forge a new path via a distinct architecture (F) to reach the destination (D), that specific "delta" is a unique intellectual asset. Even if the tool "borrows" the bricks, the topological map of the new architecture belongs to the thinker who directed it.3.The Necessity of Cross-Licensing: This is where the true core of IP exists. If the owner of the foundation (A-B) wishes to utilize the superior, optimized results of the new path (ABFD), they must respect the IP of the FD architecture. Conversely, the FD creator must acknowledge the base.We aren't just talking about 'verbatim reproduction' of code; we are talking about the Systemic Design that justifies the existence of IP offices worldwide. The future isn't about "cleaning" licenses through AI, but about a more sophisticated world of Cross-Licensing where the foundational layer and the innovative layer recognize each other's functional logic.

emporas: Porting code from one programming language to another will be one of the most important tasks of code gen A.I.Imagine doing the same with vehicle engines. Less fuel consumption, less pollution, less weight and who knows how many more benefits.Just letting the A.I. do it by itself is sloppy though. The real benefit is derived only when the resulting port is of equal or better quality than the original. It needs a more systematic approach, with a human in the loop and good tools to index and select data from both codebases, the original and the ported one. The tools are not invented yet but we will get there.

erelong: I think getting rid of IP shifts economic focus on to tangible physical goods which you can exclusively own: you can sell the physical medical devices, just not claim a specific design is "yours exclusively"IP has always had awkward things like, what if you discover the sole treatment for a disease and can restrict people from making use of it... kind of weird, especially when people can "independently" draw the same conclusions so they truly obtain an idea that is "their own" but which then they are legally restricted from making use of in such an example

Splinelinus: I'm waiting for AGPL to become AIGPL: If you train a model with some or all of the licensed work, you agree that the weights of that model constitute a derivative work, and further for the weights, as well as any inference output produced as a result of those weights to be bound by the terms of the license. If you run a model with the licensed work in part or in full as input, you agree that any output from the model is bound by the terms of the license.

bored9000: Also, towards the bottom of the page: > Content licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.

rzmmm: Bingo. I can see this is a possible future, and probably desirable scenario for anyone with preference for free software.

Aeolun: I dunno, I’m inclined to think the WTFPL and MIT did more to help open source. And for a while during my youth there was indeed no distinction between publically accessible code and free and unencumbered code.

KaiserPro: > the ownership should reside with the thinker.Assuming that you are a programmer, when you think back to your contract, you will have noticed something like "The employee agrees to that any works created during employment will be solely owned by $company_name"Copyright _should_ be about allowing workers to make money from the non physical stuff that they produce.Google spent many many millions undermining that so they could run youtube, the news service and google books (amongst other things.Disney bought most of congress to do the opposite.At it's heart copyright is a tool that allows you and me to make a living. However its evolved into a system that allows large corporations to make and hold monopolies.Now that large corporations can see an opportunity to cut employees out of the system entirely, they are quite happy with AI companies undermining copyright, just so long as they can keep charging for auto generated content.TLDR: copyright is automatically assigned to the creator of the specific work, not the thinker.ie thinker: "build me a box with two yellow rabbit ears"The text is copyright of the "thinker"maker: builds a box with yellow rabbit ears Unless the yellow rabbit ears are a specific and recognisable of the thinker's work, its not infringement.

roenxi: I see you submitted that as a link, it deserves a lot more than the current 4 upvotes I see. What a fascinating article. It gives me much hope that dead old games are not in fact dead. If there is still a binary somewhere and current trends continue then they can probably be resurrected cheaply and with relatively unskilled people.

sigmar: You can't change the law with a license agreement and redefine what constitutes a derivative work. If that was possible, people could have done it pre-LLMs.also how would you prove it was in the training set? re: your last sentence, the licensed work wasn't in the input in the chardet example ("no access to the old source tree")

ncruces: Agree. But then, the test suite was the input (chardet). So, is the test suite creative or functional in nature? And does the concept of fair use apply globally?

kccqzy: He is the maintainer of chardet. The main topic of the article is the whole LGPL to MIT rewrite and relicense done by this person.https://github.com/chardet/chardet/releases/tag/7.0.0

Aeolun: I think the “I maintained this thing for 12 years” weighs a lot heavier than the “and then I even went through the trouble of reimplementing it” before changing it to a license that is more open. Seriously…

cheesecompiler: Since Linux is GPL this seems to support my point.

glkindlmann: Sure, a license can't create new legal understanding of "derived work", but I think the intent of what Splinelinus said still works: a license outlines the terms under which a licensee can use the licensed Work. The license can say "if you train a model on the Work, then here are the terms that apply to model or what the model generates". If you accept the license, those terms apply, even if the phrase "derived work" never came up. I hope there are more licenses that include terms explicitly dealing with models trained on the Work.Also, for comparison, both GPL and LGPL, when applied to software libraries (in the C sense of the word), assert that creating an application by linking with the library creates a derived work (derived from the library), and then they both give the terms that govern that "derived work" (which are reciprocal for GPL but not for LGPL). IANAL but I believe those terms are enforceable, even if the thing made by linking with the library does not meet a legal threshold for being a derived work.

toyg: You don't even need to go down to assembly - most commercial software is trivial to disassemble calling a few EXEs. In theory this is largely forbidden by licenses, but good luck enforcing them now.

pixl97: >Actually, I think LLMs are a step in the wrong direction if we really want to reach true AI.Any particular reason beyond feelings why this is the case.We already know expert systems failed us when reaching towards generalized systems. LLMs have allowed us to further explore the AI space and give us insights on intelligence. Even more so we've had an explosion in hardware capabilities because of LLMs that will allow us to test other mechanisms faster than ever before.

PaulDavisThe1st: I have no doubt that I was oversimplifying it. The court case that determines whether code written by an LLM in response to various types of prompts has not yet been launched (AFAIK; if it has, it has not yet been decided).But it will be a shitshow either way.

sigmar: Yeah, that's possible, but seems to me more about contract law and creating an EULA for the code, than it is about copyright-derived enforcement. maybe 'copyleft' stuff will move in that direction.it's barely tangential to the topic but worth pointing out, I don't think there's firm legal consensus on your library point, that is just the position of the FSF that that's true. IANAL tho. https://en.wikipedia.org/wiki/GNU_General_Public_License#Lib...

mannanj: I think at the core this is a problem of abuse of the commons and parasitic and extractive behavior being tolerated as a norm.How would I defend myself against hostile entities and societal norms that make it OK to steal from me and my effort without compensation? I will close my doors, put up walls, and distrust more often.That's clearly the trend the world is going towards and I don't see that changing until we find some a way to make it cheaper to detect deception and parasitic behavior along with holding said entities accountable. Since our world leaders have had a history of unaccountable leadership and they are whom model this behavior, I have difficulty seeing the norms change without drastic worldwide leadership change.

PunchyHamster: And same corporations are now pushing BSD license at every avenue just to avoid having to do that.

jFriedensreich: Non-permissive licenses, open core and proprietary software will just not survive. There is no reality in which I or anyone in my community would use something like eg. raycast or the saas email clients that someone locks down and does rent extraction and top down decisions on. The experience of being able to change anything about the software i use with a prompt while using it is impossible to come back from to all the glitches, limitations and stupidities. we have to come to terms with infinite software.

jason_oster: I did not say it was unimportant. I said it was not the only important factor.

jason_oster: You certainly made the case that the GPL was the only force, or at least ignored the contribution of alternative licenses.I also wouldn't agree that proprietary software is in decline. There are niches where the OS is proprietary, mobile apps and games are almost entirely proprietary (and that is not changing any time soon). But the most damning problem is that all computer hardware now has multiple layers of subsystems with proprietary software components, even if the boot loader and beyond are ostensibly FOSS.My take on the cause of proprietary software is "the bottom line". Companies want to sell products and they believe that it's easier to sell things that are not open source. Meanwhile, there are several counterexamples of commercial products that are also open source (not necessarily copyleft), including computer games. The cause of whatever decline you're seeing in proprietary software dominance is unlikely to be the GPL.

jagged-chisel: I was indeed misremembering part of this.The professional photographer claimed he engineered the situation that led to the photo and thus he owns the copyright on the images. That specific claim appears to not have been addressed by the court nor by the copyright office. Instead Slater settled by committing to donations from future revenue of the photos.

observationist: If it were a trained monkey, and the photographer held a button in his hand that triggered the photo taking mechanism, there'd be no question of copyrightability. Similarly, vibe-coding and eliciting output from a software tool which results in software or images or text created under the specification and direction and intent and deliberate action of the user of the tool is clearly able to be copyrighted.The user is responsible for the output of the software. An image created in photoshop isn't the IP of Adobe, nor is text in Word somehow belonging to Microsoft. The idea that because the software tool is AI its output is magically immune from copyright is silly, and any regulation or legislation or agency that comes to that conclusion is silly and shouldn't be taken seriously.Until they get over the silliness, just lie. You carefully manually crafted each and every character, each pixel, each raw byte by hand, slaving away with a tiny electrode, flipping each bit in memory, to elicit the result you see. Any resemblance to AI creations is purely coincidental, or deliberate as an ironic statement about current affairs.

greyface-: Copyright is positive law created by humans, not natural law that we happen to recognize. The idea that adopted legislation or established caselaw can be wrong about what copyright fundamentally is makes no sense.

jason_oster: This confuses the economics of open source. It's easier to contribute changes upstream than maintaining a fork. A smart business decision is using permissively licensed software that is maintained by other teams (low maintenance cost) while contributing patches upstream when the need arises (low feature cost).Bringing a fork in-house and falling behind on maintenance is a very bad idea. The closest I've ever come to that in industry was deploying a patch before the PR was merged.

littlestymaar: Proprietary Unixes were literally that at the scale of an entire OS.

observationist: Not what I'm saying - if you meet the technical, intentional definition of a process, substantiated by precedent, then the law should support any variation of the process which has those same technical features meeting the definition.Using AI as a tool to produce output, no matter how complex the underlying tool, should result in the authorship of the output being assigned to the user of the tool.If autocorrect in Word doesn't nullify copyright, neither should the use of LLMs; manifesting an idea into code and text and images using prompts might have little human input, but the input is still there. And if it's a serious project, into which many hours of revision, back and forth, testing, changing, etc, there should be absolutely no bar to copyright.I can entertain a dismissal based on specific low effort uses of a tool - something like "generate a 13 chapter novel 240 pages long" and seeing what you get, then attempting to publish the book. But almost anything that involves any additional effort, even specifying the type of novel, or doing multiple drafts, or generating one chapter at a time, would be sufficient human involvement to justify copyright, in my eyes.There's no good reason to gatekeep copyright like that. It doesn't benefit society, or individuals, it can only benefit those with vast IP hoards and giant corporations, and it's probably fair to say we've all had about enough of that.

casey2: Pretty simple, if the model was trained on GPL or any copyleft then the output is copyleft (in whole or in part!) you just have a really long preprocessing step before hitting compile.

ndsipa_pomu: No, lawyers will want software patents as that's the only group that would benefit from them, apart from large litigation-happy companies that want to squash any competition.

galaxyLogic: Not sure I can follow your reasoning. Wouldn't the developer of the software who got a patent for an invention embodied in the software she developed benefit as well?

ndsipa_pomu: Not if the developer is employed at the time as contracts will usually mean that the company owns the patents, even if the developer was working on their own time.The bigger issue is patent abuse - file or buy a few poorly specified patents and then use them along with litigation to shut down competitors. This generally leads to bolstering the bigger companies at the expense of smaller companies due to the costs of litigation.

muyuu: i find his arguments on re-licensing blatantly AI-plagiarised libraries down to API compatibility confusingthey are arguments against any licence not just LGPL, I could literally plagiarise all his work, claim it's mine "clean-room" and not give him as much as a mention, by his own logicand in his own words, he's "not interested" about the morality of itodd

just6979: I think the reimplementation in question rubs people the wrong way because of the intentions of parties on both ends and the ignoring of one of them by the other (erasure of, from some POV). The original author of the code obviously chose the license they did intentionally (copyleft "keep it open" reasons, seemingly). And the the rewrite author has their intentions as well (unknown beyond "less restrictions on derivative"). The problem comes when those intentions conflict, and in this case the rewrite author basically just ignored the usual convention to resolve the conflict, which is forking or just starting a new project. Claiming "I've maintained it for a while so I can do whatever I want" is kinda gross because is just completely overrides the original authors' intention. They're basically saying "my intentions as maintainer are more important than the creator's", and that doesn't feel even. The "is it a real clean-room" due to prior exposure due to LLM training and working on the codebase is always going to be contentious. But "should I override erase someone else intentions?" question is easy to answer. No. Especially since we have come up with so many ways to make it easy not to (forking is practically free, the abstraction of APIs is powerful, etc).It also just feels a little nefarious. There isn't much reason to change between those licenses in question beyond to allow it to be more tightly integrated into something commercial and closed-source. In which case, having an LLM write a compatible rewrite _in a new project_ seems reasonable at the current moment in time. It's this intentional overriding of the original intentions, seemingly _for profit_ as well, that is the grossest part, because the alternatives are just so easy and common.

cheesecompiler: > You certainly made the case that the GPL was the only forceNope.

marcus_holmes: If the code has been entirely a product of an LLM, you don't have copyright so you can't license it. Copyright is only applicable to human creativity, so you can only copyright the bit of the product that was created by a human. And all licensing derives from copyright.There might be a path to this business model via Trade Secrets (you register your source code as a Trade Secret, and sell only binaries).And, of course, you can still sell support as the paid-for service, which has worked for a lot of people.

prmph: Technically how will vibe code be identified? And how does one determine the level of human involvement that would make code copyrightable? What of the prompts? Are those copyrightable? What about the architectural and tactical design of the code if I do those myself?I don't vibe code; I am firmly in charge of the architecture and code style of my projects, and i frequently give detailed instructions to AI tools I use. But, to me, this is leading to a weird place. Why would the result of using a tool to create something new not be copyrightable simply due to the specific tool used?I think this whole hullabaloo is self inflicted. Code or an other creative work should stand on its merits. There is no issue with copyright and no issue with the ship of Theseus. The current copyright approach is still applicable: code (or any other creative work) that appears to be lifted verbatim from another work could be a copyright violation. Work that is sufficiently original (irrespective of how it was created) is likely not a copyright violation.

marcus_holmes: It's the courts' opinions that count. And they say that copyright only attaches to human creative work, and that does not include LLM output.I can see there's going to be some huge court fights over this in the next ten years - there's no way some of the big media companies are going to be OK with their content being public domain, and no way are they going to just miss out on being able to produce it so cheaply with an LLM.

marcus_holmes: Cory Doctorow (and almost every other source I'd found online commenting on this) disagrees with you.https://pluralistic.net/2026/03/03/its-a-trap-2/Quoting from that post:> At the core of the dispute is a bedrock of copyright law: that copyright is for humans, and humans alone. In legal/technical terms, "copyright inheres at the moment of fixation of a work of human creativity."

marcus_holmes: Really good question.My understanding is that only human creativity can be copyrighted. So if you sketched out the plot and got the LLM to write all the words, then only the plot is copyrightable. So someone else can copy all the words, as long as they don't copy your plot.However, as you point out, someone has to determine which bits the LLM created and which bits you created. If you wrote the whole book, and a tool incorrectly flags your writing as LLM writing, and then someone copies chunks of your book because they believed the tool and assumed they could (and assuming you filed a DMCA claim and they denied it using the tool's output as proof) then there's going to have to be a court case.I suspect there's going to be a few court cases about this.

BerislavLopac: > only the plot is copyrightableBut the plot can't be copyrightable, as the copyright applies only to a tangible representation of an idea (e.g. written text), and not to an idea itself.

marcus_holmes: Your sketch of the plot is copyrighted then.I think there are going to be a quite a few court cases thrashing this out to its conclusion

Something related, but different, happened with chardet. The current maintainer reimplemented it from scratch by only pointing it to the API and the test suite.

coldtea: >Something related, but different, happened with chardet. The current maintainer reimplemented it from scratch by only pointing it to the API and the test suite.Only "pointing it". But the LLM, who can recite over 90% of a book in its training set verbatim *, would have also have had trained on the original code.Maybe "the slop of Theseus" is a better title.* https://the-decoder.com/researchers-extract-up-to-96-of-harr...

the_mitsuhiko: Maybe, but the LLM did not recite the chardet source code so that argument does not appear to apply here.

logicprog: Also from that exact same study (why not cite the actual study? It's quite readable) the LLMs couldn't recite more than a small fraction of many other books, often ones just as well known[0] — in fact, from the bar charts shown in the exact news article you cited, it's pretty clear that Sonnet 3.7 was a massive outlier, and so was Harry Potter and the Sorcerer's Stone, so it really seems to me like that's an extremely unrepresentative example, and if all the other LLMs couldn't recite even a small fraction of all the other books except that one outlier pairing, despite them being widely reproduced classics, why would we expect LLMs to actually regurgitate regularly, especially a relatively unknown open source project that probably hasn't been separately reproduced that many times?Not to mention the fact that, as the other commenters mention, that appears to just... not have happened at all in this case, so it's a moot point.[0]: https://arxiv.org/pdf/2601.02671

AI And The Ship of Theseus | Armin Ronacher's Thoughts and Writings

More from lucumr.pocoo.org

Absurd In Production | Armin Ronacher's Thoughts and Writings

Mario and Earendil | Armin Ronacher's Thoughts and Writings

Some Things Just Take Time | Armin Ronacher's Thoughts and Writings

Some Things Just Take Time | Armin Ronacher's Thoughts and Writings

Discover

Scientific datasets are riddled with copy-paste errors

Recovering Windows Live Writer files by Ben Overmyer

Got an Old Kindle? It Might Not Work Anymore. Here’s What to Do. | Reviews by Wi…

A Brief History of Fish Sauce

아빠의 뇌: 아버지됨이 남성의 마음을 어떻게 재구성하는가 | GeekNews

Reminder: enable ZRAM on your Linux system to optimize RAM usage (and potentiall…