Discussion
mondainx: Get ready for some dope code... ;)
maxloh: Context: https://github.com/orgs/community/discussions/188488
kristianp: What's a good alternative for free private repos?
Imustaskforhelp: I would've recommended codeberg but codeberg isn't the finest to be recommended for free private repos.I definitely feel like more can be done within this space and that there is space for more competitors (even forgejo instances for that matter)
eblume: I've recently started hosting my own forgejo instance. It works so well! Free tailscale for connectivity. I expose mine over fly.io proxy, also free, but not to be done without caution.
kepano: I've been saying this since 2023> If your data is stored in a database that a company can freely read and access (i.e. not end-to-end encrypted), the company will eventually update their ToS so they can use your data for AI training — the incentives are too strong to resisthttps://news.ycombinator.com/item?id=37124188
sethops1: When Louis Rossmann started describing tech leadership as having a "rapist mentality" I brushed him off as being sensationalist. But actions like this make me think more and more he's right. The product managers pushing for changes like this are despicable scum.
starkeeper: So now CoPilot will be EVEN better at writing viruses, worms and malware!
lanxevo3: To be precise: the opt-out is for GitHub Copilot training specifically, which has always required opt-in for public repos under their policy. The change Apr 24 is about private repos being included by default unless you opt out. If you're using Copilot in your private repos, definitely opt out unless you're comfortable with that. The setting is at github.com/settings/copilot — takes 30 seconds.
jokoon: weren't they already using repos for training?
martinwoodward: No we won’t. Details here https://github.blog/news-insights/company-news/updates-to-gi...For users of Free, Pro and Pro+ Copilot, if you don’t opt out then we will start collecting usage data of Copilot for use in model training.If you are a subscriber for Business or Pro we do not train on usage.The blog post covers more details but we do not train on private repo data at rest, just interaction data with Copilot. If you don’t use Copilot this will not affect you. However you can still opt out now if you wish and that preference will be retained if you decide to start using Copilot in the future.Hope that helps.
conductr: Just spitballing, don’t use these tools myself, but isn’t this something that should be encrypted to really prevent them from training? I personally don’t trust anyone with my data when they pivot to building AI products yet claim my data wasn’t a part of that strategy. It’s too easy to hide/lie.
piersj225: I've not tried this, however https://github.com/AGWA/git-cryptApparently someone has developed something similar to this
jollyllama: It's not clear to me what happens to personal repos if you're getting Copilot for work, or where to disable it there.
djsavvy: yeah, how can I view the settings on my own personal account if my employer is managing the copilot settings?
sebastiennight: GitLab would be a good bet here. We started on their free tier and used that for a couple of years, I was very happy with it. Not sure how the tiers might have evolved since.And according to their PM and privacy policy, they're not training their models on your code[0].[0]: https://forum.gitlab.com/t/can-i-opt-out-from-my-code-being-...
13415: It is the feature "Allow GitHub to use my data for AI model training" that needs to be disabled. Right?Or am I missing some trick / dark GUI pattern? Just want to make sure.
shell0x: Shouldn’t this be “Tell HN”?
yonatan8070: How do I opt out of this for my own private repos? I don't see anything related to this as I've got a ton of settings for Copilot itself (I have access to Copilot through my work org)
forthac: I believe it is under:Settings->Copilot->Features->Privacy=>[ Allow GitHub to use my data for AI model trainingAllow GitHub to collect and use my Inputs, Outputs, and associated context to train and improve AI models. Read more in the Privacy Statement. ]
Sohcahtoa82: I wonder how effective it would be to sabotage the training by publishing deliberately bad code. A FizzBuzz with O(n^2) complexity. A function named "quicksort" that actually implements bogosort. A "filter_xss" function that's a no-op or just does something else entirely.The possibilities are endless. I thought of this after remembering seeing a post a couple months ago about how it doesn't take a significant amount of bad data to poison an LLM's training.
cj: Pro tip: sign up for the business/enterprise version when reasonable in price.I do this with Google Workspace. You can also do it with GitHub.(Google doesn’t train on Workspace, Github doesn’t train on business customers, etc)
margalabargala: > Google doesn’t train on Workspace, Github doesn’t train on business customers, etc...yet
jamie_ca: https://github.com/settings/copilot/features, it's near the bottom "Allow GitHub to use my data for AI model training"
landl0rd: This headline is false; it will not go take your private repos and dump them into a training dataset. Rather, GitHub will train on your copilot interactions with your private repos. If you do not use copilot, this makes no difference to you, though you should probably still turn it off.
thot_experiment: Probably don't reward extortion with money.
bonestamp2: [delayed]
throwuxiytayq: It's not a pro tip if it only fucks you over slightly later. How's the weather in Stockholm?
doubled112: Even the way modern software phrases questions is rapey.Imagine a man asking a woman “want to have sex? Or maybe later?” out of the blue, then asking her again every 3 days until she says “yes”
chuckadams: Something like "tea and consent": https://www.youtube.com/watch?v=pZwvrxVavnQYeah, it ain't sex, but it does still come down to basic respect.
contingencies: Thank you.
andoando: Thats still pretty bad. Its no longer private if all your code goes through LLM training set and is resurfable to everyone publicly.Why would I ever use copilot on any code Id want to be kept private? Labling it a private repo and having a tiny clause in the TOS saying we can take your code and show it to everybody is just an upright lie
parsimo2010: Jokes on them, my private repos are total dog dookie. If nobody but me can see the code then I don't have to worry about style, structure, comments, or any other best practices.You don't want an LLM trained on my private repos. Trust me.
forinti: Poisoning LLMs is an interesting path of resistance.
grepfru_it: Back in my day someone would post a HN article to the internal slack in order to sway conversation in their favor. Glad to see its still happening! :D
SirensOfTitan: Right, but it shouldn't be opt-out only to begin with. It's a dishonest pattern that relies on people not noticing. Honest use of data is a "Caesar's wife must be above suspicion" moment for me -- if this is how you're acting when engaging with customers explicitly, I don't trust you to resist the temptation to tap into my data privately. AI companies already have trained their models illegally against the intellectual property of all of humanity with little consent along the way.Honestly, if you work at GitHub, maybe you should focus on your uptime -- it's awful.
worble: Pro tip: You could instead spend that money to spin up a forgejo instance for as little as $2 a month https://www.pikapods.com/apps#development (not affiliated, just a happy customer)Please don't reward these companies with money.
rrgok: I'm gonna put a license fee on all my repos. 10% of revenue if my private repos have been used for AI training. 5% on all my other repos.
jffry: It's unnecessarily splitting hairs.> interaction data—specifically inputs, outputs, code snippets, and associated context [...] will be used to train and improve our AI modelsSo using Copilot in a private repo, where lots of that repo will be used as context for Copilot, means GitHub will be using your private repo as training data when they were not before.
munk-a: Probably extremely ineffective, it's an issue of scale and unless you really automate the terrible code generation and somehow manage to make it distinct enough in style that it isn't easy to detect and eliminate wholesale then you just won't have the volume to significantly impact the result set.I'm absolutely sure that there are state actors with gigantic budgets that are putting a lot of effort into similar attacks, though.
aduwah: I will join the club. +1 for ruining M$ AI with my garbage code
mrits: Thanks for confirming you train on our data
mememememememo: Yes I think you are right. Even a super ethical company can be taken over. There may be exceptions but it is more luck. I work for a SP500 that absolutely won't dont this and locks down prod access so a rogue staff can't do it. But if Larry or Zuck or Bezos buys them out, who knows.
miohtama: Microsoft would never do this(-:
endofreach: How did people forget that github was purchased by that one company?
kace91: How's the codeberg experience nowadays? I think it's finally time to switch for me.
moralestapia: Thank you for your service. We really need more "canaries in the mine" giving out early warnings of things that might not be evident on a first glance.Any takes on what 2029 will look like? (related to this topic, ofc)
hirako2000: [delayed]
groby_b: Github's enterprise version "starts at" $21.99/seat, and requires you to "contact sales".And I don't see any mention that that exempts you from being trained on. (Yes, the blog says you're still covered, but at that price I'd like to see a contract saying that)
jpcrs: Good luck to them, my private repos are probably some of the worst code humanity has produced.
rakel_rakel: I'm looking forward to the class action lawsuit, even if only to establish a precedent!I don't have much hope, but I wish that ignoring software licensing and attribution at scale becomes harder than it currently seems.
rrgok: They would've done the math. Even with a class action they will come up positive. It just another bill for them.
tartoran: If you opt out Github will probably still train on your private repo. Just migrate.
moralestapia: Is this the case even if you're a paid customer?If so, this might be illegal.
ChadNauseam: The situation you describe has dynamics that don't apply when your windows laptop is trying to get you to install an update. A woman can't have 100% confidence that saying no won't trigger a man into rage, so just the question being asked at all is already a bit unpleasant. WinRAR trying to get me to buy a license is not as offensive because I know it won't beat me up for saying no.
doubled112: Of course. Claiming this is a 1:1 would be wrong.However, do you think people accept Microsoft backup because they want a backup?Or do you think they click yes because it makes the popup go away for good?Wearing me down until I say yes isn’t the same as just yes.It’s the same dark pattern for the 10-11 upgrade. My father in law managed to upgrade by accident because it kept popping up. He didn’t really make an informed choice for himself. One day he just couldn’t figure out why everything was different.
munk-a: The only setting I'm seeing is on a per-user basis. Does anyone know how to blanket disable training on an organizational basis?Is there any information about how much information from an organization managed repo may be trained on if an individual user has this flag enabled? Will one leaky account cause all of our source code to be considered fair game?
Jabrov: Can't you just make it opt-in?No? Because no one would opt-in, you say?Wow. It's almost like this is a user-hostile feature that breaks the implicit promise behind a "private" repo.
wswope: What a wildly disingenuous take. Speaking earnestly from one human to another: your behavior and work is shameful, and you should feel embarrassed by your actions, Martin.You’re laundering the code of users who don’t opt-in through Copilot users who do, to read in as many LoC as possible. It’s clear as day to everyone not morally bankrupt.
hedayet: To Github's credit, they have been showing a banner consistently. To my discredit - I never bothered to read that banner until I saw this HN headline
jmward01: I've never seen the banner. Where does this show up?
arcanemachiner: It's been on top of the web UI for 2 or 3 days now.You might have closed it...Just go to your account settings and find the opt-out option.
Supermancho: Gitlab?Microsoft services are tech debt. I moved the moment they were acquired and never regretted it.
nottorp: I opened gitlab.com and it starts with"Finally, AI for the entire software lifecycle."Not very trust inspiring, that.Can I even have git hosting without anything else being crammed down my throat, or it's just like Microsoft?
happytoexplain: As others have pointed out, this is somewhat dishonest. Which is depressing, if you represent GitHub.
chistev: I don't like when people make sarcastic remarks and sign off in a way that indicates it was sarcasm. It kills it for me. Lol.Like using that /s or using that smiling emoji sign you used.A good joke would land even if some other people miss it because of the text format."Microsoft would never do this" would have landed for me.
darthoctopus: subtlety is dead on the internet of the lowest common denominator, and that enabled by AI assistance is very low indeed
slowhadoken: I’m still concerned about MS using the code I write on my laptop to train AI. Tinfoil hat wearing Linux users are starting to make a lot of sense to me.
chistev: Now this is sarcasm. Lol
jmward01: They just lost my repos. I can not believe they snuck this in. My level of anger right now is far higher that I ever wanted to feel. I went to API access for anthropic, paying more in the process, to avoid them training on my code. And GH just -adds- this, without telling me? Without a prompt. They are dead to me.
ares623: make sure you opt-out anyway before deleting your account. they'll probably train on some archived version if it sees your profile didn't opt-out at some point.
gverrilla: honest question: is there any realistic mechanism that will make them accountable if let's say they just train on 100% of repos without regards to opt-ins? I operate under the premise these tech companies can do whatever they want and there's very little oversight.
roegerle: right up top. I'm not sure how anyone could miss it.
gortok: This is a distinction without a difference, according to the text of that enable/disable dialog,> Allow GitHub to use my data for AI model training: Allow GitHub to collect and use my Inputs, Outputs, and associated context to train and improve AI models. Read more in the Privacy Statement.“Associated Context” is the repo. If I use copilot, I’m giving it access to my repo.I don’t know in all the ways copilot can be triggered, and I’m not certain that I could stop it from being triggered, given Microsoft’s past behaviors in slapping Copilot on everything that exists.
AndrewKemendo: I started self hosting my own git on a digital ocean droplet with Gitea (1). It’s been unbelievably fantastic and trivially easy to manage experience and I can make them public and invite contrib ans do integrations … I see zero downsidesI see no reason to ever go back to holding my code elsewhere.Don’t forget git is fairly newWhen I first started doing production code it was pre-github so we used some other kind of repo management systemThis is a perfect example of where the they’re starting to cannibalize their base and now we have the ability to get away from them entirely.(1) https://about.gitea.com/
i7l: Thanks for flagging this!
layer8: Note that “flagging” has a specific meaning on HN.
_pdp_: Rather than defending this absurd decision, GitHub could instantly win back trust by admitting they f*** up and reversing it entirely.If they want to incentivise people to contribute their sources and copilot sessions, they could easily make it opt-in on a per-repository basis and provide some incentive, like an increased token quota.This is not hard.
qaadika: > https://github.blog/news-insights/company-news/updates-to-gi...> Should you decide to participate in this program, the interaction data we may collect and leverage includes:> - Outputs accepted or modified by you> - Inputs sent to GitHub Copilot, including code snippets shown to the model> - Code context surrounding your cursor position> - Comments and documentation you write> - File names, repository structure, and navigation patterns> - Interactions with Copilot features (chat, inline suggestions, etc.)> - Your feedback on suggestions (thumbs up/down ratings)"should you decide to participate.."??? You didn't ask if I wanted to participate. You asked if I didn't.I didn't get to decide to participate. I had to decide not to. You made me do work.
sedatk: I have an individual GitHub Copilot Pro subscription and also am a member of an Enterprise account that has one of its GitHub Copilot Business seats assigned to me. The opt-out setting doesn't appear on my individual profile anymore. However, I want to be able to use individual GitHub Copilot subscription for my individual work, and it seems like I can't do it anymore as Enterprise has taken over all my preferences. What a mess.
nottorp: How does that help if you don't go to the github site but just use git from the command line?
lkbm: They also sent an email.
languid-photic: Appreciate the clarification. But, it's still not great.To the PM behind this - developers are sensitive to this kind of thing. Just make it opt-in instead?
dotancohen: > takes 30 seconds. No, it takes an hour of perusing HN every day to stumble upon this. That's 20 hours per month, 240 hours per year, shall I bill it to GitHub or to Microsoft directly?Corrupting Steinmetz' quip to Ford: it's 30 seconds to flip the switch, 240 hours to know that a switch needs to be flipped.
bigstrat2003: I use Fossil for mine. Dead easy to set up, and while the workflow might not be great for public contributions like Github is, that doesn't matter on something where I'm the only user.
mrled: I'm curious about specific consequences of this. I tend to think the importance of code secrecy has always been exaggerated (there are specific exceptions like hedge fund strategies and malware), even more so now in this post-Claude world. Does anyone have specific things they're trying to avoid by opting out of this?
b112: It's not tinfoil, it's aluminum foil. I.. I mean, I heard it's that.
DougN7: I thought that’s more what the CoPilot change is really about - not your repo, but all the code CoPilot read while it is offering helpful completions, etc - so literally the code on your laptop. I cancelled my account.
tptacek: No it isn't. Most people don't use Copilot, so this term change won't effect most people. You can reasonably be unhappy about it anyways (or unreasonably still be using Copilot in 2026), but it's still ultra-useful information for them to add to the discussion.
pistoriusp: I don't use copilot, but somehow was subscribed... I probably clicked something long ago and it just remained active.
irishcoffee: I am aware of CUI data hosted on github by corporate entities. You’re saying you’ll essentially violate the entire point of CUI?That’s fucking terrifying.
ziml77: Thanks for the clarification. The OP here made me think I missed something in both the blog post about the change and in the available settings.
arcanemachiner: Or, they don't train on it, but who's to say they're not harvesting analytics which may or may or not code samples, prompt data, etc. Which are then laundered through some sort of anonymization pipeline, to the point where they can argue that it no longer qualifies as your data, and can be freely trained upon.Conspiratorial thinking? Sure. But if you've been around for a couple decades and seen the games these people play (and you aren't a complete sucker), then you'll at least be aware that there's at least slight possibility that these companies can get things from their customers that they (the customers) did not knowingly agree to.
schubidubiduba: Nothing conspirational about it. Getting data that their users or customers don't actually intend to give is the bread and butter of these companies. And they will do what they can to get it.
bonestamp2: Thanks for the heads up, I assumed they had already done this with my data.
seanw444: Probably did. Now comes the legal ass-covering.
input_sh: They "gift you" a free standard plan if you have above a certain (non-transparent) level of stars, I don't think you can even disable your "subscription" if you get it for free.
_pdp_: Copilot, or "chat with Copilot" is a button that is available on every page right next to the search bar.I don't have to be a Copilot user to click on it.This change is malicious, and it doesn't only affect Copilot users. It affects everyone on the platform!
Lio: An enterprise licence won't save you, Google, Microsoft, et al have happily broken copyright laws for years.If the publishing industry can't win a case AI firms, including Google and Microsoft, then you don't stand a chance when you find out they actually have trainded on your private data the whole time.
gchamonlive: [delayed]
hexage1814: If you opt out... they will also train on your private repos.
martinwoodward: It wasn’t previously opt-in.Previously we didn’t do any training on usage. However as other products have come into the market they do train on usage. We’ve been training on our internal usage for just over a year and have seen some major improvements. For details see of the types of improvements we’ve seen from training on our internal usage check out this article: https://github.blog/news-insights/product-news/copilot-new-e...
homebrewer: You can always ask your parent company to train on their usage. I hear they have incredibly massive codebases: Windows, Office, MSSQL, which stay out of training data for some reason.I thought neural nets never repeat the training data verbatim, and copyright does not pass through them, so what's the problem?
IcyWindows: Who said they don't?
wilsonjholmes: At least they are finally being honest about the direction of the business. I have thought for a long while that they were already doing this and just not telling anyone...
i7l: 10-4.I meant it in the sense of "bringing it to our collective attention."
NewsaHackO: How do you know that isn't already the case?
GMoromisato: I'm sure this is just me, but I don't mind if AI trains on my public or private repos. I suspect my imagination is just not good enough to come up with downsides.So far it's been a benefit because coding agents seems to understand my code and can follow my style.I don't store client data (much less credentials) in my repos (public or private) so I'm not worried about data leaks. And I don't expect any of my clients to decide to replace me and vibe code their way to a solution.I do worry (slightly) about large company competitors using AI to lower their prices and compete with me, but that's going to happen regardless of whether anyone trains on my code. And my own increases in efficiency due to AI have made up for that.
JonChesterfield: Any computer you have ssh access to.
nottorp: Did they? Not to me, and I have a 'review this new sign in' from 4 days ago so them emailing me works.
JonChesterfield: Don't give your code to Microsoft if you don't want them to have your code.This setting will make no difference to whether your code is fed into their training set. "Oops we accidentally ignored the private flag years ago and didn't realise, we are very sorry, we were trying to not do that".
nitrogen99: So? It’s not like some human is spying on you. This is just code. Relax.
uwagar: why all u programmers cant make ur own website and host ur own git severs?
pokot0: while I agree, I understood this is only when you use copilot? if not, their communication is very misleading
layer8: In the EU, opt-out is not a legally valid way to obtain the necessary consent. How do you plan to handle this?
x0x0: For personal data. I don't believe you can reasonably claim code is personal data any more than a hammer is your personal data.
qaadika: It's been interesting the past year or so watching myself turn more and more into one of the tin-foil wearing linux users. I'm not sure how it happened, but self-hosting became more and more alluring and hyperfocusing on taking as much data as I can offline became worth spending entire weekends on.I didn't become paranoid, everybody else didn't!
ClikeX: How do you handle accounts that have copilot managed by an organisation? I've seen several cases where people cannot opt out their account because of the org connection (the option just isn't there in the settings). What happens to their account the moment they leave that org?
booi: probably by paying the fine and doing it anyway
dotancohen: $ git pull $ vim foo.rs $ git commit $ git push That's how.
jmward01: exactly this. I rarely need to go to the site.
pverheggen: Isn't this pretty standard, using your interaction data for training and making it opt-out? Claude Code, Codex, Antigravity etc. all do the same. Private repo doesn't make a difference as they have a local copy to work from.
ekjhgkejhgk: You're right, of course, and I find it frustrating that people are so thick as to not see your claim as obvious.Stallman is always right.
leej111: Based
johndough: Under GDPR, opt-out is not considered informed consent, and repositories can contain personally identifiable information, which fall under GDPR. Do you think differently, or do you think ignoring the law will be worth it?
ekjhgkejhgk: I think this kind of nuance is useless or even harmful. That might be how it is now but they'll change it when you're not looking.You see coders have this reasoning flaw where they go "Oh I've understood the system, now I can work out all the ramifications of my actions", and then they get tricked at every step of the life.
jawilson2: Algorithms and models for a proprietary trading system? My personal notes? The latex text of my phd thesis?I will go screaming and kicking and fighting into this dystopian nightmare post-privacy shithole world that so many people seem fine with. If I have to move off of every service or technology to maintain some semblance of privacy so be it.
mrled: Well, mostly I was thinking about code, and aside from the specific exceptions of trading algorithms (which I was trying to get at when I said hedge fund strategies), and now PhD theses (good point, at least if you're talking pre-publication), I'm still having trouble understanding the threat model even if AI did train on most proprietary, private business code. Can AI training on a CRUD app's code damage a business?And I have the same question about private notes, or even a diary. Can an AI training on a bunch of personal stuff damage the person that wrote it?Do you really keep trading algorithms on github?
encrypted_bird: Or, alternatively, self-host a gitea instance!
kriops: No. Money-grab incoming. Use forgejo.
uberman: If even one person in a repo does not disable this will copilot have full access to the repo? How can I determine if other members of my team have turned this off or not?
hirako2000: The same way you can't determine whether a team member pulling the repo dumped the code into a prompt.It's convenient for MS to make this opt in by default for sure.
elAhmo: It’s not convenient, it is a deliberate decision.
elAhmo: Defaulting to opt-in is a malicious move, no matter how you present things.
_pdp_: So you will train on data collected from free users working on GPL and copyrighted projects?
DougN7: And on users that don’t even use github, other than the required account to use CoPilot in Visual Studio.
_pdp_: Exactly.This affects anyone using VS Code or Copilot with proprietary data, including all the users automating workflows through the Copilot SDK and the like. A perfect storm.Did anyone from GitHub's legal team actually authorise this, or did they use Copilot to sign off on it?
shamelessdev: This is the exact reason I vibe coded “artifact”.Not for commercial success, just wanted a git and github like experience for my new game project.Then I started getting into features specific to game dev like moving away from LFS and properly diffing binaries.paganartifact.com/benny/artifactMirror: GitHub bennyschmidt/artifact
worik: > Stallman is always right.Not really. Almost always right....
johndough: Code often contains personal data. Here are over 400 files on GitHub with email addresses:https://grep.app/search?regexp=true&q=%5Ba-z%5D%7B8%2C%7D%5C...For example, license files often contain names and many package managers require a contact person.
buildbot: >Hope that helpsHonestly, what the fuck? This changes was already pretty bad but this being the apparent corporate response is insane.Done with Github and Microsoft after this. Just disgusting how little you care for users, ethics, or morals.
piekvorst: Personally, I don’t mind. Train however you want.
gafferongames: If you guys didn't already realize that Microsoft was a garbage company in the 90s I really don't know what to say...
Uhhrrr: Put an ORM in your private repo which randomly 1% of the time calls DROP TABLE.
akerl_: Again, this collects usage data. If you click the button by accident and don’t interact, they get no data.
_pdp_: So? This feature is available to everyone and you have zero idea how many people actually use it.If I go to one of your GPL projects and I ask a simple question to find out what this project is about, you will be perfectly "ok" that this interaction (that includes most of the code that is required to answer my dumb the question) will be used for training?This is not ok.
jamiek88: About technology.About communication with other humans he’s pretty much always wrong.Imagine we’d had a better communicator who wasn’t a gross toe nail picking troll fronting free software? It shouldn’t matter. Only the ideas should matter . But the reality is different.
itsdesmond: He argued against EU proposals for ISPs to filter CSAM on the basis of protecting free expression. Not always right about technology, either.
tryauuum: Mass scale internet censorship in Russia also started with the premise of "protecting the children"When you put in law that ISPs should adhere to some government-provided blocklist, this is already a game over. No matter how sane your government is. The government in 10 years might be vastly different, and the ability to control the ISPs is too alluring to not abuse
fph: Can you use git's Copilot from the command line? If you can't, then you have nothing to opt out from.
layer8: Every Git commit is likely to contain personal data, in the form of the author’s name and email address usually present in a commit’s metadata. Furthermore, unless GitHub is prohibiting users from submitting personal data via their ToS (which, given the above, would be impractical), the only thing that matters is whether the data in fact contains personal data or not. GitHub cannot just assume that it doesn’t. And processing that data for new purposes requires user consent.
fph: By that logic, you can't use any user input to train an LLM, because what if they decide to write their own name.
layer8: Indeed, you can’t unless you have appropriate consent.
Esophagus4: There’s a lot of furor in this thread, but people felt the same way when Google Street View came out. Eventually they worked through most of the thorny bits and people use Street View now.I suspect MSFT is in a similar spot. If they don’t train on more data, they’ll be left behind by Anthropic/OAI. If they do, they’ll annoy a few diehards for a while, they’ll work through the kinks, then everyone will get used to it.
computomatic: That comparison doesn’t hold at all. This would be equivalent to Google publishing photos of inside your home.Or, perhaps more directly, training their image-gen models on your private Google Photos.
Esophagus4: Conceptually I think it’s a fine comparison.They’re training (with an opt out) on stuff people feel is an invasion of their privacy to make their service better.
Ancalagon: This is the worst year of enshittification I can recall. Literally everything is going to shit.
bolangi: Hah, github can have my crap code. Anyone trained on it will be in for a world of hurt :-)
Esophagus4: Can’t wait for copilot to start saying stuff like// todo… remove this before it goes to prod lol
Forgeties79: I worry about a post-Gabe valve for this reason.