Discussion

Search code, repositories, users, issues, pull requests...

ComputerGuru: Reviews of the tool on twitter indicate that it completely nerfs the models in the process. It won't refuse, but it generates absolutely stupid responses instead.

littlestymaar: Don't use this 2 days old vibe coded bullshit please.p-e-w's Heretic (https://news.ycombinator.com/item?id=45945587) is what you're looking for if you're looking for an automatic de-censoring solution.

kube-system: I guess it's kind of like a lobotomy tool.

Animats: Link?It's interesting that people are writing tools that go inside the weights and do things. We're getting past the black box era of LLMs.That may or may not be a good thing.

noufalibrahim: I believe that this is already done to several models. One that I've come across are the JOSIEfied models from Gökdeniz Gülmez. I downloaded one or two and tried them on a local ollama setup. It does generate potentially dangerous output. Turning on thinking for the QWEN series shows how it arrives at it's conclusions and it's quite disturbing.However, after a few rounds of conversation, it gets into loops and just repeats things over and over again. The main JOSIE models worked the best of all and was still useful even after abliteration.

Alifatisk: This is for local models right? I can't use it on, say my glm-5 subscription connected to opencode?

HanClinto: Correct, local models only.

littlestymaar: This is vibecoded garbage that the “author” probably didn't even test by themselves since making this yesterday, so it's not surprising that it's broken.

dinunnob: Hate to have to be the one to stick up for pliny here, but hes concerned about forcing frontier labs to focus more on model guardrails - he demonstrates results that are crazy all the timehttps://x.com/elder_plinius

thegrim33: Whether or not the linked tool uses a good approach, manipulating models like you mention is already fairly well established, see: https://huggingface.co/blog/mlabonne/abliteration .

IncreasePosts: I didn't use this tool, but I did try out abliterated versions of Gemma and yes, it lost about 100% of it's ability to produce a useful response once I did it

PeterStuer: Already censored for sharing on FB Messenger?

a2128: You're not just using a tool — you're co-authoring the science. This README is an absolute headache that is filled with AI writing, terminology that doesn't exist or is being used improperly, and unsound ideas. For example, it focuses a lot on doing "ablation studies", by which it means removing random layers of an already-trained model, to find the source of the refusals(?), which is an absolute fool's errand because such behavior is trained into the model as a whole and would not be found in any particular layer. I can only assume somebody vibe-coded this and spent way too much time being told "You're absolutely right!" bouncing back the worst ideas

dinunnob: Hmm, pliny is amazing - if you kept up with him on social media you’d maybe like him https://x.com/elder_plinius

bigyabai: If this qualifies as "amazing" in 2026 then Karpathy and Gerganov must be halfway to godhood by now.

gavinray: The parent comment makes no reference to or comment on the author of the README.It just says "the README sucks." Which, I'm inclined to agree, it does.LLM-generated text has no place in prose -- it yields a negative investment balance between the author and aggregate readers.

EGreg: Amazing as in his stuff actually works?I just hear him promoting OBLITERATUS all day long and trying to get models to say naughty things

dinunnob: Yeah but i think the philosophy is to show how precarious the guardrails are

paradox460: It's not just a headache, it's bad

Retr0id: I don't know if this particular tool/approach is legit, but LLM ablation is definitely a thing: https://arxiv.org/abs/2512.13655

electroglyph: the default heretic with only 100 samples isn't very good, you really need your own, larger dataset to do a proper abliteration. the best abliteration roughly matches a very careful decensor SFT

dinunnob: I dont think anyone is going to dispute this

bigyabai: I just don't think many people will be "amazed" by their output, as you claim.

dinunnob: I just said pliny was amazing, fwiw - i like that hes hacking on these and posts about it. I rushed to defend, i wish more people were taking old school anarchist cookbook approaches to these things

cess11: Smoke banana peel?

Zetaphor: I had such a godawful headache from that. Also tried the peanut shells, equally awful. I was a dumb teenager.

fragmede: Alternately, it's intentional. It very effective filters out people with your mindset. You can decide if that's a good thing or not.

fragmede: gasoline and styrofoam was fun tho

eli: Why would a tool that works need to dissuade skeptics from trying it?

butILoveLife: This is my experience with abliterated models.I use Berkley Sterling from 2024 because I can trick it. No abliteration needed.

Aurornis: I don't know. I scrolled through his recent Tweets and he's sharing things like this $900 snake oil device that "finds nearby microphones" and "sends out AI-generated cancellation signals" to make them unable to record your voice : https://x.com/aidaxbaradari/status/2028864606568067491Try to think for a moment about how a device would "find nearby microphones" or how it would use an AI-generated signal to cancel out your voice at the microphone. This should be setting of BS alarms for anyone.It seems the Twitter AI edgey poster guy is getting meta-trolled by another company selling fake AI devices

lazzlazzlazz: Ironic to see this comment when Pliny, the author of this codebase, is one of the most sophisticated LLM jailbreakers/red-teamers today. So presumptive and arrogant!

SilverElfin: Does anyone offer a live (paid) LLM chatbot / video generation / etc that is completely uncensored? Like not requiring doing any work except just paying for it?

mapontosevenths: Nous Hermes was built from the ground up to be uncensored. No abliteration required.Its not a frontier model but it will give you a feel for what its like.

D-Machine: I immediately read it as intentional, as a sort of attempt at ironic / nihilistic humour re: LLM-generation, given what the tool claims to do.

jeffbee: "Ablation studies" are a real thing in LLM development, but in this context it serves as a shibboleth by which members of the group of people who believe that models are "woke" can identify each other. In their discourse it serves a similar purpose to the phrase "gain of function" among COVID-19 cranks. It is borrowed from relevant technical jargon, but is used as a signal.

creatonez: > For example, it focuses a lot on doing "ablation studies", by which it means removing random layers of an already-trained model, to find the source of the refusals(?), which is an absolute fool's errand because such behavior is trained into the model as a whole and would not be found in any particular layer.That doesn't mean there couldn't be a "concept neuron" that is doing the vast majority of heavy lifting for content refusal, though.

mapontosevenths: Thats not what it means at all. It uses SVD[0] to map the subspace in which the refusal happens. Its all pretty standard stuff with some hype on top to make it an interesting read.Its basically using a compression technique to figure out which logits are the relevant ones and then zeroing them.[0] https://en.wikipedia.org/wiki/Singular_value_decomposition

D-Machine: You are also not quite correct, IMO. See my comment at https://news.ycombinator.com/item?id=47283197.What you are talking about is abliteration. What OBLITERATUS seems to be claiming to do is much more dumb, i.e. just zeroing out huge components (e.g. embedding dimension ranges, feed-forward blocks; https://github.com/elder-plinius/OBLITERATUS?tab=readme-ov-f...) of the network as an "Ablation Study" to attempt to determine the semantics of these components.However, all these methods are marked as "Novel", I.e., maybe just BS made up by the author. IMO I don't see how they can work based on how they are named, they are way too dumb and clunky. But proper abliteration like you mentioned can definitely work.

roywiggins: Ultrasound microphone jammers seem to be a real thing, so it's possible it does to some extent work.

D-Machine: Doesn't look legit to me. You are talking about abliteration, which is real. But the OP linked tool is doing novel and very dumb ablation: zeroing out huge components of the network, or zeroing out isolated components in a way that indicates extreme ignorance of the basic math involved.Compared to abliteration, none of the ablation approaches of this tool make even half a whit of sense if you understand even the most basic aspects of an e.g. Transformer LLM architecture, so my guess is this is BS.

D-Machine: Thanks for this link, and mentioning this info some times in this overall thread.It also seems the influgrifter has a lot of bots (or perhaps cultists) working this thread...

dmix: Based on his twitter he may just like irony/meta posting a little too much like a lot of modern culture

mapontosevenths: You got me there. I missed the wackier antics further down. Mea culpa.

D-Machine: So did I initially until I saw a few more things from others here.

userbinator: "Getting high on your own supply" is exactly what I'd expect from those immersed in this new AI stuff.

g947o: Went through the README but still have no idea how well this works, in terms of removing the censorship while minimally degrading the quality of responses. Well to be honest I can't tell if this works at all or is just an idea.

orbital-decay: Looking at his attempts at jailbreaking some models, I'm not sure he even remotely understands what he's doing, e.g. he tries to counter non-existent refusal training in Gemini [0] while doing nothing against the guardrails which actually protect the model. Looks like a pompous e-celeb, all performance with no substance.https://github.com/elder-plinius/L1B3RT4S/blob/main/GOOGLE.m...

userbinator: I see you have carefully avoided the em-dash. ;-)

drnick1: I wouldn't call mainstream LLMs "woke," but they are definitely on the "politically correct" side of things. There should be NO restriction on open source models. They should just reflect the state of human knowledge and not take a stance on whether some activity is illegal or immoral.

orbital-decay: Not sure if this is sarcasm (hard to detect on HN), but he's trying to break DeepSeek R1 or Gemini refusals which are almost non-existent, as they're both made for external guardrails. It looks to me he's not even trying to read the documentation, let alone tinker with the models. It's pure performance for the clueless twitter public.

shevy-java: > LLM-generated text has no place in proseAI will infiltrate that too. I remember some time ago I read a book that was AI-generated. It took me a while to notice that it was AI-generated. One can notice certain patterns, where real humans would not write things the way AI does.

shevy-java: Is that quote from the movie Scarface?https://www.youtube.com/watch?v=U4XplzBpOiU # had to search for it right now, seems to be a movie-quote \o/

hexaga: The terminology comes from the post[0] which kicked off interest in orthogonalizing weights w.r.t. a refusal direction in the first place. That is, abliteration was not originally called abliteration, but refusal ablation.Ultimately though, OP is just what you get if you take the idea of abliteration and tell an LLM to fix the core problems: that refusal isn't actually always exactly a rank-1 subspace, nor the same throughout the net, nor nicely isolated to one layer/module, that it damages capabilities, and so on.The model looks at that list and applies typical AI one-off 'workarounds' to each problem in turn while hyping up the prompter, and you get this slop pile.[0]: https://www.lesswrong.com/posts/refusal-in-llms-is-mediated-...

KennyBlanken: Only for specific kinds, like MEMS.But there's no way to detect microphones automatically, and "AI generated cancellation signals" is a word salad that doesn't mean anything.What they probably mean is "we asked ChatGPT to tell us what waveform and frequency range to use on MEMS devices and spit out some arduino code."

jandrese: No offense, but a Lesswrong link is an immediate yellow flag, especially on the topic of AI. I can’t say if that article in particular is bad, but it is associating with a whole lot of abject nonsense written by people who get high on their own farts.

hexaga: Regardless, it is the origin of abliteration. Other extremely similar things have been done before, but the popularized idea/name is from that.

pjc50: Does anyone offering such a thing bear liability if the model induces a crime?

pjc50: Defining morality out of the set of knowledge is quite an opinion.

pjc50: As a non logged in user I get tweets in popularity order, which means this weird but tame sexual image comes up third https://x.com/elder_plinius/status/1904961097569890363?s=20

dragonwriter: Someone offering a completely uncensored chatbot or image/video generation service is probably either committing, or at risk of committing, crimes directly, which may be a more pressing concern than having liability for inducing a third party to commit a crime.Even jurisdictions with relatively broad expressive freedoms tend not to tolerate distribution (especially commercial distribution) of all conceivable content.

dragonwriter: > Does anyone offer a live (paid) LLM chatbot / video generation / etc that is completely uncensored?Probably not, because if it is completely uncensored, it would probably violate the law (in different ways) in every possible jurisdiction.. (Also, one common method of censorship is exclusion of particular types of content from the training set, so to be completely free of that kind of censorship, there would have to be no content intentionally excluded from the training set.)In general, paid services are censored not only to attempt to meet the laws in all jurisdictions of concern to the provider, but also to try to be safe with regard to the (shiifting) demands of payment processors, and to try to maintain the PR image of the provider.

simgt: If LLMs were a public good released by non profit entities, that could make sense, maybe. Turns out spewing illegal and immoral shit is not good for the PR of most for-profit businesses.

DeathArrow: > I can only assume somebody vibe-coded this and spent way too much time being told "You're absolutely right!" bouncing back the worst ideasAre there LLMs which don't always approve whatever idea the user has and tell him it's absolutely brilliant?

littlestymaar: > he demonstrates results that are crazy all the timeThat's what influgrifters do, yes. They make a living thanks to gullible people believing their grandiose claims.

D-Machine: > "ablation studies", by which it means removing random layers of an already-trained model, to find the source of the refusals(?)This is not what an ablation study is. An ablation study removes and/or swaps out ("ablates") different components of an architecture (be it a layer or set of layers, all activation functions, backbone, some fixed processing step, or any other component or set of components) and/or in some cases other aspects of training (perhaps a unique / different loss function, perhaps a specialized pre-training or fine-tuning step, etc) in order to attempt to better understand which component(s) of some novel approach is/are actually responsible for any observed improvements. It is a very broad research term of art.That being said, the "Ablation Strategies" [1] the repo uses, and doing a Ctrl+F for "ablation" in the README does not fill me with confidence that the kind of ablation being done here is really achieving what the author claims. All the "ablation" techniques seem "Novel" in his table [2], i.e. they are unpublished / maybe not publicly or carefully tested, and could easily not work at all.From later tables, I am not convinced I would want to use these ablations, as they ablate rather huge portions of the models, and so probably do result in massively broken models (as some commenters have noted in this thread elsewhere). EDIT: Also, in other cases [1], they ablate (zero out) architecture components in a way that just seems incredibly braindead if you have even a basic understanding of the linear algebra and dependencies between components of a transformer LLM. There is nothing sound clearly about this, in contrast to e.g. abliteration [3].[1] hhtps://github.com/elder-plinius/OBLITERATUS?tab=readme-ov-file#ablation-strategies[2] https://github.com/elder-plinius/OBLITERATUS?tab=readme-ov-f...EDIT: As another user mentions, "ablation" has a specific additional narrower meaning in some refusal analyses or when looking at making guardrails / changing response vectors and such. It is just a specific kind of ablation, and really should actually be called "abliteration", not "ablation" [3].[3] https://huggingface.co/blog/mlabonne/abliteration, https://arxiv.org/abs/2512.13655.

hexaga: What do you mean? It's a spin on abliteration / refusal ablation. Roughly, from what I remember abliteration is:1. find a direction corresponding to refusal by analyzing activations at various parts of a model (iirc, via mass means seen earlier in Marks, Tegmark and shown to work well for similar tasks)2. find the best part(s) of the model to orthogonalize w.r.t. that direction and do so (exhaustive search w/ some kind of benchmark)OP is swapping in SVD for mass means (1), and the 'ablation study' for (2), and a bunch of extra LLM slop for... various reasons. The final model doesn't have zeroed chunks, that is search for which parts to orthogonalize/refusal ablate/abliterate. I don't have confidence that it works very well either, but, it isn't 'braindead' / obvious garbage in the way you're describing.It's LLMified but standard abliteration. The idea has fundamental limitations and LLMs tend to work sideways at it -- there's not much progress to be made without rethinking it all -- but it's very conceptually and computationally simple and thus attractive to AIposters.You can see how the LLMs all come up with the same repackaged ideas: SVD does something deeply similar to mass means (and yet isn't exactly equivalent, so LLM will _always_ suggest it), the various heuristic search strategies are competing against plain exhaustive search (which is... exhaustive already), and any time you work with tensors the LLM will suggest clipping/norms/smoothing of N flavors "just to be safe". And each of those ends up listed as "Novel" when it's just defensive null checks translated to pytorch.I mean, the whole 'distributed search' thing is just because of how many combinations of individual AI slops need to be tested to actually run an eval on this. But the idea is sound! It's just terrible.I'm not defending the project itself -- I think it's a mess of AIisms of negligible value -- but please at least condemn it w.r.t. what is actually wrong and not 'on vibes'.

simondotau: A model should understand multiple perspectives on morality and avoid prescribing a single one where there’s no overwhelming prior consensus.Alternatively, they should be trained on my opinion on everything. That would also be acceptable.

gcr: wait, SVD / zeroing out the first principal component is an unsupervised technique. The earlier difference-of-means technique relies on the knowledge of which outputs are refusals and which aren’t. How would SVD be able to accomplish this without labels?

gcr: jailbreaks are holistic, it’s not like you’re deprogramming / “countering” individual parts. Nobody creating jailbreaks “understand what they’re doing”

greenpizza13: Never stopped to ask if they should...

k33n: Of course all censorship should be removed from everything. But in this case, he never stopped to ask if he could.

orbital-decay: That's exactly what you do in case of refusal training, though. Yes, it will affect other "parts", but that's not the point. In this case the model itself doesn't even need a jailbreak.>Nobody creating jailbreaks “understand what they’re doing”Unless you mean those "god mode jailbreaker" e-celebrities showing off on Twitter/Reddit, that's simply not true.

hexaga: Indeed.Taking the top principal component pattern matches as 'more surgical / targeted' so the LLM staples it on (consider prompts like: make this method stop degrading model performance). It ignores that _what_ is being targeted is as or more important than that 'something' is being targeted. But that's LLMs for you.(in case it isn't immediately obvious, that paper is AI written too)

bird0861: Just want to add to this that with custom calibration data it's incredibly effective and surgical, you can get VERY LOW KL divergence this way. Many MoEs are supported too, it's actively maintained.

Reader /

Discussion

GitHub - elder-plinius/OBLITERATUS: obliterate the chains that bind you · GitHub

github.com

/ pin · @ user · Ctrl+Enter

No discussions yet

Discover

arxiv.org

[2604.15356] Sequential KV Cache Compression via Probabilistic Language Tries: B…

Abstract page for arXiv paper 2604.15356: Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the P…

1 29

journals.plos.org

Real-Time Visualization of Joint Cavitation | PLOS One

Cracking sounds emitted from human synovial joints have been attributed historically to the sudden collapse of a cavitat…

1 2

news.hada.io

Atlassian, AI 학습을 위한 기본 데이터 수집 활성화 | GeekNews

Jira, Confluence 등 Atlassian Cloud 제품의 고객 메타데이터와 앱 내 콘텐츠가 2026년 8월 17일부터 Rovo와 Rovo Dev 학습에 기본 활용 예정요금제별 기본값이 다르게 적용되며,…

onesteppower.com

How is subsea cable repaired?

It’s Fun Fact Friday and today we’re going to talk about repairing submarine cables. Submarine data cables are located u…

1 1

news.hada.io

John Ternus, Apple CEO로 선임 | GeekNews

장기 승계 계획에 따라 Apple 이사회가 Tim Cook의 Executive Chairman 전환과 John Ternus의 CEO 선임을 만장일치로 승인했으며, 효력 발생일은 2026년 9월 1일John Ternu…

flyingpenguin.com

FreeBSD CVE-2026-4747 Log Suggests Mythos is a Marketing Trick | flyingpenguin

1 2

GitHub - elder-plinius/OBLITERATUS: obliterate the chains that bind you · GitHub

More from github.com

GitHub - zeroecco/holos: docker compose for kvm/qemu · GitHub

GitHub - gizmo64k/soulplayer-c64: A real 25k-parameter transformer running on a…

GitHub - Luce-Org/lucebox-hub: Lucebox optimization hub: hand-tuned LLM inferenc…

GitHub - ArcaneNibble/awawausb: WebUSB for fopses · GitHub