Discussion
You're using a suspiciously old browser
Etheryte: While I see the point the author is trying to make, I'm not really sure I agree. Most users don't even read error messages, never mind logs. At best, logs are something they need for compliance, for most, the concept doesn't exist at all. I do agree that the logs should help you understand what went wrong and why, but in that regard the principle is the same for both sysadmins and developers and I don't really see the difference?
hk__2: > Most users don't even read error messages, never mind logs.Yes, see all the questions on StackOverflow with people posting their error message without reading it, like “I got an error that says ‘fail! please install package xyz!’, what should I do?!?”.
dexwiz: That question is more likely how do I install, not what to install.
LgWoodenBadger: All software should provide something meaningful for anybody to diagnose, if they’re inclined to. It’s particularly bad in the (Apple) mobile ecosystem, including AppleTV.I have AdGuard Home but one of my spouse’s streaming services wouldn’t work. “There was a problem.” Gee thanks. Eventually figured out that I had to unblock a few hosts so it would work. Only found which ones by googling and finding some other poor soul who fixed it and documented it.
dylan604: I think that's being very generous. If you've ever been in tech support, you'll be amazed at how often you'll be asked what to do when it tells me to do X.If they don't know how to do X, then they should be able to look up how to do X. If it's something like install 3rd party library, then that's not the first party's responsibility. Especially OSS for different arch/distros. They are all different. Look up the 3rd party's repo and figure it out.But no, it's contact support straight away.
tempest_: Sysadmins needs logs that tell them what action they can do fix it. Developers need logs that tell them what a system is doing.Generally a sysadmin needs to know "is there an action I can do to correct this" where as a dev has the power to alter source code and thus needs to know what and where the system is doing things.
majkinetor: Any group of people is target of specific log level. INFO for random folks, DEBUG for programmers etc.
EvanAnderson: In my sysadmin work I curse every developer who makes me fire up strace, tcpdump, procmon, Wireshark, etc, because they couldn't be bothered to actually say what file couldn't be found, what TCP connection failed to be established. etc.
lucianbr: I get the impression that often it isn't laziness but the concept that error details leak information to an attacker and are therefore a vulnerability.I disagree with this view, but it definitely exists.
EvanAnderson: In a message returned by a server to a client I suppose it's defensible. For writing to syslog, event log, a log file, etc, it's not.
thangalin: Of possible interest:* https://dave.autonoma.ca/blog/2022/01/08/logging-code-smell/* https://dave.autonoma.ca/blog/2026/02/03/lloopy-loops/Both of these posts discuss using event-based frameworks to eliminate duplicative (cross-cutting) logging statements throughout a code base.My desktop Markdown editor[1], uses this approach to output log messages to a dialog box, a status bar, and standard error, effectively "for free".[1]: https://repo.autonoma.ca/repo/keenwrite/tree/HEAD/src/main/j...
ragall: > but in that regard the principle is the same for both sysadmins and developers and I don't really see the difference?No, it's very different: developers generally want to know about things they control, so they want detailed debugging info about a program's internal state that could have caused a malfunction. Sysadmins don't care about that (they're fine with coalescing all the errors that developers care about under a general "internal error"), and they care about what in the program environment could have triggered the bug, so that they may entirely avoid it or at least deploy workarounds to sidestep the bug.
thayne: > It’s particularly bad in the (Apple) mobile ecosystemIt's been years since I've significantly used Apple software, but when I had to use a Mac at work, or helped friends or family troubleshoot some problem on Mac OS, I had a similar experience. When things don't "just work", it was very difficult to figure out why it didn't work.
keithnz: From recent experience, I'm thinking logs need to be written for AI. Over the last few months, I've had a couple of issues where I took a bunch of logs from a bunch of interacting programs, pointed the AI at the logs and the source code and it's been really effective and finding the problems, often seeing patterns that would have been really hard for me to spot in all the noise.
bonoboTP: They don't want tinkering or tinkerers.Apple is all about walled-off, locked-down, black box, just-works (when it does) etc. It's supposed to seem like magic. You're not supposed to tinker with magic, it makes it pedestrian. Apple as a brand is a lifestyle, a feeling. The slick, polished brand. Remember "I'm a Mac, and I'm a PC"? PC is where you tinker, and there is screws and nuts and bolts and jargon and troubleshooting etc. In Apple land, you just take it to a slick genius bar and they do their magic. Or you just buy a new one.As a European I'm always baffled how Apple got so much market share among the actual techies and power users in the US. You do it to yourself by buying this stuff. It's for people who don't want to spend one second thinking about actual technical issues.
yjftsjthsd-h: Okay, but then their stuff needs to be perfect as designed. Because the moment there's a bug, we're back to needing diagnostic tools.
bonoboTP: There is a self-regulating loop that Apple users quickly learn not to "draw outside the lines" and just use the thing as designed and intended by Apple. If you use stuff like AdGuard, custom DNS etc, that's tinkerer tier stuff. A good Apple user either watches the ads or pays not to see them.
yjftsjthsd-h: My point is that even inside the lines there are still bugs.
nubinetwork: I recently went all-in on the systemd ecosystem as much as I could on some recent hardware installs, and my biggest pet peeve is the double timestamps and double logs I find in journalctl... it's like they never intended you to read the logs...
hinkley: [delayed]
hinkley: For years now I’ve been pushing for moving of all non actionable error messages and all aggregate-actionable error messages into telemetry data instead.Not the least of which because log processing SaaS companies seem to be overcharging for their services even versus hosted Grafana services, and really many of us could do away with the rent seeking entirely.The computational complexity of finding meaning in log files versus telemetry data leans toward this always being the case. It will never change except in brief cases of VC money subsidizing your subscription.If an error shouldn’t trigger operator actions, but 1000 should, that’s a telemetry alert not a data dog or Splunk problem.
alpaca128: There's a difference between Apple's mobile devices which are an actual walled garden, and Mac OS which (begrudgingly) still lets you install and run pretty much anything. It has a nice terminal, no driver issues, and is not nearly as distracting and annoying as modern Windows (still has more than enough bugs and quirks though). And once update support runs out I can install Linux on it.iPads are a completely different world and really feel not just restrictive but the whole ecosystem constantly tries to push you towards subscriptions for everything, including the OS which conveniently offers the only sane backup solution that can cover all apps. It incentivises content consumption and giving up control over one's data. Not my cup of tea.
DharmaPolice: I totally agree but you can attribute a lot of the Apple worship to Microsoft and their OEM partners making PC laptops an often miserable experience.
mjevans: The log needs to document, at least in broad steps and critical details, what the next operation is and what key parameters were provided to it.A human, or an 'agent' can use those to figure out why said next step might have gone wrong.
ziml77: This is the way I like to do it. I know bloating the logs too much can be a problem, but it's even worse if you're lacking information to reconstruct what happened when there ends up being a problem. And only providing that detail when there's an error isn't enough. What if the issue never triggered an error in the application and it was only caught later on either by a person seeing something was off or by an error a downstream system?Also it's helpful to log before operations rather than after because if a step gets stuck it's possible to know what it's stuck on.
mfuzzey: Depends a lot on the context and type of software.For server side software where there is a sysadmin in charge of keeping it running I generally agree.But for end user software (desktop, mobile, embedded) no one wil read the logs and there the logs can, and probably should, be aimed at the developers. Of course you can and should still provide usable and informative end user oriented error messages but they're not the same thing as logs
RadiozRadioz: A small subset of technical users do read logs. If a desktop app has a problem, I have a fighting chance of fixing it if I have logs. Error messages may not give the full picture; what was the app trying to do before the error occurred? Logs let me debug slowness and crashes.
bsder: > As a European I'm always baffled how Apple got so much market share among the actual techies and power users in the US.Linux, historically, was terrible and then some; lots of us simply want to get on with life and not dork with the OS every day. If you didn't want to use Windows at your day job, that left OS X.And, for a while, Apple hardware was quite nice. For a remarkably long time, you could get way cheaper high resolution laptop displays than the competition. The trackpads have always been far superior on Apple than Linux. And then the M-series came along and was also quite nice.However, over time Linux has gotten better so it's now functional as a daily driver and reasonably reliable. And macOS has deteriorated until it's now probably below Linux in terms of reliability.So, here we are. macOS and Windows do seem to be losing share to Linux, but only Linux cares. At this point, desktop/laptop revenue is dwarfed by everything else at both Microsoft and Apple.
superjan: Yeah, along those lines we have requirements on never logging PII, and not logging anything that potentially contains PII, such as folder names.
justinclift: Maybe tokenise the PII part of the folder name when outputting it?ie `$HOME`/.config/foo/stuff.cfg` rather than `/home/joebloggs/foo/stuff.cfg`?
justinclift: One fairly common approach to this for systems, is to configure the system to ship the logs to an external collection mechanism (FluentBit, etc) and do so in JSON format.
Terr_: Or have an encrypted data portion, so that the sensitive details can be revealed as-needed, and redaction occurs by rotating a key.Obviously that depends on the messages being infrequent in production logging levels.
foresto: Rather than indulging the inevitable argument that most users never read log messages, I hope we can remember a more important fact:Some users do read log messages, just as some users file useful bug reports. Even when they are a tiny minority, I find their discoveries valuable. They give me a view into problems that my software faces out there in the wilds of real-world use. My log messages enable those discoveries, and have led to improvements not only in my own code, but also in other people's projects that affect mine.This is part of why I include a logging system and (hopefully) understandable messages in my software. Even standalone tools and GUI applications.(And since I am among the minority who read log messages, I appreciate good ones in software made by other people. Especially when they allow me to solve a problem immediately, on my own, rather than waiting days or weeks get a developer's attention.)
xahrepap: Honest question, how do you handle high cardinality data points?Reference to where my brain is at: https://www.robustperception.io/cardinality-is-key/I feel like splunk’s business model favors a healthy system and gives major disadvantages to an unhealthy one. What I mean in an example: when the system is unhealthy, I know it because all my splunk queries get queued up because everyone is slamming it with queries. I hate it.But I’m stuck in knowing how to move some things to Prometheus. Like say we have a CustomerID and we want to track number of times something is done by user. If we have thousands of customers, cardinality breaks that solution.Is there a good solution for this?
xahrepap: Asking this question got me to stop being lazy and actually try to answer my own question. Mimir being one that caught my eyehttps://grafana.com/oss/mimir/
john_strinlai: on friday i got 2 calls saying "my phone is no longer showing me my emails, please fix" when the error message they received was roughly "please reenter your password to continue using outlook".on wednesday i got a call saying "the CRM wont let me input this note, please fix" when the error message was "you have included an invalid character, '□' found in description. remove invalid characters and resubmit".
vlovich123: How do you handle the problem that telemetry is generally incapable of capturing temporal context?
pdntspa: I wouldn't confuse Steve Jobs-era Apple with what it is now.
brianjlogan: Not really true for modern cloud architectures. If you have an appropriately tuned Observability stack you're probably pretty familiar with the logs.
not_kurt_godel: You're baffled because you appear to be uninformed and/or willfully ignorant. macOS is Unix-based and 95% functionally equivalent to Linux for software development and tinkering purposes. iOS, while less customizable than Android, is overall very good software for a phone. Apple hardware is superior across the board, especially for durability.Meanwhile, I'm baffled why any techie would voluntarily use an OS that force-enables telemetry and advertising. The fight for privacy and ad-free experiences is hard enough without your OS fundamentally working against you.
closeparen: The freedom of an open ecosystem also accrues to the smarmiest, lowest-rent participants in the value chain.My car's infotainment sucks and I wish I could mod it, but it's plausible we'd all be even worse off if dealerships could.
hinkley: This gets even worse if you have a language with one process per CPU as you can get clobbering other values on the same instance if you don't add fields to uniquely identify them.We got a lot of pushback when migrating our telemetry to AWS after initially being told to just move it when they saw how OTEL amplified data points and cardinality versus our old StatsD data.You probably need less cardinality than you think, and there are a mix of stats that work fine with less frequent polling, while others like heap usage are terrible if you use 20 or 30 second intervals. Our Pareto frontier was to reduce the sampling rate of most stats and push per-process things like heap usage into histograms.An aggregator per box can drop a couple of tags before sending them upstream which can help considerably with the number of unique values. (eg, instanceID=[0..31] isn't that useful outside of the box)
hinkley: That is the $65k question and unfortunately I don't have a pat answer for that yet. I probably need to see more types of projects instead of more time on fewer projects which is where I'm at.But I can give you a partial picture.You're going to end up with multiple dashboards with duplicate charts on them because you're showing correlation between two charts via proximity. Especially charts that are in the same column in row n±1 or vice versa. You're trying to show whether correlation is likely to be causation or not. Grafana has a setting that can show the crosshairs on all graphs at the same time, but they need to be in the same viewport for the user to see them. Generally, for instance, error rates and request rates are proportional to each other, unless a spike in error rates is being for instance triggered by web crawlers who are now hitting you with 300 req/s each whereas they normally are sending you 50. The difference in the slope of the lines can tell you why an alert fired or that it's about to. So I let previous RCAs inform whether two graphs need to be swapped because we missed a pattern that spanned a viewport. And sometimes after you fix tech debt, the correlation between two charts goes up or way down. So what was best in May not be best come November.There's a reason my third monitor is in portrait mode, and why that monitor is the first one I see when I come back to my desk after being AFK. I could fit 2 dashboards and group chat all on one monitor. One dashboard showed overall request rate and latency data, the other showed per-node stats like load and memory. That one got a little trickier when we started doing autoscaling. The next most common dashboard which we would check at intervals showed per-service tail latencies versus request rates. You'd check that one every couple of hours, any time there was a weird pattern on the other two, or any time you were fiddling with feature toggles.From there things balkanized a bit. We had a few dashboards that two or three of us liked and the rest avoided.
ting0: He's uninformed? I assume you have a jailbroken Apple iPhone then?
mixmastamyk: Apple sends tens of megabytes of telemetry from first network connection and regularly:https://sneak.berlin/20210202/macos-11.2-network-privacy/None of this able to be turned off, the boot volume is read-only. Can only be deactivated by jumping through hoops.
Spooky23: We have a magic button in servicenow that lets the L1 agent kick off a job that pulls telemetry from a user device and do an overall health check of the device. That input identifies the issue like 80% of the time if it’s a device issue.It either gets resolved quicker by the L2 guy or dispatched to the third party hardware fix it guy or sent to some speciality L3 team. Resolution time is down like 60%.My next goal is to assess disk and battery health in laptops and proactively replace if they hit whatever threshold we can push the vendors to accept. That could eliminate something like 30% of device related issues, which has a super high value.
Spooky23: As someone who came up in the Slashdot M$ era, if nothing else the PR and communication style of Satya is a masterclass is delivering a message to the public. The dude presents like a Zen master. The message is baffling and the strategy is nonexistent, but people think there’s a new gentle Microsoft.Somehow angry Europeans (at least in this thread) are running into the embrace of Windows as the defender of the tinkerers. Certainly not in n my bingo card.
vlovich123: Yeah, but that still doesn’t let you see “event A happened before event B which led to C”. I’ve had significantly >> 1 bugs where having good logs lets me investigate and resolve the issue so quickly and easily whereas telemetry would have left you searching around forever.
alsetmusic: I haven’t seen a YouTube ad on my machine in years. I download all the videos that I watch and skip through the ads that content creators bake in. I control my dns and network to restrict what can get to my browser and other apps. I have a highly customized Bash environment (I see no reason to switch to zshell when I’ve got Homebrew).But paint the nerds who like MacOS and the wonderful third-party app ecosystem of developers who care about fit and finish as a bunch of mindless rubes if it makes you feel better.
jtbaker: IDK, they were sending around stacks of Mac Studios to tinkerer youtubers messing with EXO clustering like @geerlingguy.https://youtu.be/1iT9JeZYXcI?si=UMR0nfHAYbVq2tF1
jandrewrogers: > As a European I'm always baffled how Apple got so much market share among the actual techies and power users in the US.I know exactly how this happened, I was there. It filled a gap for a practical desktop UNIX when none existed.In the old days, there many flavors of proprietary UNIX, like Solaris, IRIX, HPUX, AIX, et al plus a few open source versions like FreeBSD and early Linux. The early Internet was a purely UNIX world (still mostly is) but UNIX was a fragmented market of dozens of marginally interoperable OS.During the dotcom boom, Solaris on Sparc became the gold standard for large servers. These are very expensive machines and not particularly user friendly. If you were a dev in those days, you were either using some type of Sparc workstation or FreeBSD or Linux (which wasn’t very good in those days). You wanted your desktop environment to be UNIX-ish but the good + cheap options were limited. Linux became better on the server and started to displace FreeBSD there but was still very limited as a desktop OS. Linux was much worse than Windows NT on the desktop at the time but Windows NT wasn’t UNIX.MacOS X came along and offered UNIX on the desktop with a far better experience than Linux (or any other UNIX) on the desktop, and much cheaper than a Solaris workstation. It filled a clear gap in the market, and so Silicon Valley moved from a mix of Solaris and Linux desktops for development to MacOS X desktops, which were better in almost every way for the average dev. It was UNIX and it ran normal business applications like Microsoft Office.MacOS X was a weaker UNIX than many of the other UNIX OS but it offered a desktop that didn’t suck and it was cheap. For someone that had been using Linux or Solaris at the time, which many devs were, it was a massive upgrade.MacOS still kind of sucks as a UNIX but that’s okay because we don’t use it as a server. Silicon Valley needed a competent UNIX desktop that didn’t cost a fortune and Apple delivered.Apple is just a remote UNIX system for manipulating the other UNIX systems your code actually runs one.
bbkane: Yup, this is how I do it in https://github.com/bbkane/logos
hinkley: Here’s the thing though. When you’ve got 1000 req/s split across a couple dozen log files all being scanned in parallel there’s really no such thing as tracing a->b->c anyway. It’s the seashore and you’re looking for a specific shell.You’ve got correlationids, and if your system isn’t reliably propagating those everywhere you absolutely have to fix that. But you’re going to use those once you already notice an uptick in a weird error you haven’t seen before, and it’s hard to see those when you’re generating 8k log entries per second that are 140-200 characters long and so you’re only seeing twenty of them at a time in Splunk.You have some chatty frontend that’s firing off three requests at the same time and you’re going to struggle period. You’re going to be down to some janky log searches for that and you don’t need to be paying someone $$ every month to still have it rough.We used to have QA people for this.
nottorp: > As a European I'm always baffled how Apple got so much market share among the actual techies and power users in the US. You do it to yourself by buying this stuff. It's for people who don't want to spend one second thinking about actual technical issues.Why only the US? I'm in Europe and I've switched from Linux to Mac OS as my daily driver when I got tired of waiting for the mythical "linux on the desktop year".Note that a good part of my career involves arm linuxes for industrial applications so I never actually stopped using linux if i was paid for it.Mac OS is indeed becoming more and more annoying, but then so is desktop Ubuntu. And Windows is out of the question. I know firsthand, I have a contract for a windows application right now.If Apple management continues to not take their dried frog pills as prescribed, I will eventually switch back to Linux, but for the desktop I'll probably have to check out some more niche distributions, or at least Debian.And even then I'll probably keep the macbook pro and switch to Linux only on the desktop machines.
bonoboTP: I once tried to put an mp3 on a relative's iPhone. I tried connecting it to our PC, and do it with iTunes, but it turned out I couldn't do it. Or it was some ridiculous contortion performance. I just told my relative that he shouldn't ask me to help with Apple devices. If you want one apple device you have to replace all your infra with apple devices, and learn to live in the walls built by apple and forget about files or any kind of agency independent of your apple overlords.
bonoboTP: Ubuntu may have issues, but at least the logs are there and you have freedom to open up the hood and reconfigure things as you wish. This is effort but we are in a thread discussing the problem of opaque errors and the impossibility of troubleshooting. Yes troubleshooting is tinkering. It is effortful and nerdy and sweaty, not slick and effortless.
bonoboTP: I think thats about the first era of Apple. They faded in the background in the consumer mind from the mid 90s to the mid 00s. It was the iPod/iPhone/iPad trilogy that brought Apple back to the mainstream. In ~2002 for regular people Apple and Mac had a dusty sound, like Commodore.
bonoboTP: If it works, great. My comment was aimed at a person who seemed to want more freedom and troubleshooting possibility.
bonoboTP: Windows is enshittifying too but at least carries some of the pre-walled-gardens mentality of computing, when users expected a bit more agency. I personally use Linux, but I also know it's not practical for average regular people like my family members. I tried. Unfortunately when they run into some problem they demand to get back their windows. It's not like they never have trouble with windows, but they are used to that shape of trouble and don't really see it as unusual or even if they are annoyed by it, they feel like it's just the nature of things like a muddy rainy day every once in a while.
nottorp: > the logs are there and you have freedom to open up the hood and reconfigure things as you wishI don't know. There's a lot of friction like gnome hiding or removing configuration options, kde becoming a third class citizen, different packaging systems every 2 years... app stores being pushed instead of apt-get install...The command line and server side stuff is fine of course, I wouldn't dream of running anything but linux for that.
bonoboTP: True, snap annoys me as well, but it's possible to switch packages to apt. And it's just going to get easier to customize using AI agents.
nottorp: Tried late last year to set up a new linux server with the help of "AI". Unfortunately it couldn't decide what distribution it's talking about in spite of me specifying it in the prompt. And when it got it right it mixed LTS Ubuntu versions.So... i don't know about "AI". Might have to still write the config files by hand.