Discussion
Are your startup bets actually paying off?
Evgeniuz: There’s a bias, I think. When I saw the title that is about how bad I’m at estimating, I’ve leaned towards counterintuitive answers. This got me quite a high score. I think test set should also include intuitive facts (or maybe I was just lucky).
convexly: As much as it is counterintuitive, that is actually a valid calibrations strategy. If you notice the questions lean slightly towards counterintuitive and adjust for it, that IS better calibration! But you raise a fair point about framing bias from the title.
convexly: Interesting data from the quiz so far: 160+ quiz takers! The average is 0.239 (barely better than a coin flip at 0.25), but almost everyone indicates they are confident in their answers.
convolvatron: I didn't find the questions very representative about estimation. that is maybe if happen to know many of random root facts about the world under which they were based, then their application might be a revenant question about ability to estimate. I really felt more like I was making uneducated guesses (0.155). I suppose I was expecting more ping pong balls in airplanes
convexly: The point I was going for was more so how people handle questions they don't know the answer to. Someone that is "well-calibrated" would set things they are uncertain about at closer to 50% instead of guessing one way or the other (overconfident). That score is excellent, so it suggests you did exactly that!
zupa-hu: It is very disappointing that you can't see what you got right or wrong without giving out your email. I'm not even sure if one would learn from the email or whatever the calibration result is.I'm happy for you if it works but I sure feel cheated. I hope others also feel it's against the spirit of a Show HN. But maybe it's just me.
convexly: That's a good point, I might have gated it too hard. I'll open up the full results now. Appreciate the feedback.
reltnek: I think this might be conflating confidence with accuracy. I tried leaving the slider the the middle (nominally the least confident position) and it gave a score of 0.25 and diagnosed it as 'overconfident'.
convexly: That is definitely a bug, thank you for pointing that out. Should have been neutral! I'll push a fix for this.
fred_is_fred: Is it down? The start and skip button both dont work and I see this error in my console.Manifest fetch from https://www.convexly.app/manifest.json failed, code 403
convexly: Just checked and everything is up. That might just be a console warning, but shouldn't affect the quiz. Can you try a hard refresh (ctrl+shift+R)? If that still doesn't work, what browser are you on?
fred_is_fred: I tried Chrome and Safari. It's working great on my phone, so probably zscalar.
lorenzohess: Maybe I don't know enough about "calibration" in a technical sense, but it seems like this quiz cant really distinguish between factual knowledge and calibration skill?Is this type of quiz reproducible for individuals and across various cross-sections of the population?Are there studies on this? Is the quiz based on these studies?
convexly: Update: 400+ quiz takers now... insane. Best Brier score so far is 0.007 (nearly perfect calibration). The worst came in at 0.600. Average is 0.230, still just better than a coin flip. Where did you land?
bovermyer: I hit 0.012.As a test of general knowledge it was interesting. The confidence angle was the most interesting part, though.
convexly: That's the second best score I've seen today out of 700+ quiz takers! Exceptional calibration. The confidence angle is the whole point, people don't know how far off they actually are until they see the hard data!
Hnus: Why is it asking for email?
convexly: I just removed that, full results should be fully visible without email! A hard refresh should show the update.
convexly: Great question. Calibration specifically is about whether your confidence in an answer matches your accuracy, not whether you know the answer. Someone who knows a lot but is always 90% confident would score poorly even if they're wrong 20% of the time, as an example.In terms of research, Tetlock's Expert Political Judgement and Superforecasting were the foundation. He did a 20 year study that showed domain experts were barely better than chance at long-range predictions. The Brier score was the standard metric for that research.
lorenzohess: I see, that makes a lot of sense. Maybe the UI should reflect this? Have one button for True or False or Uncertain, and then the slider for confidence in the answer?
convexly: That's a really good UX idea. I can see how it's not the most intuitive now. Separating the direction from the confidence level would make it much clearer. Adding that to my list.
addisonl: > Question: A fair die rolling a 6 twice in a row is more likely than rolling 1-2-3-4-5-6 in sequenceTwo 6s in a row is 1/36 chance (1/6)^21-2-3-4-5-6 is a 1/46656 chance (1/6)^6Website is claiming they are the same probability:> Same probability: 1/46,656 — Both outcomes have exactly the same probability: (1/6)^6 = 1/46,656. This illustrates the representativeness heuristic — random-looking sequences feel more probable than ordered ones.Website's "answer" is wrong: was the question supposed to be rolling a 6 six times in a row?
cyanydeez: Yeah, most likely it was try to identify a bias of human perception, that 1,2,3,4,5,6 would be more probably than 6x6.A better way to illustrate this bias is with coin flips. People will tell you that odds of 6 heads is more rare than the odds 3 tails then 3 heads. The difficulty is understanding whether they mean "in order" or "as a group".If it's in order, the odds are the same. Every order of H/T has the same probability, but humans will see "all heads" and think that's more rare. But the important bit is whether there's a clear understanding ordering.
1qaboutecs: came with the same complaint. the website then had the nerve to tell me i am overconfident.
convexly: Fair point! Bad question on my end. The overconfidence was based on all 10 questions though, not just that one!
gcanyon: Wait, so roughly is it rewarding being confident when correct, and penalizing being confident when wrong? Meaning that the highest score is only achievable if you answer fully confident true or false, and get all 10 correct?If so, isn't that conflating knowledge with over/under confidence?
convexly: Your point on scoring is correct, if you're 100% confident and right on everything you would score a perfect 0. The calibration insight is in how you handle the questions where you don't know the answer. Say you're highly knowledgeable and 95% confident on everything, but get 2 wrong scores compared to someone that says they are 70% confident on those same two questions. That would indicate that you are overconfident compared to the other person!
snarf21: If anyone is interested in why we are bad at estimating, please check out the amazing book Thinking, Fast and Slow: Daniel Kahneman.
convexly: Great recommendation. That was one of the biggest influences for starting to write my decisions down and then building this.
Havoc: I'd consider removing some questions that are bound to be country specific. e.g. The one about time spent in front of a red light.>0.188Slightly above avg - yay
convexly: That's fair, I'll flag those or maybe even add regional context. Nice score, well above average!