Discussion
NaN is weird.
paulddraper: NaN == NaN is truly a perversion equality.It makes little sense that 1/0 is SIGFPE, but log(-5) is NaN in C.And the same is true for higher level languages, and their error facilities.What a mess.
Aardwolf: Python is extra annoying though with refusing to support division through zero the way other programming languages with IEEE floats do (i.e. output inf or nan), even though it does have ways to create floats representing inf and nan, and has no problem doing things like float('inf') / float('inf') --> nan. It just specifically does its different-than-every-one-else thing for division through zero. Very inconsistent.
kmeisthax: This is also why Rust has separate PartialEq and Eq traits - the latter is only available for types that don't have weird not-self-equal values like floating point NaNs or SQL NULLs. If you lie to Rust and create a wrapper type over f32 or f64 that has Eq, then you'd get unindexable NaN keys that just sit in your hashmap forever.The real surprise to me is that Python can index NaN keys sometimes, at least by reference to the original NaN. I knew CPython does some Weird Shit with primitive values, so I assume it's because the hashmap is comparing by reference first and then by value.
adampunk: NaN is weird? No, NaN is normal*, NaN PAYLOADS are weird: https://anniecherkaev.com/the-secret-life-of-nan*This is false, NaN is weird, though maybe it needs to be. It is nowhere written arithmetic on computers must be straightforward.
adamzochowski: Makes perfect sense .NaN is a special type indicating one can't reason about it normal way.It is an unknown or value that can't be represented.When comparing, think of it like comparing two bags of unknown amount of apples.One bag has NaN count of applesOther bag has NaN count of applesDo the two bags have equal number of apples?I wish all languages used nulls the way SQL does.
kubb: It would be more satisfying to learn why hash of nan is not guaranteed to be the same. It feels like a bug.
Jaxan: The hash is the same. But a hash set has to use == in case of equal hashes (to avoid collisions).
kubb: It's not always the same: >>> hash(float('nan')) 271103401 >>> hash(float('nan')) 271103657
AndriyKunitsyn: NaN that is not equal to itself _even if it's the same variable_ is not a Python oddity, it's an IEEE 754 oddity.
cmovq: > we had an unusual discussion about a Python oddityThere are so many discussions about "X language is so weird about it handles numbers!" and it's just IEEE 754 floats.
swiftcoder: The oddity here is not the float itself, it's that Python provided a default hash implementation for floats
munchler: >>> my_dict[nan] = 3 >>> my_dict[nan] 3 Wait, how does that work if nan isn't equal to itself?
JuniperMesos: Yeah IEEE 754 floating point numbers should probably not be hashable, and the weird (but standard-defined) behaviour with respect to NaN equality is one good reason for this.
riskassessment: Nor is that inequality an oddity at all. If you were to think NaN should equal NaN, that thought would probably stem result of the belief that NaN is a singular entity. NaN rather signifies a specific number that is not representable as a floating point. Two specific numbers that cannot be represented are not necessarily equal because they may be have resulted from different calculations.
zahlman: > Last week in the Python Discord we had an unusual discussion about a Python oddity.Oh, I missed it. But yes, this is more to do with NaN than Python.> But, of course, you can't actually get to those values by their keys: ... That is, unless you stored the specific instance of nan as a variable:Worth noting that sets work the same way here, although this was glossed over: you can store multiple NaNs in a set, but not the same NaN instance multiple times. Even though it isn't equal to itself, the insertion process (for both dicts and sets) will consider object identity before object equality: >>> x = float('nan') >>> {x for _ in range(10)} {nan} And, yes, the same is of course true of `collections.Counter`.
gravel7623: I guess because the hash of an instance stays consistent (which is used to retrieve the value from the dict). The `__eq__` method must disregard the hash and return False for all nans.
munchler: But the hash alone shouldn't be enough to match the key. Isn't an equality check also needed to avoid a false positive? That's the idea behind a hash table, as I understand it. (I'm not a Python programmer.)
cogman10: If I could change one thing in computing, it'd be how SQL handles NULL. But if I got a second thing, it'd be how IEEE handles NaN. I probably wouldn't even allow NaN as a representation. If some mathematical operation results in what would be NaN, I'd rather force the programming language to throw some sort of interrupt or exception. Much like what happens when you divide an integer by 0. Heck, I'd probably even stop infinity from being represented with floats. If someone did 1/0 or 0/0, I'd interrupt rather than generating an INF or NaN.In my experience, INF and NaN are almost always an indicator of programming error.If someone want's to programmatically represent those concepts, they could do it on top of and to the side of the floating point specification, not inside it.
glkindlmann: I like this justifiation of NaN != NaN; it emphasizes that NaN has representional intent, more than just some bit pattern.We take for granted that (except for things like x86 extended precision registers) floating point basically works the same everywhere, which was the huge victory of IEEE 754. It easy to lose sight of that huge win, and to be ungrateful, when one's first introduction to IEEE 754 are details like NaN!=NaN.
jeleh: ...reminds me that object + object is NaN: > {} + {} NaN see https://www.destroyallsoftware.com/talks/watPS: Wait for it ... Watman! =8-)
caditinpiscinam: Respectfully, I disagree.If NaNs were meant to represent unknown quantities, then they would return false for all comparisons. But NaN != NaN is true. Assuming that two unknowns are always different is just as incorrect as assuming that they're always the same.I'd also push back on the idea that this behavior makes sense. In my experience it's a consistent source of confusion for anyone learning to program. It's one of the clearest violations of the principle of least astonishment in programming language design.As others have noted, it makes conscientious languages like Rust do all sorts of gymnastics to accommodate. It's a weird edge case, and imo a design mistake. "Special cases aren't special enough to break the rules."Also, I think high level languages should avoid exposing programmers to NaN whenever possible. Python gets this right: 0/0 should be an error, not a NaN.
glkindlmann: As a standard for floating point representation and computation, IEEE 754 solved multiple long-standing serious problems better than anything that came before it. I don't think its sensible to judge it with a PL design lens like "principle of least astonishment"; certainly not as if IEEE754 is a peer to Rust or Python. Or, you could learn about the surprise, frustration, and expense involved in trying to write portable numeric code prior to IEEE 754:https://people.eecs.berkeley.edu/~wkahan/ieee754status/754st...
nitwit005: There's no non-confusing option for comparisons. You have two invalid values, but they aren't necessarily the same invalid value. There are multiple operations that can produce NaN.It's a sentinel value for an error. Once you have an error, doing math with the error code isn't sensible.
caditinpiscinam: There are no non-confusing options, but some of those are still clearly worse than others.What should sorted([3, nan, 2, 4, 1]) give you in Python?A) [1, 2, 3, 4, nan] is an good optionB) [nan, 1, 2, 3, 4] is an good optionC) An error is an good optionD) [3, nan, 1, 2, 4] is a silly, bad option. It's definitely not what you want, and it's quiet enough to slip by unnoticed. This is what you get when Nan != NaNNaN == NaN is wrong. NaN != NaN is wrong, unintuitive, and breaks the rest of your code. If you want to signal that an operation is invalid, then throw an error. The silently nonsensical semantics of NaN are the worst possible response
AndriyKunitsyn: It's the only "primitive type" that does that. If I deserialize data from wire, I'll be very surprised when the same bits deserialize as unequal variables. If it cannot be represented, then throwing makes more sense than trying to represent it.
paulddraper: > Makes perfect sense .So why is 1/0 an error (SIGFPE), but log(-5) is NaN?
1-more: This got me curious, and yeah it turns out Elm's dictionary implementation uses values, not pointers when retrieving values. elm repl ---- Elm 0.19.1 ---------------------------------------------------------------- Say :help for help and :exit to exit! More at <https://elm-lang.org/0.19.1/repl> -------------------------------------------------------------------------------- > import Dict exposing (Dict) > nan = 0/0 NaN : Float > nan NaN : Float > nan == nan False : Bool > naan = 0/0 NaN : Float > d = Dict.fromList [(nan, "a"), (naan, "b")] Dict.fromList [(NaN,"a"),(NaN,"b")] : Dict Float String > Dict.toList d [(NaN,"a"),(NaN,"b")] : List ( Float, String ) > Dict.keys d [NaN,NaN] : List Float > Dict.get nan d Nothing : Maybe String > Dict.get naan d Nothing : Maybe String
paulddraper: It's an IEEE-754 oddity that Python chose to adopt for its equality.IEEE-754 does remainder(5, 3) = -1, whereas Python does 5 % 3 = 2.There's no reason to expect exact equivalence between operators.
superjan: IEEE 754 prescripes, for better or worse, that any mathematical comparison operator (==, <, > ….) involving at least one NaN must always return false, including comparison against itself. This is annoying for something like dictionaries or hashtables. C# has a solution: if you call a.Equals(b) on two floats a and b, it will return true also if both are NaN. I think this is a cool solution: it keeps the meaning of math operators the same identical with other languages, but you still have sensible behavior for containers. I believe this behavior is copied from Java.
adrian_b: I consider this as a very bad solution, because it can lead to very subtle bugs.The correct solution for any programming language is to define all the 14 relational operators that are required by any partially-ordered set, instead of defining only the 6 of them that are sufficient for a totally-ordered set.If the programming language fails to define all 14 operators, then you must always test the operands for NaNs, before using any of the 6 ALGOL relational operators. If you consider this tedious, then you must unmask the invalid operation exception and take care to handle this exception.If invalid operations generate exceptions, then the floating-point numbers become a totally-ordered set and NaN cannot exist (if a NaN comes from an external source, it will also generate an exception, while internally no NaN will ever be generated).
paulddraper: It makes no sense.1/0 is an error (SIGFPE). log(-5) is a value (NaN).---I suppose you could have this "no reflexive equality" sentinel, but it applied so randomly in languages as to be eternally violate the principle of least astonishment.