Discussion
5NF and Database Design
DeathArrow: There are use cases where is better to not normalize the data.
tadfisher: I love reading about the normal forms, because it makes me sound like I know what I'm talking about in the conversation where the backend folks tell me, "if we normalized that data then the database would go down". This is usually followed by arguments over UUID versions for some reason.
necovek: So which normal form do they argue for and against? And what UUID version wins the argument?
Tostino: Not OP, but UUID v7 is what you want for most database workloads (other than something like Spanner)
estetlinus: The lost art of normalizing databases. ”Why is the ARR so high on client X? Oh, we’re counting it 11 times lol”.I would maybe throw in date as an key too. Bad idea?
andrew_lettuce: Typically it's better to take normalized data and denormalize for your use case vs. not normalize in the first place. Really depends on your needs
tadfisher: Explaining jokes is poor form.
culi: On the internet it is normal.
jerf: In a roundabout way this article captures well why I don't really like thinking in terms of "normal forms", especially as a numbered list like that. The key insights are really 1. Avoid redundancy and 2. This may involve synthesizing relationships that don't immediately obviously exist from a human perspective. Both of those can be expanded on at quite some length, but I never found much value in the supposedly-blessed intermediate points represented by the nominally numbered "forms". I don't find them useful either for thinking about the problem or for communicating about it.Someone, somewhere writing down a list and that list being blessed with the imprimatur of Academic Approval (TM) doesn't mean it is actually useful... sometimes it just means that it made it easy to write multiple choice test questions. (e.g., "What does Layer 2 of the OSI network model represent? A: ... B: ... C: ... D: ..." to which the most appropriate real-world answer is "Who cares?")
petalmind: > Someone, somewhere writing down a list and that list being blessed with the imprimatur of Academic Approval (TM)One problem is that normal forms are underspecified even by the academy.E.g., Millist W. Vincent "A corrected 5NF definition for relational database design" (1997) (!) shows that the traditional definition of 5NF was deficient. 5NF was introduced in 1979 (I was one year old then).2NF and 3NF should basically be merged into BCNF, if I understand correctly, and treated like a general case (as per Darwen).Also, the numeric sequence is not very useful because there are at least four non-numeric forms (https://andreipall.github.io/sql/database-normalization/).Also, personally I think that 6NF should be foundational, but that's a separate matter.
bob1029: JSON is extremely fast these days. Gzipped JSON perhaps even more so.I find that JSON blobs up to about 1 megabyte are very reasonable in most scenarios. You are looking at maybe a millisecond of latency overhead in exchange for much denser I/O for complex objects. If the system is very write-intensive, I would cap the blobs around 10-100kb.
tossandthrow: I use the null uuid as primary key - never had any DB scaling issues.
petalmind: Yeah, no NULL is ever equal to any other NULL, so they are basically unique.
Quarrelsome: I adore contiguous reads that ideas like that yield. I'd rather push that out to a read-only end point, then getting sucked into the entropy of treating what is effectively an unschema-ed blob into editable data.
RedShift1: Me still using bigints... Which haven't given me any problems. Wouldn't use it for client generated IDs but that is not what most applications require anyway.