Discussion
The purpose of Continuous Integration is to fail
jgbuddy: This is of course true as a blanket "gotcha" headline- although I wouldn't call a failed test the CI itself failing. A real failure would be a false positive, a pass where there wasn't coverage, or a failure when there was no breaking change. Covering all of these edge cases can become as tiresome as maintaining the application in the first place (of course this is a generalization)
chriswarbo: I agree. The same can be said for testing too: their main purpose is to find mistakes (with secondary benefits of documenting, etc.). Whenever I see my tests fail, I'm happy that they caught a problem in my understanding (manifested either as a bug in my implementation, or a bug in my test statement).
cogman10: This ultimately is what shapes my view of what a good test is vs a bad test.An issue I have with a lot of unit tests is they are too strongly coupled to the implementation. What that means is any change to the implementation ultimately means you have to change tests.IMO, good tests are relatively immutable. You should be able to have multiple valid implementations. You should add new tests to describe the new functionality of that implementation, however, the old tests should remain relatively untouched.If it turns out that a single change to an implementation requires you to change and update 20 tests, those are bad tests.What I want as a dev is to immediately think "I must have broken something" when a test fails, not "I need to go fix 20 tests".For example, let's say you have a method which sorts data.A bad test will check "did you call this `swap` function 5 times". A good test will say "I gave the method this unsorted data set, is the data set sorted?". Heck, a good test can even say something like "was this large data set sorted in under x time". That's more tricky to do well, but still a better test than the "did you call swap the right number of times" or even worse "Did you invoke this sequence of swap calls".
ralferoo: The premise of the article has some weight, but the final conclusion with the suggestion to change the icons seems completely crazy.Green meaning "to the best of our knowledge, everything is good with the software" is well understood.Using green to mean "we know that this doesn't work at all" is incredibly poor UI (EDITED from "beyond idiotic" due to feedback, my bad).And whilst flaky tests are the most problematic for a CI system, it's because they often work (and usually, from my experience most flaky tests are because they are modelling situations that don't usually happen in production) and so are often potentially viable builds for deployment with a caveat. If anything, they should be marked orange if they are tests that are known to be problematic.
chrisweekly: Good insights but I'd suggest"beyond idiotic" -> "misleading | poor UX"(I agree it's a terrible choice, but civility matters, and strengthens your case.)
ralferoo: Fair point, updated my wording.
vova_hn2: > IMO, good tests are relatively immutable. You should be able to have multiple valid implementations. You should add new tests to describe the new functionality of that implementation, however, the old tests should remain relatively untouched.Taken to extreme this would mean getting rid of unit tests altogether in favor of functional and/or end-to-end testing. Which is... a strategy. I don't know if it is a good or bad strategy, but I can see it being viable for some projects.
9rx: > Taken to extreme this would mean getting rid of unit tests all together in favor of functional and/or end-to-end testing.The dirty little secret in CS is that unit, functional, and end-to-end tests are all the exact same thing. Watch next time someone tries to come up with definitions to separate them and you'll soon notice that they didn't actually find a difference or they invent some kind of imagined way of testing that serves no purpose.
cogman10: If you can't tell, I actually think functional tests have a lot more value than most unit tests :)Kent Dodd agrees with me. [1]This isn't to say I see no value in unit tests, just that they should tend towards describing the function of the code under test, not the implementation.[1] https://kentcdodds.com/blog/the-testing-trophy-and-testing-c...
yrjrjjrjjtjjr: The purpose of a car's crumple zone is to crumple.
bluejellybean: Yep, the 'unit' is size in which one chooses to use. The exact same thing happens when trying to discuss micro services v monolith.Really it all comes down to agreeing to what terms mean within the context of a conversation. Unit, functional, and end-to-end are all weasel words, unless defined concretely, and should raise an eyebrow when someone uses them.
skydhash: It took me a bit of time (and two or three different view) to finally get this. That is mostly why I hardcode my values in the tests. Make them simpler. If something fails, either the values are wrong or the algorithm of the implementation is wrong.
chriswarbo: Comparing actual outputs against expected ones is the ideal situation, IMHO. My own preference is for property-checking; but hard-coding a few well-chosen values is also fine.That's made easier when writing (mostly) pure code, since the output is all we have (we're not mutating anything, or triggering other processes, etc. that would need extra checking).I also think it's important to make sure we're checking the values we actually care about; since those might not be the literal return value of the "function under test". For example, if we're testing that some function correctly populates a table cell, I would avoid comparing the function's result against a hard-coded table, since that's prone to change over time in ways that are irrelevant. Instead, I would compare that cell of the result against a hard-coded value. (Rather than thinking about the individual values, I like to think of such assertions as relating one piece of code to another, e.g. that the "get_total" function is related to the "populate_total" function, in this way...).The reason I find this important, is that breaking a test requires us to figure out what it's actually trying to test, and hence whether it should have broken or not; i.e. is it a useful signal that requires us to change our approach (the table should look like that!), or is it noise that needs its incidental details updated (all those other bits don't matter!). That can be hard to work out many years after the test was written!
vova_hn2: > The dirty little secret in CS is that unit, functional, and end-to-end tests are all the exact same thing.I agree that the boundaries may be blurred in practice, but I still think that there is distinction.> visible, public interfaceVisible to whom? A class can have public methods available to other classes, a module can have public members available to other modules, a service can have public API that other services can call through network etcI think that the difference is the level of abstraction we operate on:unit -> functional -> integration -> e2eUnit is the lowest level of abstraction and e2e is the highest.
9rx: > Visible to whom?The user. Your tests are your contract with the user. Any time there is a user, you need to establish the contract with the user so that it is clear to all parties what is provided and what will not randomly change in the future. This is what testing is for.Yes, that does mean any of classes, network services, visual user interfaces, etc. All of those things can have users.> Unit is the lowest level of abstraction and e2e is the highest.There is only one 'abstraction' that I can see: Feed inputs and evaluate outputs. How does that turn into higher or lower levels?
SAI_Peregrinus: 100% coverage is an EXPTIME problem.
mihir_kanzariya: The biggest problem I've seen with CI isn't the failing part, it's what teams do when it fails. The "just rerun it" culture kills the whole point.We had a codebase where about 15% of CI runs were flaky. Instead of fixing the root causes (mostly race conditions in tests and one service that would intermittently timeout), the team just added auto-retry. Three attempts before it actually reported failure. So now a genuinely broken build takes 3x longer to tell you it's broken, and the flaky stuff just gets swept under the rug.The article's right that failure is the point, but only if someone actually investigates the failure instead of clicking retry.
ant6n: If a build fails 10% of the time, it actually takes 100x longer before to fail for the 10%x10%x10% case.
jquaint: https://github.com/srid/nixci Is this the project or is this a completely different Nix based CI/CD tool? I can't find a Github or anything on the website.
When it passes, it's just overhead: the same outcome you'd get without CI.
9rx: > When it passes, it's just overhead: the same outcome you'd get without CI.The outcome still isn't the same as CI, even when everything passes, enables other developers to built on top of your partially-built work. This is the real purpose of CI. Test automation is only there to keep things sane.