Discussion
pjmlp: While using C extensions, and yes Microslop rather have you using C++.https://herbsutter.com/2012/05/03/reader-qa-what-about-vc-an...Even if in recent years after tbat post they added support for C11 and C17, minus some stuff like aligned mallocs.
BearOso: [delayed]
pseudohadamard: Is it actually better/faster though? To see the difference between -O and -O2/3, compile some code for an x64 target on Godbolt and look at the output. -O produces optimised x86 code. -O2/3 produces enormous amounts of incomprehensible SSE/AVX/whatever code for even the simplest stuff, leading to a huge blowout in code size that can potentially interact badly with cacheing.We had a look at this in embedded where you don't have infinite memory to play with and at the moment it's OK because there's no advanced instructions available to use, but it'll get ugly in the future when gcc realises it can use new instructions and produce five times the amount of object code for the same source code.
ranger_danger: > Is it actually better/faster though?For their use case, I would say yes. The article does not talk about general program optimization like -O2/3 does, it's about selecting different versions of specific functions depending on which CPU the application is running on.For example if your program is heavy on image/video processing, using functions that iterate over your buffers, you typically want the fastest method available. A function that can only use MMX/SSE instructions instead of say, AVX2 or AVX-512, is going to be orders of magnitude slower, translating into significant real world FPS differences in performance.