Discussion

gzip decompression in 250 lines of rust

MisterTea: > twenty five thousand lines of pure C not counting CMake files. ...Keep in mind this is also 31 years of cruft and lord knows what.Plan 9 gzip is 738 lines total: gzip.c 217 lines gzip.h 40 lines zip.c 398 lines zip.h 83 lines Even the zipfs file server that mounts zip files as file systems is 391 lines.> ... (and whenever working with C always keep in mind that C stands for CVE).Sigh.

nayuki: Just like that author, many years ago, I went through the process of understanding the DEFLATE compression standard and producing a short and concise decompressor for gzip+DEFLATE. Here are the resources I published:* https://www.nayuki.io/page/deflate-specification-v1-3-html* https://www.nayuki.io/page/simple-deflate-decompressor* https://github.com/nayuki/Simple-DEFLATE-decompressor

jeffrallen: [delayed]

tyingq: His also omits CRC, which is part of the 25k lines, no --fast/--best/etc, missing some output formats, and so on. I'm sure the 25k includes a lot of bloat, but the comparison is odd. Comparing to your list would make much more sense.

kibwen: I would expect a CRC to add a negligible number of lines of code. The reason that production-grade decompressors are tens of thousands of LOC is likely attributable to extreme manual optimization. For example, I wouldn't be surprised if a measurable fraction of those lines are actually inline assembly.

tyingq: Yes, there's subdirs with language bindings for many non-C langs, an examples folder with example C code, win32 specific C code, test code, etc.More reasons it's an odd comparison.

up2isomorphism: Another dev who doesn’t show respect to what has been done and expect a particular language will do wonders for him. Also I don’t see this is much better in term of readability.

hybrid_study: he does mention https://github.com/trifectatechfoundation/zlib-rs not just https://github.com/madler/zlib, but it would be interesting to hear from those developers also

xxs: Crc32 can be written in handful lines of code. Although it'd be better to use vector set - e.g. AVX when available.

ack_complete: Doesn't need to be inline assembly, just pre-encoded lookup tables and intrinsics-based vectorized CRC alone will add quite a lot of code. Most multi-platform CRC algorithms tend to have at least a few paths for byte/word/dword at a time, hardware CRC, and hardware GF(2) multiply. It's not really extreme optimization, just better algorithms to match better hardware capabilities.The Huffman decoding implementation is also bigger in production implementations for both speed and error checking. Two Huffman trees need to be exactly complete except in the special case of a single code, and in most cases they are flattened to two-level tables for speed (though the latest desktop CPUs have enough L1 cache to use single-level).Finally, the LZ copy typically has special cases added for using wider than byte copies for non-overlapping, non-wrapping runs. This is a significant decoding speed optimization.

maverwa: Where do you see the lack of respect? The author wanted to learn how gzip works and chose to implement it in a language they like to do so. As a learning tool, not because the world needs another gzip decompressor.

Reader /

Discussion

gzip decompression in 250 lines of rust | ian erik varatalu

iev.ee

/ pin · @ user · Ctrl+Enter

No discussions yet

Discover

magnifiedsand.com

Sand Under a Microscope - Magnified Sand Photos

Under microscopic magnification, the unique beauty and individual character of sand grains reveal a diverse origin refle…

windowscentral.com

Internal push to end Windows 11’s Microsoft Account rule | Windows Central

Microsoft's big sweeping set of improvements coming soon to Windows 11 don't address its controversial Microsoft account…

1 39

dfarq.homeip.net

eMachines never obsolete PCs: More than a meme - The Silicon Underground

You've seen the sticker. A slowpoke 366 MHz eMachines PC proclaiming it was never obsolete. Here's what that never obsol…

1 3

simplex.chat

SimpleX Chat: private and secure messenger without any user IDs (not even random…

SimpleX Chat - a private and encrypted messenger without any user IDs (not even random ones)! Make a private connection…

1 4

blog.dailydoseofds.com

Anatomy of the .claude/ Folder - by Avi Chawla

A complete guide to CLAUDE.md, custom commands, skills, agents, and permissions, and how to set them up properly.

1 45

geohot.github.io

The Last Gasps of the Rent Seeking Class | the singularity is nearer

Over the past fifty years, the U.S. economy built a giant rent-extraction layer on top of human limitations: things take…

1 2

gzip decompression in 250 lines of rust | ian erik varatalu

More from iev.ee

finding all regex matches has always been O(n²). even in the engines built to pr…

RE#: how we built the world's fastest regex engine in F# | ian erik varatalu

Discover

Sand Under a Microscope - Magnified Sand Photos

Internal push to end Windows 11’s Microsoft Account rule | Windows Central

eMachines never obsolete PCs: More than a meme - The Silicon Underground

SimpleX Chat: private and secure messenger without any user IDs (not even random…

Anatomy of the .claude/ Folder - by Avi Chawla

The Last Gasps of the Rent Seeking Class | the singularity is nearer

gzip decompression in 250 lines of rust | ian erik varatalu

More from iev.ee

finding all regex matches has always been O(n²). even in the engines built to pr…

RE#: how we built the world's fastest regex engine in F# | ian erik varatalu

Discover

Sand Under a Microscope - Magnified Sand Photos

Internal push to end Windows 11&rsquo;s Microsoft Account rule | Windows Central

eMachines never obsolete PCs: More than a meme - The Silicon Underground

SimpleX Chat: private and secure messenger without any user IDs (not even random…

Anatomy of the .claude/ Folder - by Avi Chawla

The Last Gasps of the Rent Seeking Class | the singularity is nearer

Internal push to end Windows 11’s Microsoft Account rule | Windows Central