⭐⭐⭐⭐⭐

An Ode to bzip

Source: purplesyringa.moe | Author: purplesyringa | Date: March 12, 2026
Compression Algorithms bzip2 BWT

Summary

A contrarian but well-argued case for why bzip is actually the best compression algorithm for code, despite conventional wisdom favoring zstd, xz, or brotli.

Compression Results (327KB Lua Source)

AlgorithmCompressed Size
bzip361,067 bytes (best!)
bzip2 -963,727 bytes
lzip -967,651 bytes
brotli -Z67,859 bytes
xz -967,940 bytes
zstd -2269,018 bytes
zopfli75,882 bytes

Why bzip Wins for Code

BWT vs LZ77

Most compressors use LZ77 (finding repeated strings and referencing earlier occurrences). bzip uses BWT (Burrows-Wheeler Transform):

  • Reorders characters to group them by context
  • For code: similar patterns cluster together
  • Simple run-length encoding then works well
  • No need to store backreference offsets

Key Insight

Code has consistent patterns (function keywords, brackets, indentation). BWT exploits this better than LZ77's "recent history" approach.

Decoder Size Comparison

For self-extracting archives (like embedding in Lua):

AlgorithmDecoder Size
Custom bzip-style1.5 KB (smallest!)
xz / lzip~1 KB
gzip~1.5 KB
brotli~2.2 KB
zstd~3 KB

Debunking "bzip is slow"

  • Compression: zopfli is actually slower AND produces worse results
  • Decompression: bzip is slower than gzip, but comparable to zstd/brotli in high-level languages
  • For embedded: "slow" is relative - all operations are slow in Lua anyway

Conclusion

bzip3 is recommended for best compression, but even bzip2 outperforms all LZ77-based algorithms for code. The key is BWT's ability to exploit the consistent structure of source code.