Faux-DEFLATE

It's 2020, bandwidth is free
Reading time: about 2 minutes

I was working on a proof-of-concept implementation of a file format that uses DEFLATE compression. Since it was supposed to be a self-contained example, I didn’t want to bring in a fully-featured compressor like zlib. I skimmed the DEFLATE RFC and noticed that it supports raw / uncompressed data blocks. I wrote a simple encoder that stores uncompressed blocks to solve my problem, and wanted to document it on my blog for future reference.

Structure of a DEFLATE stream

A DEFLATE stream is made up of blocks. Each block has a 3-bit header. The first bit signifies the last block of the stream, and the other two make up the block type.

The block types are

  • 00 - no compression
  • 01 - compressed with fixed Huffman codes
  • 10 - compressed with dynamic Huffman codes
  • 11 - reserved (error)

In this post, we’re only interested in the first one (00).

Structure of an uncompressed block

In the uncompressed block, the 3-bit header is contained in a byte. A good property of the uncompressed block type being 00 is the ease of constructing the header.

  • If the block is the final one, the header is 1
  • If the block is not the final one, the header is 0

After the header byte, there is the length and negated length. These are both encoded as little endian uint16_t’s.

  • length is the number of data bytes in the block
  • negated length is one’s complement of length, ~len & 0xFFFF

After the header, length and the negated length, length bytes of data follow. If there are no more blocks after this one, the final bit is set.

Python implementation

Here’s a simple DEFLATE implementation in Python. The output should be a valid DEFLATE stream for real decoders.

def faux_deflate(reader, writer, bufsize=2048):
    while True:
        chunk = reader.read(bufsize)

        header = 0

        if not chunk:
            header = 1

        _len = len(chunk)
        nlen = ~_len & 0xFFFF
        writer.write(struct.pack("<BHH", header, _len, nlen))
        writer.write(chunk)

        if not chunk:
            break

There’s also a decoder that can only decode uncompressed blocks.

def faux_inflate(reader, writer):
    while header := reader.read(5):
        header, _len, nlen = struct.unpack("<BHH", header)

        assert header in [0, 1]
        assert nlen == ~_len & 0xFFFF

        writer.write(reader.read(_len))

Useful resources

The following pages link here

Citation

If you find this work useful, please cite it as:
@article{yaltirakli202006fauxdeflate,
  title   = "Faux-DEFLATE",
  author  = "Yaltirakli, Gokberk",
  journal = "gkbrk.com",
  year    = "2020",
  url     = "https://www.gkbrk.com/2020/06/faux-deflate/"
}
Not using BibTeX? Click here for more citation styles.
IEEE Citation
Gokberk Yaltirakli, "Faux-DEFLATE", June, 2020. [Online]. Available: https://www.gkbrk.com/2020/06/faux-deflate/. [Accessed Dec. 17, 2024].
APA Style
Yaltirakli, G. (2020, June 10). Faux-DEFLATE. https://www.gkbrk.com/2020/06/faux-deflate/
Bluebook Style
Gokberk Yaltirakli, Faux-DEFLATE, GKBRK.COM (Jun. 10, 2020), https://www.gkbrk.com/2020/06/faux-deflate/

Comments

© 2024 Gokberk Yaltirakli