I was working on a proof-of-concept implementation of a file format that uses DEFLATE compression. Since it was supposed to be a self-contained example, I didn’t want to bring in a fully-featured compressor like zlib. I skimmed the DEFLATE RFC and noticed that it supports raw / uncompressed data blocks. I wrote a simple encoder that stores uncompressed blocks to solve my problem, and wanted to document it on my blog for future reference.
Structure of a DEFLATE stream
A DEFLATE stream is made up of blocks. Each block has a 3-bit header. The first bit signifies the last block of the stream, and the other two make up the block type.
The block types are
-
00
- no compression -
01
- compressed with fixed Huffman codes -
10
- compressed with dynamic Huffman codes -
11
- reserved (error)
In this post, we’re only interested in the first one (00).
Structure of an uncompressed block
In the uncompressed block, the 3-bit header is contained in a byte. A good property of the uncompressed block type being 00
is the ease of constructing the header.
- If the block is the final one, the header is
1
- If the block is not the final one, the header is
0
After the header byte, there is the length and negated length. These are both encoded as little endian uint16_t’s.
- length is the number of data bytes in the block
- negated length is one’s complement of length,
~len & 0xFFFF
After the header, length and the negated length, length bytes of data follow. If there are no more blocks after this one, the final bit is set.
Python implementation
Here’s a simple DEFLATE implementation in Python. The output should be a valid DEFLATE stream for real decoders.
def faux_deflate(reader, writer, bufsize=2048):
while True:
chunk = reader.read(bufsize)
header = 0
if not chunk:
header = 1
_len = len(chunk)
nlen = ~_len & 0xFFFF
writer.write(struct.pack("<BHH", header, _len, nlen))
writer.write(chunk)
if not chunk:
break
There’s also a decoder that can only decode uncompressed blocks.
def faux_inflate(reader, writer):
while header := reader.read(5):
header, _len, nlen = struct.unpack("<BHH", header)
assert header in [0, 1]
assert nlen == ~_len & 0xFFFF
writer.write(reader.read(_len))