Content Defined chunking (CDC)

One of the interesting use cases of the rolling hash function is that it can create dynamic, content-based chunks of a stream or file. - wikipedia / HN / Borg

The simplest approach to calculate the dynamic chunks is to calculate the rolling hash and if it matches a pattern (like the lower N bits are all zeroes) then it’s a chunk boundary. This approach will ensure that any change in the file will only affect its current and possibly the next chunk, but nothing else.

see also

Written on September 9, 2020, Last update on January 9, 2023
hash