Git vs. Mercurial Repository Format

Some interesting links about how two popular open source DVCSs (Distributed Version Control Systems) actually store the data you put into them.


Git just stores a tree of whole objects referred to by their SHA1 checksums. Compression is done later as multiple objects are all stuffed into pack files. This has the added benefit that the compression algorithm can optimize over multiple files worth of data. It also allows new compression algorithms to be added into the system as they become available without requiring the whole repository to be unpacked and re-compressed. In theory multiple compression algorithms could be used side by side, they could even be chosen based on the type of data being compressed.


Mercurial optimizes to reduce disk seeks which were especially costly before SSDs arrived. The actual data is stored as a series of whole compressed objects at intervals with binary, reverse deltas in between.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s