Git vs. Mercurial Repository Format

Some interesting links about how two popular open source DVCSs (Distributed Version Control Systems) actually store the data you put into them.

Git

Git just stores a tree of whole objects referred to by their SHA1 checksums. Compression is done later as multiple objects are all stuffed into pack files. This has the added benefit that the compression algorithm can optimize over multiple files worth of data. It also allows new compression algorithms to be added into the system as they become available without requiring the whole repository to be unpacked and re-compressed. In theory multiple compression algorithms could be used side by side, they could even be chosen based on the type of data being compressed.

Mercurial

Mercurial optimizes to reduce disk seeks which were especially costly before SSDs arrived. The actual data is stored as a series of whole compressed objects at intervals with binary, reverse deltas in between.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s