João Freitas

The following is an explanation of missing features in Git that SQLite needed, and how Fossil, the version control system conceived for SQLite, addresses them.

https://sqlite.org/whynotgit.html


Git does not provide good situational awareness

When I want to see what has been happening on SQLite, I visit the timeline and in a single screen I see a summary of the latest changes, on all branches. In a few clicks, I can drill down as much detail as I want. I can even do this from a phone.

GitHub and GitLab offer nothing comparable. The closest I have found is the network, which is slow to render (unless it is already cached), does not offer nearly as much details, and scarcely works at all on mobile. The commits view of GitHub provides more detail, renders quickly, and works on mobile, but only shows a single branch at a time, so I cannot easily know if I’ve seen all of the recent changes. And even if GitHub/GitLab did offer better interfaces, both are third-party services. They are not a core part of Git. Hence, using them introduces yet another dependency into the project.

I am told that Git users commonly install third-party graphical viewers for Git, many of which do a better job of showing recent activity on the project. That is great, but these are still more third-party applications that must be installed and managed separately. Many are platform-specific. (One of the better ones, GitUp, only works on Mac, for example.) All require that you first sync your local repository then bring up their graphical interface on your desktop. And even with all that, I still cannot see what I typically want to see without multiple clicks. Checking on project status from a phone while away from the office is not an option.

Git makes it difficult to find successors (descendants) of a check-in

Git lets you look backwards in time, but not forwards. Given some historical check-in, you can see what came before, but it is challenging see what came next.

In contrast, Fossil offers helpful displays such as https://sqlite.org/src/timeline?df=major-release to show all check-ins that are derived from the most recent major release.

It is not impossible to find the descendants of a check-in in Git. It is merely difficult. For example, there is a stackoverflow page showing the command sequence for finding the descendants of a check-in in unix:

git rev-list --all --parents | grep ".\\{40\\}.\*.\*" | awk '{print $1}'

But this is not the same thing. The command above gives a list of descendants without showing the branching structure, which is important for understanding what happened. And the command only works if you have a local clone of the repository; finding the descendants of a check-in is not something you can do with web interfaces such as GitHub or GitLab.

This is not really about just finding the descendants of a check-in from time to time. The fact that descendants are readily available in Fossil means that the information pervades the web pages provided by Fossil. One example: Every Fossil check-in information page (example) shows a small “Context” graph of the immediate predecessor and successors to that check-in. This helps the user maintain better situational awareness, and it provides useful capabilities, such as the ability to click forward to the next check-in within a sequence. Another example: Fossil easily shows the context around a specific check-in (example) which again helps to promote situational awareness and a deeper understanding of what is happening in the code. There is a whole page of additional examples in the Fossil documentation.

All of the above is theoretically possible with Git, given the right extensions and tools and using the right commands. But it is not easy to do, and so it rarely gets done. Consequently, developers have less awareness of what is happening in the code.

The mental model for Git is needlessly complex

The complexity of Git distracts attention from the software under development. A user of Git needs to keep all of the following in mind:

  1. The working directory
  2. The “index” or staging area
  3. The local head
  4. The local copy of the remote head
  5. The actual remote head

Git has commands (and/or options on commands) for moving and comparing content between all of these locations.

In contrast, Fossil users only need to think about their working directory and the check-in they are working on. That is 60% less distraction. Every developer has a finite number of brain-cycles. Fossil requires fewer brain-cycles to operate, thus freeing up intellectual resources to focus on the software under development.

One user of both Git and Fossil writes in HN:

Fossil gives me peace of mind that I have everything … synced to the server with a single command…. I never get this peace of mind with git.

Git does not track historical branch names

Git keeps the complete DAG of the check-in sequence. But branch tags are local information that is not synced and not retained once a branch closes. This makes review of historical branches tedious.

As an example, suppose a customer asks you: “What ever became of that ‘prefer-coroutine-sort-subquery’ branch from two years ago?” You might try to answer by consulting the history in your version control system, thusly:

The Fossil view clearly shows that the branch was eventually merged back into trunk. It shows where the branch started, and it shows two occasions where changes on trunk were merged into the branch. GitHub shows none of this. In fact, the GitHub display is mostly useless in trying to figure out what happened.

Many readers have recommended various third-party GUIs for Git that might do a better job of showing historical development activity. Maybe some of them do work better than native Git and/or GitHub, though they will all be hampered by the fact that Git does not preserve historical branch names across syncs. And even if those other tools are better, the fact that it is necessary to go to a third-party tool to get the information desired does not speak well of the core system.

Git requires more administrative support

` Git is complex software. One needs an installer of some kind to put Git on a developer workstation, or to upgrade to a newer version of Git. Standing up a Git server is non-trivial, and so most developers use a third-party service such as GitHub or GitLab, and thus introduce additional dependencies.

In contrast, Fossil is a single standalone binary which is installed by putting it on $PATH. That one binary contains all the functionality of core Git and also GitHub and/or GitLab. It manages a community server with wiki, bug tracking, and forums, provides packaged downloads for consumers, login managements, and so forth, with no extra software required. Standing up a community server for Fossil takes minutes. And Fossil is efficient. A Fossil server will run fine on a $5/month VPS or a Raspberry Pi, whereas GitLab and similar require beefier hardware.

Less administration means that programmers spend more time working on the software (SQLite in this case) and less time fussing with the version control system.

Git provides a poor user experience

The following https://xkcd.com/1597/ cartoon is an exaggeration, yet hits close to home:

Let’s be real. Few people dispute that Git provides a suboptimal user experience. A lot of the underlying implementation shows through into the user interface. The interface is so bad that there is even a parody site that generates fake git man pages.

Designing software is hard. It takes a lot of focus. A good version control system should provide the developer with assistance, not frustration. Git has gotten better in this regard over the past decade, but it still has a long way to go.

A Git-User’s Guide To Accessing SQLite Source Code

If you are a devoted Git user, you can still easily access SQLite. This section gives some hints on how to do so.

The Official GitHub Mirror

As of 2019-03-20, there is now an official Git mirror of the SQLite sources on GitHub.

The mirror is an incremental export of the canonical Fossil repository for SQLite. A cron-job updates the GitHub repository once an hour. This is a one-way, read-only code mirror. No pull requests or changes are accepted via GitHub. The GitHub repository merely copies the content from the Fossil repository. All changes are input via Fossil.

The hashes that identify check-ins and files on the Git mirror are different from the hashes in Fossil. There are many reasons for this, chief among them that Fossil uses a SHA3-256 hash whereas Git uses a SHA1 hash. During export, the original Fossil hash for each check-in is added as a footer to check-in comments. To avoid confusion, always use the original Fossil hash, not the Git hash, when referring to SQLite check-ins.

Web Access

The SQLite Fossil Repository contains links for downloading a Tarball, ZIP Archive, or SQLite Archive for any historical version of SQLite. The URLs for these downloads are simple and can be incorporated easily into automated tools. The format is:

https://sqlite.org/src/tarball/_VERSION_/sqlite.tar.gz

Simply replace VERSION with some description of the version to be downloaded. The VERSION can be a prefix of the cryptographic hash name of a specific check-in, or the name of a branch (in which case the most recent version of the branch is fetched) or a tag for a specific check-in like “version-3.23.1”:

https://sqlite.org/src/tarball/version-3.23.1/sqlite.tar.gz

To get the latest release, use “release” for VERSION, like this:

https://sqlite.org/src/tarball/release/sqlite.tar.gz

To get the latest trunk check-in, us “trunk” for VERSION:

https://sqlite.org/src/tarball/trunk/sqlite.tar.gz

And so forth. For ZIP archives and SQLite Archives, simply change the “/tarball/” element into either “/zip/” or “/sqlar/”, and maybe also change the name of the download file to have a “.zip” or “.sqlar” suffix.

Fossil Access

Fossil is easy to install and use. Here are the steps for unix. (Windows is similar.)

  1. Download the self-contained Fossil executable from https://fossil-scm.org/fossil/uv/download.html and put the executable somewhere on your $PATH.
  2. mkdir ~/fossils
  3. fossil clone https://sqlite.org/src ~/fossils/sqlite.fossil
  4. mkdir ~/sqlite; cd ~/sqlite
  5. fossil open ~/fossils/sqlite.fossil

At this point you are ready to type “./configure; make” (or on Windows with MSVC, “nmake /f Makefile.msc”).

To change your checkout to a different version of Fossil use the “update” command:

fossil update VERSION

Use “trunk” for VERSION to get the latest trunk version of SQLite. Or use a prefix of a cryptographic hash name, or the name of some branch or tag. See https://fossil-scm.org/fossil/doc/trunk/www/checkin_names.wiki for more suggestions on what names can be used for VERSION.

Use the “fossil ui” command from within the ~/sqlite checkout to bring up a local copy of the website.

Additional documentation on Fossil can be found at https://fossil-scm.org/fossil/doc/trunk/www/permutedindex.html

Do not be afraid to explore and experiment. Without a log-in you won’t be able to push back any changes you make, so you cannot damage the project.

Verifying Source Code Integrity

If you need to verify that the SQLite source code that you have is authentic and has not been modified in any way (perhaps by an adversary) that can be done using a few simple command-line tools. At the root of the SQLite source tree is a file named “manifest”. The manifest file contains the name of every other file in the source tree together with either a SHA1 or SHA3-256 hash for that file. (SHA1 is used for older files and SHA3-256 for newer files.) You can write a script to extract these hashes and verify them against the source code files. The hash name for the check-in is just the SHA3-256 hash of the “manifest” file itself, possibly with the last line omitted if the last line begins with “# Remove this line…”

#reads #sqlite #fossil #git #comparison