VistaDB and Locking: What’s It All About?
We finally released VistaDB 6.1 after considerable delay. Part of the delay was due to unexpected difficulties with implementing reliable locking on macOS and Linux platforms. That turned out to be trickier than expected, so we thought it would be a good basis for a blog discussing why VistaDB implements locking in the first place.
In this article, we’ll go over the basics of why “locking” is necessary, the tools available in .NET to perform locking, and a little about how VistaDB uses them.
Why Even Use Locking?
You use a database to store your application’s data reliably. But when multiple processes want to access your database at the same time, this can get tricky. If multiple processes can alter a shared database without any regulation, they can create significant problems, including:
- Lost Data: If multiple processes add new rows to the same table simultaneously, one process could overwrite the other’s new row and destroy the data.
- Inconsistent Updates: If two processes write updates to different parts of the data simultaneously—each based on the data that the other was about to update—they could each decide on the change to make without having seen the other’s update. Even if both updates succeed without being overwritten, they could still be the wrong changes based on the other updated data, which they had not seen.
- File Corruption: If significant meta-data changes were overwritten inconsistently, the database’s state could become corrupted. This means more lost data and a database (and application) that doesn’t work.
They need to take turns, which means that the database engine needs to coordinate whose turn it is to access any given portion of the database file.
A centralized database server like MS SQL Server has that coordination performed by the server process. It is the only entity writing directly to the database file(s). It can use locking, row versioning, and other techniques to coordinate between connections—coming from various threads, within various processes—and ensure that the state of the database is always consistent and stable.
However, an embedded database engine like VistaDB doesn’t have a central server to coordinate everything. It runs in the client application—which could be a single process, multiple instances of the same executable, instances of separate executable applications sharing a common database between them, or even processes executing on different computers accessing the database file over a network share. It all must work reliably—and without any centralized authority. Even using “non-locking” techniques like row versioning would require coordination between distributed connections, which basically requires locking.
Coordinating Between Processes
There are several tools in .NET to coordinate between threads (and some between processes), which in theory, we could use for VistaDB. Some are:
-
The C# lock statement: This is a shorthand for using Monitor.Enter(…) and Monitor.Exit(…) around a critical section of code—and those methods can also be called directly for more complex situations. This is helpful for multi-threaded programming within a single process, but it doesn’t help between processes.
-
Semaphore and Mutex: There is the Semaphore class for coordinating the sharing of a limited resource and a more specific Mutex class when you just need to ensure mutual-exclusion (much like the lock statement and the Monitor class but more flexible). These allow the optional use of names to create system-wide Semaphore or Mutex instances that other processes can share. They also have more efficient “Slim” versions, which can only exist within the current process.
-
ReaderWriterLockSlim: This class allows for multiple reading threads—or one writing thread—at any given time on the protected resource. It can be helpful when a resource is generally read more often than it is written (like a database!), but it can’t help between processes because it does not support named instances—even in the non-Slim class.
We actually use some of these internally, but none of these fully fit what we need to coordinate database file access in general. Monitor/lock and ReaderWriterLock can only be used within the same process, but we also need to coordinate between separate processes. We could use named-instances of Mutex between processes, but we don’t know the exact needs of every potential user of VistaDB; they could create a database file anywhere in the file-system, and each one would require its own coordination. It would be difficult to convert the file’s full path name to a legal system-wide Mutex instance name for all legal file path names, so we can’t really use it for this purpose.
These tools also couldn’t cover processes on other computers accessing the database file over a network share. The remote machine would only see the Mutex instance on its system, and there would be no coordination between processes on different computers. To support the range of usage that VistaDB allows, the coordination of file access really has to be done through the file-system itself.
Coordinating Access through the File-System
There are two tools in .NET for controlling and coordinating access through the file-system:
- Access/share control (in the FileStream constructor).
- Region locking (with the FileStream.Lock method).
“Access/share” refers to specifying the FileAccess and FileShare option flags in the FileStream constructor when opening a file. These “flags” enum types define a bit for Read, a bit for Write, and possibly some others. The read-write combinations possible for these two options are:
- FileAccess: Read-only, write-only, read/write
- FileShare: Read-only, write-only, read/write, none
If the access and share flags of a new FileStream are incompatible with any open connection to that same file anywhere else, it will be blocked and throw an exception. That way, each FileStream can control what other FileStream access it is willing to overlap with.
VistaDB uses this type of access control to implement and enforce the various “Open Mode” options from the connection string as follows:
The Connection's Open Mode | FileAccess | FileShare |
---|---|---|
ExclusiveReadWrite | Read, Write | None |
ExclusiveReadOnly | Read | None |
NonexclusiveReadWrite * | Read, Write | Read, Write |
NonexclusiveReadOnly * | Read | Read, Write |
SharedReadOnly | Read | Read |
* As named in past versions of VistaDB. (The modes replacing them as of VistaDB 6 could require more explanation.)
Notice that only the Nonexclusive modes include FileShare.Write. The other modes use FileShare none or FileShare read-only.
The ExclusiveReadWrite or ExclusiveReadOnly modes do not allow any other connection at the same time. As the only connections possible, they have free reign over the database without any need for further locking. On the other hand, SharedReadOnly mode only allows read connections (including itself). By forbidding any writers to have the file open, SharedReadOnly mode allows any number of read-only connections without further locking between them.
We don’t usually mean these modes when we talk about “locking” in VistaDB, but in a sense, they are “locking” the entire file while open.
And the Nonexclusive Modes?
NonexclusiveReadWrite (now called MultiProcessReadWrite in VistaDB 6) allows multiple connections that might want to write and read to be open at the same time. Other than closing and reopening the connection (or the underlying FileStream) every time it needs to write, how would these overlapping connections negotiate access with each other and avoid corruption? We need a secondary level of control.
This is where the second tool provided by .NET comes into play: the FileStream.Lock method. It allows an open FileStream to temporarily reserve exclusive ownership of a specified region of the file. A region of a file reserved by a FileStream cannot be read or written by any other FileStream until the owner releases it with the FileStream.Unlock method (or that FileStream is closed, which will cause any region locks it owns to be cleaned up by the OS). Thus, non-exclusive connections can still control access by using FileStream.Lock and make sure different connections take turns when accessing various parts of a database file.
Is That All?
In this article, we’ve talked about the basic need for locking in an embedded database engine like VistaDB, some basic tools for performing such locking in .NET, and a little bit of how VistaDB can use some of them to coordinate access to a database file.
If you want to learn more about VistaDB, feel free to take a hands-on look at how VistaDB coordinates access to the database file with VistaDB 6.1.