VistaDB 6 employs new data access strategy for dramatically faster queries
For more information on what’s new in VistaDB 6 see What’s Coming in VistaDB 6
VistaDB has used a carefully crafted object-oriented approach in its internal implementation of reading and writing data. This provided the expected benefits from OOP - namely minimized implementation risks and simplified extensibility. Unfortunately, this led to a substantial number of objects being created to read each row and copying of data between layers of the code structure. For VistaDB 6, we’ve rewritten the internal data access code to dramatically reduce object creation and data copies. The performance gains are impressive: A nearly 40% improvement in selecting the entire contents of a modest-sized table, and scalability across several threads at the same time with minimal drop in individual query performance.
It’s the hardest, riskiest change we’ve made in VistaDB in five years but it also yields the biggest gains yet, all without any code changes for your application.
Cautious Code Creates Performance Problem
While doing detailed profiling of VistaDB 5, we identified that an excessive amount of time was being spent in garbage collection - which was being driven by a large number of short-lived objects created while reading data. For example, take a simple query:
SELECT * FROM dbo.Table_With_25_Columns;
To access the data for a table, VistaDB creates a template row object with a collection of column objects. This neatly describes the schema the table was loaded with. As each row is read, a copy of this template is made and the raw binary data for the row is copied from the data page and each column value is deserialized by its column definition object. This has excellent isolation characteristics: For example, if you are changing the schema of a table and copying its data you can trust that the row object can be interpreted because it has its own copy of the schema. The big downside is that even a simple select all query can make a lot of objects: Two per column plus the row for every row, minimum. For a table with 25 columns even if you’re just reading one of them it takes a minimum of 51 objects to deserialize and represent the data as it goes through the engine.
It’s made worse if the SQL has subqueries (e.g. projections) as each subquery has its own virtual schema - so its own template row with its own column objects that the data has to be copied into as it is calculated. That means a simple subquery can introduce dramatic object creation overhead:
SELECT EX.* FROM (SELECT * FROM dbo.Table_With_24_Columns) EX;
This effectively doubles the number of objects created for each row read. Even with tables with a modest number of columns this caused so many objects to be created that running two queries at the same time produced at best the same number of total rows per second - effectively halving the performance of each query. It gets quickly worse from there as the time spent in Gen 0 Garbage Collection quickly ramps to consume nearly all execution time.
This problem hid for a long time because the time isn’t really spent in the object creation but instead is spent in Garbage Collection - something that many profilers have a difficult time representing reliably. It took a customer reporting that VistaDB appeared single threaded (since running multiple queries at the same time didn’t improve total throughput) for us to break out the relatively elaborate Visual Studio Concurrency Visualizer to show us what was going on.
What it shows is the Garbage Collector consistently blocking the query threads from running. The reason is the Generation 0 collector does block all threads - even in the latest version of .NET - from running because it’s designed to clear up the very small number of objects that are very short lived. Unfortunately, VistaDB created an excessive number of these objects. We really can’t blame the Garbage Collector - we’re doing something far outside of the norm for .NET.
Minimizing Object Creation for Top Performance
For VistaDB 6, we’ve gone on a mission to eliminate object allocations, particularly when they’re repeated on each row. This sounds simple, but had a series of complex tradeoffs. The first challenge is that much of the API (including the ADO.NET API we can’t change) uses object as the type for column values. For any value type column (int, boolean, etc.) this requires object boxing - creating an object to wrap the value type. In many cases we would immediately have to cast the value back to its native type, shedding the object box - which then had to be GC’d in Gen 0. We modified our internal API’s to cache boxed values and add alternate type-correct properties to bypass boxing. This reduced object allocations per row by over 60%.
To reuse the template for each row (including its column collection) we had to separate where we stored per-row column values once they were deserialized, ideally without creating any additional objects. We experimented with the performance of several different approaches and finally settled on creating an object array for each row’s data. We then reference the value out of this array by ordinal wherever it is needed. To maximize performance we cache other items in the same object array - like modification tracking, string values for comparison, and additional column metadata for extended data types like VarChar(max).
At the lowest level of the system we’re working directly with these object arrays to maximize throughput, while higher level code is working through a new API to make it easy to ensure you’re accessing the right column data just as you did before. It’s a different coding pattern but offers the same safety guarantees at higher levels of our engine implementation.
To avoid extra copies as data is transferred from the low level of the engine to the ADO.NET layer we’ve had to refactor the entire approach to use a common base interface instead of concrete implementations and add an intelligent reference counting layer to avoid duplication until data is modified.
What’s the Bottom Line?
In total this was the largest change set we’ve ever done to VistaDB - approximately 20,000 lines of source code modified in the core of the engine. But, we feel it’s all worth it: this change alone nets the largest improvement yet in query performance. Here’s our test query:
SELECT * FROM dbo.Table_With_46_Columns;
The table of real world data we used had 300,000 rows for a total of 13,800,000 column values read. Comparing VistaDB 5.8 and 6.0 we get:
Version | Duration | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|
VistaDB 5.8 | 6.173 s | 530,000 | 264,000 | 17,000 | 3.11 GB |
VistaDB 6.0 | 3.782 s | 276,000 | 136,000 | 4,000 | 1.62 GB |
Version 6 is 38.7% Faster but just as importantly allocated about half the memory and stressed the GC by half compared to version 5. Better still, the performance improvements scale much better across threads because of the significantly reduced pressure on the Garbage Collector. We’re going to continue building improvements on this new low-level data access method to speed up more queries and scenarios.