You are here

How We Test NuoSQL

Here in the NuoDB Engineering organization, we run pretty lean.  One of the most important things we do as engineers is ensure that our features and bugfixes are well tested. The NuoSQL team uses a variety of techniques to build a testable and tested product.

Test-Driven Development

Test-driven development has become a buzzword, but it has always been a standard way of doing things in the programming language world.  User stories are the rationale for a new language feature, and use cases are, by and large, immediately useful as test cases.  For example, when we added the ANALYZE TABLE feature we used our testcases to check that the workflow we had specified was useful.

With a language like SQL, a good deal of our development process is also driven by standards -- both the de jure ISO standard and de facto standards in the industry. But again, writing the test is an important way for the engineer who implements the feature to become familiar with how it works. Sometimes we become familiar with how a language element works differently in different implementations. :)

We have a variety of testing frameworks, which typically use a variation on one of the classic automated testing methodologies.

"Known Good" Testing

A "known good" test is the most specific kind of test we write. Depending on the test environment (Java/JDBC, C++/ODBC, or another host language and driver), we may be able to use a test framework such as JUnit, or we may resort to the time-honored diff program to compare our tests' output to a sample of output that has been checked by a human being and designated as the correct output of the test.

A typical "known good" test is directly derived from a use case, and they're our first line of defense against regressions. Our development process specifies that every new feature -- even a minor bugfix -- needs testing. One important part of our code review process is verifying that the testcases are present and correct.

Although "known good" testing is extremely valuable, it does have some drawbacks.

  • Testing with blinders: a developer who's unaware of behavior of his or her feature should exhibit will also not write tests for it. Having a separate engineer writing tests is very helpful, but that contributes to the next problem with "known good" testing.
  • Expensive: "known good" tests are expensive artifacts.  The tests require an initial investment development investment, and they often require maintenance during their lifecycle.
  • Typically not complete: because "known good" tests are usually hand written, they usually only test a subset of the possible combinations of language elements that might interact with a feature.  For example, I recently added tests for the ISO SQL construct <row value predicand> to our 2.0 codebase. Since we only accept a <row value predicand> in limited number of contexts, I tried to be as complete as possible.  The number of tests quickly explodes, though:

(1,1) = (1,1)
(1,2) = (1,1)
(2,1) = (1,1)
(1,1) = (1,2)
(1,1) = (2,1)

and on with all combinations.  And we haven't even started on forms like (1,(1,2)) = (1,(2,1)) yet, much less (1,1) = (f1,f2).  And then there's the <>, >, >=, <, and <= operators to test. It takes a lot of determination and caffeine to even get close to a complete test.

But we are not defenseless against this explosion of testcases; we use the venerable tactic of using the machine to test itself.

Differential Testing

Differential testing uses a separate system as the "known good" element of a test; the differential test generator randomly generates a test case and compares the results it gets from the "good" baseline system to the result of the new system under test.  Results may, again, be checked by a specialized testing framework or by a simple diff of the two outputs.  Because test generation and test verification are automated, the testing is typically more complete than hand-coded testing.

Our primary differential testing tool is a Random Query Generator, a tool that generates random (valid) SQL statements and verifies the results using either a different version of NuoDB or a different database system as an external "oracle."

Differential Testing in Action: Testing our New Parser

When we replaced our hand-coded legacy parser with a YACC-based parser in the NuoDB 2.0 codebase, we needed a differential testing tool that emitted both valid and invalid constructs, and we built test scaffolding into the compiler to run both the old and new parsers and compare the results.  This allowed us to reverse-engineer the legacy parser to a high degree of accuracy.

The workload generator was a recursive-descent generator; we chose this design because the legacy parser was a recursive-descent design and we copied logic from the parser into the generator.  It wasn't a straight copy-paste job, but the similarity of design helped a lot.

The logic in the compiler is very simple:

AST* newResult = newParser->parse();
AST* oldResult = oldParser->parseStatement();
if (!newResult.equals(oldResult)) {
    throw CompilerProblem("Result mismatch:\n%s\n%s" newResult->asTree(), oldResult->asTree());
} else {
    // Ensure diagnostics are equivalent.

Self-Verifying Tests

A self-verifying test is one that can tell you whether it succeeded or failed.  A very simple self-verifying test is:

SELECT 'succeeded' FROM DUAL WHERE 1 + 2 = 3;&lt;br&gt;SELECT 'FAILED' FROM DUAL WHERE 1 + 1 = 3;

This test checks integer addition with small integers. Many of our "known good" tests are written after this fashion.  For example, the ANALYZE TABLE test I referred to above contains:

CREATE TABLE padded_decimal (f1 DECIMAL(5,2));
INSERT INTO padded_decimal VALUES (0.1), (.1), ('0.1'), ('0.1 '), (' 0.1'), ('.1'), (0.10), (.10), ('0.10'), ('.10'), (0.1e0), (1e-1);
INSERT INTO padded_decimal VALUES (0), (+0), (-0), (0.00), (00.0), (.0);
CREATE INDEX padded_decimal_index ON padded_decimal(f1);
INSERT INTO padded_decimal VALUES (0.1), (.1), ('0.1'), ('0.1 '), (' 0.1'), ('.1'), (0.10), (.10), ('0.10'), ('.10'), (0.1e0), (1e-1);
INSERT INTO padded_decimal VALUES (0), (+0), (-0), (0.00), (00.0), (.0);
ANALYZE TABLE padded_decimal;

This test should output a row whose two columns both have the value TRUE.

Writing randomly generated self-verifying tests is a tricky job, as you might imagine. I have written random program self-verifying generators for procedural languages, but as yet I have not written one for a database. The basic technique is not dissimilar to these two examples: one generally works backwards from a desired result by randomly picking language elements that fit the constraints of the problem and compute a reasonable part of a sub-problem.


Add new comment