Auto-generate unit tests & benchmarks #145

regexident · 2019-10-10T18:31:12Z

tl;dr

We've gone way past the point where writing/maintaining highly redundant manual unit tests is any fun. If writing unit tests becomes tedious and a maintenance hell people start neglecting them instead. Let's thus make use of the fact that our APIs (and such the tests) almost all follow the pattern and automatically generate the tests for us, allowing us to increase test coverage even more, at actually far less overall cost.

What?

A quick look at the /Tests reveals a suite of tests that all pretty much share the same pattern:

Our tests look something like this:

func test_<something>_float() {
    // Define a type-alias for convenience:
    typealias Scalar = Float

    // Create some dummy data:
    let lhs: [Scalar] = .monotonicNormalized()
    let rhs: [Scalar] = .monotonicNormalized()

    // Create a working copy of the dummy data:
    var actual: [Scalar] = lhs
    // Operate on the working copy:
    Surge.eladdInPlace(&actual, rhs)

    // Provide a ground-truth implementation to compare against:
    let expected = zip(lhs, rhs).map { $0 + $1 }

    // Compare the result:
    XCTAssertEqual(actual, expected, accuracy: 1e-8)
}

… only differentiating each other by a change in this line:

Surge.eladdInPlace(&actual, rhs)

… and this line:

let expected = zip(lhs, rhs).map { $0 + $1 }

And our benchmarks look something like this:

// benchmarks:
func test_add_in_place_array_array_float() {
    // Call convenience function:
    measure_inout_array_array(of: Float.self) { measure in
        // Call XCTest's measurement method:
        measureMetrics([.wallClockTime], automaticallyStartMeasuring: false) {
            // Perform the actual operations to be measured:
            measure(Surge.eladdInPlace)
        }
    }
}

… which is semantically equivalent to the more verbose:

func test_add_in_place_array_array_float() {
    typealias Scalar = T

    let lhs = produceLhs()
    let rhs = produceRhs()

    // Call XCTest's measurement method:
    measureMetrics([.wallClockTime], automaticallyStartMeasuring: false) {
        var lhs = lhs
        
        startMeasuring()
        let _ = Surge.eladdInPlace(&lhs, rhs)
        stopMeasuring()
    }
}

… again, only differentiating each other by a change in this line:

let _ = Surge.eladdInPlace(&actual, rhs)

Why?

At now shy over 200 tests and over 60 benchmarks maintenance of our tests/benchmarks suites has become quite a chore. 😣

So this got me thinking: What if … what if instead of writing and maintaining hundreds of highly redundant tests functions (for a lack of macros in Swift) we had a way to have the tests and even benchmarks generated auto-magically for us?

With this we could easily increase test coverage from "just the functions containing non-trivial logic" to "basically every public function, regardless of complexity", allowing us to catch regressions for even the most-trivial wrapper function, currently not covered at hardly any additional maintenance burden.

How?

The basic idea is to get rid of all the existing unit tests and replace them with mere Sourcery annotations, like this:

// sourcery: test, floatAccuracy = 1e-5, expected = "add(array:array)"
public func add<L, R>(_ lhs: L, _ rhs: R) -> [Float] where L: UnsafeMemoryAccessible, R: UnsafeMemoryAccessible, L.Element == Float, R.Element == Float {
    // …
}

… given a fixture like this:

enum Fixture {
    enum Argument {
        func `default`<Scalar>() -> Scalar { … }
        func `default`<Scalar>() -> [Scalar] { … }
        func `default`<Scalar>() -> Vector<Scalar> { … }
        func `default`<Scalar>() -> Matrix<Scalar> { … }
    }
    enum Accuracy {
        func `default`() -> Float { … }
        func `default`() -> Double { … }
    }
    enum Expected {}
}

extension Fixture.Expected {
    func add<Scalar>(array lhs: [Scalar], array rhs: [Scalar]) -> [Scalar] {
        return zip(lhs, rhs).map { $0 + $1 }
    }
}

Function Annotation	Description
`test`	Generate test function (Optional)
`bench`	Generate benchmark function (Optional)
`expected = <function name>`	The fixture function to use as ground-truth (Required by `test`)
`accuracy = <float literal>`	A custom testing accuracy (Optional, used by `test`)
`floatAccuracy = <float literal>`	A custom `Float`-specific testing accuracy (Optional, used by `test`)
`doubleAccuracy = <float literal>`	A custom `Double`-specific testing accuracy (Optional, used by `test`)
`arg<N> = <function name>`	The fixture factory function for the nth argument (Optional, used by `test`)
…	…

One would have Sourcery parse the ource code and generate a test suite per source file (or type extension, preferably), looking for test and bench annotations.

The current unit tests make use of minimal customization of lhs/rhs dummy values, so arg<N> will rarely find use, but a few tests need custom data to test against.

Also given that Surge has a rather restricted set of types that are to be expected as function arguments we should be able to match against them (Scalar, Collection where Element == Scalar, Vector<Scalar>, Matrix<Scalar>) rather naïvely, allowing us to elide most data we would otherwise have to specify explicitly.

The text was updated successfully, but these errors were encountered:

regexident added the enhancement label Oct 10, 2019

regexident added this to the 3.0 milestone Oct 10, 2019

regexident self-assigned this Oct 10, 2019

regexident mentioned this issue Oct 13, 2019

3.0 Roadmap #149

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-generate unit tests & benchmarks #145

Auto-generate unit tests & benchmarks #145

regexident commented Oct 10, 2019 •

edited

Auto-generate unit tests & benchmarks #145

Auto-generate unit tests & benchmarks #145

Comments

regexident commented Oct 10, 2019 • edited

tl;dr

What?

Why?

How?

regexident commented Oct 10, 2019 •

edited