John Ericson 68c81c7375 Put functional tests in tests/functional

I think it is bad for these reasons when `tests/` contains a mix of
functional and integration tests

 - Concepts is harder to understand, the documentation makes a good
   unit vs functional vs integration distinction, but when the
   integration tests are just two subdirs within `tests/` this is not
   clear.

 - Source filtering in the `flake.nix` is more complex. We need to
   filter out some of the dirs from `tests/`, rather than simply pick
   the dirs we want and take all of them. This is a good sign the
   structure of what we are trying to do is not matching the structure
   of the files.

With this change we have a clean:
```shell-session
$ git show 'HEAD:tests'
tree HEAD:tests

functional/
installer/
nixos/
```

2023-10-06 09:05:56 -04:00

11 KiB

Raw Blame History

Running tests

Coverage analysis

A coverage analysis report is available online You can build it yourself:

# nix build .#hydraJobs.coverage
# xdg-open ./result/coverage/index.html

Extensive records of build metrics, such as test coverage over time, are also available online.

Unit-tests

The unit tests are defined using the googletest and rapidcheck frameworks.

Source and header layout

An example of some files, demonstrating much of what is described below

src
├── libexpr
│   ├── value/context.hh
│   ├── value/context.cc
│   │
│   …
    └── tests
│       ├── value/context.hh
│       ├── value/context.cc
│       │
│       …
│
├── unit-test-data
│   ├── libstore
│   │   ├── worker-protocol/content-address.bin
│   │   …
│   …
…

The unit tests for each Nix library (libnixexpr, libnixstore, etc..) live inside a directory src/${library_shortname}/tests within the directory for the library (src/${library_shortname}).

The data is in unit-test-data, with one subdir per library, with the same name as where the code goes. For example, libnixstore code is in src/libstore, and its test data is in unit-test-data/libstore. The path to the unit-test-data directory is passed to the unit test executable with the environment variable _NIX_TEST_UNIT_DATA.

Note

Due to the way googletest works, downstream unit test executables will actually include and re-run upstream library tests. Therefore it is important that the same value for _NIX_TEST_UNIT_DATA be used with the tests for each library. That is why we have the test data nested within a single unit-test-data directory.

Running tests

You can run the whole testsuite with make check, or the tests for a specific component with make libfoo-tests_RUN. Finer-grained filtering is also possible using the --gtest_filter command-line option, or the GTEST_FILTER environment variable.

Characterization testing

See below for a broader discussion of characterization testing.

Like with the functional characterization, _NIX_TEST_ACCEPT=1 is also used. For example:

$ _NIX_TEST_ACCEPT=1 make libstore-tests-exe_RUN
...
[  SKIPPED ] WorkerProtoTest.string_read
[  SKIPPED ] WorkerProtoTest.string_write
[  SKIPPED ] WorkerProtoTest.storePath_read
[  SKIPPED ] WorkerProtoTest.storePath_write
...

will regenerate the "golden master" expected result for the libnixstore characterization tests. The characterization tests will mark themselves "skipped" since they regenerated the expected result instead of actually testing anything.

Functional tests

The functional tests reside under the tests/functional directory and are listed in tests/functional/local.mk. Each test is a bash script.

Running the whole test suite

The whole test suite can be run with:

$ make install && make installcheck
ran test tests/functional/foo.sh... [PASS]
ran test tests/functional/bar.sh... [PASS]
...

Grouping tests

Sometimes it is useful to group related tests so they can be easily run together without running the entire test suite. Each test group is in a subdirectory of tests. For example, tests/functional/ca/local.mk defines a ca test group for content-addressed derivation outputs.

That test group can be run like this:

$ make ca.test-group -j50
ran test tests/functional/ca/nix-run.sh... [PASS]
ran test tests/functional/ca/import-derivation.sh... [PASS]
...

The test group is defined in Make like this:

$(test-group-name)-tests := \
  $(d)/test0.sh \
  $(d)/test1.sh \
  ...

install-tests-groups += $(test-group-name)

Running individual tests

Individual tests can be run with make:

$ make tests/functional/${testName}.sh.test
ran test tests/functional/${testName}.sh... [PASS]

or without make:

$ ./mk/run-test.sh tests/functional/${testName}.sh
ran test tests/functional/${testName}.sh... [PASS]

To see the complete output, one can also run:

$ ./mk/debug-test.sh tests/functional/${testName}.sh
+ foo
output from foo
+ bar
output from bar
...

The test script will then be traced with set -x and the output displayed as it happens, regardless of whether the test succeeds or fails.

Debugging failing functional tests

When a functional test fails, it usually does so somewhere in the middle of the script.

To figure out what's wrong, it is convenient to run the test regularly up to the failing nix command, and then run that command with a debugger like GDB.

For example, if the script looks like:

foo
nix blah blub
bar

edit it like so:

 foo
-nix blah blub
+gdb --args nix blah blub
 bar

Then, running the test with ./mk/debug-test.sh will drop you into GDB once the script reaches that point:

$ ./mk/debug-test.sh tests/functional/${testName}.sh
...
+ gdb blash blub
GNU gdb (GDB) 12.1
...
(gdb)

One can debug the Nix invocation in all the usual ways. For example, enter run to start the Nix invocation.

Troubleshooting

Sometimes running tests in the development shell may leave artefacts in the local repository. To remove any traces of that:

git clean -x --force tests

Characterization testing

Occasionally, Nix utilizes a technique called Characterization Testing as part of the functional tests. This technique is to include the exact output/behavior of a former version of Nix in a test in order to check that Nix continues to produce the same behavior going forward.

For example, this technique is used for the language tests, to check both the printed final value if evaluation was successful, and any errors and warnings encountered.

It is frequently useful to regenerate the expected output. To do that, rerun the failed test(s) with _NIX_TEST_ACCEPT=1. For example:

_NIX_TEST_ACCEPT=1 make tests/functional/lang.sh.test

This convention is shared with the characterization unit tests too.

An interesting situation to document is the case when these tests are "overfitted". The language tests are, again, an example of this. The expected successful output of evaluation is supposed to be highly stable – we do not intend to make breaking changes to (the stable parts of) the Nix language. However, the errors and warnings during evaluation (successful or not) are not stable in this way. We are free to change how they are displayed at any time.

It may be surprising that we would test non-normative behavior like diagnostic outputs. Diagnostic outputs are indeed not a stable interface, but they still are important to users. By recording the expected output, the test suite guards against accidental changes, and ensure the result (not just the code that implements it) of the diagnostic code paths are under code review. Regressions are caught, and improvements always show up in code review.

To ensure that characterization testing doesn't make it harder to intentionally change these interfaces, there always must be an easy way to regenerate the expected output, as we do with _NIX_TEST_ACCEPT=1.

Integration tests

The integration tests are defined in the Nix flake under the hydraJobs.tests attribute. These tests include everything that needs to interact with external services or run Nix in a non-trivial distributed setup. Because these tests are expensive and require more than what the standard github-actions setup provides, they only run on the master branch (on https://hydra.nixos.org/jobset/nix/master).

You can run them manually with nix build .#hydraJobs.tests.{testName} or nix-build -A hydraJobs.tests.{testName}

Installer tests

After a one-time setup, the Nix repository's GitHub Actions continuous integration (CI) workflow can test the installer each time you push to a branch.

Creating a Cachix cache for your installer tests and adding its authorization token to GitHub enables two installer-specific jobs in the CI workflow:

The installer job generates installers for the platforms below and uploads them to your Cachix cache:
- x86_64-linux
- armv6l-linux
- armv7l-linux
- x86_64-darwin
The installer_test job (which runs on ubuntu-latest and macos-latest) will try to install Nix with the cached installer and run a trivial Nix command.

One-time setup

Have a GitHub account with a fork of the Nix repository.
At cachix.org:
- Create or log in to an account.
- Create a Cachix cache using the format <github-username>-nix-install-tests.
- Navigate to the new cache > Settings > Auth Tokens.
- Generate a new Cachix auth token and copy the generated value.
At github.com:
- Navigate to your Nix fork > Settings > Secrets > Actions > New repository secret.
- Name the secret CACHIX_AUTH_TOKEN.
- Paste the copied value of the Cachix cache auth token.

Working on documentation

Using the CI-generated installer for manual testing

After the CI run completes, you can check the output to extract the installer URL:

Click into the detailed view of the CI run.
Click into any installer_test run (the URL you're here to extract will be the same in all of them).
Click into the Run cachix/install-nix-action@v... step and click the detail triangle next to the first log line (it will also be Run cachix/install-nix-action@v...)
Copy the value of install_url

To generate an install command, plug this install_url and your GitHub username into this template:

curl -L <install_url> | sh -s -- --tarball-url-prefix https://<github-username>-nix-install-tests.cachix.org/serve

11 KiB Raw Blame History Unescape Escape

Running tests

Coverage analysis

Unit-tests

Source and header layout

Running tests

Characterization testing

Functional tests

Running the whole test suite

Grouping tests

Running individual tests

Debugging failing functional tests

Troubleshooting

Characterization testing

Integration tests

Installer tests

One-time setup

Working on documentation

Using the CI-generated installer for manual testing

11 KiB

Raw Blame History