Merge pull request #6420 from nix-community/doc-what-is-nix

Document what Nix *is*
2022-08-04 20:49:01 +02:00 · 2022-08-04 20:49:01 +02:00 · 81e101345f
parent 7d1280bbaf 39d32ac4c6
commit 81e101345f
9 changed files with 477 additions and 2 deletions
--- a/.gitignore
+++ b/.gitignore
@ -22,7 +22,7 @@ perl/Makefile.config
 /doc/manual/src/SUMMARY.md
 /doc/manual/src/command-ref/new-cli
 /doc/manual/src/command-ref/conf-file.md
-/doc/manual/src/expressions/builtins.md
+/doc/manual/src/language/builtins.md

 # /scripts/
 /scripts/nix-profile.sh
--- a/doc/manual/local.mk
+++ b/doc/manual/local.mk
@ -1,5 +1,9 @@
 ifeq ($(doc_generate),yes)

+MANUAL_SRCS := \
+  $(call rwildcard, $(d)/src, *.md) \
+  $(call rwildcard, $(d)/src, */*.md)
+
 # Generate man pages.
 man-pages := $(foreach n, \
  nix-env.1 nix-build.1 nix-shell.1 nix-store.1 nix-instantiate.1 \
@ -97,7 +101,7 @@ doc/manual/generated/man1/nix3-manpages: $(d)/src/command-ref/new-cli
 	done
 	@touch $@

-$(docdir)/manual/index.html: $(MANUAL_SRCS) $(d)/book.toml $(d)/anchors.jq $(d)/custom.css $(d)/src/SUMMARY.md $(d)/src/command-ref/new-cli $(d)/src/command-ref/conf-file.md $(d)/src/language/builtins.md $(call rwildcard, $(d)/src, *.md)
+$(docdir)/manual/index.html: $(MANUAL_SRCS) $(d)/book.toml $(d)/anchors.jq $(d)/custom.css $(d)/src/SUMMARY.md $(d)/src/command-ref/new-cli $(d)/src/command-ref/conf-file.md $(d)/src/language/builtins.md
 	$(trace-gen) RUST_LOG=warn mdbook build doc/manual -d $(DESTDIR)$(docdir)/manual

 endif
--- a/doc/manual/src/SUMMARY.md.in
+++ b/doc/manual/src/SUMMARY.md.in
@ -59,6 +59,12 @@
@manpages@
  - [Files](command-ref/files.md)
    - [nix.conf](command-ref/conf-file.md)
+- [Architecture](architecture/architecture.md)
+  - [Store](architecture/store/store.md)
+    - [Closure](architecture/store/store/closure.md)
+    - [Build system terminology](architecture/store/store/build-system-terminology.md)
+  - [Store Path](architecture/store/path.md)
+  - [File System Object](architecture/store/fso.md)
 - [Glossary](glossary.md)
 - [Contributing](contributing/contributing.md)
  - [Hacking](contributing/hacking.md)
--- a/doc/manual/src/architecture/architecture.md
+++ b/doc/manual/src/architecture/architecture.md
@ -0,0 +1,79 @@
+# Architecture
+
+*(This chapter is unstable and a work in progress. Incoming links may rot.)*
+
+This chapter describes how Nix works.
+It should help users understand why Nix behaves as it does, and it should help developers understand how to modify Nix and how to write similar tools.
+
+## Overview
+
+Nix consists of [hierarchical layers][layer-architecture].
+
+```
+-----------------------------------------------------------------+
+| Nix                                                             |
+|                  [ commmand line interface ]------,             |
+|                               |                   |             |
+|                           evaluates               |             |
+|                               |                manages          |
+|                               V                   |             |
+|                  [ configuration language  ]      |             |
+|                               |                   |             |
+| +-----------------------------|-------------------V-----------+ |
+| | store                  evaluates to                         | |
+| |                             |                               | |
+| |             referenced by   V       builds                  | |
+| |  [ build input ] ---> [ build plan ] ---> [ build result ]  | |
+| |                                                             | |
+| +-------------------------------------------------------------+ |
+-----------------------------------------------------------------+
+```
+
+At the top is the [command line interface](../command-ref/command-ref.md), translating from invocations of Nix executables to interactions with the underlying layers.
+
+Below that is the [Nix expression language](../expressions/expression-language.md), a [purely functional][purely-functional-programming] configuration language.
+It is used to compose expressions which ultimately evaluate to self-contained *build plans*, used to derive *build results* from referenced *build inputs*.
+
+The command line and Nix language are what users interact with most.
+
+> **Note**
+> The Nix language itself does not have a notion of *packages* or *configurations*.
+> As far as we are concerned here, the inputs and results of a build plan are just data.
+
+Underlying these is the [Nix store](./store/store.md), a mechanism to keep track of build plans, data, and references between them.
+It can also execute build plans to produce new data.
+
+A build plan is a series of *build tasks*.
+Each build task has a special build input which is used as *build instructions*.
+The result of a build task can be input to another build task.
+
+```
+-----------------------------------------------------------------------------------------+
+| store                                                                                   |
+|                   .................................................                     |
+|                   :  build plan                                   :                     |
+|                   :                                               :                     |
+|  [ build input ]-----instructions-,                               :                     |
+|                   :               |                               :                     |
+|                   :               v                               :                     |
+|  [ build input ]----------->[ build task ]--instructions-,        :                     |
+|                   :                                      |        :                     |
+|                   :                                      |        :                     |
+|                   :                                      v        :                     |
+|                   :                               [ build task ]----->[ build result ]  |
+|  [ build input ]-----instructions-,                      ^        :                     |
+|                   :               |                      |        :                     |
+|                   :               v                      |        :                     |
+|  [ build input ]----------->[ build task ]---------------'        :                     |
+|                   :               ^                               :                     |
+|                   :               |                               :                     |
+|  [ build input ]------------------'                               :                     |
+|                   :                                               :                     |
+|                   :                                               :                     |
+|                   :...............................................:                     |
+|                                                                                         |
+-----------------------------------------------------------------------------------------+
+```
+
+[layer-architecture]: https://en.m.wikipedia.org/wiki/Multitier_architecture#Layers
+[purely-functional-programming]: https://en.m.wikipedia.org/wiki/Purely_functional_programming
--- a/doc/manual/src/architecture/store/fso.md
+++ b/doc/manual/src/architecture/store/fso.md
@ -0,0 +1,69 @@
+# File System Object
+
+The Nix store uses a simple file system model for the data it holds in [store objects](store.md#store-object).
+
+Every file system object is one of the following:
+
+ - File: an executable flag, and arbitrary data for contents
+ - Directory: mapping of names to child file system objects
+ - [Symbolic link][symlink]: may point anywhere.
+
+We call a store object's outermost file system object the *root*.
+
+    data FileSystemObject
+      = File      { isExecutable :: Bool, contents :: Bytes }
+      | Directory { entries :: Map FileName FileSystemObject }
+      | SymLink   { target :: Path }
+
+Examples:
+
+- a directory with contents
+
+      /nix/store/<hash>-hello-2.10
+      ├── bin
+      │   └── hello
+      └── share
+          ├── info
+          │   └── hello.info
+          └── man
+              └── man1
+                  └── hello.1.gz
+
+- a directory with relative symlink and other contents
+
+      /nix/store/<hash>-go-1.16.9
+      ├── bin -> share/go/bin
+      ├── nix-support/
+      └── share/
+
+- a directory with absolute symlink
+
+      /nix/store/d3k...-nodejs
+      └── nix_node -> /nix/store/f20...-nodejs-10.24.
+
+A bare file or symlink can be a root file system object.
+Examples:
+
+    /nix/store/<hash>-hello-2.10.tar.gz
+
+    /nix/store/4j5...-pkg-config-wrapper-0.29.2-doc -> /nix/store/i99...-pkg-config-0.29.2-doc
+
+Symlinks pointing outside of their own root or to a store object without a matching reference are allowed, but might not function as intended.
+Examples:
+
+- an arbitrarily symlinked file may change or not exist at all
+
+      /nix/store/<hash>-foo
+      └── foo -> /home/foo
+
+- if a symlink to a store path was not automatically created by Nix, it may be invalid or get invalidated when the store object is deleted
+
+      /nix/store/<hash>-bar
+      └── bar -> /nix/store/abc...-foo
+
+Nix file system objects do not support [hard links][hardlink]:
+each file system object which is not the root has exactly one parent and one name.
+However, as store objects are immutable, an underlying file system can use hard links for optimization.
+
+[symlink]: https://en.m.wikipedia.org/wiki/Symbolic_link
+[hardlink]: https://en.m.wikipedia.org/wiki/Hard_link
--- a/doc/manual/src/architecture/store/path.md
+++ b/doc/manual/src/architecture/store/path.md
@ -0,0 +1,105 @@
+# Store Path
+
+Nix implements [references](store.md#reference) to [store objects](store.md#store-object) as *store paths*.
+
+Store paths are pairs of
+
+- a 20-byte [digest](#digest) for identification
+- a symbolic name for people to read.
+
+Example:
+
+- digest: `b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z`
+- name:   `firefox-33.1`
+
+It is rendered to a file system path as the concatenation of
+
+  - [store directory](#store-directory)
+  - path-separator (`/`)
+  - [digest](#digest) rendered in a custom variant of [base-32](https://en.m.wikipedia.org/wiki/Base32) (20 arbitrary bytes become 32 ASCII characters)
+  - hyphen (`-`)
+  - name
+
+Example:
+
+      /nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1
+      |--------| |------------------------------| |----------|
+    store directory            digest                 name
+
+## Store Directory
+
+Every [store](./store.md) has a store directory.
+
+If the store has a [file system representation](./store.md#files-and-processes), this directory contains the store’s [file system objects](#file-system-object), which can be addressed by [store paths](#store-path).
+
+This means a store path is not just derived from the referenced store object itself, but depends on the store the store object is in.
+
+> **Note**
+> The store directory defaults to `/nix/store`, but is in principle arbitrary.
+
+It is important which store a given store object belongs to:
+Files in the store object can contain store paths, and processes may read these paths.
+Nix can only guarantee [referential integrity](store/closure.md) if store paths do not cross store boundaries.
+
+Therefore one can only copy store objects to a different store if
+
+- the source and target stores' directories match
+
+  or
+
+- the store object in question has no references, that is, contains no store paths.
+
+One cannot copy a store object to a store with a different store directory.
+Instead, it has to be rebuilt, together with all its dependencies.
+It is in general not enough to replace the store directory string in file contents, as this may render executables unusable by invalidating their internal offsets or checksums.
+
+# Digest
+
+In a [store path](#store-path), the [digest][digest] is the output of a [cryptographic hash function][hash] of either all *inputs* involved in building the referenced store object or its actual *contents*.
+
+Store objects are therefore said to be either [input-addressed](#input-addressing) or [content-addressed](#content-addressing).
+
+> **Historical Note**
+> The 20 byte restriction is because originally digests were [SHA-1][sha-1] hashes.
+> Nix now uses [SHA-256][sha-256], and longer hashes are still reduced to 20 bytes for compatibility.
+
+[digest]: https://en.m.wiktionary.org/wiki/digest#Noun
+[hash]: https://en.m.wikipedia.org/wiki/Cryptographic_hash_function
+[sha-1]: https://en.m.wikipedia.org/wiki/SHA-1
+[sha-256]: https://en.m.wikipedia.org/wiki/SHA-256
+
+### Reference scanning
+
+When a new store object is built, Nix scans its file contents for store paths to construct its set of references.
+
+The special format of a store path's [digest](#digest) allows reliably detecting it among arbitrary data.
+Nix uses the [closure](store.md#closure) of build inputs to derive the list of allowed store paths, to avoid false positives.
+
+This way, scanning files captures run time dependencies without the user having to declare them explicitly.
+Doing it at build time and persisting references in the store object avoids repeating this time-consuming operation.
+
+> **Note**
+> In practice, it is sometimes still necessary for users to declare certain dependencies explicitly, if they are to be preserved in the build result's closure.
+This depends on the specifics of the software to build and run.
+>
+> For example, Java programs are compressed after compilation, which obfuscates any store paths they may refer to and prevents Nix from automatically detecting them.
+
+## Input Addressing
+
+Input addressing means that the digest derives from how the store object was produced, namely its build inputs and build plan.
+
+To compute the hash of a store object one needs a deterministic serialisation, i.e., a binary string representation which only changes if the store object changes.
+
+Nix has a custom serialisation format called Nix Archive (NAR)
+
+Store object references of this sort can *not* be validated from the content of the store object.
+Rather, a cryptographic signature has to be used to indicate that someone is vouching for the store object really being produced from a build plan with that digest.
+
+## Content Addressing
+
+Content addressing means that the digest derives from the store object's contents, namely its file system objects and references.
+If one knows content addressing was used, one can recalculate the reference and thus verify the store object.
+
+Content addressing is currently only used for the special cases of source files and "fixed-output derivations", where the contents of a store object are known in advance.
+Content addressing of build results is still an [experimental feature subject to some restrictions](https://github.com/tweag/rfcs/blob/cas-rfc/rfcs/0062-content-addressed-paths.md).
+
--- a/doc/manual/src/architecture/store/store.md
+++ b/doc/manual/src/architecture/store/store.md
@ -0,0 +1,151 @@
+# Store
+
+A Nix store is a collection of *store objects* with references between them.
+It supports operations to manipulate that collection.
+
+The following concept map is a graphical outline of this chapter.
+Arrows indicate suggested reading order.
+
+```
+                      ,--------------[ store ]----------------,
+                      |                  |                    |
+                      v                  v                    v
+               [ store object ]     [ closure ]--,      [ operations ]
+                      |               |   |      |        |        |
+                      v               |   |      v        v        |
+           [ files and processes ]    |   | [ garbage collection ] |
+               /          \           |   |                        |
+              v            v          |   v                        v
+[ file system object ] [ store path ] | [ derivation ]--->[ building ]
+                  |        ^      |   |                         |
+                  v        |      v   v                         |
+             [ digest ]----' [ reference scanning ]<------------'
+              /      \
+             v        v
+[ input addressing ] [ content addressing ]
+```
+
+## Store Object
+
+A store object can hold
+
+- arbitrary *data*
+- *references* to other store objects.
+
+Store objects can be build inputs, build results, or build tasks.
+
+Store objects are [immutable][immutable-object]: once created, they do not change until they are deleted.
+
+## Reference
+
+A store object reference is an [opaque][opaque-data-type], [unique identifier][unique-identifier]:
+The only way to obtain references is by adding or building store objects.
+A reference will always point to exactly one store object.
+
+## Operations
+
+A Nix store can *add*, *retrieve*, and *delete* store objects.
+
+                [ data ]
+                    |
+                    V
+    [ store ] ---> add ----> [ store' ]
+                    |
+                    V
+              [ reference ]
+
+<!-- -->
+
+              [ reference ]
+                    |
+                    V
+    [ store ] ---> get
+                    |
+                    V
+             [ store object ]
+
+<!-- -->
+
+              [ reference ]
+                    |
+                    V
+    [ store ] --> delete --> [ store' ]
+
+
+It can *perform builds*, that is, create new store objects by transforming build inputs into build outputs, using instructions from the build tasks.
+
+
+              [ reference ]
+                    |
+                    V
+    [ store ] --> build --(maybe)--> [ store' ]
+                             |
+                             V
+                       [ reference ]
+
+
+As it keeps track of references, it can [garbage-collect][garbage-collection] unused store objects.
+
+
+    [ store ] --> collect garbage --> [ store' ]
+
+## Files and Processes
+
+Nix maps between its store model and the [Unix paradigm][unix-paradigm] of [files and processes][file-descriptor], by encoding immutable store objects and opaque identifiers as file system primitives: files and directories, and paths.
+That allows processes to resolve references contained in files and thus access the contents of store objects.
+
+Store objects are therefore implemented as the pair of
+
+  - a [file system object](fso.md) for data
+  - a set of [store paths](path.md) for references.
+
+[unix-paradigm]: https://en.m.wikipedia.org/wiki/Everything_is_a_file
+[file-descriptor]: https://en.m.wikipedia.org/wiki/File_descriptor
+
+The following diagram shows a radical simplification of how Nix interacts with the operating system:
+It uses files as build inputs, and build outputs are files again.
+On the operating system, files can be run as processes, which in turn operate on files.
+A build function also amounts to an operating system process (not depicted).
+
+```
+-----------------------------------------------------------------+
+| Nix                                                             |
+|                  [ commmand line interface ]------,             |
+|                               |                   |             |
+|                           evaluates               |             |
+|                               |                manages          |
+|                               V                   |             |
+|                  [ configuration language  ]      |             |
+|                               |                   |             |
+| +-----------------------------|-------------------V-----------+ |
+| | store                  evaluates to                         | |
+| |                             |                               | |
+| |             referenced by   V       builds                  | |
+| |  [ build input ] ---> [ build plan ] ---> [ build result ]  | |
+| |         ^                                        |          | |
+| +---------|----------------------------------------|----------+ |
+-----------|----------------------------------------|------------+
+            |                                        |
+    file system object                          store path
+            |                                        |
+-----------|----------------------------------------|------------+
+| operating system        +------------+             |            |
+|           '------------ |            | <-----------'            |
+|                         |    file    |                          |
+|                     ,-- |            | <-,                      |
+|                     |   +------------+   |                      |
+|          execute as |                    | read, write, execute |
+|                     |   +------------+   |                      |
+|                     '-> |  process   | --'                      |
+|                         +------------+                          |
+-----------------------------------------------------------------+
+```
+
+There exist different types of stores, which all follow this model.
+Examples:
+- store on the local file system
+- remote store accessible via SSH
+- binary cache store accessible via HTTP
+
+To make store objects accessible to processes, stores ultimately have to expose store objects through the file system.
+
--- a/doc/manual/src/architecture/store/store/build-system-terminology.md
+++ b/doc/manual/src/architecture/store/store/build-system-terminology.md
@ -0,0 +1,32 @@
+# A [Rosetta stone][rosetta-stone] for build system terminology
+
+The Nix store's design is comparable to other build systems.
+Usage of terms is, for historic reasons, not entirely consistent within the Nix ecosystem, and still subject to slow change.
+
+The following translation table points out similarities and equivalent terms, to help clarify their meaning and inform consistent use in the future.
+
+| generic build system             | Nix              | [Bazel][bazel]                                                       | [Build Systems à la Carte][bsalc] | programming language     |
+| -------------------------------- | ---------------- | -------------------------------------------------------------------- | --------------------------------- | ------------------------ |
+| data (build input, build result) | store object     | [artifact][bazel-artifact]                                           | value                             | value                    |
+| build instructions               | builder          | ([depends on action type][bazel-actions])                            | function                          | function                 |
+| build task                       | derivation       | [action][bazel-action]                                               | `Task`                            | [thunk][thunk]           |
+| build plan                       | derivation graph | [action graph][bazel-action-graph], [build graph][bazel-build-graph] | `Tasks`                           | [call graph][call-graph] |
+| build                            | build            | build                                                                | application of `Build`            | evaluation               |
+| persistence layer                | store            | [action cache][bazel-action-cache]                                   | `Store`                           | heap                     |
+
+All of these systems share features of [declarative programming][declarative-programming] languages, a key insight first put forward by Eelco Dolstra et al. in [Imposing a Memory Management Discipline on Software Deployment][immdsd] (2004), elaborated in his PhD thesis [The Purely Functional Software Deployment Model][phd-thesis] (2006), and further refined by Andrey Mokhov et al. in [Build Systems à la Carte][bsalc] (2018).
+
+[rosetta-stone]: https://en.m.wikipedia.org/wiki/Rosetta_Stone
+[bazel]: https://bazel.build/start/bazel-intro
+[bazel-artifact]: https://bazel.build/reference/glossary#artifact
+[bazel-actions]: https://docs.bazel.build/versions/main/skylark/lib/actions.html
+[bazel-action]: https://bazel.build/reference/glossary#action
+[bazel-action-graph]: https://bazel.build/reference/glossary#action-graph
+[bazel-build-graph]: https://bazel.build/reference/glossary#build-graph
+[bazel-action-cache]: https://bazel.build/reference/glossary#action-cache
+[thunk]: https://en.m.wikipedia.org/wiki/Thunk
+[call-graph]: https://en.m.wikipedia.org/wiki/Call_graph
+[declarative-programming]: https://en.m.wikipedia.org/wiki/Declarative_programming
+[immdsd]: https://edolstra.github.io/pubs/immdsd-icse2004-final.pdf
+[phd-thesis]: https://edolstra.github.io/pubs/phd-thesis.pdf
+[bsalc]: https://www.microsoft.com/en-us/research/uploads/prod/2018/03/build-systems.pdf
--- a/doc/manual/src/architecture/store/store/closure.md
+++ b/doc/manual/src/architecture/store/store/closure.md
@ -0,0 +1,29 @@
+# Closure
+
+Nix stores ensure [referential integrity][referential-integrity]: for each store object in the store, all the store objects it references must also be in the store.
+
+The set of all store objects reachable by following references from a given initial set of store objects is called a *closure*.
+
+Adding, building, copying and deleting store objects must be done in a way that preserves referential integrity:
+
+- A newly added store object cannot have references, unless it is a build task.
+
+- Build results must only refer to store objects in the closure of the build inputs.
+
+  Building a store object will add appropriate references, according to the build task.
+
+- Store objects being copied must refer to objects already in the destination store.
+
+  Recursive copying must either proceed in dependency order or be atomic.
+
+- We can only safely delete store objects which are not reachable from any reference still in use.
+
+  <!-- more details in section on garbage collection, link to it once it exists -->
+
+[referential-integrity]: https://en.m.wikipedia.org/wiki/Referential_integrity
+[garbage-collection]: https://en.m.wikipedia.org/wiki/Garbage_collection_(computer_science)
+[immutable-object]: https://en.m.wikipedia.org/wiki/Immutable_object
+[opaque-data-type]: https://en.m.wikipedia.org/wiki/Opaque_data_type
+[unique-identifier]: https://en.m.wikipedia.org/wiki/Unique_identifier
+
+