explain store directory

This commit is contained in:
Valentin Gagarin 2022-06-09 11:07:50 +02:00
parent f632816cba
commit fa7ad4593d
4 changed files with 94 additions and 110 deletions

View file

@ -17,8 +17,10 @@
- [Upgrading Nix](installation/upgrading.md)
- [Architecture](architecture/architecture.md)
- [Store](architecture/store/store.md)
- [Store Object](architecture/store/objects.md)
- [Store Path](architecture/store/paths.md)
- [Store Path](architecture/store/path.md)
- [Digest](architecture/store/path.md#digest)
- [Input Addressing](architecture/store/path.md#input-addressing)
- [Content Addressing](architecture/store/path.md#content-addressing)
- [Package Management](package-management/package-management.md)
- [Basic Package Management](package-management/basic-package-mgmt.md)
- [Profiles](package-management/profiles.md)

View file

@ -1,48 +0,0 @@
# Store Object
Nix organizes the data it manages into *store objects*.
A store object is the pair of
- a [file system object](#file-system-object)
- a set of [references](#reference) to store objects.
We call a store object's outermost file system object the *root*.
```haskell
data StoreOject = StoreObject {
root :: FileSystemObject
, references :: Set StoreObject
}
```
## File system object {#file-system-object}
The Nix store uses a simple file system model.
Every file system object is one of the following:
- File: an executable flag, and arbitrary data for contents
- Directory: mapping of names to child file system objects
- [Symbolic link](https://en.m.wikipedia.org/wiki/Symbolic_link): may point anywhere.
```haskell
data FileSystemObject
= File { isExecutable :: Bool, contents :: Bytes }
| Directory { entries :: Map FileName FileSystemObject }
| SymLink { target :: Path }
```
A bare file or symlink can be a root file system object.
Symlinks pointing outside of their own root, or to a store object without a matching reference, are allowed, but might not function as intended.
### Reference scanning
While references could be arbitrary paths, Nix requires them to be store paths to ensure correctness.
Anything outside a given store is not under control of Nix, and therefore cannot be guaranteed to be present when needed.
However, having references match store paths in files is not enforced by the data model:
Store objects could have excess or incomplete references with respect to store paths found in their file contents.
Scanning files therefore allows reliably capturing run time dependencies without declaring them explicitly.
Doing it at build time and persisting references in the store object avoids repeating this time-consuming operation.

View file

@ -1,78 +1,103 @@
# Store Path
A store path is a pair of a 20-byte digest and a name.
Nix implements [references](store.md#reference) to [store objects](store.md#store-object) as *store paths*.
## String representation
Store paths are pairs of
A store path is rendered as the concatenation of
- a 20-byte [digest](#digest) for identification
- a symbolic name for people to read.
- a store directory
- a path-separator (`/`)
- the digest rendered as Base-32 (20 arbitrary bytes becomes 32 ASCII chars)
- a hyphen (`-`)
- the name
Let's take the store path from the very beginning of this manual as an example:
/nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1
This parses like so:
/nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^
store dir digest name
We then can discard the store dir to recover the conceptual pair that is a store path:
Example:
{
digest: "b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z",
name: "firefox-33.1",
}
### Where did the "store directory" come from?
It is rendered to a file system path as the concatenation of
If you notice, the above references a "store directory", but that is *not* part of the definition of a store path.
We can discard it when parsing, but what about when printing?
We need to get a store directory from *somewhere*.
- [store directory](#store-directory)
- path-separator (`/`)
- [digest](#digest) rendered in [base-32](https://en.m.wikipedia.org/wiki/Base32) (20 arbitrary bytes become 32 ASCII characters)
- hyphen (`-`)
- name
The answer is, the store directory is a property of the store that contains the store path.
The explanation for this is simple enough: a store is notionally mounted as a directory at some location, and the store object's root file system likewise mounted at this path within that directory.
Example:
This does, however, mean the string representation of a store path is not derived just from the store path itself, but is in fact "context dependent".
/nix/store/b6gvzjyb2pg0kjfwrjmg1vfhh54ad73z-firefox-33.1
|--------| |------------------------------| |----------|
store directory digest name
## The digest
## Store Directory {#store-directory}
The calculation of the digest is quite complicated for historical reasons.
The details of the algorithms will be discussed later once more concepts have been introduced.
For now, we just concern ourselves with the *key properties* of those algorithms.
Every [store](./store.md) has a store directory.
If the store has a [file system representation](./store.md#files-and-processes), this directory contains the stores [file system objects](#file-system-object), which can be addressed by [store paths](#store-path).
This means a store path is not just derived from the referenced store object itself, but depends on the store the store object is in.
::: {.note}
**Historical note** The 20 byte restriction is because originally a digests were SHA-1 hashes.
This is no longer true, but longer hashes and other information are still boiled down to 20 bytes.
The store directory defaults to `/nix/store`, but is in principle arbitrary.
:::
Store paths are either *content-addressed* or *input-addressed*.
It is important which store a given store object belongs to:
Files in the store object can contain store paths, and processes may read these paths.
Nix can only guarantee [referential integrity](store.md#closure) if store paths do not cross store boundaries.
Therefore one can only copy store objects if
- the source and target stores' directories match
or
- the store object in question has no references, that is, contains no store paths.
To move a store object to a store with a different store directory, it has to be rebuilt, together with all its dependencies.
It is in general not enough to replace the store directory string in file contents, as this may break internal offsets or content hashes.
# Digest {#digest}
In a [store path](#store-path), the [digest][digest] is the output of a [cryptographic hash function][hash] of either all *inputs* involved in building the referenced store object or its actual *contents*.
Store objects are therefore said to be either [input-addressed](#input-addressing) or [content-addressed](#content-addressing).
::: {.note}
The former is a standard term used elsewhere.
The later is our own creation to evoke a contrast with content addressing.
**Historical note**: The 20 byte restriction is because originally digests were [SHA-1][sha-1] hashes.
This is no longer true, but longer hashes and other information are still truncated to 20 bytes for compatibility.
:::
Content addressing means that the store path digest ultimately derives from referred store object's contents, namely its file system objects and references.
There is more than one *method* of content-addressing, however.
Still, if one does know the content addressing schema that was used,
(or guesses, there isn't that many yet!)
one can recalculate the store path and thus verify the store object.
[digest]: https://en.m.wiktionary.org/wiki/digest#Noun
[hash]: https://en.m.wikipedia.org/wiki/Cryptographic_hash_function
[sha-1]: https://en.m.wikipedia.org/wiki/SHA-1
Input addressing means that the store path digest derives from how the store path was produced, namely the "inputs" and plan that it was built from.
Store paths of this sort can *not* be validated from the content of the store object.
Rather, the store object might come with the store path it expects to be referred to by, and a signature of that path, the contents of the store path, and other metadata.
The signature indicates that someone is vouching for the store object really being the results of a plan with that digest.
While metadata is included in the digest calculation explaining which method it was calculated by, this only serves to thwart pre-image attacks.
That metadata is scrambled with everything else so that it is difficult to tell how a given store path was produced short of a brute-force search.
In the parlance of referencing schemes, this means that store paths are not "self-describing".
### Reference scanning
While references could be arbitrary paths, Nix requires them to be store paths to ensure correctness.
Anything outside a given store is not under control of Nix, and therefore cannot be guaranteed to be present when needed.
However, having references match store paths in files is not enforced by the data model:
Store objects could have excess or incomplete references with respect to store paths found in their file contents.
Scanning files therefore allows reliably capturing run time dependencies without declaring them explicitly.
Doing it at build time and persisting references in the store object avoids repeating this time-consuming operation.
## Input Addressing {#input-addressing}
Input addressing means that the digest derives from how the store object was produced, namely its build inputs and build plan.
To compute the hash of a store object one needs a deterministic serialisation, i.e., a binary string representation which only changes if the store object changes.
Nix has a custom serialisation format called Nix Archive (NAR)
Store object references of this sort can *not* be validated from the content of the store object.
Rather, a cryptographic signature has to be used to indicate that someone is vouching for the store object really being produced from a build plan with that digest.
## Content Addressing {#content-addressing}
Content addressing means that the digest derives from the store object's contents, namely its file system objects and references.
If one knows content addressing was used, one can recalculate the reference and thus verify the store object.
Content addressing is currently only used for the special cases of source files and "fixed-output derivations", where the contents of a store object are known in advance.
Content addressing of build results is still an [experimental feature subject to some restrictions](https://github.com/tweag/rfcs/blob/cas-rfc/rfcs/0062-content-addressed-paths.md).

View file

@ -67,18 +67,19 @@ As it keeps track of references, it can [garbage-collect][garbage-collection] un
[ store ] --> collect garbage --> [ store' ]
## Closure
## Closure {#closure}
Nix stores have the *closure property*: for each store object in the store, all the store objects it references must also be in the store.
Nix stores ensure [referential integrity][referential-integrity]: for each store object in the store, all the store objects it references must also be in the store.
Adding, building, copying and deleting store objects must be done in a way that obeys this property:
The set of all store objects reachable by following references from a given initial set of store objects is called a *closure*.
Adding, building, copying and deleting store objects must be done in a way that preserves referential integrity:
- A newly added store object cannot have references, unless it is a build task.
- Build results must only refer to store objects in the closure of the build inputs.
Building a store object will add appropriate references, according to the build task.
These references can only come from declared build inputs.
- Store objects being copied must refer to objects already in the destination store.
@ -86,16 +87,15 @@ Adding, building, copying and deleting store objects must be done in a way that
- We can only safely delete store objects which are not reachable from any reference still in use.
Garbage collection will delete those store objects that cannot be reached from any reference in use.
<!-- more details in section on garbage collection, link to it once it exists -->
[referential-integrity]: https://en.m.wikipedia.org/wiki/Referential_integrity
[garbage-collection]: https://en.m.wikipedia.org/wiki/Garbage_collection_(computer_science)
[immutable-object]: https://en.m.wikipedia.org/wiki/Immutable_object
[opaque-data-type]: https://en.m.wikipedia.org/wiki/Opaque_data_type
[unique-identifier]: https://en.m.wikipedia.org/wiki/Unique_identifier
## Files and Processes
## Files and Processes {#files-and-processes}
Nix maps between its store model and the [Unix paradigm][unix-paradigm] of [files and processes][file-descriptor], by encoding immutable store objects and opaque identifiers as file system primitives: files and directories, and paths.
That allows processes to resolve references contained in files and thus access the contents of store objects.
@ -103,11 +103,16 @@ That allows processes to resolve references contained in files and thus access t
Store objects are therefore implemented as the pair of
- a [file system object](fso.md) for data
- a set of *store paths* for references.
- a set of [store paths](paths.md) for references.
[unix-paradigm]: https://en.m.wikipedia.org/wiki/Everything_is_a_file
[file-descriptor]: https://en.m.wikipedia.org/wiki/File_descriptor
The following diagram shows a radical simplification of how Nix interacts with the operating system:
It uses files as build inputs, and build outputs are files again.
On the operating system, files are either "dead" data, or "live" as processes, which in turn operate on files, or can bring them to life.
A build function also amounts to an operating system process (not depicted).
```
+-----------------------------------------------------------------+
| Nix |