Commit graph

83 commits

Author SHA1 Message Date
John Ericson e739a5002d Avoid Windows macros in the parser and lexer
`FLOAT`, `INT`, and `IN` are identifers taken by macros.

The name `IN_KW` is chosen to match `OR_KW`, which is presumably named
that way for the same reason of dodging macros.
2024-01-12 19:51:36 -05:00
pennae b78e77b34c use custom location type in the parser
~1% parser speedup from not using TLS indirections, less on system eval.
this could have also gone in flex yyextra data, but that's significantly
slower for some reason (albeit still faster than thread locals).

before:

  Time (mean ± σ):      4.231 s ±  0.004 s    [User: 3.725 s, System: 0.504 s]
  Range (min … max):    4.226 s …  4.240 s    10 runs

after:

  Time (mean ± σ):      4.224 s ±  0.005 s    [User: 3.711 s, System: 0.512 s]
  Range (min … max):    4.218 s …  4.234 s    10 runs
2023-12-19 19:32:16 +01:00
pennae 2e0321912a use aligned flex tables
~2% speedup on parsing without eval, less (but still significant) on
system eval. having flex generate faster parsers leads to very strange
misparses. maybe re2c is worth investigating.

before:

  Time (mean ± σ):      4.260 s ±  0.003 s    [User: 3.754 s, System: 0.505 s]
  Range (min … max):    4.257 s …  4.266 s    10 runs

after:

  Time (mean ± σ):      4.231 s ±  0.004 s    [User: 3.725 s, System: 0.504 s]
  Range (min … max):    4.226 s …  4.240 s    10 runs
2023-12-19 19:32:16 +01:00
Yingchi Long 3c90340fe6 libexpr: use thread_local to make the parser thread-safe
If we call `adjustLoc`, the global variable `prev_yylloc` is shared
between threads and racy.

Currently, nix itself does not concurrently parsing files, but this is
helpful for libexpr users. (The parser is thread-safe except this.)
2023-07-03 16:05:43 +08:00
Eelco Dolstra 27ebb97d0a
Handle EOFs in string literals correctly
We can't return a STR token without setting a valid StringToken,
otherwise the parser will crash.

Fixes #6562.
2022-05-25 17:58:13 +02:00
pennae 6526d1676b replace most Pos objects/ptrs with indexes into a position table
Pos objects are somewhat wasteful as they duplicate the origin file name and
input type for each object. on files that produce more than one Pos when parsed
this a sizeable waste of memory (one pointer per Pos). the same goes for
ptr<Pos> on 64 bit machines: parsing enough source to require 8 bytes to locate
a position would need at least 8GB of input and 64GB of expression memory. it's
not likely that we'll hit that any time soon, so we can use a uint32_t index to
locate positions instead.
2022-04-21 21:46:06 +02:00
Sergei Trofimovich 9174d884d7 lexer: add error location to lexer errors
Before the change lexter errors did not report the location:

    $ nix build -f. mc
    error: path has a trailing slash
    (use '--show-trace' to show detailed location information)

Note that it's not clear what file generates the error.

After the change location is reported:

    $ src/nix/nix --extra-experimental-features nix-command build -f ~/nm mc
    error: path has a trailing slash

           at .../pkgs/development/libraries/glib/default.nix:54:18:

               53|   };
               54|   src = /tmp/foo/;
                 |                  ^
               55|
    (use '--show-trace' to show detailed location information)

Here we see both problematic file and the string itself.
2022-03-24 08:16:14 +00:00
pennae 0a7746603e remove ExprIndStr
it can be replaced with StringToken if we add another bit if information to
StringToken, namely whether this string should take part in indentation scanning
or not. since all escaping terminates indentation scanning we need to set this
bit only for the non-escaped IND_STRING rule.

this improves performance by about 1%.

 before

  nix search --no-eval-cache --offline ../nixpkgs hello
    Time (mean ± σ):      8.880 s ±  0.048 s    [User: 6.809 s, System: 1.643 s]
    Range (min … max):    8.781 s …  8.993 s    20 runs

  nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
    Time (mean ± σ):     375.0 ms ±   2.2 ms    [User: 339.8 ms, System: 35.2 ms]
    Range (min … max):   371.5 ms … 379.3 ms    20 runs

  nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
    Time (mean ± σ):      2.831 s ±  0.040 s    [User: 2.536 s, System: 0.225 s]
    Range (min … max):    2.769 s …  2.912 s    20 runs

 after

  nix search --no-eval-cache --offline ../nixpkgs hello
    Time (mean ± σ):      8.832 s ±  0.048 s    [User: 6.757 s, System: 1.657 s]
    Range (min … max):    8.743 s …  8.921 s    20 runs

  nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
    Time (mean ± σ):     367.4 ms ±   3.2 ms    [User: 332.7 ms, System: 34.7 ms]
    Range (min … max):   364.6 ms … 374.6 ms    20 runs

  nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
    Time (mean ± σ):      2.810 s ±  0.030 s    [User: 2.517 s, System: 0.225 s]
    Range (min … max):    2.742 s …  2.854 s    20 runs
2022-01-19 13:39:42 +01:00
pennae 72f42093e7 optimize unescapeStr
mainly to avoid an allocation and a copy of a string that can be
modified in place (ever since EvalState holds on to the buffer, not the
generated parser itself).

 # before

Benchmark 1: nix search --offline nixpkgs hello
  Time (mean ± σ):     571.7 ms ±   2.4 ms    [User: 563.3 ms, System: 8.0 ms]
  Range (min … max):   566.7 ms … 579.7 ms    50 runs

Benchmark 2: nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
  Time (mean ± σ):     376.6 ms ±   1.0 ms    [User: 345.8 ms, System: 30.5 ms]
  Range (min … max):   374.5 ms … 379.1 ms    50 runs

Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
  Time (mean ± σ):      2.922 s ±  0.006 s    [User: 2.707 s, System: 0.215 s]
  Range (min … max):    2.906 s …  2.934 s    50 runs

 # after

Benchmark 1: nix search --offline nixpkgs hello
  Time (mean ± σ):     570.4 ms ±   2.8 ms    [User: 561.3 ms, System: 8.6 ms]
  Range (min … max):   564.6 ms … 578.1 ms    50 runs

Benchmark 2: nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
  Time (mean ± σ):     375.4 ms ±   1.3 ms    [User: 343.2 ms, System: 31.7 ms]
  Range (min … max):   373.4 ms … 378.2 ms    50 runs

Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
  Time (mean ± σ):      2.925 s ±  0.006 s    [User: 2.704 s, System: 0.219 s]
  Range (min … max):    2.910 s …  2.942 s    50 runs
2022-01-13 18:06:15 +01:00
pennae 61a9d16d5c don't strdup tokens in the lexer
every stringy token the lexer returns is turned into a Symbol and not
used further, so we don't have to strdup. using a string_view is
sufficient, but due to limitations of the current parser we have to use
a POD type that holds the same information.

gives ~2% on system build, 6% on search, 8% on parsing alone

 # before

Benchmark 1: nix search --offline nixpkgs hello
  Time (mean ± σ):     610.6 ms ±   2.4 ms    [User: 602.5 ms, System: 7.8 ms]
  Range (min … max):   606.6 ms … 617.3 ms    50 runs

Benchmark 2: nix eval -f hackage-packages.nix
  Time (mean ± σ):     430.1 ms ±   1.4 ms    [User: 393.1 ms, System: 36.7 ms]
  Range (min … max):   428.2 ms … 434.2 ms    50 runs

Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
  Time (mean ± σ):      3.032 s ±  0.005 s    [User: 2.808 s, System: 0.223 s]
  Range (min … max):    3.023 s …  3.041 s    50 runs

 # after

Benchmark 1: nix search --offline nixpkgs hello
  Time (mean ± σ):     574.7 ms ±   2.8 ms    [User: 566.3 ms, System: 8.0 ms]
  Range (min … max):   569.2 ms … 580.7 ms    50 runs

Benchmark 2: nix eval -f hackage-packages.nix
  Time (mean ± σ):     394.4 ms ±   0.8 ms    [User: 361.8 ms, System: 32.3 ms]
  Range (min … max):   392.7 ms … 395.7 ms    50 runs

Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
  Time (mean ± σ):      2.976 s ±  0.005 s    [User: 2.757 s, System: 0.218 s]
  Range (min … max):    2.966 s …  2.990 s    50 runs
2022-01-13 18:06:14 +01:00
Eelco Dolstra 81e7c40264 Optimize primop calls
We now parse function applications as a vector of arguments rather
than as a chain of binary applications, e.g. 'substring 1 2 "foo"' is
parsed as

  ExprCall { .fun = <substring>, .args = [ <1>, <2>, <"foo"> ] }

rather than

  ExprApp (ExprApp (ExprApp <substring> <1>) <2>) <"foo">

This allows primops to be called immediately (if enough arguments are
supplied) without having to allocate intermediate tPrimOpApp values.

On

  $ nix-instantiate --dry-run '<nixpkgs/nixos/release-combined.nix>' -A nixos.tests.simple.x86_64-linux

this gives a substantial performance improvement:

  user CPU time:      median =      0.9209  mean =      0.9218  stddev =      0.0073  min =      0.9086  max =      0.9340  [rejected, p=0.00000, Δ=-0.21433±0.00677]
  elapsed time:       median =      1.0585  mean =      1.0584  stddev =      0.0024  min =      1.0523  max =      1.0623  [rejected, p=0.00000, Δ=-0.20594±0.00236]

because it reduces the number of tPrimOpApp allocations from 551990 to
42534 (i.e. only small minority of primop calls are partially
applied) which in turn reduces time spent in the garbage collector.
2021-11-04 15:03:40 +01:00
Taeer Bar-Yam f14660d5e2 reset yylloc when yyless(0) is called 2021-09-29 19:47:01 -04:00
Taeer Bar-Yam 8f9429dcab add antiquotations to paths 2021-08-06 06:46:05 -04:00
Pamplemousse 99f8fc995b libexpr: Fix read out-of-bound on the heap
Signed-off-by: Pamplemousse <xav.maso@gmail.com>
2021-07-14 09:09:42 -07:00
regnat 0d9e1af695 Remove an unknown pragma gcc warning 2020-12-02 14:33:20 +01:00
regnat 438977731c shut up clang warnings
- Fix some class/struct discrepancies
- Explicit the overloading of `run` in the `Cmd*` classes
- Ignore a warning in the generated lexer
2020-12-01 15:04:03 +01:00
Eelco Dolstra e14e62fddd Remove trailing whitespace 2020-06-15 14:12:39 +02:00
Ben Burdette 3bc9155dfc a few more 'format's rremoved 2020-04-22 15:00:11 -06:00
Guillaume Maudoux 6a5bf9b143 simplify handling of extra '}' 2018-10-27 00:14:51 +02:00
aszlig 0ad643ed5c
libexpr: Use int64_t for NixInt
Using a 64bit integer on 32bit systems will come with a bit of a
performance overhead, but given that Nix doesn't use a lot of integers
compared to other types, I think the overhead is negligible also
considering that 32bit systems are in decline.

The biggest advantage however is that when we use a consistent integer
size across all platforms it's less likely that we miss things that we
break due to that. One example would be:

https://github.com/NixOS/nixpkgs/pull/44233

On Hydra it will evaluate, because the evaluator runs on a 64bit
machine, but when evaluating the same on a 32bit machine it will fail,
so using 64bit integers should make that consistent.

While the change of the type in value.hh is rather easy to do, we have a
few more options available for doing the conversion in the lexer:

  * Via an #ifdef on the architecture and using strtol() or strtoll()
    accordingly depending on which architecture we are. For the #ifdef
    we would need another AX_COMPILE_CHECK_SIZEOF in configure.ac.
  * Using istringstream, which would involve copying the value.
  * As we're already using boost, lexical_cast might be a good idea.

Spoiler: I went for the latter, first of all because lexical_cast does
have an overload for const char* and second of all, because it doesn't
involve copying around the input string. Also, because istringstream
seems to come with a bigger overhead than boost::lexical_cast:

https://www.boost.org/doc/libs/release/doc/html/boost_lexical_cast/performance.html

The first method (still using strtol/strtoll) also wasn't something I
pursued further, because it is also locale-aware which I doubt is what
we want, given that the regex for int is [0-9]+.

Signed-off-by: aszlig <aszlig@nix.build>
Fixes: #2339
2018-08-29 01:05:52 +02:00
Eelco Dolstra 1ad19232c4
Don't return negative numbers from the flex tokenizer
Fixes #1374.
Closes #2129.
2018-05-11 12:05:12 +02:00
Eelco Dolstra f3c85f9eb3
Revert "Throw a specific error for incomplete parse errors."
This reverts commit 6498adb002. We don't
actually use IncompleteParseError in 'nix repl'.
2018-05-11 11:40:50 +02:00
Tuomas Tynkkynen a0e38c16bc libexpr: Recognize newline in more places in lexer
Flex's regexes have an annoying feature: the dot matches everything
except a newline. This causes problems for expressions like:

"${0}\
"

where the backslash-newline combination matches this rule instead of the
intended one mentioned in the comment:

    <STRING>\$|\\|\$\\ {
                    /* This can only occur when we reach EOF, otherwise the above
                    (...|\$[^\{\"\\]|\\.|\$\\.)+ would have triggered.
                    This is technically invalid, but we leave the problem to the
                    parser who fails with exact location. */
                    return STR;
                }
However, the parser actually accepts the resulting token sequence
('"' DOLLAR_CURLY 0 '}' STR '"'), which is a problem because the lexer
rule didn't assign anything to yylval. Ultimately this leads to a crash
when dereferencing a NULL pointer in ExprConcatStrings::bindVars().

The fix does change the syntax of the language in some corner cases
but I think it's only turning previously invalid (or crashing) syntax
to valid syntax. E.g.

"a\
b"

and

''a''\
b''

were previously syntax errors but now both result in "a\nb".

Found by afl-fuzz.
2018-03-02 17:30:48 +02:00
Tuomas Tynkkynen f67a7007a2 libexpr: Pre-reserve space in string in unescapeStr()
Avoids some malloc() traffic.
2018-02-16 04:39:43 +02:00
Eelco Dolstra 2c39e4eca0
Revert "Don't parse "x:x" as a URI"
This reverts commit f90f660b24.

This broke Hydra's release.nix, which contained

  preCheck = ''export LOGNAME=${LOGNAME:-foo}'';
2017-11-14 15:10:52 +01:00
Eelco Dolstra f90f660b24
Don't parse "x:x" as a URI
URIs now have to contain "://" or start with "channel:".
2017-10-30 17:58:01 +01:00
Jörg Thalheim 2fd8f8bb99 Replace Unicode quotes in user-facing strings by ASCII
Relevant RFC: NixOS/rfcs#4

$ ag -l | xargs sed -i -e "/\"/s/’/'/g;/\"/s/‘/'/g"
2017-07-30 12:32:45 +01:00
Guillaume Maudoux a143014d73 lexer: remove catch-all rules hiding real errors
With catch-all rules, we hide potential errors.
It turns out that a4744254 made one cath-all useless. Flex detected that
is was impossible to reach.
The other is more subtle, as it can only trigger on unfinished escapes
in unfinished strings, which only occurs at EOF.
2017-05-01 01:18:06 +02:00
Guillaume Maudoux a474425425 Fix lexer to support $' in multiline strings. 2017-05-01 01:15:40 +02:00
Eelco Dolstra 603f08506e
Tweak error message 2016-12-06 17:18:40 +01:00
Guillaume Maudoux e4b82af387 Improve error message on trailing path slashes 2016-11-27 17:48:46 +01:00
Guillaume Maudoux a5e761dddb Fix comments parsing
Fixed the parsing of multiline strings ending with an even number of
stars, like /** this **/.
Added test cases for comments.
2016-11-13 17:20:34 +01:00
Scott Olson 6498adb002 Throw a specific error for incomplete parse errors.
`nix-repl` will use this for deciding whether to keep waiting for input or
error out right away.
2016-02-24 04:32:21 -06:00
Eelco Dolstra b3e8d72770 Merge pull request #762 from ctheune/ctheune-floats
Implement floats
2016-02-12 12:49:59 +01:00
Eelco Dolstra 5d8b7eb3e1 Revert "Revert "next try for "don't abort when given unmatched '}' with 'start-condition stack underflow'. This fixes #751"""
This reverts commit b669d3d2e8.
2016-01-20 16:34:42 +01:00
Eelco Dolstra b669d3d2e8 Revert "next try for "don't abort when given unmatched '}' with 'start-condition stack underflow'. This fixes #751""
This reverts commit ed23c8568e. Let's
merge this *after* the 1.11.1 release.
2016-01-20 00:05:28 +01:00
Fabian Schmitthenner ed23c8568e next try for "don't abort when given unmatched '}' with 'start-condition stack underflow'. This fixes #751"
This reverts commit 8120b6fb8a and fixes the regression introduced in
8d22b26448.
2016-01-19 20:35:35 +00:00
Eelco Dolstra 8120b6fb8a Revert "don't abort when given unmatched '}' with 'start-condition stack underflow'. This fixes #751"
This reverts commit 8d22b26448. It
breaks Nixpkgs:

$ nix-env -qa
error: syntax error, unexpected IND_STR, expecting '}', at /home/eelco/Dev/nixpkgs-stable/pkgs/top-level/python-packages.nix:7605:8
2016-01-19 20:33:32 +01:00
Fabian Schmitthenner 8d22b26448 don't abort when given unmatched '}' with 'start-condition stack underflow'. This fixes #751 2016-01-12 20:40:41 +00:00
Christian Theune a12a43046b Edge condition: parser did not pick up floats starting exactly with 0. 2016-01-05 09:54:49 +01:00
Christian Theune f872262e08 Fix up float parsing. 2016-01-05 09:46:37 +01:00
Christian Theune 494fc5acbb Try a simplified version of float lexing that didn't work.
The last one I tried was botchered anyway ...
2016-01-05 00:53:22 +01:00
Christian Theune 14ebde5289 First hit at providing support for floats in the language. 2016-01-05 00:40:40 +01:00
Guillaume Maudoux 467977f203 Fix the parsing of "$"'s in strings. 2015-07-03 14:09:58 +02:00
Guillaume Maudoux 65e4dcd69b Fix the hack that resets the scanner state. 2015-07-03 13:53:36 +02:00
Shea Levy e0953d53de Allow the leading component of a path to be a ~ 2015-02-19 08:05:16 -05:00
Eelco Dolstra 11849a320e Use proper quotes everywhere 2014-08-20 18:03:48 +02:00
Shea Levy f9913f4422 Allow "bare" dynamic attrs
Now, in addition to a."${b}".c, you can write a.${b}.c (applicable
wherever dynamic attributes are valid).

Signed-off-by: Shea Levy <shea@shealevy.com>
2014-01-14 14:00:15 +01:00
Eelco Dolstra 33972629d7 Fix whitespace 2013-09-02 16:29:15 +02:00
Eelco Dolstra d308aeaf53 Store Nix integers as longs
So on 64-bit systems, integers are now 64-bit.

Fixes #158.
2013-08-19 12:35:03 +02:00