JSON strings must be utf-8-clean. We also verify this in json_parse_string()
so we would reject a message with invalid utf-8 anyway.
It would probably be slightly cheaper to detect non-conformaning strings in
serialization, but then we'd have to fail serialization. By doing this early,
we give the caller a chance to handle the error nicely.
The test is adjusted to contain a valid utf-8 string after decoding of the
utf-32 encoding in json ("विवेकख्यातिरविप्लवा हानोपायः।", something about the
cessation of ignorance).
Let's add a concept of normalization: as preparation for signing json
records let's add a mechanism to bring JSON records into a well-defined
order so that we can safely validate JSON records.
This adds two booleans to each JsonVariant object: "sorted" and
"normalized". The latter indicates whether a variant is fully sorted
(i.e. all keys of objects listed in alphabetical order) recursively down
the tree. The former is a weaker property: it only checks whether the
keys of the object itself are sorted. All variants which are
"normalized" are also "sorted", but not vice versa.
The knowledge of the "sorted" property is then used to optimize
searching for keys in the variant by using bisection.
Both properties are determined at the moment the variants are allocated.
Since our objects are immutable this is safe.
This will call json_variant_sensitive() internally while parsing for
each allocated sub-variant. This is better than calling it a posteriori
at the end, because partially parsed variants will always be properly
erased from memory this way.
Should finally fix oss-fuzz-14688.
8688c29b5a wasn't enough.
The buffer retrieved from memstream has the size that the same as the written
data. When we write do write(f, s, strlen(s)), then no terminating NUL is written,
and the buffer is not (necessarilly) a proper C string.
Fixes#11600.
The code was effectively doing:
json_build(..., ({ char **_x = ((char**) ((const char*[]) {"one", "two", "three", "four", NULL })); _x; }));
but there was no guarantee that the storage for the array that _x points to
survives pass the end of the block. Essentially, STRV_MAKE cannot be used
inline inside of a block like this.
Found by inspecting results of running this small program:
int main(int argc, const char **argv) {
for (int i = 1; i < argc; i++) {
FILE *f;
char line[1024], prev[1024], *r;
int lineno;
prev[0] = '\0';
lineno = 1;
f = fopen(argv[i], "r");
if (!f)
exit(1);
do {
r = fgets(line, sizeof(line), f);
if (!r)
break;
if (strcmp(line, prev) == 0)
printf("%s:%d: error: dup %s", argv[i], lineno, line);
lineno++;
strcpy(prev, line);
} while (!feof(f));
fclose(f);
}
}
The test fails under valgrind, so there was an exception for valgrind.
Unfortunately that check only works when valgrind-devel headers are
available during build. But it is possible to have just valgrind installed,
or simply install it after the build, and then "valgrind test-json" would
fail.
It also seems that even without valgrind, this fails on some arm32 CPUs.
Let's do the usual-style test for absolute and relative differences.
Quite often when we generate objects some fields should only be
generated in some conditions. Let's add high-level support for that.
Matching the existing JSON_BUILD_PAIR() this adds
JSON_BUILD_PAIR_CONDITIONAL() which is very similar, but takes an
additional parameter: a boolean condition. If "true" this acts like
JSON_BUILD_PAIR(), but if false then the whole pair is suppressed.
This sounds simply, but requires a tiny bit of complexity: when complex
sub-variants are used in fields, then we also need to suppress them.
This is a nice little optimization when using static const strings: we
can now use them directly as JsonVariant objecs, without any additional
allocation.
Simply as a safety precaution so that json objects we read are not
arbitrary amounts deep, so that code that processes json objects
recursively can't be easily exploited (by hitting stack limits).
Follow-up for oss-fuzz#10908
(Nice is that we can accomodate for this counter without increasing the
size of the JsonVariant object.)
This was used by the dkr logic, which is gone now, hence remove this too.
Should we need it one day again the git history never forgets...
Note that this only covers the JSON parser. The JSON generator used by
"journalctl -o json" remains, as its much much simpler and requires no
infrastructure except printf() and the most basic escaping.
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.
This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.
Also touches a few unrelated include files.
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
We originally only supported escaping ucs2 encoded characters (as \uxxxx). This
only covers the BMP. Support escaping also utf16 surrogate pairs (on the form
\uxxxx\uyyyy) to cover all of unicode.