Our journal code is generally supposed to be written in a fashion that
the underlying file can be deallocated any time, i.e. our mmap of it
suddenly becomes all zeroes. The idea is that we catch that when parsing
everything. For that to work safely we need to make sure that when doing
arithmetics or comparisons on values read from the map we don't run into
TTOCTTOU issues when determining validity. Hence we need to copy out the
values before use and operate on the copies. This requires some special
care since the C compiler could suppress our copies as optimization.
Hence use the new READ_NOW() macro to force a copy by using memcpy(),
and use it whenever we start doing an arithmetic operation on it, or
validity checking of multiple steps.
Fixes: #14943
Mappings canbe replaced by all zeroes under our feet if vacuuming
decides to unallocate some file. Hence let's not check for this kind of
stuff in an assert.
(Typically, we should genreate runtime errors in this case, in
particular EBADMSG, which the callers generally look for. But in this
case this is just an extra precaution check anyway, so let's just remove
it.)
journal_file_fstat() returns an error if we call it on already unlinked
journal file and hence we never reach remove_file_real() which is the
entire point.
I must have made some mistake while testing the fix that got me thinking
the issue is gone while opposite was true.
Fixes#14695
This regression was introduced in #14913.
The current_file variable can be NULL, as, for example, with the
following commands:
* journalctl --list-boots
* journalctl -b -1 --no-pager
Since current_file is only checked for pointer equality with f, removing
the assertion is safe here.
When having a service which intentionally outputs multiple equal lines,
all these messages might be inserted with the same timestamp.
journalctl has a mechanism to avoid duplicate lines, which might be in
different journal files.
This patch allows duplicate lines, if they are from the same file.
Found by inspecting results of running this small program:
int main(int argc, const char **argv) {
for (int i = 1; i < argc; i++) {
FILE *f;
char line[1024], prev[1024], *r;
int lineno;
prev[0] = '\0';
lineno = 1;
f = fopen(argv[i], "r");
if (!f)
exit(1);
do {
r = fgets(line, sizeof(line), f);
if (!r)
break;
if (strcmp(line, prev) == 0)
printf("%s:%d: error: dup %s", argv[i], lineno, line);
lineno++;
strcpy(prev, line);
} while (!feof(f));
fclose(f);
}
}
Ideally, coccinelle would strip unnecessary braces too. But I do not see any
option in coccinelle for this, so instead, I edited the patch text using
search&replace to remove the braces. Unfortunately this is not fully automatic,
in particular it didn't deal well with if-else-if-else blocks and ifdefs, so
there is an increased likelikehood be some bugs in such spots.
I also removed part of the patch that coccinelle generated for udev, where we
returns -1 for failure. This should be fixed independently.
Now that we don't (mis-)use the env file parser to parse kernel command
lines there's no need anymore to override the used newline character
set. Let's hence drop the argument and just "\n\r" always. This nicely
simplifies our code.
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
Most our other parsing functions do this, let's do this here too,
internally we accept that anyway. Also, the closely related
load_env_file() and load_env_file_pairs() also do this, so let's be
systematic.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
Previously the compression threshold was hardcoded to 512, which meant that
smaller values wouldn't be compressed. This left some storage savings on the
table, so instead, we make that number tunable.
Let's add a common implementation for regular file checks, that are
careful to return the right error code (EISDIR/EISLNK/EBADFD) when we
are encountering a wrong file node.
Let's make sure we aren't confused if a journal file is replaced by a
different one (for example due to rotation) if we are in a q overflow:
let's compare the inode/device information, and if it changed replace
any open file object as needed.
Fixes: #8198
Let's be more careful with the naming, and indicate that the function
is about *named* journal files, and will validate the name as needed.
(in opposition to add_any_file() which doesn't care about names)
In that case we have no inotify fd yet, and there's nothing to process
hence. Let's make the call a NOP.
(Previously, without this change we'd end up trying to read off inotify
fd -1, which is quite a problem... 😢)
This adds proper handling of IN_Q_OVERFLOW: when the inotify queue runs
over we'll reiterate all directories we are looking at. At the same time
we'll mark all files and directories we encounter that way with a
generation counter we first increased. All files and directories not
marked like this are then unloaded.
With this logic we do the best when the inotify queue overflows: we
synchronize our in-memory state again with what's on disk.
This contains some refactoring of the directory logic, to share more
code between uuid directories and "root" directories and generally make
things a bit more readable by splitting things up into smaller bits.
See: #7998#8032
This changes real_journal_next() to leverage the IteratedCache for
accelerating iteration across the open journal files.
journalctl timing comparisons with 100 journal files of 8MiB size
party to this boot:
Pre (~v235):
# time ./journalctl -b --no-pager > /dev/null
real 0m9.613s
user 0m9.560s
sys 0m0.053s
# time ./journalctl -b --no-pager > /dev/null
real 0m9.548s
user 0m9.525s
sys 0m0.023s
# time ./journalctl -b --no-pager > /dev/null
real 0m9.612s
user 0m9.582s
sys 0m0.030s
Post-IteratedCache:
# time ./journalctl -b --no-pager > /dev/null
real 0m8.449s
user 0m8.425s
sys 0m0.024s
# time ./journalctl -b --no-pager > /dev/null
real 0m8.409s
user 0m8.382s
sys 0m0.027s
# time ./journalctl -b --no-pager > /dev/null
real 0m8.410s
user 0m8.350s
sys 0m0.061s
~12.5% improvement, the benefit increases the more log files there are.
log.h really should only include the bare minimum of other headers, as
it is really pulled into pretty much everything else and already in
itself one of the most basic pieces of code we have.
Let's hence drop inclusion of:
1. sd-id128.h because it's entirely unneeded in current log.h
2. errno.h, dito.
3. sys/signalfd.h which we can replace by a simple struct forward
declaration
4. process-util.h which was needed for getpid_cached() which we now hide
in a funciton log_emergency_level() instead, which nicely abstracts
the details away.
5. sys/socket.h which was needed for struct iovec, but a simple struct
forward declaration suffices for that too.
Ultimately this actually makes our source tree larger (since users of
the functionality above must now include it themselves, log.h won't do
that for them), but I think it helps to untangle our web of includes a
tiny bit.
(Background: I'd like to isolate the generic bits of src/basic/ enough
so that we can do a git submodule import into casync for it)