Commit Graph

344 Commits

Author SHA1 Message Date
Vito Caputo 258190a0d5 mmap-cache: drop ret_size from mmap_cache_get()
The ret_size result is a bit of an awkward optimization that in a
sense enables bypassing the mmap-cache API, while encouraging
duplication of logic it already implements.

It's only utilized in one place; journal_file_move_to_object(),
apparently to avoid the overhead of remapping the whole object
again once its header, and thus its actual size, is known.

With mmap-cache's context cache, the overhead of simply
re-getting the object with the now known size should already be
negligible.  So it's not clear what benefit this brings, unless
avoiding some function calls that do very little in the hot
context-cache hit case is of such a priority.

There's value in having all object-sized gets pass through
mmap_cache_get(), as it provides a single entrypoint for
instrumentation in profiling/statistics gathering.  When
journal_file_move_to_object() bypasses getting the full object
size, you don't capture the full picture on the mmap-cache side
in terms of object sizes explicitly loaded from a journal file.

I'd like to see additional accounting in mmap_cache_get() in a
future commit, taking advantage of this change.
2020-12-13 11:14:43 +00:00
Vito Caputo 104fc4be11 mmap-cache: bind prot(ection) to MMapFileDescriptor
There are no mmap_cache_get() users that actually deviate prot
from the JournalFile's f->prot.

So there's no point in making this a separate parameter to
mmap_cache_get(), nor is there any need to store it in
JournalFile's f->prot.

Instead just pass it to mmap_cache_add_fd() at MMapFileDescriptor
creation, storing it in there for the mmap() callers, which
already receive MMapFileDescriptor *.

For functions receiving both an MMapFileDescriptor * and prot,
the prot argument has been simply removed and call sites updated.

Formalizing this fd:prot binding at the public API also enables
discarding the prot check in window_matches(), which is a hot
function on long window lists, so a minor CPU efficiency gain
should be had there as seen with the past removal of the fd
check.  Unnoticable for uncached journals, but maybe a little
runtime improvement when cached in specific circumstances.

window_matches_fd() has also been simplified to treat the
MMapFileDescrptor * as equivalent to its fd and prot.
2020-12-10 13:03:31 +01:00
Yu Watanabe db9ecf0501 license: LGPL-2.1+ -> LGPL-2.1-or-later 2020-11-09 13:23:58 +09:00
Lennart Poettering 39cf0351c5 tree-wide: make use of new relative time events in sd-event.h 2020-07-28 11:24:55 +02:00
Zbigniew Jędrzejewski-Szmek e9dd698407 tree-wide: fixes for assorted grammar and spelling issues
Fixes #16363. Also includes some changes where I generalized the pattern.
2020-07-06 11:29:05 +02:00
Lennart Poettering d80b051cea tree-wide: add new HAVE_COMPRESSION compile time flag
let's simplify the checks for ZSTD/LZ4/XZ

As suggested:

https://github.com/systemd/systemd/pull/16096#discussion_r440705585
2020-06-25 15:02:45 +02:00
Lennart Poettering 8653185a9e journal: support zstd compression for large objects in journal files 2020-06-25 15:02:18 +02:00
Lennart Poettering 0dbe57ee86 journal-file: when individual hash chains grow too large, rotate
Even with the new keyed hash table journal feature: if an attacker
manages to get access to the journal file id it could synthesize records
that result in hash collisions. Let's rotate automatically when we
notice that, so that a new journal file ID is generated, our performance
is restored and the attacker has to guess a new file ID before being
able to trigger the issue again.

That said, untrusted peers should never get access to journal files in
the first case...
2020-06-25 15:02:00 +02:00
Lennart Poettering 4ce534f4cd journal: use a different hash function for each journal file
This adds a new (incompatible) feature to journal files: if enabled the
hash function used for the hash tables is no longer jenkins hash with a
zero key, but siphash keyed by the file uuid that is included in the
file header anyway. This should make our hash tables more robust against
collision attacks, as long as the attacker has no read access to the
journal files. We switch from jenkins to siphash simply because it's
more well-known and we standardize for the rest of our codebase onto it.

This is hardening in order to make collision attacks harder for clients
that can forge log messages but have no read access to the logs. It has
no effect on clients that have read access.
2020-06-25 15:01:45 +02:00
Lennart Poettering 20b0acfacd journal: rename hash64() to jenkins_hash64()
Let's prefix this with "jenkins_" since it wraps the jenkins hash. We
want to add support for other hash functions to journald soon, hence
better be clear with what this is. In particular as all other symbols
defined by lookup3.h actually are prefixed "jenkins_".
2020-06-25 15:01:36 +02:00
Lennart Poettering f4474e004d journal-file: rename return parameters to ret_xyz
Let's clean this up a bit, following our usual nomenclature to name
return parameters ret-xyz.

This is mostly a bit of renaming, but there's also some minor other
changes: if we return a pointer to a mmap'ed object plus its offset, in
almost all cases we are happy if either parameter is NULL in case the
caller is not interested in it. Let's fix the remaining case to do this
too, to minimize surprises.
2020-06-25 15:01:22 +02:00
Lennart Poettering 5030c85a3e journal-file: also show field hash table size in debug output 2020-06-25 15:01:17 +02:00
Lennart Poettering e958c05703 journal-file: simplify boot ID acquiring 2020-06-25 15:01:12 +02:00
Lennart Poettering bfbd5be02a journal: no need to check offset twice, journal_file_move_to_object() does it again 2020-04-23 12:13:21 +02:00
Lennart Poettering 893e0f8fb6 journal: make sure to explicitly copy out values of mmap before doing arithmetics on them
Our journal code is generally supposed to be written in a fashion that
the underlying file can be deallocated any time, i.e. our mmap of it
suddenly becomes all zeroes. The idea is that we catch that when parsing
everything. For that to work safely we need to make sure that when doing
arithmetics or comparisons on values read from the map we don't run into
TTOCTTOU issues when determining validity. Hence we need to copy out the
values before use and operate on the copies. This requires some special
care since the C compiler could suppress our copies as optimization.
Hence use the new READ_NOW() macro to force a copy by using memcpy(),
and use it whenever we start doing an arithmetic operation on it, or
validity checking of multiple steps.

Fixes: #14943
2020-04-23 12:13:10 +02:00
Lennart Poettering 711398986e journal: several minor coding style fixes/clean-ups 2020-04-23 12:12:58 +02:00
Lennart Poettering 20ee282bb7 journal-file: avoid risky subtraction when validity checking object
The value might change beneath what we do, and hence let's avoid any
chance of underflow.
2020-04-23 12:12:17 +02:00
Michal Sekletár 28ca867abd sd-journal: close journal files that were deleted by journald before we've setup inotify watch
Fixes #14695
2020-02-05 18:34:52 +01:00
Lennart Poettering 5905d7cf5b tree-wide: use SD_ID128_STRING_MAX where appropriate 2019-12-10 11:56:18 +01:00
Vito Caputo a602d93e44 journal-file: delete some unnecessary braces
Trivial change, just something I noticed skimming the code.
2019-11-10 12:39:44 +01:00
Thibault Nélis 2c54acb112 journal: Consistently capitalize printed header entries
Per comments in https://github.com/systemd/systemd/pull/13808.
2019-10-22 10:32:09 +02:00
Zbigniew Jędrzejewski-Szmek 6aae0b1af4 journald: lower keep_free to 5% and raise min_use to 2%
https://bugzilla.redhat.com/show_bug.cgi?id=1715699

> /dev/mapper/live-rw  6.4G  5.7G  648M  91% /
> systemd-journald[905]: Fixed min_use=1.0M max_use=648.7M max_size=81.0M min_size=512.0K keep_free=973.1M n_max_files=100

When journald is started, we pick keep_free as 15% of the disk size. When the
fs is almost filled, we will only keep one journal file around and rotate very
often (because min_size is very small).

Let's set min use to something reasonable, so that we get more useful logs that
will cover at least the full boot.

Some cases considered in the PR:

> /dev/mapper/live-rw 6.4G 5.7G 648M 91% /

keep_free→MIN(327,100)→100 MB.
min_use→16MB.
effective range: 16 MB – 548 MB

> /dev/mapper/fedora_krowka-root 78G 69G 5.7G 93% /

keep_free → MIN(4GB, 100MB)→100MB
min_use→16MB
effective range: 16 MB – 5.6 GB
(but then there's the max_use limit, which cuts the range down)

> 4TB, 4GB free

keep_free → MIN(209715, 100) → 100 MB
min_use→16MB
effective range: 16 MB – 4.9 GB
(also effectively limited by max_use)

Also replace unneeded width suffixes with spaces, I think this is more
readable, and drop DEFAULT_ prefixes in cases where this setting is
simply a bound, and cannot be overridden by user config, hence is not
a default.
2019-07-26 16:45:49 +02:00
Zbigniew Jędrzejewski-Szmek 170a434c78 journal: emit debug log about settings only once (or when changed)
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=902795
https://bugzilla.redhat.com/show_bug.cgi?id=1715699
report "thousands" of those messages. I think this occurs when journald
rotates files very quickly. Nevertheless, logging this over and over is not
useful, let's do it just once.
2019-07-19 18:11:16 +02:00
Yu Watanabe aa89266900 util: introduce format_bytes_full()
And move it into format-util.c.
2019-06-19 23:15:19 +09:00
Yu Watanabe c377a6f3ad journal: do not trigger assertion when journal_file_close() get NULL
We generally expect destructors to not complain if a NULL argument is passed.

Closes #12400.
2019-05-28 18:07:18 +09:00
Lennart Poettering 5087825ea7 journald: output a proper error message when the journal is used on fs that doesn't do mmap() properly
Prompted by:

https://lists.freedesktop.org/archives/systemd-devel/2019-May/042708.html
2019-05-22 18:56:24 +02:00
Zbigniew Jędrzejewski-Szmek 1d3fe304fd Use sd_event_source_disable_unref() 2019-05-10 16:55:37 +02:00
Chris Morin 924426a703 journal-remote: use source's boot-id
systemd-journal-remote always wrote the boot-id of the device it was running on
to the header of its journal files. When the source had a different boot-id
(because it was generated on a different boot, or a different device), the
boot-ids in the file were inconsistent. The _BOOT_ID field was that of the
source, but the journal file header and each entry object header were that of
the device systemd-journal-remote ran on. This breaks journalctl --list-boots
on any of these files.

Set the boot-id in the header to be that of the source. This also fixes the
entry object headers.
2019-04-02 10:32:21 +02:00
Chris Morin 08f9e80b3f journal-file: handle SIGBUS on offlining thread
The thread launched in journal_file_set_offline() accesses a memory
mapped file, so it needs to handle SIGBUS. Leave SIGBUS unblocked on the
offlining thread so that it uses the same handler as the main thread.

The result of triggering SIGBUS in a thread where it's blocked is
undefined in Linux. The tested implementations were observed to cause
the default handler to run, taking down the whole journald process.

We can leave SIGBUS unblocked in multiple threads since it's handler is
thread-safe. If SIGBUS is sent to the journald process asynchronously
(i.e. with kill, sigqueue, or raise), either thread handling it will
result in the same behavior: it will install the default handler and
reraise the signal, killing the process.

Fixes: #12042
2019-03-20 13:02:04 +01:00
Lennart Poettering 760877e90c util: split out sorting related calls to new sort-util.[ch] 2019-03-13 12:16:43 +01:00
Lennart Poettering 0a9707187b util: split out memcmp()/memset() related calls into memory-util.[ch]
Just some source rearranging.
2019-03-13 12:16:43 +01:00
Zbigniew Jędrzejewski-Szmek cd2a429ed7 tree-wide: use assert_se() for signal operations with constants
Continuation of a3ebe5eb620e49f0d24082876cafc7579261e64f:
in other places we sometimes use assert_se(), and sometimes normal error
handling. sigfillset and sigaddset can only fail if mask is NULL (which cannot
happen if we are passing in a reference), or if the signal number is invalid
(which really shouldn't happen when we are using a constant like SIGCHLD. If
SIGCHLD is invalid, we have a bigger problem). So let's simplify things and
always use assert_se() in those cases.

In sigset_add_many() we could conceivably pass an invalid signal, so let's keep
normal error handling here. The caller can do assert_se() around the
sigprocmask_many() call if appropriate.

'>= 0' is used for consistency with the rest of the codebase.
2018-12-21 19:49:28 +01:00
Zbigniew Jędrzejewski-Szmek baaa35ad70 coccinelle: make use of SYNTHETIC_ERRNO
Ideally, coccinelle would strip unnecessary braces too. But I do not see any
option in coccinelle for this, so instead, I edited the patch text using
search&replace to remove the braces. Unfortunately this is not fully automatic,
in particular it didn't deal well with if-else-if-else blocks and ifdefs, so
there is an increased likelikehood be some bugs in such spots.

I also removed part of the patch that coccinelle generated for udev, where we
returns -1 for failure. This should be fixed independently.
2018-11-22 10:54:38 +01:00
Zbigniew Jędrzejewski-Szmek a03d43593c journal: fix sort order of header includes 2018-11-20 07:27:37 +01:00
Zbigniew Jędrzejewski-Szmek b6cdfbe5c4 journal: simplify use of sd_event_source_get_enabled() 2018-11-16 09:03:41 +01:00
Zbigniew Jędrzejewski-Szmek ca5d90d4d9 journal-file: get rid of a helper variable
It doesn't really save much in code length. Having the event source named
explicitly makes it easier to understand the code at a glance.
2018-11-16 09:03:41 +01:00
Evgeny Vereshchagin e0f768c356 journal: drop an unused variable clang is complaining about
../../src/systemd/src/journal/journal-file.c:3592:30: warning: unused variable 'p' [-Wunused-variable]
        _cleanup_free_ char *p = NULL;
                             ^
1 warning generated.

This is a follow-up to 6812765891.
2018-10-29 15:21:58 +00:00
Lennart Poettering 971b52c485 journal-file: structured initialization is your friend 2018-10-25 21:44:48 +02:00
Lennart Poettering 6812765891 journal-file: refactor journal_file_open_reliably()
Let's split out the part that actually renames the file in case we can't
open it into a new function journal_file_dispose().

This way we can reuse the function in other cases where we want to open
a file but can't.
2018-10-25 21:43:09 +02:00
Lennart Poettering 7a4d21ad20 journal-file: refactor journal_file_rotate()
Let's split the function in three: the part where we archive the old
file into journal_file_archive(), and the part where we initiate the
deferred closing into journal_file_initiate_close().
journal_file_rotate() then simply becomes a wrapper around these two
calls, and the opening of the new journal file.

This useful so that we can archive journal files without having to open
new ones, i.e. to do only the archival part of the rotation, without the
rotation part.
2018-10-25 21:43:09 +02:00
Yu Watanabe 90c88092e6 tree-wide: use CMP() macro where applicable
Follow-up for 6dd91b3682.
2018-10-16 19:55:38 +02:00
Lennart Poettering 6dd91b3682 tree-wide: CMP()ify all the things
Let's employ coccinelle to fix everything up automatically for us.
2018-10-16 17:45:53 +02:00
David Tardon c52368509f journal-file: avoid calling ftruncate with invalid fd
This can happen if journal_file_close is called from the failure
handling code of journal_file_open before f->fd was established.
2018-10-12 14:51:35 +02:00
Lennart Poettering db9a42545a chattr: optionally, return the old flags when updating them 2018-10-08 21:40:44 +02:00
Yu Watanabe 93bab28895 tree-wide: use typesafe_qsort() 2018-09-19 08:02:52 +09:00
Lennart Poettering 0c69794138 tree-wide: remove Lennart's copyright lines
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
2018-06-14 10:20:20 +02:00
Lennart Poettering 818bf54632 tree-wide: drop 'This file is part of systemd' blurb
This part of the copyright blurb stems from the GPL use recommendations:

https://www.gnu.org/licenses/gpl-howto.en.html

The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.

hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
2018-06-14 10:20:20 +02:00
Lennart Poettering 0be9b12be2
Merge pull request #9147 from keszybz/runtime-enablement
Runtime enablement
2018-06-04 11:58:21 +02:00
Zbigniew Jędrzejewski-Szmek 846e541830 journal: small simplification 2018-05-31 20:42:04 +02:00
Zbigniew Jędrzejewski-Szmek d180c34998 journal: allow boot_id to be passed to journal_append_entry()
In this commit, this is done only in testing code, i.e. there is
no functional change apart from tests.
2018-05-31 14:30:23 +02:00