Commit Graph

104 Commits

Author SHA1 Message Date
Vito Caputo 104fc4be11 mmap-cache: bind prot(ection) to MMapFileDescriptor
There are no mmap_cache_get() users that actually deviate prot
from the JournalFile's f->prot.

So there's no point in making this a separate parameter to
mmap_cache_get(), nor is there any need to store it in
JournalFile's f->prot.

Instead just pass it to mmap_cache_add_fd() at MMapFileDescriptor
creation, storing it in there for the mmap() callers, which
already receive MMapFileDescriptor *.

For functions receiving both an MMapFileDescriptor * and prot,
the prot argument has been simply removed and call sites updated.

Formalizing this fd:prot binding at the public API also enables
discarding the prot check in window_matches(), which is a hot
function on long window lists, so a minor CPU efficiency gain
should be had there as seen with the past removal of the fd
check.  Unnoticable for uncached journals, but maybe a little
runtime improvement when cached in specific circumstances.

window_matches_fd() has also been simplified to treat the
MMapFileDescrptor * as equivalent to its fd and prot.
2020-12-10 13:03:31 +01:00
Yu Watanabe db9ecf0501 license: LGPL-2.1+ -> LGPL-2.1-or-later 2020-11-09 13:23:58 +09:00
Lennart Poettering d80b051cea tree-wide: add new HAVE_COMPRESSION compile time flag
let's simplify the checks for ZSTD/LZ4/XZ

As suggested:

https://github.com/systemd/systemd/pull/16096#discussion_r440705585
2020-06-25 15:02:45 +02:00
Lennart Poettering 8653185a9e journal: support zstd compression for large objects in journal files 2020-06-25 15:02:18 +02:00
Lennart Poettering 4ce534f4cd journal: use a different hash function for each journal file
This adds a new (incompatible) feature to journal files: if enabled the
hash function used for the hash tables is no longer jenkins hash with a
zero key, but siphash keyed by the file uuid that is included in the
file header anyway. This should make our hash tables more robust against
collision attacks, as long as the attacker has no read access to the
journal files. We switch from jenkins to siphash simply because it's
more well-known and we standardize for the rest of our codebase onto it.

This is hardening in order to make collision attacks harder for clients
that can forge log messages but have no read access to the logs. It has
no effect on clients that have read access.
2020-06-25 15:01:45 +02:00
Lennart Poettering a76560915f journal-file: use FLAGS_SET where appropriate 2020-06-25 15:00:44 +02:00
Michal Sekletár 28ca867abd sd-journal: close journal files that were deleted by journald before we've setup inotify watch
Fixes #14695
2020-02-05 18:34:52 +01:00
Yu Watanabe 627df1dc42 journal: use cleanup attribute at one more place 2019-05-28 18:07:18 +09:00
Zbigniew Jędrzejewski-Szmek f2dc22b447 headers: add missing includes
Fixes #12125.
2019-03-28 19:59:56 +01:00
Zbigniew Jędrzejewski-Szmek ca78ad1de9 headers: remove unneeded includes from util.h
This means we need to include many more headers in various files that simply
included util.h before, but it seems cleaner to do it this way.
2019-03-27 11:53:12 +01:00
Zbigniew Jędrzejewski-Szmek a03d43593c journal: fix sort order of header includes 2018-11-20 07:27:37 +01:00
Lennart Poettering 6812765891 journal-file: refactor journal_file_open_reliably()
Let's split out the part that actually renames the file in case we can't
open it into a new function journal_file_dispose().

This way we can reuse the function in other cases where we want to open
a file but can't.
2018-10-25 21:43:09 +02:00
Lennart Poettering 7a4d21ad20 journal-file: refactor journal_file_rotate()
Let's split the function in three: the part where we archive the old
file into journal_file_archive(), and the part where we initiate the
deferred closing into journal_file_initiate_close().
journal_file_rotate() then simply becomes a wrapper around these two
calls, and the opening of the new journal file.

This useful so that we can archive journal files without having to open
new ones, i.e. to do only the archival part of the rotation, without the
rotation part.
2018-10-25 21:43:09 +02:00
Lennart Poettering 0c69794138 tree-wide: remove Lennart's copyright lines
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
2018-06-14 10:20:20 +02:00
Lennart Poettering 818bf54632 tree-wide: drop 'This file is part of systemd' blurb
This part of the copyright blurb stems from the GPL use recommendations:

https://www.gnu.org/licenses/gpl-howto.en.html

The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.

hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
2018-06-14 10:20:20 +02:00
Zbigniew Jędrzejewski-Szmek d180c34998 journal: allow boot_id to be passed to journal_append_entry()
In this commit, this is done only in testing code, i.e. there is
no functional change apart from tests.
2018-05-31 14:30:23 +02:00
Zbigniew Jędrzejewski-Szmek 5a271b08b3 journal: remove unused args from journal_file_copy_entry() 2018-05-31 14:30:23 +02:00
Zbigniew Jędrzejewski-Szmek 11a1589223 tree-wide: drop license boilerplate
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.

I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
2018-04-06 18:58:55 +02:00
Lennart Poettering d9a43665eb
Merge pull request #8313 from alexgartrell/compression-threshold
Compression threshold
2018-03-21 12:37:54 +01:00
Lennart Poettering ffe535e43e journal-file: drop unused tail_entry_monotonic_valid field.
As pointed out by Matthijs van Duin:

https://lists.freedesktop.org/archives/systemd-devel/2018-March/040499.html
2018-03-20 23:31:11 +01:00
Alex Gartrell 57850536d5 journal: provide compress_threshold_bytes parameter
Previously the compression threshold was hardcoded to 512, which meant that
smaller values wouldn't be compressed. This left some storage savings on the
table, so instead, we make that number tunable.
2018-03-20 11:48:52 -07:00
Lennart Poettering 858749f731 sd-journal: properly handle inotify queue overflow
This adds proper handling of IN_Q_OVERFLOW: when the inotify queue runs
over we'll reiterate all directories we are looking at. At the same time
we'll mark all files and directories we encounter that way with a
generation counter we first increased. All files and directories not
marked like this are then unloaded.

With this logic we do the best when the inotify queue overflows: we
synchronize our in-memory state again with what's on disk.

This contains some refactoring of the directory logic, to share more
code between uuid directories and "root" directories and generally make
things a bit more readable by splitting things up into smaller bits.

See: #7998 #8032
2018-02-12 11:07:55 +01:00
Zbigniew Jędrzejewski-Szmek f916819053 journal: use new helpers with journal_file_close
journal_file_close_set() is not necessary anymore.
2017-11-28 21:34:50 +01:00
Zbigniew Jędrzejewski-Szmek 53e1b68390 Add SPDX license identifiers to source files under the LGPL
This follows what the kernel is doing, c.f.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
2017-11-19 19:08:15 +01:00
Zbigniew Jędrzejewski-Szmek 349cc4a507 build-sys: use #if Y instead of #ifdef Y everywhere
The advantage is that is the name is mispellt, cpp will warn us.

$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build

squash! build-sys: use #if Y instead of #ifdef Y everywhere

v2:
- fix incorrect setting of HAVE_LIBIDN2
2017-10-04 12:09:29 +02:00
Vito Caputo be7cdd8ec9 journal: explicitly add fds to mmap-cache (#6307)
This way we have a MMapFileDescriptor reference external to the cache,
and can supply the handle directly to mmap_cache_get(), eliminating
hashmap lookups entirely from the hot path.
2017-07-10 19:24:56 -04:00
Vito Caputo 8eb851711f journal: set STATE_ARCHIVED as part of offlining (#2740)
The only code path which makes a journal durable is via
journal_file_set_offline().

When we perform a rotate the journal's header->state is being set to
STATE_ARCHIVED prior to journal_file_set_offline() being called.

In journal_file_set_offline(), we short-circuit the entire offline when
f->header->state != STATE_ONLINE.

This all results in none of the journal_file_set_offline() fsync() calls
being reached when rotate archives a journal, so archived journals are
never explicitly made durable.

What we do now is instead of setting the f->header->state to
STATE_ARCHIVED directly in journal_file_rotate() prior to
journal_file_close(), we set an archive flag in f->archive for the
journal_file_set_offline() machinery to honor by committing
STATE_ARCHIVED instead of STATE_OFFLINE when set.

Prior to this, rotated journals were never getting fsync() explicitly
performed on them, since journal_file_set_offline() short-circuited.
Obviously this is undesirable, and depends entirely on the underlying
filesystem as to how much durability was achieved when simply closing
the file.

Note that this problem existed prior to the recent asynchronous fsync
changes, but those changes do facilitate our performing this durable
offline on rotate without blocking, regardless of the underlying
filesystem sync-on-close semantics.
2016-04-27 08:29:43 +02:00
Lennart Poettering 5d1ce25728 sd-journal: add API for opening journal files or directories by fd
Also, expose this via the "journalctl --file=-" syntax for STDIN. This feature
remains undocumented though, as it is probably not too useful in real-life as
this still requires fds that support mmaping and seeking, i.e. does not work
for pipes, for which reading from STDIN is most commonly used.
2016-04-25 15:24:46 +02:00
Vito Caputo b58c888f30 journal: defer journal closes on rotate
When we rotate journals, we must set offline and close the current one,
but don't generally need to wait for this to complete.

Instead, we'll initiate an asynchronous offline via
journal_file_set_offline(oldfile, false), and add the file to a
per-server set of deferred closes to be closed later when they
won't block.

There's one complication however; journal_file_open() via
journal_file_verify_header() assumes that any writable journal in the
online state is the product of an unclean shutdown or other form of
corruption.

Thus there's a need for journal_file_open() to be aware of deferred
closes and synchronize with their completion when opening preexisting
journals for writing.  To facilitate this the deferred closes set is
supplied to the journal_file_open() function where the deferred closes
may be closed synchronously before verifying the header in such
circumstances.
2016-02-19 18:50:20 -08:00
Vito Caputo ac2e41f510 journal: asynchronous journal_file_set_offline()
This adds a wait flag to journal_file_set_offline(), when false the offline is
performed asynchronously in a separate thread.

When wait is true, if an asynchronous offline is already in-progress it is
restarted and waited for.  Otherwise the offline is performed synchronously
without the use of a thread.

journal_file_set_online() cancels or waits for the asynchronous offline to
complete if in-flight, depending on where in the offline process the thread
happens to be.  If the thread is in the fsync() phase, it is cancelled and
waiting is unnecessary.  Otherwise, the thread is joined before proceeding.

A new offline_state member is added to JournalFile which is used via
atomic operations for communicating between the offline thread and the
journal_file_set_{offline,online}() functions.
2016-02-19 18:50:20 -08:00
Daniel Mack b26fa1a2fb tree-wide: remove Emacs lines from all files
This should be handled fine now by .dir-locals.el, so need to carry that
stuff in every file.
2016-02-10 13:41:57 +01:00
Vito Caputo 7a24f3bf2f journal: coalesce ftruncate()s in 250ms windows
Prior to this change every journal append causes an ftruncate() for the
sake of inotify propagation of the mmap-based writes.

With this change the notification is deferred up to ~250ms, coalescing
any repeated journal writes during the deferred period into a single
ftruncate().  The ftruncate() call isn't free and doing it on every
append adds unnecessary overhead and latency in the journald event loop.

Introduces journal_file_enable_post_change_timer() which manages a
timer on the provided sd-event instance for scheduling coalesced
ftruncates.  The ftruncate() behavior is unchanged unless
journal_file_enable_post_change_timer() is called on the JournalFile.

While not a tremendous improvement, profiling systemd-journald event loop
latencies using instrumentation as introduced by 34b8751 it was observed that
coalescing the ftruncates was low-hanging fruit worth pursuing.

Note orders 12 and 13 shifting left into order 11 and order 6 dipping into
order 5:

Unmodified:
     log2(us)   1 2 3  4 5  6   7   8  9   10 11   12   13 14 15 16 17 18 19
                -----------------------------------------------------------
[10685.414572]  0 0 0  0 38 602 61  2  290 60 1643 2554 13 1  4  1  0  0  1
[10690.415114]  0 0 0  0 0  646 54  7  309 44 2073 2148 17 1  3  0  0  0  1
[10695.415509]  0 0 0  0 1  650 73  3  324 37 2071 2270 9  0  0  1  0  1  0
[10700.416297]  0 0 0  0 0  659 50  4  318 38 2111 2152 6  0  1  0  0  1  1
[10705.417136]  0 0 0  0 2  660 48  4  320 38 2129 2146 12 1  1  0  0  1  1
[10710.489114]  0 0 0  0 0  673 38  3  321 37 1925 2339 7  0  0  0  0  1  1
[10715.489613]  0 0 0  0 3  656 64  8  317 48 2365 2007 7  0  0  0  0  0  1

Coalesced:
     log2(us)   1 2 3  4 5  6   7   8  9   10 11   12   13 14 15 16 17 18 19
                -----------------------------------------------------------
[ 6169.161360]  0 0 0  1 24 786 54  11 389 24 4192 771  6  4  0  0  1  0  1
[ 6174.161705]  0 0 0  1 18 800 35  6  380 27 3977 893  3  1  0  0  1  0  1
[ 6179.162741]  0 0 0  1 28 768 51  4  391 16 3998 831  5  3  0  0  0  0  2
[ 6184.162856]  0 0 0  0 19 770 60  2  376 26 3795 1004 9  5  1  0  1  0  1
[ 6189.163279]  0 0 0  0 28 761 49  7  372 27 3729 1056 3  2  0  0  1  0  1
[ 6194.164255]  0 0 0  0 25 785 49  7  394 19 3996 908  6  3  2  0  0  0  1
[ 6199.164658]  0 0 0  0 29 797 35  5  389 18 3995 898  3  4  1  1  1  0  1

The remaining high-order delays are a result of the synchronous fsyncs in
systemd-journald, beyond the scope of this commit.
2016-01-14 16:36:07 -08:00
Thomas Hindoe Paaboel Andersen 71d35b6b55 tree-wide: sort includes in *.h
This is a continuation of the previous include sort patch, which
only sorted for .c files.
2015-11-18 23:09:02 +01:00
Lennart Poettering d1afbcd221 journal: fix error handling when compressing journal objects
Let's make sure we handle compression errors properly, and don't
misunderstand an error for success.

Also, let's actually compress things if lz4 is enabled.

Fixes #1662.
2015-10-24 13:19:42 +02:00
Lennart Poettering 8580d1f73d journal: rework vacuuming logic
Implement a maximum limit on number of journal files to keep around.
Enforcing a limit is useful on this since our performance when viewing
pays a heavy penalty for each journal file to interleve. This setting is
turned on now by default, and set to 100.

Also, actully implement what 348ced9097
promised: use whatever we find on disk at startup as lower bound on how
much disk space we can use. That commit introduced some provisions to
implement this, but actually never did.

This also adds "journalctl --vacuum-files=" to vacuum files on disk by
their number explicitly.
2015-10-02 23:21:59 +02:00
Lennart Poettering 804ae586d4 journal: make journal_file_close() return NULL
The way it is customary everywhere else in our sources.
2015-10-02 22:36:33 +02:00
Lennart Poettering dade37d403 journal: avoid mapping empty data and field hash tables
When a new journal file is created we write the header first, then sync
and only then create the data and field hash tables in them. That means
to other processes it might appear that the files have a valid header
but not data and field hash tables. Our reader code should be able to
deal with this.

With this change we'll not map the two hash tables right-away after
opening a file for reading anymore (because that will of course fail if
the objects are missing), but delay this until the first time we access
them. On top of that, when we want to look something up in the hash
tables and we notice they aren't initialized yet, we consider them
empty.

This improves handling of some journal files reported in #487.
2015-07-24 01:55:45 +02:00
Michal Schmidt 950c07d421 journal: make skipping of exhausted journal files effective again
Commit 668c965af "journal: skipping of exhausted journal files is bad if
direction changed" fixed a correctness issue, but it also significantly
limited the cases where the optimization that skips exhausted journal
files could apply.
As a result, some journalctl queries are much slower in v219 than in v218.
(e.g. queries where a "--since" cutoff should have quickly eliminated
older journal files from consideration, but didn't.)

If already in the initial iteration find_location_with_matches() finds
no entry, the journal file's location is not updated. This is fine,
except that:
 - We must update at least f->last_direction. The optimization relies on
   it. Let's separate that from journal_file_save_location() and update
   it immediately after the direction checks.
 - The optimization was conditional on "f->current_offset > 0", but it
   would always be 0 in this scenario. This check is unnecessary for the
   optimization.
2015-02-25 17:32:27 +01:00
Zbigniew Jędrzejewski-Szmek a2341f6836 Move DEFINE_TRIVIAL_CLEANUP_FUNC to macro.h
This remove the need for various header files to include the
(relatively heavyweight) util.h.
2015-01-18 19:06:48 -05:00
Lennart Poettering f27a386430 journald: whenever we rotate a file, btrfs defrag it
Our write pattern is quite awful for CoW file systems (btrfs...), as we
keep updating file parts in the beginning of the file. This results in
fragmented journal files. Hence: when rotating files, defragment them,
since at that point we know that no further write accesses will be made.
2015-01-06 20:31:40 +01:00
Lennart Poettering 2678031a17 journald: when we detect the journal file we are about to write to has been deleted, rotate
https://bugzilla.redhat.com/show_bug.cgi?id=1171719
2015-01-05 02:57:36 +01:00
Lennart Poettering fa6ac76083 journald: process SIGBUS for the memory maps we set up
Even though we use fallocate() it appears that file systems like btrfs
will trigger SIGBUS on certain low-disk-space situation. We should
handle that, hence catch the signal, add it to a list of invalidated
pages, and replace the page with an empty memory area. After each write
check if SIGBUS was triggered, and consider the write invalid if it was.

This should make journald a lot more robust with file systems where
fallocate() is not reliable, for example all CoW file systems
(btrfs...), where changing written data can fail with disk full errors.

https://bugzilla.redhat.com/show_bug.cgi?id=1045810
2015-01-05 01:40:51 +01:00
Michal Schmidt f534928ad7 journal: journal_file_next_entry() does not need pointer to current Object
The current offset is sufficient information.
2014-12-18 14:41:22 +01:00
Michal Schmidt 6e693b42dc journal: optimize iteration by skipping exhausted files
If from a previous iteration we know we are at the end of a journal
file, don't bother looking into the file again. This is complicated by
the fact that the EOF does not have to be permanent (think of
"journalctl -f"). So we also check if the number of entries in the
journal file changed.

This optimization has a similar effect as "journal: optimize iteration:
skip whole files behind current location" had.
2014-12-18 14:29:46 +01:00
Michal Schmidt d8ae66d7fa journal: compare candidate entries using JournalFiles' locations
When comparing the locations of candidate entries, we can rely on the
location information stored in struct JournalFile.
2014-12-18 12:26:00 +01:00
Michal Schmidt 6573ef05a3 journal: keep per-JournalFile location info during iteration
In next_beyond_location() when we find a candidate entry in a journal
file, save its location information in struct JournalFile.

The purpose of remembering the locations of candidate entries is to be
able to save work in the next iteration. This patch does only the
remembering part.

LOCATION_SEEK means the location identifies a candidate entry.
When a winner is picked from among candidates, it becomes
LOCATION_DISCRETE.
LOCATION_TAIL here signifies we've iterated the file to the end (or the
beginning in the case of reversed direction).
2014-12-18 12:17:20 +01:00
Michal Schmidt 1fc605b0e1 journal: abstract the resetting of JournalFile's location 2014-12-18 11:56:19 +01:00
Michal Schmidt 99cc7653a8 journal: move definition of LocationType to journal-file.h
In preparation for individual JournalFiles maintaining a location
of their own.
2014-12-18 11:53:39 +01:00
Michal Schmidt 14499361a5 journal: delete unused function journal_file_skip_entry()
Its only caller is a test.
2014-12-18 11:53:08 +01:00
Michal Schmidt ae2adbcd09 journal: delete unused function journal_file_move_to_entry_by_offset() 2014-12-18 11:47:13 +01:00