The ret_size result is a bit of an awkward optimization that in a
sense enables bypassing the mmap-cache API, while encouraging
duplication of logic it already implements.
It's only utilized in one place; journal_file_move_to_object(),
apparently to avoid the overhead of remapping the whole object
again once its header, and thus its actual size, is known.
With mmap-cache's context cache, the overhead of simply
re-getting the object with the now known size should already be
negligible. So it's not clear what benefit this brings, unless
avoiding some function calls that do very little in the hot
context-cache hit case is of such a priority.
There's value in having all object-sized gets pass through
mmap_cache_get(), as it provides a single entrypoint for
instrumentation in profiling/statistics gathering. When
journal_file_move_to_object() bypasses getting the full object
size, you don't capture the full picture on the mmap-cache side
in terms of object sizes explicitly loaded from a journal file.
I'd like to see additional accounting in mmap_cache_get() in a
future commit, taking advantage of this change.
There are no mmap_cache_get() users that actually deviate prot
from the JournalFile's f->prot.
So there's no point in making this a separate parameter to
mmap_cache_get(), nor is there any need to store it in
JournalFile's f->prot.
Instead just pass it to mmap_cache_add_fd() at MMapFileDescriptor
creation, storing it in there for the mmap() callers, which
already receive MMapFileDescriptor *.
For functions receiving both an MMapFileDescriptor * and prot,
the prot argument has been simply removed and call sites updated.
Formalizing this fd:prot binding at the public API also enables
discarding the prot check in window_matches(), which is a hot
function on long window lists, so a minor CPU efficiency gain
should be had there as seen with the past removal of the fd
check. Unnoticable for uncached journals, but maybe a little
runtime improvement when cached in specific circumstances.
window_matches_fd() has also been simplified to treat the
MMapFileDescrptor * as equivalent to its fd and prot.
Account and log these statistics separately since their overheads
are potentially quite different when the window lists are large.
There should probably be a histogram of window list traversal
counts too.
In preparation for logging more mmap-cache statistics get rid of this
piecemeal stats accessor api and just have a debug log output function
for producing the stats.
Updates the one call site using these accessors, moving what that site
did into the new log function. So the output is unchanged for now,
just a trivial refactor.
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.
This takes inspiration from Rust:
https://doc.rust-lang.org/std/option/enum.Option.html#method.take
and was suggested by Alan Jenkins (@sourcejedi).
It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
The advantage is that is the name is mispellt, cpp will warn us.
$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build
squash! build-sys: use #if Y instead of #ifdef Y everywhere
v2:
- fix incorrect setting of HAVE_LIBIDN2
Introduces window_matches_fd() for the fd matching case in try_context(),
In find_mmap() we're already walking a list of windows by fd, checking
this is pointless work in a potentially hot loop with many windows.
Instead of context_detach_window() and a manual attach of the new
window, simply call context_attach_window() which performs the
detach first if appropriate.
If requested, return the actual mapping size to the caller in
addition to the address.
journal_file_move_to_object() often performs two successive
mmap_cache_get() calls via journal_file_move_to(); one to get the
object header, then another to get the entire object when it's
larger than the header's size.
If mmap_cache_get() returned the actual mapping's size, it's
probable that the second mmap_cache_get() could be skipped when
the established mapping already encompassed the desired size.
This way we have a MMapFileDescriptor reference external to the cache,
and can supply the handle directly to mmap_cache_get(), eliminating
hashmap lookups entirely from the hot path.
Throughout the tree there's spurious use of spaces separating ++ and --
operators from their respective operands. Make ++ and -- operator
consistent with the majority of existing uses; discard the spaces.
Let's use bitfields for our booleans, and don't try to apply binary OR or addition on them, because that's weird and we
should instead use logical OR only.
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
Even though we use fallocate() it appears that file systems like btrfs
will trigger SIGBUS on certain low-disk-space situation. We should
handle that, hence catch the signal, add it to a list of invalidated
pages, and replace the page with an empty memory area. After each write
check if SIGBUS was triggered, and consider the write invalid if it was.
This should make journald a lot more robust with file systems where
fallocate() is not reliable, for example all CoW file systems
(btrfs...), where changing written data can fail with disk full errors.
https://bugzilla.redhat.com/show_bug.cgi?id=1045810
try_context() is such a hot path that the hashmap lookup is expensive.
The number of contexts is small - it is the number of object types.
Using a hashmap is overkill. A plain array will do.
Before:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m9.445s
user 0m9.228s
sys 0m0.213s
After:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m5.438s
user 0m5.266s
sys 0m0.170s
The only user is sd_journal_enumerate_unique() and, as explained in
the previous commit (fed67c38e3 "journal: map objects to context set by
caller, not by actual object type"), the use of them there is now
superfluous. Let's remove them.
This reverts major parts of commits:
ae97089d49 journal: fix access to munmapped memory in
sd_journal_enumerate_unique
06cc69d44c sd-journal: fix sd_journal_enumerate_unique skipping values
Tested with an "--enable-debug" build and "journalctl --list-boots".
It gives the expected number of results. Additionally, if I then revert
the previous commit ("journal: map objects to context set by caller, not
to actual object type"), it crashes with SIGSEGV, as expected.
This is useful for exposing unsafe access to mmapped objects after
the context that they were mapped in was already moved.
For example:
journal_file_move_to_object(f1, OBJECT_DATA, p1, &o1);
journal_file_move_to_object(f2, OBJECT_DATA, p2, &o2);
t = o1->object.type; /* this usually works, but is unsafe */
sd_journal_enumerate_unique will lock its mmap window to prevent it
from being released by calling mmap_cache_get with keep_always=true.
This call may return windows that are wider, but compatible with the
parameters provided to it.
This can result in a mismatch where the window to be released cannot
properly be selected, because we have more than one window matching the
parameters of mmap_cache_release. Therefore, introduce a release_cookie
to be used when releasing the window.
https://bugs.freedesktop.org/show_bug.cgi?id=79380
It is redundant to store 'hash' and 'compare' function pointers in
struct Hashmap separately. The functions always comprise a pair.
Store a single pointer to struct hash_ops instead.
systemd keeps hundreds of hashmaps, so this saves a little bit of
memory.
After a section of memory is succesfully allocated, some of the following
actions can still fail due to lack of memory. In this case -ENOMEM is
returned without actually freeing the already mapped memory.
Found with coverity. Fixes: CID#1237762
sd_j_e_u needs to keep a reference to an object while comparing it
with possibly duplicate objects in other files. Because the size of
mmap cache is limited, with enough files and object to compare to,
at some point the object being compared would be munmapped, resulting
in a segmentation fault.
Fix this issue by turning keep_always into a reference count that can
be increased and decreased. Other callers which set keep_always=true
are unmodified: their references are never released but are ignored
when the whole file is closed, which happens at some point. keep_always
is increased in sd_j_e_u and later on released.
hashmap_free() wasn't being called on m->contexts and m->fds resulting
in a leak.
To reproduce do:
while(1) {
sd_journal_open(&j, SD_JOURNAL_LOCAL_ONLY);
sd_journal_close(j);
}
Memory usage will increase until OOM.
I'm assuming that it's fine if a _const_ or _pure_ function
calls assert. It is assumed that the assert won't trigger,
and even if it does, it can only trigger on the first call
with a given set of parameters, and we don't care if the
compiler moves the order of calls.