The commit 6f3ac0d517 drops the prefix and
suffix in TAGS= property. But there exists several rules that have like
`TAGS=="*:tag:*"`. So, the property must be always prefixed and suffixed
with ":".
Fixes#17930.
if we allocate a bunch of hash tables all at the same time, with none
earlier than the other, there's a good chance we'll initialize the
shared hash key multiple times, so that some threads will see a
different shared hash key than others.
Let's fix that, and make sure really everyone sees the same hash key.
Fixes: #17007
I think this is nicer in general, and here in particular we have a lot
of code like:
static inline IteratedCache* hashmap_iterated_cache_new(Hashmap *h) {
return (IteratedCache*) _hashmap_iterated_cache_new(HASHMAP_BASE(h));
}
and it's visually appealing to use the same whitespace in the function
signature and the cast in the body of the function.
The compiler would do this to, esp. with LTO, but we can short-circuit the
whole process and make everything a bit simpler by avoiding the separate
definition.
(It would be nice to do the same for _set_new(), _set_ensure_allocated()
and other similar functions which are one-line trivial wrappers too. Unfortunately
that would require enum HashmapType to be made public, which we don't want
to do.)
Also use double space before the tracking args at the end. Without
the comma this looks ugly, but it's a bit better with the double space.
At least it doesn't look like a variable with a type.
This combines set_ensure_allocated() with set_consume(). The cool thing is that
because we know the hash ops, we can correctly free the item if appropriate.
Similarly to set_consume(), the goal is to simplify handling of the case where
the item needs to be freed on error and if already present in the set.
It's such a common operation to allocate the set and put an item in it,
that it deserves a helper. set_ensure_put() has the same return values
as set_put().
Comes with tests!
"internal" is a lot of characters. Let's take a leaf out of the Python's book
and simply use _ to mean private. Much less verbose, but the meaning is just as
clear, or even more.
If we're using a set with _put_strdup(), most of the time we want to use
string hash ops on the set, and free the strings when done. This defines
the appropriate a new string_hash_ops_free structure to automatically free
the keys when removing the set, and makes set_put_strdup() and set_put_strdupv()
instantiate the set with those hash ops.
hashmap_put_strdup() was already doing something similar.
(It is OK to instantiate the set earlier, possibly with a different hash ops
structure. set_put_strdup() will then use the existing set. It is also OK
to call set_free_free() instead of set_free() on a set with
string_hash_ops_free, the effect is the same, we're just overriding the
override of the cleanup function.)
No functional change intended.
So far, we'd use hashmap_free_free to free both keys and values along with
the hashmap. I think it's better to make this more encapsulated: in this variant
the way contents are freed can be decided when the hashmap is created, and
users of the hashmap can always use hashmap_free.
Using C11 thread-local storage in destructors causes uninitialized
read. Let's avoid that using a direct comparison instead of using
the cached values. As this code path is taken only when compiled
with -DVALGRIND=1, the performance cost shouldn't matter too much.
Fixes#12814
internal_hashmap_first_key_and_value() returns the first value, or %NULL
if the hashmap is empty.
However, hashmaps may contain %NULL values. That means, a caller getting
%NULL doesn't know whether the hashmap is empty or whether the first
value is %NULL.
For example, a caller may be tempted to do something like:
if ((val = hashmap_steal_first_key_and_value (h, (void **) key))) {
// process first entry.
}
But this is only correct if the caller made sure that the hash is either
not empty or contains no NULL values.
Anyway, since a %NULL return value can signal an empty hash or a %NULL
value, it seems error prone to leave the key output argument
uninitialized in situations that the caller cannot clearly distinguish
(without making additional assumptions).
GCC 8.2 with LTO and -O2 emits a false warning:
src/basic/hashmap.c: In function 'internal_hashmap_free.constprop':
src/basic/hashmap.c:898:33: error: 'k' may be used uninitialized in this function [-Werror=maybe-uninitialized]
free_key(k);
^
Avoid it by initializing the variable.
Let's first remove an item from the hashmap and only then destroy it.
This makes sure that destructor functions can mdoify the hashtables in
their own codee and we won't be confused by that.
When clients don't follow protocol and use the same object from
different threads, then we previously would silently corrupt memory.
With this assert we'll fail with an assert(). This doesn't fix anything
but certainly makes mis-uses easier to detect and debug.
Triggered by https://bugzilla.redhat.com/show_bug.cgi?id=1609349
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
Configuration through environment variable is inconvenient with meson, because
they cannot be convieniently changed and/or are not preserved during
reconfiguration (https://github.com/mesonbuild/meson/issues/1503).
This adds -Dvalgrind=true/false, which has the advantage that it can be set
at any time with meson configure -Dvalgrind=... and ninja will rebuild targets
as necessary. Additional minor advantages are better consistency with the
options for hashmap debugging, and typo avoidance with '#if' instead of '#ifdef'.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
gcc says:
[196/1142] Compiling C object 'src/basic/basic@sta/hashmap.c.o'.
../src/basic/hashmap.c: In function ‘cachemem_maintain’:
../src/basic/hashmap.c:1913:17: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
mem->active = r = true;
^~~
which conflates two things: the first is transitive assignent a = b = c = d;
the second is assignment of the value of an expression, which happens to be a
an assignment expression here, and boolean. While the second _should_ be
parenthesized, the first should _not_, and it's more natural to understand
our code as the first, and gcc should treat this as an exception and not emit
the warning. But since it's a while until this will be fixed, let's update
our code too.
Adds the basics of the IteratedCache and constructor support for the
Hashmap and OrderedHashmap types.
iterated_cache_get() is responsible for synchronizing the cache with
the associated Hashmap and making it available to the caller at the
supplied result pointers. Since iterated_cache_get() may need to
allocate memory, it may fail, so callers must check the return value.
On success, pointer arrays containing pointers to the associated
Hashmap's keys and values, in as-iterated order, are returned in
res_keys and res_values, respectively. Either may be supplied as NULL
to inhibit caching of the keys or values, respectively.
Note that if the cached Hashmap hasn't changed since the previous call
to iterated_cache_get(), and it's not a call activating caching of the
values or keys, the cost is effectively zero as the resulting pointers
will simply refer to the previously returned arrays as-is.
A cleanup function has also been added, iterated_cache_free().
This only frees the IteratedCache container and related arrays. The
associated Hashmap, its keys, and values are not affected. Also note
that the associated Hashmap does not automatically free its associated
IteratedCache when freed.
One could, in theory, safely access the arrays returned by a
successful iterated_cache_get() call after its associated Hashmap has
been freed, including the referenced values and keys. Provided the
iterated_cache_get() was performed prior to the hashmap free, and that
the type of hashmap free performed didn't free keys and/or values as
well.
It was dropped in 89439d4fc0. As a result, every
process that uses a hashmap allocates and then leaks the hashmap mempools.
The mempools are only allocated in the main thread, but we don't know where
the memory is used.
So let's check if we are the last thread and free the mempools then. This is
fairly heavy, because /proc/self/status has to be opened and parsed, but we do
it only when compiled for valgrind, i.e. not by default, and compared to running
under valgrind or asan, the extra cost is acceptable. The big advantage is that
we don't have to think or filter out this false positive.
As a micro-opt, cleanup is attempted only in the main thread. We could allow
any thread to check if it is the last one and perform cleanup, but that'd mean
that we'd have to _do_ the check in every thread. We don't use threads like
that, our non-main threads are always short-lived, so let's just accept the
possibility that we'll leak memory if a thread survives. The check is also
non-atomic, but it's called in a destructor of the main thread _and_ we do
cleanup only when there are no other threads, so the risk of some library
suddenly spawning another thread is very low. All in all, this is not perfect,
but should work in 999‰ of cases.
Fixes the following valgrind warning:
==22564== HEAP SUMMARY:
==22564== in use at exit: 8,192 bytes in 2 blocks
==22564== total heap usage: 243 allocs, 241 frees, 151,905 bytes allocated
==22564==
==22564== 4,096 bytes in 1 blocks are still reachable in loss record 1 of 2
==22564== at 0x4C2FB6B: malloc (vg_replace_malloc.c:299)
==22564== by 0x4F08A8C: mempool_alloc_tile (mempool.c:62)
==22564== by 0x4F08B16: mempool_alloc0_tile (mempool.c:81)
==22564== by 0x4EF8DE0: hashmap_base_new (hashmap.c:748)
==22564== by 0x4EF8ED9: internal_hashmap_new (hashmap.c:782)
==22564== by 0x11045D: test_hashmap_copy (test-hashmap-plain.c:87)
==22564== by 0x115722: test_hashmap_funcs (test-hashmap-plain.c:914)
==22564== by 0x10FC9D: main (test-hashmap.c:60)
==22564==
==22564== 4,096 bytes in 1 blocks are still reachable in loss record 2 of 2
==22564== at 0x4C2FB6B: malloc (vg_replace_malloc.c:299)
==22564== by 0x4F08A8C: mempool_alloc_tile (mempool.c:62)
==22564== by 0x4F08B16: mempool_alloc0_tile (mempool.c:81)
==22564== by 0x4EF8DE0: hashmap_base_new (hashmap.c:748)
==22564== by 0x4EF8EF8: internal_ordered_hashmap_new (hashmap.c:786)
==22564== by 0x10A2A0: test_ordered_hashmap_copy (test-hashmap-ordered.c:89)
==22564== by 0x10F70F: test_ordered_hashmap_funcs (test-hashmap-ordered.c:916)
==22564== by 0x10FCA2: main (test-hashmap.c:61)
==22564==
==22564== LEAK SUMMARY:
==22564== definitely lost: 0 bytes in 0 blocks
==22564== indirectly lost: 0 bytes in 0 blocks
==22564== possibly lost: 0 bytes in 0 blocks
==22564== still reachable: 8,192 bytes in 2 blocks
==22564== suppressed: 0 bytes in 0 blocks
v2:
- check if we are the main thread
v3:
- check if there are no other threads