Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

posts/5-java-mundane-performance-tricks #30

Open
utterances-bot opened this issue Nov 29, 2021 · 11 comments
Open

posts/5-java-mundane-performance-tricks #30

utterances-bot opened this issue Nov 29, 2021 · 11 comments

Comments

@utterances-bot
Copy link

5 Mundane Java Performance Tips | Richard Startin’s Blog

Most of the time it isn’t really necessary to optimise software, but this post contains 5 tips to avoid making software written in Java slower for the sake of it.

https://richardstartin.github.io/posts/5-java-mundane-performance-tricks

Copy link

Very informative

Copy link

cbaldan commented Nov 29, 2021

Great tips!
Thanks for sharing

Copy link

If we're going to go through the trouble of making a composite key "wrapper" object (item 2), we should consider making it mutable and reusing it.

I created my own Pair (running Java 17, don't have javafx), and made it so the elements are mutable, and created another benchmark:

public void wrapReuse(Blackhole bh) {
    Pair mapKey = new Pair(null, null);
    for (int i = 0; i < prefixes.length; ++i) {
        bh.consume(pairMap.get(mapKey.set(prefixes[i], suffixes[i])));
    }
}

I see a further 5% performance improvement, and (obviously) a complete elimination of allocations

Benchmark                                                     (size)  Mode  Cnt    Score     Error   Units
CompositeLookup.concatenate                                     1024  avgt    5  126.628 ±   2.713   ns/op
CompositeLookup.concatenate:·gc.alloc.rate.norm                 1024  avgt    5   96.002 ±   0.001    B/op
CompositeLookup.wrap                                            1024  avgt    5   44.994 ±   0.874   ns/op
CompositeLookup.wrap:·gc.alloc.rate.norm                        1024  avgt    5   24.001 ±   0.001    B/op
CompositeLookup.wrapReuse                                       1024  avgt    5   42.706 ±   0.065   ns/op
CompositeLookup.wrapReuse:·gc.alloc.rate.norm                   1024  avgt    5    0.023 ±   0.001    B/op

Of course, one must make sure they never reuse such a "wrapper" object that was put into the map.

@richardstartin
Copy link
Owner

@danjmarques that saves 2ns and 24 bytes per lookup, but risks a memory leak. I would proceed with caution in that direction, and only if this is your application’s primary function and bottleneck.

Copy link

@danjmarques @richardstartin the java.util.Map Javadoc emphasizes mutability of keys as dangerous for the same reason. But, to add, the flywheel technique might be helpful and I have seen it successfully used in other cases (though not with maps) when isolated behind an abstraction. E.g., a custom get method which internally uses a private mutable/homomorphic type with compatible equals/hashCode.

That said, unless this is a critical bottleneck, why not avoid this risk/work and let the JVM optimize (maybe scalar replacement is even possible in some cases?). When JEP 169 (value types) goes to GA, it seems like compatible value type replacements might also supersede this when keys are composite on primitives.

@danjmarques
Copy link

@mikejhill @richardstartin Mutability of keys is dangerous, of course, if the key object is the one stored in the map. For lookups, they are safe (I mean, presuming some other thread isn't changing the key).

It seems that, once it becomes important to make "lookup much faster and reduce allocation rate" you might want make it even faster and eliminate allocation all together.

The flyweight as map key is actually a pretty standard trick in low-latency messaging. Let's say each message you receive has a "topic" and you need to do some lookup based on that topic. If your message is just some bytes you've received into a buffer, and you know that the "topic" is some offset N into the buffer, of length L, you just wrap the buffer under your pre-allocated flyweight key, and do the lookup with that. Of course, you must be careful to make sure you don't accidentally store that pre-allocated key in the map.

@richardstartin
Copy link
Owner

@danjmarques there are applications where it makes sense to avoid allowing the GC to run, so keep allocation at an absolute minimum, but the post isn’t aimed at people working on low latency systems, this is more about how someone working on the median Java application can arm themselves with simple defence mechanisms to avoid unnecessary inefficiency. You’re right, you can go further than this post demonstrates, but it wouldn’t be appropriate in some teams where some developer will end up putting a mutable key in the map and mutating it.

@danjmarques
Copy link

@richardstartin "...about how someone working on the median Java application ... simple defence mechanisms to avoid unnecessary inefficiency".

Yes, understood, and I think your examples do a great job of that. I was just expanding on one of those examples - I figured that a developer which saw/understood the ideas you presented for the first time might be interested to "push it" a bit further ("can we eliminate the allocations completely? how much performance do we gain?"). And I think once someone understands why that flyweight key can't be put into the map they show a pretty significant maturity as a dev (and I love asking a similar question when I interview).

Copy link

jmzc commented Dec 9, 2021

I don't understand , on tip number 2, the reason why wrapping is better than String concatenation . You say "String instance caches its hash code, and constructing a new String requires calculation of a new hash code and String’s hash code algorithm also isn’t very efficient."

I guess that it's also required to calculate Pair hash in
pairMap.get(new Pair(prefixes[i], suffixes[i]))

and I don't think that Pair's hashing algorithm be worse than String's

Thanks

@richardstartin
Copy link
Owner

I don't understand , on tip number 2, the reason why wrapping is better than String concatenation . You say "String instance caches its hash code, and constructing a new String requires calculation of a new hash code and String’s hash code algorithm also isn’t very efficient."

I guess that it's also required to calculate Pair hash in
pairMap.get(new Pair(prefixes[i], suffixes[i]))

and I don't think that Pair's hashing algorithm be worse than String's

Let's say you have two Strings with n bytes combined, then you have to hash n bytes. If you have a pair, you just have to combine two precomputed hash probably by multiplying one of them by a prime and adding the other. So they have different complexities.

Copy link

s1ck commented Jan 6, 2022

This is very interesting. Thanks for sharing!

I reproduced your benchmarks and realized that for Use enums instead of constant Strings, the length of the enum values seems to be relevant. If I use ten single letter names, e.g. A to J the hashmap is actually faster compared to the enum map. For longer value names (~10 chars), I see the behavior that you showed, but not as drastic, more like 60us/op vs 80us/op. If I use your AnEnum I don't see any difference.

Those are my results

Benchmark                                          (seed)  (size)  Mode  Cnt   Score    Error   Units
EnumMapBenchmark.longEnumMap                           42   10000  avgt   10  57.703 ±  3.460   us/op
EnumMapBenchmark.longEnumMap:·gc.alloc.rate            42   10000  avgt   10  ≈ 10⁻⁴           MB/sec
EnumMapBenchmark.longEnumMap:·gc.alloc.rate.norm       42   10000  avgt   10   0.024 ±  0.001    B/op
EnumMapBenchmark.longEnumMap:·gc.count                 42   10000  avgt   10     ≈ 0           counts
EnumMapBenchmark.longHashMap                           42   10000  avgt   10  79.336 ±  0.284   us/op
EnumMapBenchmark.longHashMap:·gc.alloc.rate            42   10000  avgt   10  ≈ 10⁻⁴           MB/sec
EnumMapBenchmark.longHashMap:·gc.alloc.rate.norm       42   10000  avgt   10   0.034 ±  0.004    B/op
EnumMapBenchmark.longHashMap:·gc.count                 42   10000  avgt   10     ≈ 0           counts
EnumMapBenchmark.shortEnumMap                          42   10000  avgt   10  57.722 ±  3.789   us/op
EnumMapBenchmark.shortEnumMap:·gc.alloc.rate           42   10000  avgt   10  ≈ 10⁻⁴           MB/sec
EnumMapBenchmark.shortEnumMap:·gc.alloc.rate.norm      42   10000  avgt   10   0.024 ±  0.002    B/op
EnumMapBenchmark.shortEnumMap:·gc.count                42   10000  avgt   10     ≈ 0           counts
EnumMapBenchmark.shortHashMap                          42   10000  avgt   10  57.004 ±  0.565   us/op
EnumMapBenchmark.shortHashMap:·gc.alloc.rate           42   10000  avgt   10  ≈ 10⁻⁴           MB/sec
EnumMapBenchmark.shortHashMap:·gc.alloc.rate.norm      42   10000  avgt   10   0.024 ±  0.005    B/op
EnumMapBenchmark.shortHashMap:·gc.count                42   10000  avgt   10     ≈ 0           counts

My code is here

Again, thanks for sharing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants