Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement tag-based cache invalidation, and apply to Gedcom Records. #3748

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jon48
Copy link
Contributor

@jon48 jon48 commented Feb 25, 2021

This is a proposal to try and address #3747. A few files changes, but there is only really a couple that contain the significant changes.
The invalidation is done through the addition of tags to the Cache object (a feature supported by the underlying Symfony cache implementation and adapters). This requires a stricter constraint on the Cache signature itself (accepting a Symfony\Contracts\Cache\TagAwareCacheInterface instead of Symfony\Contracts\Cache\CacheInterface adapter).

The object creating cache items just needs to declare a tag, that can be then invoked for invalidation by other objects updating the object. The overall architecture changes are quite limited, and that can be applied to other invalidations than GedcomRecords.

For the GedcomRecords themselves, tags are defined as gedrec-XXX@YY where XXX is the Xref of the object, and YY the tree ID. They are added whenever a cache item related to a GedcomRecord is created (the factories, the canShow method, an improvement on the getFactsWithMedia method of the MediaTabModule), and the methods on GedcomRecord updating it can then invalidate that tag.
There is an opinionated implementation choice, as only the Array adapter is invalidated by GedcomRecord, not the FileSystem one (it can be, but I thought it may not be required - I could not really decide on it).

The pending changes implementation can be controversial as well, as any change on a record invalidate the whole pending changes (as it is all-or-nothing for a tree). Your experience would be of interest on that one.

I do not see many impacts of that implementation on the existing (performances to be assessed more explicitly if you have any tool), but it can protect the system from weird loading sequences, and it provides a mechanism that can be extended to any cache usage (even allowing modules to invalidate predefined tags if necessary).

A new unit test has been added as well for the Cache class, which I could rework is the PR is not accepted.

@codecov
Copy link

codecov bot commented Feb 25, 2021

Codecov Report

Merging #3748 (2fb19a1) into main (9028837) will decrease coverage by 2.45%.
The diff coverage is 26.53%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main    #3748      +/-   ##
============================================
- Coverage     26.15%   23.70%   -2.46%     
- Complexity    11866    11987     +121     
============================================
  Files          1071     1072       +1     
  Lines         40884    44296    +3412     
============================================
- Hits          10694    10500     -194     
- Misses        30190    33796    +3606     
Impacted Files Coverage Δ Complexity Δ
app/Factories/AbstractGedcomRecordFactory.php 0.00% <0.00%> (ø) 4.00 <0.00> (+1.00)
app/Factories/CacheFactory.php 0.00% <0.00%> (ø) 5.00 <0.00> (ø)
app/Factories/FamilyFactory.php 0.00% <0.00%> (ø) 10.00 <0.00> (+3.00)
app/Factories/GedcomRecordFactory.php 0.00% <0.00%> (ø) 34.00 <0.00> (+3.00)
app/Factories/HeaderFactory.php 0.00% <0.00%> (ø) 10.00 <0.00> (+3.00)
app/Factories/IndividualFactory.php 0.00% <0.00%> (ø) 10.00 <0.00> (+3.00)
app/Factories/LocationFactory.php 0.00% <0.00%> (ø) 10.00 <0.00> (+3.00)
app/Factories/MediaFactory.php 0.00% <0.00%> (ø) 10.00 <0.00> (+3.00)
app/Factories/NoteFactory.php 0.00% <0.00%> (ø) 10.00 <0.00> (+3.00)
app/Factories/RepositoryFactory.php 0.00% <0.00%> (ø) 10.00 <0.00> (+3.00)
... and 549 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9028837...7acddc2. Read the comment docs.

@fisharebest
Copy link
Owner

A few initial thoughts, based on a quick scan...

If we only ever use one tag on an item, then do we need the taggable functionality?
Couldn't we simply invalidate the item using Cache::forget($key)?

Updating the signature to cache::remember() might break existing third-party modules which use the cache with a TTL. It might be safer to add the $tags parameter to the end of the existing parameters.

There can be many thousands of pending changes (e.g. after a search/replace) - which is a lot of data to fetch from the DB. If we invalidate its cache (e.g. during a subsequent search/replace), we could be loading thousands of rows, thousands of times.

I'll look a bit more closely at this over the coming days.

@jon48
Copy link
Contributor Author

jon48 commented Feb 27, 2021

Thanks Greg for this initial feedback.

Couldn't we simply invalidate the item using Cache::forget($key)?

That was going to be my initial approach. The problem is that the keys are unique, quite specific to the cache item creator, and more importantly rather unpredictable for the item invalidator. A few of the keys are based on __CLASS__ which is not the easiest one for the invalidator to figure out. I reckon this is technically possible to achieve, but this creates a rather tight coupling between the item creators and the item invalidators (for instance, the GedcomRecord class would need to know that a specific implementation of FamilyFactoryInterface create an item, and determine the key from there).

What I like with the tags is that this is more a logical topic approach. There is still some kind of coupling, as the creator and the invalidator need to agreed on a tag naming, but that looks a lot less painful (and worst case, you just create a tag for nothing, or invoke a tag invalidation for nothing). Moreover, the tag can be shared, which I am actually using across the different cache item creators (the GedcomRecord factories, the canShow method, the getFactsWithMedia of the MediaTabModule), the invalidator just invoke the agreed tag, and all of the cache items linked to it are invalidated.

To be honest, I have no credit in this approach, this is the one suggested by the Symfony documentation itself for cache invalidation (besides Expiration). To quote it:

The most basic kind of invalidation is direct items deletion. But when the state of a primary resource has spread across several cached items, keeping them in sync can be difficult [...] Tags-based invalidation for managing data dependencies; [...] Using tags invalidation is very useful when tracking cache keys becomes difficult.

Updating the signature to cache::remember() might break existing third-party modules which use the cache with a TTL. It might be safer to add the $tags parameter to the end of the existing parameters.

True, I did not pay much attention to it, and have not been very considerate for custom modules (which is a shame coming from me, with my several custom modules...).
I have pushed an update that restore the remember method parameters order.
I have unsuccessfully tried to restore as well the compatibility for the constructor of the Cache class (CacheInterface instead of the sub-interface TagAwareCacheInterface), but whereas it is very easy to create a TagAwareCacheInterface from any AdapterInterface (which in the Symfony implementation is pretty similar to the CacheInterface), I cannot from the simpler CacheInterface interface, so I cannot do a simple wrapping of the input parameter.

There can be many thousands of pending changes (e.g. after a search/replace) - which is a lot of data to fetch from the DB. If we invalidate its cache (e.g. during a subsequent search/replace), we could be loading thousands of rows, thousands of times.

Again, I understand where you are coming from with the concerns on the pending changes, as the current implementation is indeed quite rigid, and tend to be an all-or-nothing approach. That is why this is using a specific tag pending-t-xx, and if there are concerns, we could just remove that specific tag invalidation, and we are back to the current behavior.

I would however argue a bit with that specific example, as it looks to me more like a flawed design, or rather a prioritisation of one concern (performance) over another one (data integrity): data integrity needs to have cache invalidation to make sure that the object you are manipulating is in line with what the database holds. And on that respect, I think that AbstractGedcomRecordFactory::pendingChanges needs to be respect the data integrity criteria, as this is the entry point to retrieve pending changes. If you are happy to work on items potentially not up-to-date (for performance reasons), then we would rather need an explicit integrity-unsafe method (which cache would never be invalidated), or to take a copy of the result in memory and work from it (basically, if you don't want a method which MAY connect to the DB to connect, don't call it ☺️ ).
That said, I am talking here more theoretically than anything else, and will look a bit at the specific pendingChanges invalidation consequences on the current code base.

Base automatically changed from master to main March 11, 2021 07:40
@jon48 jon48 force-pushed the feature/cache-invalidation branch from 2104542 to 7acddc2 Compare March 13, 2021 14:09
@jon48
Copy link
Contributor Author

jon48 commented Mar 13, 2021

I have updated the PR to rebase (and squash) on top of the latest code.

Regarding the potential of thousands of calls being done in scenarios like data fixes, I have had a look at the code, and have gathered metrics, but they do not fall into the specific case that could indeed trigger it.
They are retrieving a lot of records, and updating them (which invalidate), but at no point, they try to retrieve the same record again. And even if they did at the end - which they don't -, that would only trigger one DB call. The scenario in which you could have more calls would if you were updating a record then recreating it immediately through the factories, before moving to the next one. That would call the pending changes as many times as there are records, but I could not find any implementation of this pattern.

Here are some metrics for some of the data fix, where I added log (arbitrarily category config, as there is no debug one any more) through an adhoc Middleware at different point of the process, and in particular at the very end of the request, to check how many calls to the DB have been done within the pendingChanges methods, out of all the calls made to the method.

Search - Replace

2378	config	AbstractGedcomRecordFactory::pendingChanges : Request #d1661a666ce65fe05e1b8251b616db8e - Query to database # 1
2379	config	Cache::invalidateTags : Request #d1661a666ce65fe05e1b8251b616db8e - InvalidateTags [ gedrec-I1002@2, pending-t-2 ]
2380	edit	Update: INDI I1002
2381	config	Cache::invalidateTags : Request #d1661a666ce65fe05e1b8251b616db8e - InvalidateTags [ gedrec-I1017@2, pending-t-2 ]
2382	edit	Update: INDI I1017
[...]
2495	config	Cache::invalidateTags : Request #d1661a666ce65fe05e1b8251b616db8e - InvalidateTags [ gedrec-I996@2, pending-t-2 ]
2496	edit	Update: INDI I996
2497	config	Fisharebest\Webtrees\Http\RequestHandlers\DataFixUpdateAll : Request #d1661a666ce65fe05e1b8251b616db8e - PendingChanges : 1 DB calls out of 59 method calls

Missing Married Name

2742	config	AbstractGedcomRecordFactory::pendingChanges : Request #51792bf91d20e1f1f064d50f903e170b - Query to database # 1
2743	config	Cache::invalidateTags : Request #51792bf91d20e1f1f064d50f903e170b - InvalidateTags [ gedrec-I426@2, pending-t-2 ]
2744	edit	Update: INDI I426
2745	config	Cache::invalidateTags : Request #51792bf91d20e1f1f064d50f903e170b - InvalidateTags [ gedrec-I43@2, pending-t-2 ]
2746	edit	Update: INDI I43
[...]
2839	config	Cache::invalidateTags : Request #51792bf91d20e1f1f064d50f903e170b - InvalidateTags [ gedrec-I89@2, pending-t-2 ]
2840	edit	Update: INDI I89
2841	config	Fisharebest\Webtrees\Http\RequestHandlers\DataFixUpdateAll : Request #51792bf91d20e1f1f064d50f903e170b - PendingChanges : 1 DB calls out of 907 method calls

Missing Deaths

3002	config	AbstractGedcomRecordFactory::pendingChanges : Request #c2d0600f256f5e3e5fd3dbd2ecbc61da - Query to database # 1
3003	config	Cache::invalidateTags : Request #c2d0600f256f5e3e5fd3dbd2ecbc61da - InvalidateTags [ gedrec-I1009@2, pending-t-2 ]
3004	edit	Update: INDI I1009
3005	config	Cache::invalidateTags : Request #c2d0600f256f5e3e5fd3dbd2ecbc61da - InvalidateTags [ gedrec-I1011@2, pending-t-2 ]
3006	edit	Update: INDI I1011
[...]
3281	config	Cache::invalidateTags : Request #c2d0600f256f5e3e5fd3dbd2ecbc61da - InvalidateTags [ gedrec-I1757@2, pending-t-2 ]
3282	edit	Update: INDI I1757
3283	config	Fisharebest\Webtrees\Http\RequestHandlers\DataFixUpdateAll : Request #c2d0600f256f5e3e5fd3dbd2ecbc61da - PendingChanges : 1 DB calls out of 581 method calls

Names tags

4184	config	AbstractGedcomRecordFactory::pendingChanges : Request #7bf26ad4309907e279826bd22a2e01e5 - Query to database # 1
4185	config	Cache::invalidateTags : Request #7bf26ad4309907e279826bd22a2e01e5 - InvalidateTags [ gedrec-I1000@2, pending-t-2 ]
4186	config	Cache::invalidateTags : Request #7bf26ad4309907e279826bd22a2e01e5 - InvalidateTags [ gedrec-I1004@2, pending-t-2 ]
[...]
4361	config	Cache::invalidateTags : Request #7bf26ad4309907e279826bd22a2e01e5 - InvalidateTags [ gedrec-I983@2, pending-t-2 ]
4362	config	Cache::invalidateTags : Request #7bf26ad4309907e279826bd22a2e01e5 - InvalidateTags [ gedrec-I998@2, pending-t-2 ]
4363	config	Fisharebest\Webtrees\Http\RequestHandlers\DataFixUpdateAll : Request #7bf26ad4309907e279826bd22a2e01e5 - PendingChanges : 1 DB calls out of 178 method calls

Names slashes

4988	config	AbstractGedcomRecordFactory::pendingChanges : Request #62cf7c9506586bca433b7536180ded3c - Query to database # 1
4989	config	Cache::invalidateTags : Request #62cf7c9506586bca433b7536180ded3c - InvalidateTags [ gedrec-I77@2, pending-t-2 ]
4990	edit	Update: INDI I77
4991	config	Cache::invalidateTags : Request #62cf7c9506586bca433b7536180ded3c - InvalidateTags [ gedrec-I792@2, pending-t-2 ]
4992	edit	Update: INDI I792
[...]
5033	config	Cache::invalidateTags : Request #62cf7c9506586bca433b7536180ded3c - InvalidateTags [ gedrec-I971@2, pending-t-2 ]
5034	edit	Update: INDI I971
5035	config	Fisharebest\Webtrees\Http\RequestHandlers\DataFixUpdateAll : Request #62cf7c9506586bca433b7536180ded3c - PendingChanges : 1 DB calls out of 250 method calls

Rename places

5079	config	AbstractGedcomRecordFactory::pendingChanges : Request #d122ae382abb8272759802dc67136990 - Query to database # 1
5080	config	Cache::invalidateTags : Request #d122ae382abb8272759802dc67136990 - InvalidateTags [ gedrec-I12@2, pending-t-2 ]
5081	edit	Update: INDI I12
5082	config	Cache::invalidateTags : Request #d122ae382abb8272759802dc67136990 - InvalidateTags [ gedrec-I127@2, pending-t-2 ]
5083	edit	Update: INDI I127
[...]
5102	config	Cache::invalidateTags : Request #d122ae382abb8272759802dc67136990 - InvalidateTags [ gedrec-I848@2, pending-t-2 ]
5103	edit	Update: INDI I848
5104	config	Fisharebest\Webtrees\Http\RequestHandlers\DataFixUpdateAll : Request #d122ae382abb8272759802dc67136990 - PendingChanges : 1 DB calls out of 12 method calls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants