You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
No existing media link types are supposed to map to more than one media item, but lots do, getting in the way of migrations to better systems and resulting in nondeterministic behaviour.
Some are straight duplicates, where all values of the link point to the same mediaid. These can be removed safely.
Some values differ only in timestamp metadata, e.g. PNG tIME and date:modifytEXt chunks. These can also be removed safely, and we should be stripping those chunks away from generated content like thumbnails and covers to begin with.
Some might differ more significantly? Haven’t seen any yet, but there are thousands to look through.
The distribution of counts of non-trivial bad keys (ones with multiple distinct mediaid values) is:
link
keys with distinct values
sub submission
0
sub cover
0
sub thumbnail-generated-webp
0
sub thumbnail-legacy
4
sub thumbnail-source
4
sub thumbnail-generated
11
sub thumbnail-custom
55
media cover
2114
Some possible steps, then:
delete exact duplicate links
add a unique constraint on submission_media_links (submitid, link_type) WHERE link_type IN ('submission', 'cover', 'thumbnail-generated-webp')
expand unique constraint to cover all types and both tables
switch code from always reading the first element of link lists to not having list-valued links at all
Then we can work on removing media-media links (because the only media-media link type is cover and every submission should have an equivalent cover link – that’s not true either yet, but this cleanup will make it possible to tell how much work needs to be done to fix that situation) and moving media to a more convenient and efficient place than GlusterFS-via-Nginx.
The text was updated successfully, but these errors were encountered:
No existing media link types are supposed to map to more than one media item, but lots do, getting in the way of migrations to better systems and resulting in nondeterministic behaviour.
Some are straight duplicates, where all values of the link point to the same mediaid. These can be removed safely.
Some values differ only in timestamp metadata, e.g. PNG
tIME
anddate:modify
tEXt
chunks. These can also be removed safely, and we should be stripping those chunks away from generated content like thumbnails and covers to begin with.Some might differ more significantly? Haven’t seen any yet, but there are thousands to look through.
The distribution of counts of non-trivial bad keys (ones with multiple distinct mediaid values) is:
Some possible steps, then:
submission_media_links (submitid, link_type) WHERE link_type IN ('submission', 'cover', 'thumbnail-generated-webp')
Then we can work on removing media-media links (because the only media-media link type is
cover
and every submission should have an equivalentcover
link – that’s not true either yet, but this cleanup will make it possible to tell how much work needs to be done to fix that situation) and moving media to a more convenient and efficient place than GlusterFS-via-Nginx.The text was updated successfully, but these errors were encountered: