Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up multi-valued media links #747

Open
1 of 7 tasks
charmander opened this issue Feb 20, 2020 · 0 comments
Open
1 of 7 tasks

Clean up multi-valued media links #747

charmander opened this issue Feb 20, 2020 · 0 comments
Labels

Comments

@charmander
Copy link
Contributor

charmander commented Feb 20, 2020

No existing media link types are supposed to map to more than one media item, but lots do, getting in the way of migrations to better systems and resulting in nondeterministic behaviour.

  • Some are straight duplicates, where all values of the link point to the same mediaid. These can be removed safely.

  • Some values differ only in timestamp metadata, e.g. PNG tIME and date:modify tEXt chunks. These can also be removed safely, and we should be stripping those chunks away from generated content like thumbnails and covers to begin with.

  • Some might differ more significantly? Haven’t seen any yet, but there are thousands to look through.

The distribution of counts of non-trivial bad keys (ones with multiple distinct mediaid values) is:

link keys with distinct values
sub submission 0
sub cover 0
sub thumbnail-generated-webp 0
sub thumbnail-legacy 4
sub thumbnail-source 4
sub thumbnail-generated 11
sub thumbnail-custom 55
media cover 2114

Some possible steps, then:

  • delete exact duplicate links
  • add a unique constraint on submission_media_links (submitid, link_type) WHERE link_type IN ('submission', 'cover', 'thumbnail-generated-webp')
  • remove thumbnail-legacy (not just the broken ones – this link type is unused) (Remove thumbnail-legacy links #763)
  • resolve thumbnail-source, thumbnail-generated, and thumbnail-custom manually
  • regenerate all affected covers with metadata stripping (Output from image operations should be independent of current date/time #748)
  • expand unique constraint to cover all types and both tables
  • switch code from always reading the first element of link lists to not having list-valued links at all

Then we can work on removing media-media links (because the only media-media link type is cover and every submission should have an equivalent cover link – that’s not true either yet, but this cleanup will make it possible to tell how much work needs to be done to fix that situation) and moving media to a more convenient and efficient place than GlusterFS-via-Nginx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant