Race condition causes inconsistencies in chart repository's index.yaml #18

thedoctor · 2017-12-14T00:37:38Z

helm s3 push CHART REPO has a race condition when multiple charts are pushed around the same time. It occurs in the following situation:

UserA runs: helm s3 push CHART_A s3://REPO
UserB runs: helm s3 push CHART_B s3://REPO
UserA's process fetches s3://REPO/index.yaml
UserB's process fetches s3://REPO/index.yaml
UserA's process updates his fetched index.yaml with CHART_A's new version and replaces the remote s3://REPO/index.yaml with his updated version.
UserB's process updates his fetched (and now out-of-date) index.yaml with CHART-B's new version and replaces the remote s3://REPO/index.yaml with his updated version which does not contain CHART_A.

At the end of this process, both CHART_A and CHART_B are present in the repository, but CHART_A is missing from the index so any downstream charts that require it will fail when running helm dep update CHART_THAT_DEPENDS_ON_CHART_A.

A simple solution would be for the plugin to create a mutex, e.g. index.yaml.lock before fetching index.yaml which it would delete after replacing index.yaml with the updated version.
If the lockfile is already present, the plugin should wait until it has been deleted and a new one can be created before fetching index.yaml and proceeding. In the worst case, this could cause cascading delays if many charts are frequently updated, but slow is better than broken.

The text was updated successfully, but these errors were encountered:

markhu · 2017-12-14T07:05:46Z

Until a full-on fix arrives, is there a command or script that can regenerate the index.yaml file by listing the files in the repo?

hypnoglow · 2017-12-14T07:40:04Z

Hi!

@thedoctor Yes, there is an obvious race condition. But, how would lock file help? With AWS S3 it is not possible, you will run into the same issue - imagine both user clients will create a lock file. There is no atomic operation "create the file if it does not exist". Here is the consistency model of AWS S3. And the index "transactions" can only be implemented involving third-party mechanisms, like using a database for locking. If you see how to easily implement pluggable third-party consistency providers, please suggest.

I think the best solution at the moment for this plugin is to implement "reindex" action (as @markhu stated) that will regenerate repository index on demand. This will partly solve broken indexes, but of course race conditions will still be here.

So, what do you think? I see many requests for "reindex" operation from this plugin users, so I will try to implement it in the nearest weekend.

hypnoglow · 2017-12-14T07:41:16Z

Ref: #5

markhu · 2017-12-14T08:43:33Z

Granted you can't 100% eliminate collisions if as you say, imagine both user clients will create a lock file at the same time. But we can still try to narrow the window of opportunity for race conditions. One improvement would be to delay processing index.yaml file after uploading the chart.tgz --here's a PR #19 for that.

thedoctor · 2017-12-14T08:52:07Z

Ah, I (foolishly) didn't realize s3 bucket contents were eventually consistent.
In that case, the only comprehensive solution would have to involve a service outside of s3 that implements a real mutex, which is probably beyond the scope of this plugin (although a low-overhead way to do it that I imagine would work for a lot of people would be with a git remote).

If the goal is to reduce the rate of occurrence, I'm guessing that s3's time-to-consistency is probably good enough, enough of the time that the lockfile approach would do have some impact – but I agree that it's not a full enough solution to be worth doing.

Reindexing is definitely better than nothing, and probably sufficient for almost everyone. 👍

hypnoglow · 2017-12-15T14:44:08Z

@thedoctor ,

although a low-overhead way to do it that I imagine would work for a lot of people would be with a git remote

I do not understand what do you mean, can you explain it and if it can be used to provide a consistent lock of the index file?

As for reindexing, yes, I think it will solve almost all cases of the broken index.

markhu · 2017-12-16T00:17:26Z

Maybe he means store the index.yaml file in git instead of s3.

Narrow down the window of opportunity for race conditions Reference issue: #18 The idea was presented in this PR: #19

hypnoglow · 2017-12-19T17:04:05Z

So, I implemented the idea from #19 in #21

I've also created an issue for reindexing: #22

I think I will keep this current issue open for a while, maybe someone will come up with another idea to workaround the problem.

hypnoglow · 2018-01-09T20:39:26Z

Hey folks, I rolled out reindex command in 0.5.0. Let me know if it works well or not. And how well it suits your workflow, e.g. would you automate this, how would you automate reindexing (run X times per day, or run by a condition), or maybe manual occasional reindex is enough.

josdotso · 2018-01-22T21:42:58Z

Related: https://www.terraform.io/docs/backends/types/s3.html

hypnoglow · 2018-02-28T21:42:01Z

Hm, that's interesting, maybe we can implement pluggable third-party consistency providers like I thought above, with DynamoDB being the first. I will try to spare some time to investigate this.

masterada · 2018-12-15T10:00:22Z

Another aproach could be to make it eventually consistent. When pushing the index.yml, helm s3 plugin could also push an index.yml diff file, like this:

apiVersion: v1
addedEntries:
  alpine:
    - created: 2016-10-06T16:23:20.499814565-06:00
      description: Deploy a basic Alpine Linux pod
      digest: 99c76e403d752c84ead610644d4b1c2f2b453a74b921f422b9dcb8a7c8b559cd
      home: https://k8s.io/helm
      name: alpine
      sources:
      - https://github.com/helm/helm
      urls:
      - https://technosophos.github.io/tscharts/alpine-0.2.0.tgz
      version: 0.2.0

Or for delete:

apiVersion: v1
removedEntries:
  alpine:
    - created: 2016-10-06T16:23:20.499814565-06:00
      description: Deploy a basic Alpine Linux pod
      digest: 99c76e403d752c84ead610644d4b1c2f2b453a74b921f422b9dcb8a7c8b559cd
      home: https://k8s.io/helm
      name: alpine
      sources:
      - https://github.com/helm/helm
      urls:
      - https://technosophos.github.io/tscharts/alpine-0.2.0.tgz
      version: 0.2.0

Whenever a helm s3 index update occures, it should do the following steps:

Upload the diff file for the current index modification
Fetch the current index.yml file
Fetch all diff files from s3
Apply all diff files (including the one uploaded in step 1) to index.yaml (use the chart digest for merging, ignore already applied changes)
Upload new index.yml
Delete old diff files (eg: older than 5 minutes), beause it's very unlikly that there will be any conflicting change before than

Fetching the index.yaml file with the s3 plugin could repeate step 2-4 for more consistency. Using the repo without the s3 plugin (as a static site) would still be a valid helm repo, but with weaker consistency.

There could be a command that repeats step 2-5, so a cron job could provide real eventual consistency.

pete911 · 2019-07-05T15:54:06Z

Hi, S3 objects have ETags, I'm just checking if there's an option to validate it on S3 side. If there is, the process would look like this:

dowload index with ETag
update index
upload new index with expected previous ETag
If ETag does not match, upload is failed/rejected (meaning ETag does not match - someone/something changed index in the meantime) and client retries again (download, update, upload)

demo-ecout · 2019-11-14T01:42:17Z

Just for the record s3 now supports object lock.

eserden · 2021-03-17T10:38:33Z

Any updates ?

balazs-fark-epam · 2022-09-05T14:52:10Z

Hi, @hypnoglow

Is this issue still around?

Thanks,

Balazs

…yaml generation I have set this up so it is sufficiently abstracted that other lock backends could be added in the future. Fixes hypnoglow#18

hypnoglow added help wanted question labels Dec 14, 2017

markhu mentioned this issue Dec 14, 2017

re-order operations around index.yaml #19

Closed

hypnoglow added a commit that referenced this issue Dec 19, 2017

Reorder push command operations

737a82c

Narrow down the window of opportunity for race conditions Reference issue: #18 The idea was presented in this PR: #19

This was referenced Dec 19, 2017

Reorder push command operations #21

Merged

Implement reindex operation #22

Closed

chasinglogic linked a pull request Jul 24, 2024 that will close this issue

feat: Add DynamoDB based locking to prevent race conditions in index.yaml generation #447

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition causes inconsistencies in chart repository's index.yaml #18

Race condition causes inconsistencies in chart repository's index.yaml #18

thedoctor commented Dec 14, 2017

markhu commented Dec 14, 2017

hypnoglow commented Dec 14, 2017

hypnoglow commented Dec 14, 2017

markhu commented Dec 14, 2017 •

edited

Loading

thedoctor commented Dec 14, 2017 •

edited

Loading

hypnoglow commented Dec 15, 2017

markhu commented Dec 16, 2017

hypnoglow commented Dec 19, 2017

hypnoglow commented Jan 9, 2018

josdotso commented Jan 22, 2018

hypnoglow commented Feb 28, 2018

masterada commented Dec 15, 2018

pete911 commented Jul 5, 2019

demo-ecout commented Nov 14, 2019

eserden commented Mar 17, 2021

balazs-fark-epam commented Sep 5, 2022

Race condition causes inconsistencies in chart repository's index.yaml #18

Race condition causes inconsistencies in chart repository's index.yaml #18

Comments

thedoctor commented Dec 14, 2017

markhu commented Dec 14, 2017

hypnoglow commented Dec 14, 2017

hypnoglow commented Dec 14, 2017

markhu commented Dec 14, 2017 • edited Loading

thedoctor commented Dec 14, 2017 • edited Loading

hypnoglow commented Dec 15, 2017

markhu commented Dec 16, 2017

hypnoglow commented Dec 19, 2017

hypnoglow commented Jan 9, 2018

josdotso commented Jan 22, 2018

hypnoglow commented Feb 28, 2018

masterada commented Dec 15, 2018

pete911 commented Jul 5, 2019

demo-ecout commented Nov 14, 2019

eserden commented Mar 17, 2021

balazs-fark-epam commented Sep 5, 2022

markhu commented Dec 14, 2017 •

edited

Loading

thedoctor commented Dec 14, 2017 •

edited

Loading