Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Introduce request retry mechanism to Waved to retry failed requests #2035

Closed
wants to merge 5 commits into from

Conversation

senalw
Copy link

@senalw senalw commented Jun 23, 2023

Issue: Currently Waved disconnects app when a request failure occurs between Waved and app. Therefore, this PR will introduce a request retry mechanism to retry failed requests for a certain amount of time period which can be configured by a user. If this retry count (H2O_WAVE_MAX_REQUEST_RETRY_COUNT) is not configured, it wont enable request retry mechanism.

Testing Conditions,
H2O_WAVE_MAX_REQUEST_RETRY_COUNT=20 and H2O_WAVE_REQUEST_RETRY_INTERVAL=1s

Testing Records,

  1. Simulate read: connection reset by peer disconnection for 2 seconds and reset back to normal with deleting iptables rule after 2 seconds.

sleep 5 && iptables -A INPUT -p tcp --dport 8000 -j REJECT --reject-with tcp-reset && sleep 2 && iptables -D INPUT -p tcp --dport 8000 -j REJECT --reject-with tcp-reset

Result:
App was able to successfully recover after 2 seconds,

2023/06/23 13:59:21 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","route":"/","t":"app"}
2023/06/23 13:59:21 # {"addr":"172.17.0.1:58704","route":"/4f4b7de4-1f62-4326-b37b-855454c4c61a","t":"ui_add"}
2023/06/23 13:59:21 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"1","retry interval":"1s","route":"/","t":"app"}
2023/06/23 13:59:21 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"2","retry interval":"1s","route":"/","t":"app"}
INFO:     127.0.0.1:49624 - "POST / HTTP/1.1" 200 OK
Screen.Recording.2023-06-23.at.19.28.52.mov
  1. Simulate read: connection reset by peer disconnection until requests reach H2O_WAVE_MAX_REQUEST_RETRY_COUNT and drops the app
    sleep 5 && iptables -A INPUT -p tcp --dport 8000 -j REJECT --reject-with tcp-reset

Result:
App drops after requests reach maximum retry count and unable to recover the connection.

2023/06/23 17:55:09 # {"error":"request failed: Post \"http://127.0.0.1:8000\": read tcp 127.0.0.1:52970-\u003e127.0.0.1:8000: read: connection reset by peer","host":"http://127.0.0.1:8000","route":"/","t":"app"}
2023/06/23 17:55:09 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"1","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:09 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"3","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:10 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"2","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:10 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"4","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:11 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"3","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:11 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"5","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:12 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"4","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:12 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"6","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:13 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"5","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:13 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"7","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:14 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"6","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:14 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"8","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:15 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"7","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:15 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"9","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:16 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"8","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:16 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"10","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:17 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"9","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:17 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"11","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:18 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"10","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:18 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"12","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:19 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"11","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:19 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"13","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:20 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"12","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:20 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"14","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:21 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"13","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:21 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"15","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:22 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"14","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:22 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"16","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:23 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"15","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:23 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"17","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:24 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"16","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:24 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"18","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:25 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"17","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:25 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"19","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:26 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"18","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:26 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"20","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:27 # {"error":"request failed: Post \"http://127.0.0.1:8000\": dial tcp 127.0.0.1:8000: connect: connection refused","host":"http://127.0.0.1:8000","max retry count":"20","retry count":"19","retry interval":"1s","route":"/","t":"app"}
2023/06/23 17:55:27 App wasn't be able to recover with 20 attempts. [Retry duration: 20s]
2023/06/23 17:55:27 # {"route":"/","t":"app_drop"}
2023/06/23 17:55:27 # {"addr":"172.17.0.1:62086","t":"ui_drop"}
2023/06/23 17:55:28 # {"addr":"172.17.0.1:62182","route":"/","t":"ui_add"}
2023/06/23 17:55:28 # {"addr":"172.17.0.1:62102","t":"ui_drop"}
Screen.Recording.2023-06-23.at.23.24.58.mov

@senalw senalw added the good first issue Contributions welcome! label Jun 23, 2023
@senalw senalw self-assigned this Jun 23, 2023
@senalw senalw requested review from lo5 and mturoci as code owners June 23, 2023 06:35
@senalw senalw marked this pull request as draft June 23, 2023 06:35
@senalw senalw requested a review from dulajra June 24, 2023 09:47
@senalw senalw changed the title fix: introduce request retry mechanism to Waved to retry failed requests fix: Introduce request retry mechanism to Waved to retry failed requests Jun 25, 2023
@senalw senalw removed the good first issue Contributions welcome! label Jun 25, 2023
@senalw senalw marked this pull request as ready for review June 26, 2023 08:21
@senalw senalw requested a review from mwysokin June 26, 2023 08:22
Copy link

@mwysokin mwysokin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @senalw for working on this 🙏

I'm neither a go expert nor an expert with the wave codebase so I'm not the right person to decide whether this PR should go in or not, just a couple of comments from my side:

  • This continues to be the only release blocker for MLOps 0.62.0. We won't be able to release our software because of it and we're already late by a couple of weeks. So any help from the wave team @mturoci @lo5 will be extremely appreciated 🙏
  • @senalw I don't see any linked issue in the PR. Did we not create an issue? I know there have been slack conversations but the issues are how the work is being tracked so please create one if it's not there already.
  • @senalw Since there's a go package for everything nowadays why try to implement this yourself if you can use an already existing and tested and much more configurable package for it?

@senalw
Copy link
Author

senalw commented Jun 26, 2023

Thank you @senalw for working on this 🙏

I'm neither a go expert nor an expert with the wave codebase so I'm not the right person to decide whether this PR should go in or not, just a couple of comments from my side:

  • This continues to be the only release blocker for MLOps 0.62.0. We won't be able to release our software because of it and we're already late by a couple of weeks. So any help from the wave team @mturoci @lo5 will be extremely appreciated 🙏

  • @senalw I don't see any linked issue in the PR. Did we not create an issue? I know there have been slack conversations but the issues are how the work is being tracked so please create one if it's not there already.

  • @senalw Since there's a go package for everything nowadays why try to implement this yourself if you can use an already existing and tested and much more configurable package for it?

Thanks for the suggestions. I linked the issue for this PR.

@senalw senalw requested a review from aravind-vis June 27, 2023 11:56
@lo5
Copy link
Member

lo5 commented Jun 28, 2023

This PR is a needlessly complicated solution to what's obviously an availability / resource-scheduling problem on your side.

If you really want the wave server to not drop the app after a TCP reset, the simple fix is to introduce a flag that will do exactly that.

e.g.

func (app *App) forward(clientID string, session *Session, data []byte) {
	if err := app.send(clientID, session, data); err != nil {
		echo(Log{"t": "app", "route": app.route, "host": app.addr, "error": err.Error()})
                if (dropIfUnresponsive) { // get global setting from env var or arg, default true to preserve current behavior.
		        app.broker.dropApp(app.route)
                }
	}
}

This way, the underlying http.Client will automatically re-establish the connection if possible.

@mturoci
Copy link
Collaborator

mturoci commented Jun 29, 2023

@lo5 comment implementation - #2050

@senalw
Copy link
Author

senalw commented Jun 29, 2023

Closing this PR as fix is provided with this PR.

@senalw senalw closed this Jun 29, 2023
@mturoci mturoci deleted the fix/introduce-request-retry-mechanism branch July 3, 2023 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wave app being dropped when there's a request failure between Waved and the app
4 participants