Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate performance issues of Web UI #23

Open
2 tasks
UsualSpec opened this issue Sep 19, 2024 · 12 comments
Open
2 tasks

Investigate performance issues of Web UI #23

UsualSpec opened this issue Sep 19, 2024 · 12 comments

Comments

@UsualSpec
Copy link
Collaborator

UsualSpec commented Sep 19, 2024

The WebUI probably needs some overhaul and investigation of performance bottlenecks.
I strongly assume that the issue is not hardware.
TODO:

  • Find out where potential bottlenecks occur (probably wrong timeouts): Web UI, Server or Plotter?
  • Is there any issue related to slow DB queries?
@UsualSpec
Copy link
Collaborator Author

@romain-jacob could you please tell me where exactly the issues occur? I do see that the status messages come in too slowly. Potentially this needs some re-engineering of the protocol - especially concerning timeouts.

@romain-jacob
Copy link
Contributor

romain-jacob commented Sep 19, 2024

Just everything is slow. The plotter (so, Dash) is fine, but the rest of the GUI lags a lot, and I don't manage to get the IPs of the clients. It looks like things just time out. Data still comes in fine and I can see the clients registered in the database, so that's alright.

I quickly checked the VM resources, and neither CPU nor memory seems to run out. So that seems to be an issue in the protocol running behind the web front-end

@UsualSpec
Copy link
Collaborator Author

Ok. I assume that the issue is the synchronous waiting on the server side. Potentially:

autopower/server/server.py

Lines 113 to 124 in 1911af7

def waitForResponseTo(self, requestNo):
self.responsedictlock.acquire()
while not self.responseArrived(requestNo):
# wait_for is new in version 3.2 but docs do not specify if wait_for also returns a boolean on timeout (https://docs.python.org/3/library/threading.html#threading.Condition.wait_for)
noTimeout = self.responsedictcv.wait(timeout=30) # times out after 30 seconds
if not noTimeout:
self.responsedictlock.release()
return False # on timeout give back issue
# block until requestNo has arrived
self.responsedictlock.release()
return True

Probably the correct way would be not to wait on the server, but rather forward requests via SSE (Server side events) to the browser (feels slightly better, but not yet great since it doesn't encapsulate functionality correctly). I think we'd need to come up with another design here.

@romain-jacob
Copy link
Contributor

Yes, that's most likely the culprit. Still though, I'm a bit surprised that things timed out already. There are so many clients... no sure why it takes so long to get a reply from the client.

@UsualSpec
Copy link
Collaborator Author

I've just restarted mmserver on the VM and see that autopower 16 always re registers. There could be an issue either with this device or the server.

@UsualSpec
Copy link
Collaborator Author

UsualSpec commented Sep 19, 2024

Another thing could be that the clients are hanging somewhere in the management thread.

This method:

void AutopowerClient::manageMsmt() {

is blocking somewhere - potentially. The Web UI only shows that one device is registered while this is clearly not expected (the devices do upload data).

Or it's (thread) contention on the server.

@UsualSpec
Copy link
Collaborator Author

I have the feeling that this needs some testing with an actual setup.

@UsualSpec
Copy link
Collaborator Author

UsualSpec commented Sep 19, 2024

I think, I pinned the issue down: 10 workers for gRPC aren't enough for that many clients:

grpcServer = grpc.server(futures.ThreadPoolExecutor(max_workers=10))

I've set the number higher on the server and now it should work. Not the best solution, but it should work for now.
autopower16 still seems to be misbehaving - I suspect a client issue there.

@romain-jacob
Copy link
Contributor

I don't get what the problem you mention about autopower16. Looks fine to me.

On the other hand, I observed twice now some clients that used to be registered and that lost their connection to the server (autopower9 for a long time, and I just lost autopower8 last night). I'm not sure what happens there (and I can't SSH into those since they are behind a NAT). That should be looked at more closely eventually, but that's the wrong issue for this...

@romain-jacob
Copy link
Contributor

Oh, I see what you are saying with 16 now. Looks like it keeps registering new measurements... strange

@UsualSpec
Copy link
Collaborator Author

autopower16 also constantly registers with the server. If you run cli.py on the server, you'll see what I mean. You'll get messages saying that the client re registered.

@romain-jacob
Copy link
Contributor

I'll try rebooting 16 and see if that fixes it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants