Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N+1 problem in \OCA\Polls\Service\PollService::list #3362

Open
3 of 12 tasks
ChristophWurst opened this issue Mar 14, 2024 · 21 comments
Open
3 of 12 tasks

N+1 problem in \OCA\Polls\Service\PollService::list #3362

ChristophWurst opened this issue Mar 14, 2024 · 21 comments
Assignees
Labels

Comments

@ChristophWurst
Copy link
Member

⚠️ This issue respects the following points: ⚠️

  • This is a bug, not a question or a configuration/webserver/proxy issue.
  • This issue is not already reported on Github (I've searched it).
  • I agree to follow Nextcloud's Code of Conduct.

What went wrong, what did you observe?

I'm analyzing a production installation of this app and noticed that the app loads data in a loop. The more polls you have, the more database queries run. This leads to a scaling issue on large installations.

From the query log:

SELECT * FROM `oc_polls_polls` WHERE (`deleted` = :dcValue1) OR (`owner` = :dcValue2)
# Poll 1
SELECT * FROM `oc_polls_share` WHERE (`poll_id` = :dcValue1) AND (`user_id` = :dcValue2)
SELECT `timestamp` FROM `oc_polls_options` WHERE `poll_id` = :dcValue1
SELECT * FROM `oc_polls_votes` WHERE (`poll_id` = :dcValue1) AND (`user_id` = :dcValue2)
# Poll 2
SELECT * FROM `oc_polls_share` WHERE (`poll_id` = :dcValue1) AND (`user_id` = :dcValue2)
SELECT `timestamp` FROM `oc_polls_options` WHERE `poll_id` = :dcValue1
SELECT * FROM `oc_polls_votes` WHERE (`poll_id` = :dcValue1) AND (`user_id` = :dcValue2)

^ the first query has to happen. The other blocks are repeated for every row returned in the first query.

What did you expect, how polls should behave instead?

Either join all tables and load data in one query, or at least combine the queries for child data with a WHERE IN for all loaded polls.

What steps does it need to replay this bug?

  1. Install the app
  2. Add two polls
  3. Enable query logging
  4. Load the polls start page

Installation method

Installed/updated from the appstore (Apps section of your site)

Installation type

First time installation

Affected polls version

5.4.3

Which browser did you use, when experiencing the bug?

  • Firefox
  • Chrome
  • Chromium/Chromium based (i.e. Edge)
  • Safari
  • Other/Don't know

Other browser

No response

Add your browser log here

No response

Additional client environment information

No response

NC version

Nextcloud 26

Other Nextcloud version

No response

PHP engine version

PHP 8.0

Other PHP version

No response

Database engine

MySQL

Database Engine version or other Database

No response

Which user-backends are you using?

  • Default user-backend (database)
  • LDAP/ Active Directory
  • SSO - SAML
  • Other/Don't know

Add your nextcloud server log here

No response

Additional environment informations

No response

Configuration report

No response

List of activated Apps

No response

Nextcloud Signing status

No response

Additional Information

No response

@ChristophWurst
Copy link
Member Author

https://use-the-index-luke.com/sql/join/nested-loops-join-n1-problem for more info about the problem and solutions.

@dartcafe
Copy link
Collaborator

dartcafe commented Mar 16, 2024

@ChristophWurst Since we started to move some other iterations to joins, I decided to go that approach too in the 6.x and master branch.
For the timestamps see #3371 and #3372

For the 5.x branch I need some free time to install a v27 first to test your changes.

@ChristophWurst
Copy link
Member Author

image

xdebug profile from an instance with many polls, (group) shares and votes. The "hottest" operation in PollService::list is Acl::setPoll, where view access is checked for every poll.

@dartcafe
Copy link
Collaborator

@ChristophWurst
General question. Is it essential to get these performance issues done for the 5.x branch? Just asking, since I have limited time to maintain defferent branches atm.

@ChristophWurst
Copy link
Member Author

For us at Nextcloud GmbH it's currently a priority but I fully understand that we can't expect this to be done by you. @come-nc will have a look and try to get some of the improvements to stable-5.

Thanks 🙏

@dartcafe
Copy link
Collaborator

OK. Cool. So less pressure for me. 😉

@come-nc
Copy link
Contributor

come-nc commented Mar 25, 2024

@ChristophWurst General question. Is it essential to get these performance issues done for the 5.x branch? Just asking, since I have limited time to maintain defferent branches atm.

@dartcafe Do you think it would be possible to support 27 on version 6.x? I can see stable-6 uses attributes which were introduced in 27, but is it using anything introduced since 28?

@dartcafe
Copy link
Collaborator

@come-nc Something was the reason, I have to check.

@come-nc
Copy link
Contributor

come-nc commented Mar 25, 2024

I get this 😕

Error: Call to undefined method Doctrine\DBAL\Types\Type::lookupName() in /var/www/html/apps-extra/polls/lib/Migration/Version060100Date20240209073304.php:101

@dartcafe
Copy link
Collaborator

I get this 😕

Error: Call to undefined method Doctrine\DBAL\Types\Type::lookupName() in /var/www/html/apps-extra/polls/lib/Migration/Version060100Date20240209073304.php:101

Polls@v6 with NC 28? In NC 28 the DBAL version was raised, maybe this is the reason for this error.

But I think the main reason for the hard version cut was the switch to nextcloud-vue@8.

@come-nc
Copy link
Contributor

come-nc commented Mar 28, 2024

Polls@v6 with NC 27. I wanted to check if it would be easy to support 27 on stable6 but installation does not work.

@ChristophWurst
Copy link
Member Author

But I think the main reason for the hard version cut was the switch to nextcloud-vue@8.

We use ncvue8 in Calendar 4.7 and we make that available with Nextcloud 26-29. I think the hard cut might not be necessary just for the frontend.

@come-nc
Copy link
Contributor

come-nc commented May 13, 2024

There are still at least 2 requests that runs for each poll when fetching the list (tested on branch backport/3477/stable-6):

  1. SELECT * FROM oc_polls_share WHERE (poll_id = :dcValue1) AND (user_id = :dcValue2)
  2. SELECT polls_share.*, COUNT(votes.id) AS voted FROM oc_polls_share polls_share LEFT JOIN oc_polls_votes votes ON (polls_share.poll_id = votes.poll_id) AND (polls_share.user_id = votes.user_id) WHERE (polls_share.poll_id = :dcValue1) AND (polls_share.user_id = :dcValue2) AND (polls_share.id IS NOT NULL) AND (polls_share.deleted = :dcValue3) GROUP BY polls_share.id

The first one traces back to the getUser call in Poll::jsonSerialize. Maybe it could be replaced by simply userid + display name of the owner? The userid of the owner we already have, the displayName could be joined I suppose.
The second one traces back to the Model\Acl class trying to list the permission on the poll, and searching for a share for this. It looks like in master this one is avoided thanks to #3498 , no?

@dartcafe
Copy link
Collaborator

The question is, how valuable any effort in squeezing milliseconds out of this is, knowing that this branch is a dead end.

I think the actions up to now have a huge impact, at last as the whole list is not rendered inside the poll list and the navigation anymore. This was the main reason for the delay, not the database. I am able to fetch about 1000 polls in less than 2 seconds, just measuring from the network request at the client side against a host from a shared hoster with about 50 clients on it.

The main branch got a huge change in the backend and is still not finished and the db accesses will get reduced more and more, step by step.

In the end it is up to you to optimize more for the stable-6.

The userid of the owner we already have, the displayName could be joined I suppose.

I am not sure, if it is a good idea to join tables outside the app's namespace. I think a better approach would be caching the user's displaynames, when preparing the polls list, since you can predict, that the number of poll creators is limited. But I still doupt, that this will be a high value compared to the added complexity.

And in the future, owners do not have to be site users, but can be sharees.

But this will all happen inside the master branch.

The second one traces back to the Model\Acl class trying to list the permission on the poll, and searching for a share for this. It looks like in master this one is avoided thanks to #3498 , no?

Yes, this is all part of the ongoing improvement. The acl will die and resolved in another way, which is more bound to the poll, rather than living on it's own.

@come-nc
Copy link
Contributor

come-nc commented May 14, 2024

We seem to have very different results, I still see 40 seconds for listing 2000 polls, because of the 4000 queries.
My tests are with the profiler enabled, so that slows things down a bit, but still our objective is to get rid of the N+1 problem by not having any query run in the loop for each poll.

Here is what I have with 2000 empty polls and profiler enabled, on Nextcloud 27, for the backend request:

Branch Time DB requests Memory peak
stable-6 76095.5ms 8025 124 MiB
backport/3477/stable-6 39893.2ms 4017 80.94 MiB
backport/3498/stable-6 20292.5ms 2013 56.94 MiB
Call to getUser removed 12112.4ms 9 30.01 MiB

The last line is with this patch applied:

diff --git a/lib/Db/Poll.php b/lib/Db/Poll.php
index 073f4dd4..1b205094 100644
--- a/lib/Db/Poll.php
+++ b/lib/Db/Poll.php
@@ -190,7 +190,8 @@ class Poll extends EntityWithUser implements JsonSerializable {
                        'title' => $this->getTitle(),
                        'description' => $this->getDescription(),
                        'descriptionSafe' => $this->getDescriptionSafe(),
-                       'owner' => $this->getUser(),
+                       // 'owner' => $this->getUser(),
+                       'owner' => new \OCA\Polls\Model\User\Ghost('fake'),
                        'access' => $this->getAccess(),
                        'allowComment' => $this->getAllowComment(),
                        'allowMaybe' => $this->getAllowMaybe(),

I just tried disabling profiler on branch stable-6 I still get 70s response time for the backend request.

@dartcafe
Copy link
Collaborator

@come-nc Comming back to this
First of all, please check with the current release (v7.1.1) or the master branch, since it is now compatible to NC27 (at least from my and @ArtificialOwl's tests).

How does the db caching work? I would have the expectation, that the db engine is able to cache the requests efficiently.

But I am really confused by the observered runtimes. I am testing against a system, which is located with a share hoster, with up to 50 customers per server. I get 1000 polls within 2 seconds. OK I just have a handful of users, but this should not be the problem. Is there any LDAP in play, which could be a reason for the latency?

grafik

Unfortunately I can't use a profiler (or I need a hint, how I can achieve that).

@come-nc
Copy link
Contributor

come-nc commented Jun 11, 2024

How does the db caching work? I would have the expectation, that the db engine is able to cache the requests efficiently.

No, there is no DB caching on Nextcloud side, if you fire a request with the query builder the query goes to the database server. If you want to cache something you need to use one of the cache API from OCP and cache it.

Regarding the profiler, see https://github.com/nextcloud/profiler?tab=readme-ov-file#how-to-use-short-introduction
Then you can open a polls app page and look at the XHR requests in the profiler, especially how many queries are fired.

@dartcafe
Copy link
Collaborator

No, there is no DB caching on Nextcloud side, if you fire a request with the query builder the query goes to the database server. If you want to cache something you need to use one of the cache API from OCP and cache it.

My question targeted the chaching inside the DB engine.

@dartcafe
Copy link
Collaborator

Regarding the profiler, see https://github.com/nextcloud/profiler?tab=readme-ov-file#how-to-use-short-introduction
Then you can open a polls app page and look at the XHR requests in the profiler, especially how many queries are fired.

Not able to build this. I get JS errors from webpack

@come-nc
Copy link
Contributor

come-nc commented Jun 11, 2024

Not able to build this. I get JS errors from webpack

Do you want help to debug that? Which errors do you get?

@dartcafe
Copy link
Collaborator

grafik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants