Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve time based search queries #44

Open
Dravere opened this issue Jun 22, 2018 · 4 comments
Open

Improve time based search queries #44

Dravere opened this issue Jun 22, 2018 · 4 comments
Milestone

Comments

@Dravere
Copy link
Collaborator

Dravere commented Jun 22, 2018

As mentioned in the pull-request #39 I would like to provide further changes to this plugin. One of those is an improvement in regards to time based search queries. Currently if someone is searching for a term X and wants them sorted by the date, the sorting by the date is done in NodeBB core while this plugin provides them in relevance order. Since we limit the result query by 500 this can have unexpected results.

Lets say someone is searching for a common term "book" and is sorting by post time in descending order. There is absolutely no guarantee that he will receive the most recent results that he would expect. Because if there are more than 500 results for that search term in Solr, they won't be sorted by this timestamp. Solr will provide a "random" set (e.g. not all, only 500) in regards to the timestamp. The sorting is done in NodeBB core.

That is why I requested additional data about the search query in the given hook: NodeBB/NodeBB#6552

A lot of the tasks that are currently done in NodeBB could be directly done via Solr and thus probably better should be moved to the plugin side. And for a starter I at least would like to support better time based search queries. But to be able to do this improvements we need the timestamp in the Solr database which currently is not the case. Thus a re-index would be necessary.

@julianlam
Copy link
Owner

julianlam commented Jun 22, 2018

👍 I have no problems with this approach (although I'd probably add in some code to detect the old format and do a re-index in the background, maybe... or pop up an ACP alert to the admin to re-index).

nbbpm.compatibility needs updating to 1.10.0 but otherwise great 👍

@Dravere
Copy link
Collaborator Author

Dravere commented Jun 22, 2018

I'll gladly have a look at backwards compatibility.

If we do a background re-index it might need to be slowed down. Not that we re-index while the forum is experiencing high traffic. Might be even an idea to use the toobusy-js module from NodeBB to scale the speed of the re-index.

I'd rather go with the ACP message. So the administrators can choose when the re-index should actually be done. It could be a message at the Solr plugin ACP page but also with a notice on the ACP dashboard via the filter:admin.notices hook that is already used in this plugin for some information.

@julianlam
Copy link
Owner

Makes sense. I imagine you'd be forced to pause indexing in any case, so you have my go-ahead to implement 👍

@Dravere Dravere added this to the 6.0.0 milestone Jun 28, 2018
@Dravere
Copy link
Collaborator Author

Dravere commented Jul 4, 2018

So we have a new property in the data provided named searchData. It should contain all important data related to the search:

  • query: What the user typed in
  • searchIn: 'titleposts', 'titles', 'posts', 'users', or 'tags' (keep in mind the filter is only called for 'titleposts', 'titles', or 'posts'
  • matchWords: 'all' or 'any'
  • postedBy: Username
  • categories: Array of category ids to search in
  • searchChildren: True or false value if it should search in child categories
  • hasTags: Array of tags that should be matched,
  • replies: A number of replies related to repliesFilter
  • repliesFilter: Either 'atmost' or 'atleast'
  • timeRange: Empty if any time is good, otherwise it is the time in seconds
  • timeFilter: Either 'newer' or 'older' (both are inclusive in the default search)
  • sortBy: Either empty or any of the following values: 'relevance', 'timestamp', 'teaser.timestamp', 'topic.title', 'topic.postcount', 'topic.viewcount', 'topic.timestamp', 'user.username', 'category.name'
  • sortDirection: 'desc' or probably 'asc'
  • page: The page starting at 1
  • uid: The id of the user that requested the search
  • qs: The query string (probably not important)

I feel like there is additional data that should be indexed next to the timestamp. I will have a look into what exactly should be indexed so that further changes for search improvements don't need an additional re-indexing.

Also I'll look into what we will need to look for in this data to improve time based searches.

While looking at the data I also saw that perhaps we might want to improve the caching. Currently caching is only done for the search term ignoring all those other settings. This can lead to questionable results or even empty results. For example if someone searches for something and has it set to newer then changes it to older the results will be empty. Will create another issue for this later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants