Modify toQueryString to prevent SQLite expression tree from exceeding depth of 1000 #2565

LZRS · 2024-06-08T23:46:01Z

IMPORTANT: All PRs must be linked to an issue (except for extremely trivial and straightforward changes).

Description
Recursively bifurcates the conditional params expressions to prevent occurences of SQLite expression tree exceeding depth of 1000, as suggested in this comment

Alternative(s) considered
Chunking large expression list to limit 50 within parantheses to avoid crashing with Expression tree is too large (maximum depth 1000), as described here

Type
Enhancement Feature

Screenshots (if applicable)

Checklist

I have read and acknowledged the Code of conduct.
I have read the Contributing page.
I have signed the Google Individual CLA, or I am covered by my company's Corporate CLA.
I have discussed my proposed solution with code owners in the linked issue(s) and we have agreed upon the general approach.
I have run ./gradlew spotlessApply and ./gradlew spotlessCheck to check my code follows the style guide of this project.
I have run ./gradlew check and ./gradlew connectedCheck to test my changes locally.
I have built and run the demo app(s) to verify my change fixes the issue and/or does not break the demo app(s).

https://stackoverflow.com/a/17032196

pld · 2024-06-14T13:46:54Z

engine/src/main/java/com/google/android/fhir/search/filter/FilterCriterion.kt

-        "(${it.condition})"
-      } else {
-        it.condition
+    this.chunked(50) { conditionParams ->


@MJ1998 do you have thoughts on what the chunk size should be?

…rash

pld · 2024-06-21T18:18:21Z

engine/src/main/java/com/google/android/fhir/search/filter/FilterCriterion.kt

+     * This is to prevent SQLite expression tree exceeding max depth of 1000 See
+     * https://www.sqlite.org/limits.html for Maximum Depth Of An Expression Tree
+     */
+    const val CONDITION_PARAMS_CHUNK_SIZE = 50


another question on this value, don't we want this to be as high as possible? I'd think fewer chunks are easier to process, and fewer checks definitely means fewer iterations. Why not set this to the max of 1000 as a default?

Yes, you're right, the chunk size could be higher. Most of the filters for FhirEngine#search are implemented in subqueries, it seems subqueries add to the depth expression tree generated. Will try to figure out a higher number that could be suitable...

It might be hard to find a number that might be suitable for most applications because of joins and subqueries that depend on the search. I'm thinking of setting the size to a lower number and maybe allow it to be configurable

…rash

jingtang10 · 2024-07-22T18:45:45Z

judging purely by the error message, my hypothesis is that the expression tree depth is O(n) for nested OR operators because the expression tree is constructed naively by parsing the OR operators sequentially. For example, for this expression

a OR b OR c OR d OR e OR f OR g OR h

if the expression tree is constructed naively you'd get:

where each o stands for an OR operator. This has depth 8.

But what you really want is actually this:

        o
      /   \
    o       o
   / \     / \
  o   o   o   o
 / \ / \ / \ / \
 a b c d e f g h

where the tree is more "balanced" and this has depth 4. In other words, this is O(log(n)).

If my hypothesis of what causes the problem is correct above, instead of trying to break the OR statements into chunks (and having to come up with a value), all you actually have to do is keep the tree balanced by splitting the top level OR statment at the middle of the list of params.

Does this make sense?

LZRS · 2024-07-23T07:59:00Z

judging purely by the error message, my hypothesis is that the expression tree depth is O(n) for nested OR operators because the expression tree is constructed naively by parsing the OR operators sequentially. For example, for this expression
a OR b OR c OR d OR e OR f OR g OR h
if the expression tree is constructed naively you'd get:
  o
 / \
a   o
   / \
  b   o
     / \
    c   o
       / \
      d   o
         / \
        e   o
           / \
          f   o
             / \
            g   h
where each o stands for an OR operator. This has depth 8.

But what you really want is actually this:
        o
      /   \
    o       o
   / \     / \
  o   o   o   o
 / \ / \ / \ / \
 a b c d e f g h
where the tree is more "balanced" and this has depth 4. In other words, this is O(log(n)).

If my hypothesis of what causes the problem is correct above, instead of trying to break the OR statements into chunks (and having to come up with a value), all you actually have to do is keep the tree balanced by splitting the top level OR statment at the middle of the list of params.

Does this make sense?

Yeah, it makes sense.

jingtang10 · 2024-07-23T09:56:16Z

there's no guarantee what i said is true @LZRS - i've done no testing or verification and depending on sqlite's implementation what i said could be complete garbage... so pls test and see if this is true :) (for example you could try to see if you'll hit the depth limit a bit later to bifurcate the tree instead of chunking the parameters)

LZRS · 2024-07-23T11:13:58Z

Alright, no problem. I'll test it out and get back

…rash

LZRS · 2024-08-22T01:02:14Z

there's no guarantee what i said is true @LZRS - i've done no testing or verification and depending on sqlite's implementation what i said could be complete garbage... so pls test and see if this is true :) (for example you could try to see if you'll hit the depth limit a bit later to bifurcate the tree instead of chunking the parameters)

@jingtang10 I tested this out for 1000, 2000 and upto 5000 parameters, and it worked perfectly. I also went ahead a drafted an implementation in the PR

jingtang10 · 2024-09-10T09:41:35Z

engine/src/main/java/com/google/android/fhir/search/filter/FilterCriterion.kt

-      }
+  private fun List<ConditionParam<*>>.toQueryString(operation: Operation): String {
+    if (this.size <= 1) {
+      return map {


why use map if the size is <= 1 anyway? i think you can just do the lamba inside the map function directly since you know there's only one (or zero) item.

jingtang10 · 2024-09-10T09:46:07Z

engine/src/main/java/com/google/android/fhir/search/filter/FilterCriterion.kt

+    val left = this.subList(0, mid).toQueryString(operation)
+    val right = this.subList(mid, this.size).toQueryString(operation)
+
+    return listOf(left, right)


can you just join left and right with the separator without going through all this? in line 89 you have guarantee that the list has 2 or more items. that means neither left nor right can have size 0.

jingtang10 · 2024-09-10T09:53:12Z

engine/src/test/java/com/google/android/fhir/search/SearchTest.kt

@@ -2752,6 +2754,43 @@ class SearchTest {
      .inOrder()
  }

+  @Test
+  fun `search CarePlan filter with large list of patient reference`() {


i think you do want to also test base cases... for example an empty list, and a list with 1, 2, (maybe add a 3 just to check when the list is not an even number and power of 2), 4 and 8 items.

by that point we can be confident the algorithm works as intended.

we can have a huge one like this - but it's actually not as useful in my view.

The base case for 1 is majorly covered in other test cases in the class, I guess I could add for the other cases

Well - I know that the base case for 1 is covered via other test cases. BUT it's much better to write unit tests for base cases than complex cases.

I would say that tests cases for 0, 1, 2, 3, 4 together is much better than a single test case for 10.

This is because you really want to isolate any cases and make sure tests are easy to debug, easy to fix. If the test case for 10 starts to fail, you'll spend a long time trying to debug. But if the test case fails for either of 1, 2, 3, or 4, you can almost immediately pin down the cause for failure.

aditya-07

Can we add a DatabaseImplTest with

aditya-07 · 2024-09-10T09:55:20Z

engine/src/test/java/com/google/android/fhir/search/SearchTest.kt

@@ -2752,6 +2754,43 @@ class SearchTest {
      .inOrder()
  }

+  @Test
+  fun `search CarePlan filter with large list of patient reference`() {


We should add this test in DatabaseImplTest to make sure that it always compiles and runs on the actual database.

FikriMilano · 2024-09-10T10:07:42Z

engine/src/test/java/com/google/android/fhir/search/SearchTest.kt

@@ -2752,6 +2754,43 @@ class SearchTest {
      .inOrder()
  }

+  @Test


would it be possible to test the depth after the tree balancing? for example, before the change, the depth was 8. but now, after the change, the depth is 4.

Not really sure if it's easily possible, but checking it out...

FORK - With unmerged PR #9 - WUP #13 SDK - WUP google#2178 - WUP google#2650 - WUP google#2663 PERF - WUP google#2669 - WUP google#2565 - WUP google#2561 - WUP google#2535

…rash

jingtang10 · 2024-09-11T13:53:03Z

engine/src/main/java/com/google/android/fhir/search/filter/FilterCriterion.kt

+          if (it.params.size > 1) {
+            "(${it.condition})"
+          } else {
+            it.condition
+          }


can you make this a function in the class ConditionParam actually? will look this code look better.

jingtang10 · 2024-09-11T13:54:06Z

engine/src/main/java/com/google/android/fhir/search/filter/FilterCriterion.kt

-        it.condition
-      }
+  private fun List<ConditionParam<*>>.toQueryString(operation: Operation): String {
+    if (this.size <= 1) {


i wonder if you can make this a bit more explicit - i think we should never have the 0 case? and for the 1 case you can make the code look tidier.

jingtang10 · 2024-09-11T13:58:24Z

engine/src/test/java/com/google/android/fhir/search/SearchTest.kt

@@ -2752,6 +2754,43 @@ class SearchTest {
      .inOrder()
  }

+  @Test
+  fun `search CarePlan filter with large list of patient reference`() {


Well - I know that the base case for 1 is covered via other test cases. BUT it's much better to write unit tests for base cases than complex cases.

I would say that tests cases for 0, 1, 2, 3, 4 together is much better than a single test case for 10.

This is because you really want to isolate any cases and make sure tests are easy to debug, easy to fix. If the test case for 10 starts to fail, you'll spend a long time trying to debug. But if the test case fails for either of 1, 2, 3, or 4, you can almost immediately pin down the cause for failure.

Modify toQueryString to chunk large list of ConditionParam

6b3d4c4

https://stackoverflow.com/a/17032196

LZRS force-pushed the 2561-fix-sqlite-crash branch from ae8187f to 6b3d4c4 Compare June 12, 2024 15:11

pld reviewed Jun 14, 2024

View reviewed changes

Merge remote-tracking branch 'upstream/master' into 2561-fix-sqlite-c…

8c720e2

…rash

LZRS marked this pull request as ready for review June 21, 2024 17:52

LZRS requested review from aditya-07, jingtang10 and MJ1998 as code owners June 21, 2024 17:52

pld reviewed Jun 21, 2024

View reviewed changes

LZRS added 2 commits June 24, 2024 14:15

Add test for condition params chunking and wrapping in brackets

c73037d

Merge remote-tracking branch 'upstream/master' into 2561-fix-sqlite-c…

0264816

…rash

LZRS force-pushed the 2561-fix-sqlite-crash branch from 80c50de to 0264816 Compare June 24, 2024 11:17

Merge remote-tracking branch 'upstream/master' into 2561-fix-sqlite-c…

6a45013

…rash

LZRS requested a review from santosh-pingle as a code owner June 26, 2024 14:16

Add support for chunkSize param in SearchDsl filters

d764115

LZRS force-pushed the 2561-fix-sqlite-crash branch from d9e6d03 to d764115 Compare June 26, 2024 22:21

LZRS added 2 commits June 27, 2024 11:57

Update workflow engine dependency to use latest

d509f53

Merge branch 'master' into 2561-fix-sqlite-crash

ab9a4a6

LZRS added 3 commits August 22, 2024 03:12

Merge remote-tracking branch 'upstream/master' into 2561-fix-sqlite-c…

432f7af

…rash

Refactor remove chunkSize parameter

b5e67df

Recursively bifurcate expression tree to reduce depth

b9708a6

LZRS requested a review from a team as a code owner August 22, 2024 00:57

Revert touched files not relevant for the PR

821feab

LZRS force-pushed the 2561-fix-sqlite-crash branch from af4d912 to 821feab Compare August 22, 2024 13:03

LZRS changed the title ~~Modify toQueryString to chunk large list of ConditionParam~~ Modify toQueryString to prevent SQLite expression tree from exceeding depth of 1000 Aug 22, 2024

jingtang10 requested changes Sep 10, 2024

View reviewed changes

aditya-07 requested changes Sep 10, 2024

View reviewed changes

FikriMilano suggested changes Sep 10, 2024

View reviewed changes

LZRS added 2 commits September 11, 2024 13:33

Merge remote-tracking branch 'upstream/master' into 2561-fix-sqlite-c…

a304c19

…rash

Refactor toQueryString

112344f

jingtang10 reviewed Sep 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify toQueryString to prevent SQLite expression tree from exceeding depth of 1000 #2565

Modify toQueryString to prevent SQLite expression tree from exceeding depth of 1000 #2565

LZRS commented Jun 8, 2024 •

edited

Loading

pld Jun 14, 2024 •

edited

Loading

pld Jun 21, 2024

LZRS Jun 24, 2024 •

edited

Loading

LZRS Jun 24, 2024

jingtang10 commented Jul 22, 2024 •

edited

Loading

LZRS commented Jul 23, 2024

jingtang10 commented Jul 23, 2024

LZRS commented Jul 23, 2024

LZRS commented Aug 22, 2024

jingtang10 Sep 10, 2024

jingtang10 Sep 10, 2024

jingtang10 Sep 10, 2024

LZRS Sep 11, 2024 •

edited

Loading

jingtang10 Sep 11, 2024

aditya-07 left a comment

aditya-07 Sep 10, 2024

FikriMilano Sep 10, 2024

LZRS Sep 11, 2024

jingtang10 Sep 11, 2024

jingtang10 Sep 11, 2024

jingtang10 Sep 11, 2024

Modify toQueryString to prevent SQLite expression tree from exceeding depth of 1000 #2565

Are you sure you want to change the base?

Modify toQueryString to prevent SQLite expression tree from exceeding depth of 1000 #2565

Conversation

LZRS commented Jun 8, 2024 • edited Loading

pld Jun 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LZRS Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jingtang10 commented Jul 22, 2024 • edited Loading

LZRS commented Jul 23, 2024

jingtang10 commented Jul 23, 2024

LZRS commented Jul 23, 2024

LZRS commented Aug 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LZRS Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aditya-07 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LZRS commented Jun 8, 2024 •

edited

Loading

pld Jun 14, 2024 •

edited

Loading

LZRS Jun 24, 2024 •

edited

Loading

jingtang10 commented Jul 22, 2024 •

edited

Loading

LZRS Sep 11, 2024 •

edited

Loading