Enhance detection of buffer overflows by introducing variable for length of array/blob #1330

FungOliver · 2024-01-16T02:28:54Z

This draft PR enhances the detection of buffer overflows by introducing a variable for the length of array/blob as suggested by @michael-schwarz in #620.

Arrays:
Upon creation of a VLA vdecl creates a new ghost variable in the domain, which tracks the symbolic relationships between program variables and the length of the array. Using the query system, the arraydomain can check for out-of-bounds accesses.

Malloc:
This works similarly to the VLA case. When upon calling malloc/alloc/calloc the special transfer function creates a new ghost variable, which tracks the length of blob size. MemOutOfBounds can then use the query system to ask the relational domain if an OOB access occurred. In addition, I implemented pointer tracking as we would not be able to know the symbolic relationship of the pointer offset otherwise. (see the following example).

   int main() {
     int  EXP_OF_ARRAY_LENGTH = rand();
     EXP_OF_ARRAY_LENGTH %= 4;
     int * arr;
     arr = malloc( sizeof(int)*   EXP_OF_ARRAY_LENGTH );
     for (int  i = 0 ; i < EXP_OF_ARRAY_LENGTH  ; i++) {
       char * t = arr + i ;
       *t = 2;
     }   
   }

CHANGES/BUG fixes:
- [x] The relational domain wasn't able to parse sizeOf expression. This is now fixed. One such case would be the new test case 36 100. 
- [x] Rewrite the logic of memOutOfBounds. Previously, the analysis gave no warnings for negative indices and was therefore unsound. One such example is in the new test case 74 33
 In addition to that, due to the fact that the struct Offset and pointer Offset are analyzed separately, the analysis was unsound in certain cases. (see the newly added test 74 32 and 74 34)
 Finally, the memOutOfBounds analysis should now be more precise and give less false warnings for structs see 74 13
- [x]  Fix computation of the no_overflow flag. Previously, the no_overflow flag was computed using a comparison of top. This was incorrect in the case of comparisons. When evaluating comparisons, we get the interval [0,1], if we have an overflow. However, this does not correspond to Top. Additionally, this flag was not used in further computations. This leads to the RD being unable to compare unsigned and signed integers, which is essential for this analysis. To fix this, I added an overflow flag in the intdomain to determine if an overflow occurred in an expression. This is a nonfunctional solution; however, after initially trying to rewrite the logic to incorporate the overflow information, I concluded that this would lead to too much refactoring.
```C
int main() {
    unsigned  int EXP_OF_ARRAY_LENGTH =   (rand()%32) +2;
    for (  int i = 0 ;i < EXP_OF_ARRAY_LENGTH -33; i++) 
        assert( i< EXP_OF_ARRAY_LENGTH); // UNKNOWN
    return 0;

}

TODO:

Option for toggling pointer tracking
Rewrite memOutOfBounds query logic
Fix removal of ghost variables after function return
Clean-up

jerhard

Thank you for your pull request! Supporting relational information about array lengths and allocated blocks will be very neat!

I left some first round of remarks that you may have a look at.

src/cdomains/arrayDomain.ml

src/domains/valueDomainQueries.ml

tests/regression/74-invalid_deref/30-mem-oob-struct.c

tests/regression/97-relational-malloc/02-simple-if-else.c

tests/regression/97-relational-malloc/03-global.c

src/analyses/apron/relationAnalysis.apron.ml

jerhard · 2024-01-25T10:18:26Z

src/analyses/base.ml

+    | Q.MayOverflow e ->  
+      IntDomain.local_no_overflow := true;
+      if M.tracing then M.trace "apron" "exp %a\n" d_exp e;
+      ignore(query_evalint (Analyses.ask_of_ctx ctx) ctx.global ctx.local e);
+      !IntDomain.local_no_overflow ;


Did you find an example where the old no overflow handling in the relational domain caused an unsoundness in the analysis results?

The old no overflow handling didn't cause any unsoundness in the analysis results because the flag was ignored (see comment). However this overflow handling is not sufficient for the analysis, due to apron giving up when there is a cast from signed integer to unsigned integer and vice versa even when enabling --set sem.int.signed_overflow assume_none. This may work for relational vla length analysis if there are just integers, but for malloc cil cast the expression to unsigned long.

If we decide to not ignore the old overflow handling, we do get unsoundness in the analysis as one can see in 6b797ad.

// PARAM: --enable ana.int.interval --set ana.activated[+] apron #include <stdlib.h> #include <goblint.h> int main() { unsigned int len = rand(); len = len % 10; for (int i = 0; i -1 < len ; i++) { int tmp = i; } }

This results in

[Error][Analyzer][Unsound] both branches over condition '(unsigned int )(i - 1) < len' are dead (tests/regression/100-limitation/01-crash.c:9:9-9:37) [Error][Analyzer][Unsound] Both branches dead

I, unfortunately, found out that the current overflow handling in this PR also suffers from this problem. The queries for EvalInt are cached, which results in the local_no_overflow not being set correctly.

One idea to solve this problem, which don't involve rewriting the whole evalint function would be to disable the cache lookup with another global flag.

I, unfortunately, found out that the current overflow handling in this PR also suffers from this problem. The queries for EvalInt are cached, which results in the local_no_overflow not being set correctly.

One idea to solve this problem, which don't involve rewriting the whole evalint function would be to disable the cache lookup with another global flag.

Disabling the cache with a global flag seems to work. However, I agree that this solution may not be the prettiest. After having a look at the overflow handling in Linear two var equality #1297 and testing a bit, it seems like their overflow handling may also work with my analysis.

src/analyses/memOutOfBounds.ml

src/analyses/wrapperFunctionAnalysis.ml

michael-schwarz

Great to see there's progress here! I left some first comments, but I have not had the time to go into details yet (especially the changes inside the relational domain and the oob analysis).

Here's two questions I already have though:

One's conceptual: Have you considered what happens for multi-threaded programs? Initially we thought about having an analysis such as this to demonstrate the usefulness of our analyses of global variables. Do you think this it within reach?
The other one is perhaps more down to earth: What do you do when you encounter a malloc only in one branch? Will you not have issues with the apron environments not matching up after the join?

tests/regression/36-apron/100-sizeOf.c

tests/regression/06-symbeq/53-var_eq_multithread.c

src/domains/queries.ml

michael-schwarz · 2024-01-30T19:14:36Z

src/domains/queries.ml

@@ -195,6 +199,9 @@ struct
    | MustTermAllLoops -> (module MustBool)
    | IsEverMultiThreaded -> (module MayBool)
    | TmpSpecial _ -> (module ML)
+    | MayBeOutOfBounds _ -> (module ID)
+    | MayOverflow _ -> (module MayBool)
+    | AllocMayBeOutOfBounds _ -> (module VDQ.ProdID)


Also why ID above and VDQ.ProdID here? The meaning of the tuple should ideally be obvious from the name.

For the relational VLA oob check, I don't check if the index can be negative because if the relational domain knows that the index is not negative, the bounds for the array index would be improved anyway in the Valuedomain.

However, this does not work for pointer arithmetic. The relational domain doesn't track the relationship between pointers (yet). Therefore the analysis can't give boundaries for the memory access. In the (newly) added test case 85 14 memOutOfBounds would raise a warning that we may access the array before the allocated memory. With the new pointer_tracking enabled, which tracks relational relationships between pointers, we know that this cannot be the case. We, therefore, return a tuple of ~~ID~~ FlatBool to know if we might access before or after the allocated memory.

src/analyses/apron/relationAnalysis.apron.ml

michael-schwarz · 2024-01-30T19:24:40Z

src/analyses/mCP.ml

@@ -249,10 +249,10 @@ struct
    let anyq = Queries.Any q in
    if M.tracing then M.traceli "query" "query %a\n" Queries.Any.pretty anyq;
    let r = match Queries.Hashtbl.find_option querycache anyq with
-      | Some r ->
+      | Some r when !lookUpCache ->


Why does this cache need to be disabled?

This is for the new no overflow computation in the relational domain, as the old one was unsound and unused. I added a local_no_overflow flag in the intdomain and checked this flag after an EvalInt query. This is a kind of dirty hack, as I first tried to rewrite the Evalint to include the overflow information, but after adapting the intdomain and trying to adopt the base.ml to include the overflow information. I noticed that this would involve nearly a few thousand lines of code and deemed this not reasonable for this PR.

This works in most cases except for loops, where it seems like the EvalInt query is cached, which results in the EvalInt query not being newly evaluated and the flag not being set correctly. See also: #1330 (comment)

This is very suspicious. We assume that queries are pure, so caching or not caching cannot make an observable difference. If it can, then something is wrong.

We don't recompute until query results stabilize, so in the presence of such impure queries we have no soundness guarantees anymore.

This is, because for the no_overflow query uses side effects to evaluate if we have an overflow. It queries the EvalInt query and checks the side effect for an overflow. However, this does not work if the query is cached.

FungOliver · 2024-01-31T14:35:42Z

Greta to see there's progress here! I left some first comments, but I have not had the time to go into details yet (especially the changes inside the relational domain and the oob analysis).

Here's two questions I already have though:
* One's conceptual: Have you considered what happens for multi-threaded programs? Initially we thought about having an analysis such as this to demonstrate the usefulness of our analyses of global variables. Do you think this it within reach?

* The other one is perhaps more down to earth: What do you do when you encounter a `malloc` only in one branch? Will you not have issues with the apron environments not matching up after the join?

The problem with global variables in Multithreading right now is that we need the relational information of the ghost variable with the initialization expression together in one domain. However, in the case of multithreading, the global variables get removed and are tracked separately. Therefore, we lose the relational relationship between the ghost variable and the initialization expression. Additionally, in the case of escaped variables, the relationship remains in the domain. Nonetheless, this is still useless due to the fact that new variables are introduced for the escaped variables. So, we still know nothing in this case. I got this to work with global invariants after setting the ghost variable to Global type.
Yes, this is indeed a problem. However, I am also not quite sure how to solve this problem...

Unfortunately, I don't think solving later issues is within reach. Investigating and fixing bugs in the relational domain and unsoundness issues in memOutOfBounds took quite some time.

src/analyses/apron/relationAnalysis.apron.ml

… of the current evaluation

jerhard · 2024-03-14T15:35:46Z

Before merging this, one should verify whether the handling of non-unique mallocs is sound here, e.g. for a program segment:

int n = 1 + (rand() % 5);
for(int i = 0; i < n; i++){
   int* a = malloc(i * sizeof(int));
   if(i == 0){ b = a;}
}
*b = 4; // Should warn, as size of block pointed to by b is zero.

jerhard · 2024-07-30T11:22:33Z

How does this ensure that the relational analysis never leaks the introduced dummy variables to other analysis, e.g. by making queries? Note that the relational information for other variables then may also be expressed via the dummy variables.

FungOliver · 2024-07-30T12:13:04Z

The ghost variables are added into the environment with RD.add_vars like in body and in return

analyzer/src/analyses/apron/relationAnalysis.apron.ml

Line 361 in d86fe48

let rel' = RD.add_vars st.rel [RV.return] in

The ghost variables are needed to evaluate any relational relationship between for example arrays and should be part of the relational domain. I am not quite sure what you mean by leak here. However, I think that the ghost variables should behave similarly to the other variables in the relational domain?

jerhard self-requested a review January 16, 2024 09:37

michael-schwarz added feature precision student-job labels Jan 18, 2024

jerhard reviewed Jan 19, 2024

View reviewed changes

FungOliver force-pushed the relationalArray branch 2 times, most recently from a9c9eb7 to f504431 Compare January 24, 2024 02:09

github-advanced-security bot found potential problems Jan 24, 2024

View reviewed changes

src/analyses/apron/relationAnalysis.apron.ml Fixed Show fixed Hide fixed

jerhard reviewed Jan 25, 2024

View reviewed changes

FungOliver force-pushed the relationalArray branch from e4de37e to b516fd3 Compare January 26, 2024 03:34

michael-schwarz added the relational Relational analyses (Apron, affeq, lin2var) label Jan 26, 2024

jerhard requested review from sim642 and michael-schwarz January 29, 2024 09:26

michael-schwarz reviewed Jan 30, 2024

View reviewed changes

FungOliver force-pushed the relationalArray branch from 84f806e to a59b918 Compare January 31, 2024 18:25

sim642 mentioned this pull request Feb 1, 2024

Remove intermediate module definition in AffineEqualityAnalysis #1349

Merged

sim642 reviewed Feb 3, 2024

View reviewed changes

src/analyses/apron/relationAnalysis.apron.ml Outdated Show resolved Hide resolved

FungOliver added 12 commits February 9, 2024 05:00

Fix Todos

61f9b47

add simple test case

0a12f47

enable ana.int.interval when ana.arrayoob is enabled

d4fbdbd

add new Query MayBeOutOfBounds

3660b10

add new examples for relational oob VLA check

137c00a

update method signature to include index expression for VLA and depth…

bbcad29

… of the current evaluation

update richvarinfo to allow global Varinfos and different types

045ee7f

implement relational VLA OOB check

650055f

fix error in array_oob_check

ceb804b

fix flag in wrapperFunctionAnalysis

ca227ab

add new Queries MayOverflow and AllocMayBeOutOfBounds

0c7aef3

add overflow flag to intdomain

3ce652c

FungOliver added 25 commits February 10, 2024 04:10

add new test and clean up previous tests

9f13e4a

undo comments in test

d29014e

add new test case

d3d56d1

skip test case due to timeout

5fe82ce

format test case 86 06

27a61fc

fix bug in pointer not correctly invalidated

6aa4e15

fix bug in addressOfPointer

eea98e3

expand test case 85 18 85 01

4166740

remove polyhedar domain

705ade6

add a new test case and fix existing ones

942b311

add new test case 85 19

0b100bb

clean up code

c1ee198

clean up code

333e7cb

update json conf files

a02e7f6

fix conf

bca0f0f

add useAfterFree analysis to conf files

49fbccd

fix conf files

00642c5

fix exception in sizeOfTyp

07e51af

fix mistake in conf file

87d0568

fix TypeOfError in AllocMayBeOutOfBounds

b919e9c

update conf files and fix mistake with addressOfPointer

9657391

catch a few exceptions in relationalAnalaysis

ba89107

fix Fixpoint not reached exception in Pointer Tracking

e0f4607

call assignLval instead of assign

bb9e223

fix fixpoint errors in conf files

1e59d65

jerhard self-assigned this Jul 30, 2024

jerhard added this to the SV-COMP 2025 milestone Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance detection of buffer overflows by introducing variable for length of array/blob #1330

Enhance detection of buffer overflows by introducing variable for length of array/blob #1330

FungOliver commented Jan 16, 2024 •

edited

Loading

jerhard left a comment

jerhard Jan 25, 2024

FungOliver Jan 27, 2024 •

edited

Loading

FungOliver Jan 27, 2024

FungOliver Jan 27, 2024 •

edited

Loading

FungOliver Feb 2, 2024 •

edited

Loading

michael-schwarz left a comment •

edited

Loading

michael-schwarz Jan 30, 2024

FungOliver Feb 1, 2024

michael-schwarz Jan 30, 2024

FungOliver Jan 30, 2024

sim642 Jan 31, 2024

FungOliver Jan 31, 2024

FungOliver commented Jan 31, 2024 •

edited

Loading

jerhard commented Mar 14, 2024 •

edited

Loading

jerhard commented Jul 30, 2024

FungOliver commented Jul 30, 2024

Enhance detection of buffer overflows by introducing variable for length of array/blob #1330

Are you sure you want to change the base?

Enhance detection of buffer overflows by introducing variable for length of array/blob #1330

Conversation

FungOliver commented Jan 16, 2024 • edited Loading

jerhard left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FungOliver Jan 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FungOliver Jan 27, 2024 • edited Loading

Choose a reason for hiding this comment

FungOliver Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

michael-schwarz left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FungOliver commented Jan 31, 2024 • edited Loading

jerhard commented Mar 14, 2024 • edited Loading

jerhard commented Jul 30, 2024

FungOliver commented Jul 30, 2024

FungOliver commented Jan 16, 2024 •

edited

Loading

FungOliver Jan 27, 2024 •

edited

Loading

FungOliver Jan 27, 2024 •

edited

Loading

FungOliver Feb 2, 2024 •

edited

Loading

michael-schwarz left a comment •

edited

Loading

FungOliver commented Jan 31, 2024 •

edited

Loading

jerhard commented Mar 14, 2024 •

edited

Loading