C-2PO: Thesis About a Weakly-Relational Pointer Analysis #1485

reb-ddm · 2024-05-24T10:02:10Z

This PR introduces a weakly-relational analysis between pointers (ref: 2-pointer Logic by Seidl et al.).
The analysis can infer must-equalities and must-disequalities between terms which are built from pointer variables, with the addition of constants and dereferencing.
Moreover, it computes pairs of variables that are in distinct memory blocks, for example if they were initialized with two different calls to malloc.
An example of a property that the analysis can infer is *(x + 1) = 3 + y and a != 5 + *(y + 1).

Abstract states are represented by a congruence closure of terms over the uninterpreted function *.
Additionally, a list of disequalities and a list of block disequalities between terms is stored.

The disequalities are important during the assignment operation, because when we assign a value to an address on the heap, we need to forget the information we had about all terms thay may alias with this address.
Additionally to the (block) disequalities inferred by this analysis, we exploit the May-Point-To analysis in order to derive additional disequalities between terms.
Using the May-Point-To analysis can be disabled with the option ana.c2po.askbase.

There are two version of the join and two versions of the equal function that are implemented.
They can be chosen via the options ana.c2po.precise_join and ana.c2po.normal_form, respectively.
They are disabled by default.
If precise_join is enabled, the join is calculated using the quantitative finite automaton. Otherwise, only the partition is considered for the join, which is a bit less precise.
If normal_form is enabled, the equality of two congruence closures is decided by calculating a normal form, instead of
comparing the equivalence classes. The normal form is computed lazily and doesn't need to be recomputed when we call equal on the same domain element.

The implementation of enter duplicates the parameter variables of each function in order to infer in combine which terms are related to the initial parameters.
In order to use the May-Point-To analysis also on these duplicated variables, an additional analysis startState is added.
It remembers the May-Point-To information of the duplicated variables at the beginning of the function.

Here is a more detailed description of the files that are added:

The file unionFind.ml contains the code for a quantitative union find and the quantitative finite automata.
They will be necessary in order to construct the congruence closure of terms.
The contained modules are:

T: here, the terms (e.g., *(x + 64)) and propositions (e.g., *(x + 64) = 192 + y) are defined. There are also function to convert a CIL expression to a term and a term to a CIL expression.
The offsets in the terms are expressed in bits, therefore they are equal to the offset of the CIL expression multiplied with the type of the element that the variable points to.
Each term stores the information about the CIL expression that was used to create the term.
This way, it is easier to reconvert a term to a CIL expression and to get information about the type of a term.
Only the terms that are pointers or arrays or structs or 64-bit integers are considered by the analysis.
Arrays and structs are interpreted as pointers, e.g. a[3] is interpreted as the term *(a + 3).
UnionFind: the union-find data structure is defined with terms as elements.
LookupMap: this map represents the transitions of the quantitative finite automata.
Each term t is mapped to the terms *(z + t) or equal terms.

The file congruenceClosure.ml contains the data structures for the C-2PO Domain, i.e., the congruence closure, the disequalities and the block disequalities:

BlDis represents the block disequalities as a map that maps each term to a set of terms that are definitely in a different block.
Disequalities represents the disequalities as a map from a term to a map from an integer to another term.
The module contains functions to compute the closure of the disequalities that are implied by equalities, disequalities or block disequalities.
SSet: a set of terms that are currently considered by the analysis.
MRMap: maps each equivalence class to a minimal representative. This is necessary for computing the normal form.
There are functions to calculate the closures of the proposition, to insert terms, add propositions, remove terms, two methods to compute join and two methods to compute equal.
MayBeEqual: contains code to check if two terms may point to the same address or if they may overlap. It uses information from the disequalities and from MayPointTo for this purpose.

The file c2poDomain.ml defines the domain operations.

c2poAnalysis.ml contains the transfer functions for the analysis.

duplicateVars.ml contains functions for duplicating variables, which is used in enter in the C-2PO analysis and in the StartState analysis.

startStateAnalysis.ml remembers the value of each parameter at the beginning of a function. It answers the query May-Point-To for the duplicated variables and returns the initial value of the original variable.

singleThreadedLifter.ml transforms any analysis into a single threaded analysis by returning top any time the code might be multi-threaded. This can be reused by other analyses in the future.

…arinfo

added distinct dummy and return variables; changed typ to exp as additional information for the terms, therefore simplified the function for finding the cil term starting; fixed some bugs;

… return of startState

… possible to dereference it further.

sim642

I haven't read the paper, so I cannot comment on the approach itself, but I looked over some of the code.

conf/svcomp-c2po-no-maypointto.json

src/util/library/libraryDsl.ml

src/cdomains/unionFind.ml

src/config/options.schema.json

src/goblint_lib.ml

tests/regression/83-c2po/01-simple.c

This reverts commit 8173b99.

src/cdomains/unionFind.ml

… of the resulting varinfo

michael-schwarz · 2024-09-22T14:05:16Z

Any objections to merging @jerhard @sim642?

sim642 · 2024-09-23T08:06:31Z

src/cdomains/unionFind.ml

+type prop = Equal of ((term * term * Z.t) [@equal tuple3_equal][@compare tuple3_cmp][@hash tuple3_hash])
+          | Nequal of ((term * term * Z.t) [@equal tuple3_equal][@compare tuple3_cmp][@hash tuple3_hash])
+          | BlNequal of ((term * term) [@equal tuple2_equal][@compare tuple2_cmp][@hash tuple2_hash])


It might be worth giving (semantic) names to the types term * term * Z.t and term * term above and use them here. That would avoid all the extra attributes here.

sim642 · 2024-09-23T08:11:40Z

src/cdomains/duplicateVars.ml

  (* the hash/compare values should not depend on the type.
     But they have to be defined even though they are not used, for some reason.*)
-  let equal_typ _ _ = true
-  let hash_typ _ = 0
-  let compare_typ _ _ = 0
+  let equal_typ a b = Stdlib.compare a b = 0
+  let hash_typ x = Hashtbl.hash x
+  let compare_typ a b = Stdlib.compare a b

-  type t = AssignAux of (typ[@compare.ignore][@eq.ignore][@hash.ignore])
-         | ReturnAux of (typ[@compare.ignore][@eq.ignore][@hash.ignore])
+  type t = AssignAux of typ
+         | ReturnAux of typ


Is this comment outdated? Because they are used now.

And as such, CilType.Typ.t should be used instead such that correct implementations are used. CIL types are cyclic, so Stdlib.compare may not terminate.

sim642 · 2024-09-23T08:40:55Z

src/cdomains/congruenceClosure.ml

+          set: SSet.t;
+          map: LMap.t;
+          normal_form: T.v_prop list Lazy.t[@compare.ignore][@eq.ignore][@hash.ignore];
+          diseq: Disequalities.t;


ppx_deriving does not have any [@compare.ignore] or [@eq.ignore] attributes (ppx_compare does, but we don't use that).
So this uses the built-in behavior: always forces the lazy evaluation and compares that. This might've had an unintended and unexpected performance impact.

I only noticed this now due to the weird Lazy.hash definition above: ppx_deriving_hash doesn't have [@hash.ignore] but it also doesn't have built-in Lazy.t support (in the released version 0.1.2 at least), so that's why the compiler complains.

I think explicitly overriding all three for the entire type (in parenthesis!) should work:

Suggested change

diseq: Disequalities.t;

normal_form: (T.v_prop list Lazy.t[@compare fun _ _ -> 0][@eq fun _ _ -> true][@hash fun _ -> 0]);

The Lazy module extension above shouldn't be necessary then either.

sim642 · 2024-09-23T08:48:07Z

src/common/util/richVarinfo.ml

+module VarinfoDescription = struct
+  (**This type is equal to `varinfo`, but the fields are not mutable and they are optional.
+     Only the name is not optional. *)
+  type t = {
+    vname_: string;
+    vtype_: typ option;
+    vattr_: attributes option;
+    vstorage_: storage option;
+    vglob_: bool option;
+    vinline_: bool option;
+    vdecl_: location option;
+    vinit_: initinfo option;
+    vaddrof_: bool option;
+    vreferenced_: bool option;
+  }
+
+  let from_varinfo (v: varinfo) =
+    {vname_=v.vname;
+     vtype_=Some v.vtype;
+     vattr_=Some v.vattr;
+     vstorage_=Some v.vstorage;
+     vglob_=Some v.vglob;
+     vinline_=Some v.vinline;
+     vdecl_=Some v.vdecl;
+     vinit_=Some v.vinit;
+     vaddrof_=Some v.vaddrof;
+     vreferenced_=Some v.vreferenced}
+
+  let empty name =
+    {vname_=name;
+     vtype_=None;
+     vattr_=None;
+     vstorage_=None;
+     vglob_=None;
+     vinline_=None;
+     vdecl_=None;
+     vinit_=None;
+     vaddrof_=None;
+     vreferenced_=None}
+
+  let update_varinfo (v: varinfo) (update: t) =
+    let open Batteries in
+    {vname=update.vname_;
+     vtype=Option.default v.vtype update.vtype_;
+     vattr=Option.default v.vattr update.vattr_;
+     vstorage=Option.default v.vstorage update.vstorage_;
+     vglob=Option.default v.vglob update.vglob_;
+     vinline=Option.default v.vinline update.vinline_;
+     vdecl=Option.default v.vdecl update.vdecl_;
+     vinit=Option.default v.vinit update.vinit_;
+     vid=v.vid;
+     vaddrof=Option.default v.vaddrof update.vaddrof_;
+     vreferenced=Option.default v.vreferenced update.vreferenced_;
+     vdescr=v.vdescr;
+     vdescrpure=v.vdescrpure;
+     vhasdeclinstruction=v.vhasdeclinstruction}
+end


This seems like a massive amount of boilerplate for what exactly? The fields of varinfo are mutable, so the type could just be changed in-place with v.vtype <- t without introducing copies of the varinfo with the same vid but different data.

When varinfos are correctly compared with CilType.Varinfo, then that only depends on vid anyway, so such mutations shouldn't break any data structures they are in either.

reb-ddm added 30 commits April 29, 2024 14:29

modified a comment

f22f4f3

minor style choices

69b0031

simplify cil conversion by using to_index

69dc1ff

fix small thing

72d689e

removed code for invertible assignments, as it is not necessary

70c273f

add size parameter to may_be_equal for the ize of the lvalue

31ae30d

changed Congruence closure to not be generic, but only work for for v…

8b427c4

…arinfo

added type information to terms

addf77b

added size of variables to the data structure;

d8c484a

added distinct dummy and return variables; changed typ to exp as additional information for the terms, therefore simplified the function for finding the cil term starting; fixed some bugs;

modified some tests

5889a94

first draft of join, haven't tested it yet

6b2cacd

finishe implementing join, calculate offsets correctly in join

615453c

fix bug in equal

5a6d177

better dereferencing

4e27d5f

added tests for join

40d0038

implemented disequalities

4216b06

implemented remove with disequalities

d34c252

only dereference disequalities if they have the same size

b5e94a9

add test for disequalities

1fa4e66

implemented join for disequalities

4da20a6

fixed bugs in startState analysis and remembered tainted variables in…

ad80a19

… return of startState

remove tainted variables in wrpointer analysis in combine_env

0882a0a

add function to remove tainted variables to the wrpointer domain

4d5a966

add taintPartialContext analysis to tests

4532e5b

add test for tainted variables n a function

c167abe

adapted the comments

9bd512d

fixed bug in join and added corresponding test

2807b01

made join more precise and made widen = top

23213b3

handle unknown left hand sides of assignments

61058ea

better calculation of dereferenced expression. Give up when it is not…

4a253db

… possible to dereference it further.

sim642 reviewed Aug 9, 2024

View reviewed changes

michael-schwarz requested review from michael-schwarz and jerhard August 9, 2024 09:21

reb-ddm added 12 commits August 13, 2024 10:50

remove unused polymorphism

30949fd

remove redundant compare

b3ea845

use Cilfacade.isFloatType snf Cil.unrollType

da75386

change error to TypoOfError

448768e

explain better what singleThreadedLifter is for

eabeec9

do not recompute compare

1e5e645

fix invariant generation bug

f6133c9

remove confs

599dfd4

Revert "Horrible, horrible fix. May the gods forgive us!"

0d1470a

This reverts commit 8173b99.

derive equality of propositions

85c08ec

use Lattice.LiftBot

6b269fd

catch exception

03fca72

github-advanced-security bot found potential problems Sep 10, 2024

View reviewed changes

src/cdomains/unionFind.ml Fixed Show fixed Hide fixed

reb-ddm added 7 commits September 10, 2024 11:37

implemented pretty_diff

07ab99a

remove debug print

e22a989

make precise_join an enum

addbbd4

make normal_form an enum

80f68fc

fix bug

39b2261

change the location od c2poAnalysis in Goblint_lib

c9a7e04

modify richvarinfo to make it possible to change all the other fields…

cb9dc2f

… of the resulting varinfo

michael-schwarz mentioned this pull request Sep 22, 2024

Consider submitting C2PO tests to SV-COMP #1571

Open

michael-schwarz approved these changes Sep 22, 2024

View reviewed changes

Merge branch 'master' into thesis-weakly-relational-pointer

f787e67

sim642 self-requested a review September 22, 2024 14:12

sim642 mentioned this pull request Sep 23, 2024

Handle Lattice.BotValue in Lattice.LiftBot and similar #1572

Open

sim642 reviewed Sep 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C-2PO: Thesis About a Weakly-Relational Pointer Analysis #1485

C-2PO: Thesis About a Weakly-Relational Pointer Analysis #1485

reb-ddm commented May 24, 2024 •

edited

Loading

sim642 left a comment

michael-schwarz commented Sep 22, 2024

sim642 Sep 23, 2024

sim642 Sep 23, 2024

sim642 Sep 23, 2024

sim642 Sep 23, 2024

	diseq: Disequalities.t;
	normal_form: (T.v_prop list Lazy.t[@compare fun _ _ -> 0][@eq fun _ _ -> true][@hash fun _ -> 0]);

C-2PO: Thesis About a Weakly-Relational Pointer Analysis #1485

Are you sure you want to change the base?

C-2PO: Thesis About a Weakly-Relational Pointer Analysis #1485

Conversation

reb-ddm commented May 24, 2024 • edited Loading

sim642 left a comment

Choose a reason for hiding this comment

michael-schwarz commented Sep 22, 2024

sim642 Sep 23, 2024

Choose a reason for hiding this comment

sim642 Sep 23, 2024

Choose a reason for hiding this comment

sim642 Sep 23, 2024

Choose a reason for hiding this comment

sim642 Sep 23, 2024

Choose a reason for hiding this comment

reb-ddm commented May 24, 2024 •

edited

Loading