Skip to content

Available Project Topics

maul.esel edited this page Apr 22, 2024 · 63 revisions

This is not a list of elaborated project/thesis proposals. It is rather a list of possible topics together with ideas how to realize them.

For students of the University of Freiburg: Most of these topics can also result in a Bachelor Project, a Bachelor Thesis, a Master Laboratory, a Master Project, as well as a Master Thesis. If you are interested please contact us and we will discuss together how to adapt the topic accordingly.

Computing Small Proofs for Concurrent Programs

Contact: Dominik

When we have verified a program, we would like to produce some kind of proof artifact that shows that the program is indeed correct. Such a proof should be easily checkable. This can be helpful within our tool (it can help us to find bugs in our verifier), and we can also output such proofs in a machine-readable format understood by other tools.

For concurrent programs, a useful proof format are so-called Owicki-Gries proofs (named after Susan Owicki and David Gries). However, a naive translation from our own data structures to Owicki-Gries proofs produces exponentially large proof artifacts. This defeats the purpose, as checking such large proofs is very expensive.

In this project, we will work with algorithms that aim to produce smaller proofs. This may take several forms:

  • In previous projects and theses, we have developed a first algorithm. Yet, the generated proofs often contain unnecessary conjuncts. We will modify the construction such that the formulas in the proof of a thread do not include conjuncts that are only relevant for other threads.
  • We will define and implement simplification rules that, given an Owicki-Gries proof, detect redundant parts of the proof and remove them to produce a smaller but still correct proof.
  • In a common use case, we do not need a full proof but we want to output proof hints, which another tool can use to confirm the program correctness. This sometimes requires special "simplifications" which preserve the correctness (and usefulness) of proof hints, though the result may no longer be a full proof.

Formal Correctness Certificates from Automated Verification

Contact: Dominik

Ultimate implements different algorithms for automated program verification. These algorithms are proven to be theoretically sound, i.e., if they declare that a program is correct, then the program is truly correct. However, like any large software project, the implementation of these algorithms in Ultimate sometimes contains bugs. As a result, Ultimate might declare that a program is correct when it is actually incorrect. In order for us to truly trust a verification result, Ultimate should be able to output a detailed, formal proof of correctness (or certificate) that can be automatically checked by an independent tool.

In this project, we will take some first steps towards this goal. We will take Ultimate's internal notion of a proof (called a Floyd/Hoare annotation) and use it to produce proof scripts for the the interactive proof assistant coq. With these scripts, coq should be able to prove that a Floyd/Hoare annotation produced by Ultimate actually shows that the given program is correct. To control the complexity, we will for now focus on a class of simple programs (e.g. no concurrency, no recursion, no pointers).

Note: Some previous knowledge of coq or a related system (Isabelle, Agda, Lean) will be helpful for this project. If you have not seen such tools before, you should be willing to invest some time to acquaint yourself with interactive proof assistants.

Implement more abstract domains in SIFA

Contact: Frank

Abstract Interpretation is a known technique to abstract the behavior of a program. There are different abstract domains that specify how to abstract. Abstract interpretation is already implemented in ULTIMATE with various domains.

However, there exists also a similar approach that is implemented in ULTIMATE called SIFA (Symbolic Interpretation with Fluid Abstractions). SIFA is based on the ideas of abstract interpretation (i.e. abstracts a programs), but it works slightly different and has proven to be more modular and precise. Therefore the functionality of the abstract domains slightly differs between SIFA and abstract interpretation.

The goal of this project is to implement new abstract domains in the SIFA framework of ULTIMATE.

Commutativity for Program Verification (or: Partial Order Reduction)

Contact: Dominik

In a concurrent program, different possible schedulings of threads lead to a large number of different executions (so-called interleavings). To verify a program, we have to show that it behaves correctly for all interleavings. This can make automated verification very expensive or even impossible.

The key observation of commutativity in this context is that for some statements, the order in which they are executed does not matter (e.g. for x:=x+1 and y:=5, the order does not matter, but for x:=x+1 and x:=0 it may). If two interleavings differ only in the ordering of such "commuting" statements, they are equivalent, and it suffices to prove correctness for one of these interleavings. We investigate how this observation can be exploited by so-called partial order reduction (POR) algorithms, in order to verify programs faster, and even verify programs that could not be verified (at all) without commutativity.

Projects in this area can either be more heavy on first-order logic and program semantics, or focus more on automata theory, or they can combine both aspects. An incomplete list of possible topic areas is given below.

  • What does "commutativity" mean? There are many possible definitions for this, which lead to different verification algorithms.
    • Conditional commutativity uses information about the program derived from a separate analysis, or from the ongoing verification itself.
    • Abstract commutativity focuses on a particular correctness proof of the program, and declares that statements commute if the order of their execution does not matter for this proof.
    • Stratified commutativity combines multiple definitions of commutativity in a single analysis.
  • How do I choose? I.e., if two interleavings are equivalent, which one should I verify? The answer has a significant influence on how complex a correctness proof needs to be, and how efficiently our automata algorithms behave.
  • How do I reduce? Programs typically have infinitely many interleavings, which we represent with finite automata. In order to remove redundant interleavings, there exist many POR algorithms, with different strengths and limitations. Implementing more of these algorithms (and adapting them to our automata-based setting) can speed up the verification.

Literature: The following papers describe our research in this area. Feel free to take a look :)

Petri nets as language acceptors

Contact: Matthias / Frank

In Ultimate we use bounded Petri nets a acceptors of regular languages. For an introduction you can have a look at the Petri net seminar slides of Nico Bühler from the 2015/16 Automata theory seminar. Petri nets are very efficient on languages that occur while verifying concurrent programs. There are several topics available. The following list could (in principle) be extended by every Petri net operation that is not yet supported.

  • Petri net minimization
  • Fold the finite prefix of an branching process back to a Petri net.
  • Optimizations of various operations (emptiness, difference, ...)

Automata Based Regression Verification

Contact: Matthias

Write interpolant automata to file. Use these interpolant automata for the verification of similar programs (e.g., the same program to which a patch was applied).

CDT interface for Ultimate

Contact: Daniel and Matthias

We already have a preliminary version of an Eclipse CDT interface that works with old versions of Eclipse. Adapt our interface to recent version of Eclipse and improve it.

Use Modulo Arithmetic to Represent Machine Integer Operations

Contact: Matthias

We have powerful tools for the logical reasoning over mathematical integers and hence we represent integer data types in computer programs by mathematical integers in our analyses. However, using mathematical integers is unsound in general since primitive data types usually have a finite number of values. E.g., an unsigned int in C can take only values from 0 to 4294967295. Our analysis can be made sound by using one of the following transformations.

  1. The Falke Transformation We transform a C program over primitive data types into a program over mathematical integers. After each arithmetic operation we apply a modulo operation. E.g., if x is an unsigned int the expression x + 1 is translated to (x + 1) % 4294967296.
  2. The Nutz Transformation We transform a C program over primitive data types into a program over mathematical integers. Before each comparison operation we apply a modulo operation. E.g., if x and y are unsigned ints, the expression x == y is translated to (x % 4294967296) == (y % 4294967296). Check the following examples for more details NutzTransformation01.c NutzTransformation02.c NutzTransformation03.c NutzTransformation04.c NutzTransformation05.c

The latter transformation is already implemented. Task: implement the first transformation and evaluate both.

A Sound but Efficient Extension of the Hoenicke-Lindenmann Memory Model

Contact: Matthias

Motivation

Our x86 computers store data bytewise. E.g., if a program writes the unsigned int 259 at the address ADDR, the computer will store only the least significant eight bit (here 3 in decimal representation) at ADDR and the next eight bit (here 1 in decimal representation) will be stored at ADDR+1.

In a naive model for the memory of the computer we read and store date bytewise. This is very costly, especially if we use integers to represent data. If the program writes the value of an unsigned int variable x at address ADDR we have to use modulo and division operations to get the value of each byte and we store x%256 at ADDR, we store (x/256)%256 at ADDR+1, ...

The Hoenicke-Lindenmann memory model circumvents this problem. Here, the memory is modeled as an array whose indices are addresses and whose values are integers. If the program writes the value of x to ADDR we just store its value at this address in our array. This is unsound, e.g., in the case were we read data from ADDR+1 (because we stored nothing here). See HoenickeLindenmannConcessions01.c for an example

Basic Idea

Do not store only the integer values in a data array. Use additionally an OrigAddress array and a TypeSize array. The indices of both arrays are addresses. The values of OrigAddress are non-positive integers, the values of TypeSize are positive integers. In the OrigAddress array we store how far the real data is displaced. E.g., if we store an int (bytesize four) at ADDR, we also store

OrigAddress[ADDR]:=0
OrigAddress[ADDR+1]:=-1
OrigAddress[ADDR+2]:=-2
OrigAddress[ADDR+3]:=-3

In the TypeSize array we store how many memory cells would be occupied by the naive memory model. E.g., if we store an int (bytesize four) at ADDR, we also store

TypeSize[ADDR]:=4

These three arrays will allow us to compute the exact data for each address in a way that does not require composition/decomposition to bytes in most cases.

Optimization

Take care that the control flow graph has two parallel edges for reading data. One for the simple case (OrigAddress[ADDR]==0) and one for the case where we have to compose bytes.

Take care that the control flow graph has two parallel edges for storing data. One for the simple case (OrigAddress[ADDR]==0 || size >=TypeSize[ADDR] ) and one for the case where we have to decompose into bytes.

Nice feature of traces that contain at least one simple-case-edge: Will be infeasible in most cases.

Nice feature of traces that contain only simple-case-edges: Additional arrays will be irrelevant in most cases and hence not occur in proofs.

In case the decomposition of integers to bytes would be too costly we could overapproximate data operations in the non-simple case edges. We would loose precision but we would still be sound.

Obstacles

We have to initialize the OrigAddress array with zeros. We can do this using the following assume statement at the beginning of the program.

forall i OrigAddress[i]==0

However, quantifiers are very expensive!

We have much more array operations than before.

Verification of Portable C Code

Contact: Matthias

Once, we had a C2Boogie translation that was independent of type sizes (e.g., sizeof(long) is not fixed to 4 bytes). We discontinued the support for this setting for good reasons:

  1. Unsound in the SV-COMP
  2. Introduces nonlinear arithmetic
  3. ... ?

Once, our plan was to check use exactly the semantics of the C standard, and not the semantics of common compilers/architectures (e.g., if some operation is implementation defined we should just havoc the resulting variable) We discontinued the support for this for good reasons: A lot of nondeterminism is expensive.

It seems to me that other tool try to become more implementation specific.

Let us don't play (only) where the elephants dance.

Furthermore, other tools might not check portable code, because they have problems that we don't have.

  • We can deal with nonlinear arithmetic.
  • Our C2Boogie translation is quite flexible.
  • Our model checker can deal with a lot of nondeterminism.

I think we can write a nice paper in which we explain how ultimate finds bug in code that should strictly follow the C standard and that was meant to be portable.

Migrate to Eclipse e4

Contact: Daniel

Ultimate is still based on the old Eclipse 3.x API. We want to migrate completely to the Eclipse e4 API. Ths requires a comprehensive analysis of the current Eclipse API usage in Ultimate, the identification of necessary changes, and the implementation of those changes.

For an overview over the topic, see the description of RCP and a small migration tutorial.

Native access to SMT solvers using JavaSMT

Contact: Daniel

We are currently using IPC to comunicate with various external SMT solvers. We would like to explore whether using the Java bindings of various solvers would provide us with performance improvements or not.

To this end, we want to use JavaSMT, an SMTLib abstraction layer.

The project would involve integrating JavaSMT into Ultimate, implementing a wrapper from our own data structures to JavaSMT and back, and compare the performance with our IPC approach.

Clone this wiki locally