Skip to content

GSoC 2023 Project Ideas

Oliver Beckstein edited this page Feb 1, 2023 · 35 revisions
Google Summer of Code 2023

Please see our Google Summer of Code wiki page for some general information, including advice on application writing and also see our GSoC FAQ for commonly asked questions.

To prospective applicants: if you are interested in taking part, please do get in touch on the developer list. Given this year's changes to the GSOC program structure (medium and long projects), letting us know of your intentions to apply and getting acquainted with the project early will be very helpful.

To prospective mentors: MDAnalysis welcomes new mentors, please do get in touch on the developer list if you are interested in taking part. We typically expect mentors to be familiar with our development process as evidenced by contributions to the code base and interactions on the developer mailing list.

Overview

A list of projects ideas for Google Summer of Code 2023.

The currently proposed projects are:

  1. Generalise groups
  2. Extend MDAnalysis interoperability
  3. Molecular volume and surface analysis
  4. Benchmarking and performance optimization
  5. Transport property calculations

Or work on your own idea! Get in contact with us to propose an idea and we will work with you to flesh it out into a full project. Raise an issue in the Issue Tracker or contact us via the developer Google group.

Look at the list of all available mentors for MDAnalysis for potential mentors for your project. Please send all communications to the mailing list (and don't contact mentors privately). You can certainly ask for the opinion of a specific mentor if you know that their expertise is particularly suitable for your project.


Project summary

The table summarizes the project ideas; long descriptions come after the table (or click on the links under each project name). The difficulty is a somewhat subjective ranking, where easy means that we know pretty much what needs to be done, medium requires some additional research into best solutions as part of the project, and hard is high risk/high reward where we think a solution exists but we will have to work with the student to find it and implement it. The project size is either 175 h (medium) or 350 h (long) projects.

project name difficulty project size description skills mentors
1 Generalise Groups hard 350 hours Generalise concept of groups Python, NetworkX, Molecular modelling @lilyminium, @fiona-naughton, @richardjgowers, @IAlibay, @micaela-matta @ojeda-e
2 Extend MDAnalysis Interoperability medium 350 hours Extend converters module to other relevant packages Python, Molecular Modelling @lilyminium, @IAlibay, @fiona-naughton, @hmacdope
3 Molecular volume and surface analysis medium 175 hours use an existing package for molecular surface area calculations to build a new analysis module Python, MDAnalysis.analysis @orbeckst, @IAlibay, @hmacdope
4 Benchmarking and performance optimization medium/hard 175 hours write benchmarks for automated performance analysis and address performance bottlenecks Python @hmacdope, @orbeckst, @jbarnoud @ojeda-e
5 Transport property calculations medium 350 hours write analysis code to calculate physical transport properties Python, Physics/Mathematics @orionarcher, @hmacdope

Project 5:

Project 1: Bead and Ring Groups

It is common to want to consider a group of atoms as a single site/particle, for example defining the position of a water molecule (or a larger solvent) as its center of mass. It then follows that it is useful to consider many such groupings as an array of quasi-particles, leading to something like an AtomGroup-Group, e.g. a Group representing a solvent where each item in the Group is a single molecule. The goal of this project is to make two such groupings, BeadGroup and RingGroup:

  • BeadGroup: groups of atoms that can be represented as a single site/particle. This could be used for analysis purposes, as well as to define coarse-grained beads.
  • RingGroup: aromatic rings (eg benzene, nucleobases etc.) can be defined by their position (the geometric center of the ring) and their normal vector (the direction they are facing). This class would be implemented as a special case of BeadGroup which also defines a directionality.

Objectives

  1. Design and implement a BeadGroup class to represent a container of many groupings of atoms
  2. Generalise existing methods (e.g. center_of_mass) to BeadGroup
  3. Implement RingGroup, as a special case of BeadGroup
  4. Implement ring finding functions to quickly define these groups
  5. Implement basic RingGroup analysis functions, eg angle between rings, π-stacking identification.

Relevant skills

  • Python
  • Graph theory (eg the NetworkX package)
  • Chemistry

Related issues:

Mentors

  • @richardjgowers
  • @lilyminium
  • @fiona-naughton
  • @IAlibay
  • @micaela-matta
  • @ojeda-e

Project 2: Extend interoperability

MDAnalysis has been pushing towards interoperability objectives. In pursuit of this aim, we have already added converters to the ParmEd and RDKit libraries. We aim to continue this direction by focusing on other relevant packages such as MDTraj, pytraj, OpenBabel, and Psi4.

Objectives

  • Create converter classes to and from MDAnalysis to your chosen package(s)

Relevant skills

  • Python
  • Any other language relevant to your chosen package (likely C++)

Mentors

  • @IAlibay
  • @lilyminium
  • @fiona-naughton
  • @hmacdope

Project 3: Molecular volume and surface analysis

It is often necessary to measure volume and surface area of a biomolecule or parts of it over a MD trajectory. MDAnalysis is currently lacking this important functionality. In this project you will implement an analysis class that calculates the molecular volume and area for an atomgroup as a function of time. See issue #2439.

The FreeSASA library appears to be a suitable tool to integrate into MDAnalysis. It comes under MIT license and has a C core and python bindings:

By default Lee & Richards' algorithm is used, but Shrake & Rupley's is also available.

Simon Mitternacht (2016) FreeSASA: An open source C library for solvent accessible surface area calculation. F1000Research 5:189 (doi: 10.12688/f1000research.7931.1)

Objectives

For this project you would

  1. figure out if freesasa and freesasa-python can be installed as pip and conda package; if necessary create the conda packages (on conda-forge)
  2. create test cases (use existing files in MDA and run external implementation for reference)
  3. create a analysis module MDAnalysis.analysis.sasa using the MDAnalysis.analysis.base.AnalysisBase framework.

Stretch goals

  1. benchmark performance
  2. depending on the performance we might also want to implement a parallel version of the analysis class in PMDA, which is easy once we have a standard MDAnalysis analysis class.

Mentors

  • @richardjgowers
  • @IAlibay
  • @orbeckst
  • @hmacdope

Project 4: Benchmarking and performance optimization

The performance of the MDAnalysis library is assessed by automated benchmarks with ASV. The benchmarks are publicly available and are updated every night.

The goal of this project is to increase the performance assessment coverage and identify code that should be improved.

Objectives

  1. Write benchmark cases.
  2. Analyze the performance history to identify code that needs to be improved.
  3. Optimize the code for at least one of the discovered performance bottlenecks.

Relevant skills

  • Python

Mentors

  • @orbeckst
  • @hmacdope
  • @jbarnoud
  • @ojeda-e

I don't seem to have edit access but I wrote up this draft, happy to have any feedback

Project 5: Transport property calculations

Diffusivity and conductivity are key properties of many molecular systems. MDAnalysis currently lacks a way to easily calculate these properties.

Objectives

  1. Implement self-diffusivity coefficient calculations
  2. Implement conductivity calculations with Nernst-Einstein and Green-Kubo methods

Relevant skills

  • Python
  • Mathematics/Physics

Mentors

  • @orionarcher
  • @hmacdope
Clone this wiki locally