Skip to content

story_backlog

Rod Docking edited this page Oct 20, 2017 · 1 revision

Story Backlog

This document contains a list of 'user stories', that we can discuss as the project progresses. The intent is to work through and refine this list, then convert these into GitHub issues that team members can take on.

Data Generation and Import

  • As a user, I want to convert output from all available tools to bedpe format, to facilitate simpler comparisons between tools
    • These can be handled individually, and don't all need to be implemented in a single way
    • Extra data fields from specific tools should be maintained, per the bedpe spec
    • As of 2017-10-20T11:02, issues have been created for the first 9 tools
    • Our goal is to be rigid about the standard columns (1-10), but then flexible about the remaining columns - just add everything else that’s in the original input as a trailing column
  • As a user, I want to generate data from fusion caller tools for the small example data sets, so that I can compare more tools
    • Dataset 1, Dataset 2: Use the existing fastq/BAM files to generate results for additional fusion callers of interest
    • This will require installing some of the tools, which may be finicky
    • As of 2017-10-20T11:02, there are no issues for this - we may revisit this in the future

NOTES:

  • These issues might be good for remote/solo people to take on
  • Issues here can be scaled up to include more tools depending on progress / the number of people involved in the project.

Annotate

  • As a user, I want to annotate imported fusions against available curation data sources
    • As a first step, CIViC has an API that can be used to retrieve annotations for variants.
    • Beyond this, other potential data sources are described in Components and Similar Projects
    • As of 2017-10-20T11:03, we have stubbed out issues for the first three potential data sources
  • As a user, I want to annotate imported fusions with other internal data sources
    • By this, I mean data sources that are available from existing BAM/fastq/expression data sources that aren't already part of the fusion record
    • As of 2017-10-20T11:03, we haven't submitted any issues for this

Aggregate

  • As a user, I would like to view reports on the range and type of event that is reported for a given sample/tool combination
    • This would include: number of fusions, deletions, insertions (as appropriate for a given tool)
    • Plots showing the distribution of evidence (as defined by particular tools)
    • Plots showing the distribution of inter- and intra-chromosomal events
    • Additional plots depending on exploratory analysis
    • As of 2017-10-20T11:04, we've decided to keep this part of the project, but haven't submitted any issues
  • As a user, I want to aggregate and compare fusion calls from different tools for the same sample
    • The main challenges here are that: tools may often output several similar calls, coordinates may not match exactly, reporting of gene names and transcripts may differ between tools, etc.
    • We may need to implement some rough heuristics for determining whether two calls 'match' or not.
    • As of 2017-10-20T11:05, we've discussed this but haven't submitted any issues
  • As a user, I want to aggregate and compare fusion calls from different replicates of the same sample

NOTES:

  • This kind of analysis could be implemented either as Python scripts, or within R (or some mix of the two).
  • These issues are not necessarily blocked by the 'Data Generation and Import' issues - some of the test data files are already suitable for downstream analysis.

Filter and Review

  • As a user, I want to be able to filter and review fusion evidence from a single tool, based on the annotation information added above
  • As a user, I want to be able to filter and review fusion evidence from combined sets of fusion calls from different callers
  • As of 2017-10-20T11:06, we're still too early to figure out how this will be implemented

Visualize

  • As a user, I want to to be able to visualize the structure and evidence supporting particular gene fusions
    • Note that this seems to be the part of the process that is most well handled by other tools.
  • As of 2017-10-20T11:06, we're still too early to figure out how this will be implemented