Skip to content

Latest commit

 

History

History
502 lines (343 loc) · 21.1 KB

tutorial.md

File metadata and controls

502 lines (343 loc) · 21.1 KB

RiboDraw Tutorial

Layout of the P4-P6 tertiary structure, showing noncanonical pairs
(C) Rhiju Das, Stanford University, 2018-2019

Please e-mail questions to [email protected]

Goal

This tutorial aims to show you how to use the RiboDraw code to make a nice layout of the P4-P6 domain of the Tetrahymena group I self-splicing intron, based on the classic crystal structure by Cate et al., 1GID. By the end of this tutorial, you'll know how to make this picture:

1gidA RiboDraw drawing

And along with it you'll have a Pymol Session with the 3D structure colored in an analogous way:

1gidA Pymol

If you just want to skip ahead and check out an example of the final result, there are example output files in example_output/ you can load in the drawing in MATLAB with:

load_drawing('example_output/1gidA_drawing.mat');

You can also try to follow the commands in P4P6_drawing_commands.m, but they are better explained below. If you get stuck at particular steps, you can also load in intermediate drawings with the load_drawing command and the drawing workspaces saved in example_output/workspaces/

Before you start the tutorial

First, check out the getting started page for RiboDraw & Rosetta installation instructions.

You might want to sketch out a drawing of your RNA 3D structure. For this tutorial, we're choosing to make the P4-P6 RNA, for which numerous representations are already floating around the literature (or passed around labs), though most leave out noncanonical pairs. Here's a picture of one 'legacy' diagram that was passed to me by Rick Russell, with some handwritten annotations: P4P6 legacy image

Note that the P5abc section (left-hand side of this legacy ) is flipped horizontally compared to the final drawing at the top of this document -- one of the nice things about RiboDraw is that such flipping is pretty straightforward to do at any stage, and this tutorial will illustrate that action.

Let's do this

Step 0. Initialize the drawing

You'll first need to download the 3D structure from the PDB, cut out the bits you want to look at, and run the rna_motif Rosetta application. See README.md for Rosetta instructions. [If you don't have a 3D structure and don't care about noncanonical pairs, etc., you can get away with just having you sequence in a .fasta file and your stems defined in a stems.txt file; for examples of such a 'minimal' starting point, see drawings/ribozymes/tC19Z/. If you only have sequence and secondary structure, check out initialize_files(sequence, secstruct, resnum, tag)]

I've already done the Rosetta rna_motif stuff in 1gid_RNAA.pdb. This is chain A from the PDB entry 1GID. I think I used make_rna_rosetta_ready.py and extract_chain.py in Rosetta's rna_tools, but you can also just prepare a similar file by hand.

Put this file into a new directory called rna_motif, and go into that directory:

mkdir rna_motif
cp 1gid_RNAA.pdb rna_motif
cd rna_motif

Now run on the UNIX command line:

rna_motif -s 1gid_RNAA.pdb -ignore_zero_occupancy false

You may need to use the full executable name (something like rna_motif.macosclangrelease) depending on how you've compiled Rosetta. The -ignore_zero_occupancy false flag tells Rosetta to go ahead and include a couple uridines that were uncertain in the experimental map and flagged as such by having their occupancies set to 0 in the PDB coordinate file.

The output should be a bunch of files:

1gid_RNAA.pdb.base_pairs.txt
1gid_RNAA.pdb.fasta
1gid_RNAA.pdb.ligands.txt
1gid_RNAA.pdb.motifs.txt
1gid_RNAA.pdb.other_contacts.txt
1gid_RNAA.pdb.stacks.txt
1gid_RNAA.pdb.stems.txt

These have the relevant information for RiboDraw. (Examples of these files are inside this tutorial directory example_output/rna_motif/[example_output/rna_motif/])

Last, you should probably define names of your helices. Open up 1gid_RNAA.pdb.stems.txt in your favorite text editor. It should look like this:

A:107-110 A:211-214
A:111-112 A:208-209
A:116-121 A:200-205
A:127-129 A:193-195
A:131-134 A:189-192
A:136-138 A:180-182
A:141-149 A:154-162
A:165-166 A:174-175
A:220-223 A:250-253
A:227-235 A:239-247

You'll want to then add in labels for the helix/stem names that are conventional for researchers studying the molecule. In the case of P4-P6, I used:

A:107-110 A:211-214 P4
A:111-112 A:208-209 P4*
A:116-121 A:200-205 P5
A:127-129 A:193-195 P5a
A:131-134 A:189-192 P5a*
A:136-138 A:180-182 P5a**
A:141-149 A:154-162 P5b
A:165-166 A:174-175 P5c
A:220-223 A:250-253 P6a
A:227-235 A:239-247 P6b

A note here: By convention the entire stretch of 107-112 and 208-214, which is interrupted by a bulge at A210 is usually called P4. Rosetta and RiboDraw go ahead and split these into two helices. We'll call the second helical segment P4* for now, and hide this label later. Ditto for P5a.

(If you forgot to do this step and/or need to change the names later, edit the file as above, and run update_helix_names('1gid_RNAA.pdb.stems.txt').)

Now, go into MATLAB and get into this tutorial directory. Type:

initialize_drawing( 'rna_motif/1gid_RNAA.pdb' );

Your MATLAB figure window should look like this:

Step 0 initialize_drawing

Its a little crazy but all the information needed for the drawing is in there -- now we just need to lay it out.

Step 1. Roughly lay out the helix stems

First step in layout: let's move/rotate the helix stems roughly where you want them.

Helices are the fundamental objects in RiboDraw. The helces must be aligned either vertically or horizontally – a convention in the literature for depicting tertiary-folded RNAs in 2D that also nicely matches the fact that most helices in such an RNA really hve parallel or perpendicular axes. Let's hide the noncanonical pairs for now, which are distractions:

hide_noncanonical_pairs

Then make sure to show the control features that will let us manipulate helices (and any non-helical residues that are assoiated with them):

show_helix_controls

Note that these control features will not get output in the final image. (And you can hide them at any time with hide_helix_controls)

You'll see light blue rectangle outlines around each helix. You can drag the rectangle outlines the boxes to move the helices. Try it! You'll see that non-helical residues are associated with nearby helices and move with them; you can drag them separately.

To rotate a helix, you can click on the little rounded square in the middle of each rectangle. (One such square control is marked with a red arrow labeled "rotate" on the image below.) Try it!

Moving and zooming around in MATLAB can be a pain; the magnifying glass and hand tools in the menubar are available, but sometimes you'll get stuck. If you ever want to return to a view that encapsulates the whole drawing, type this:

set_artboards

Try to make your MATLAB figure window look like this:

Step 1 move stems

Step 2. Flipping helices

You can also now get the helices flipped so that there are fewer crossing between the arrow lines that connect residues.

To flip a helix horizontally or vertically, there are four tiny dashes cutting through the outer edges of the rectangles (see also the red arrow marked "flip" in the image above).

Try to make the helices look like this Step 2 some flips

Now might also be a good time to save our work. All the information in the drawing can be output with

save_drawing( 'drawing.mat' );

And can be read back in later with

load_drawing( 'drawing.mat' );

Step 3. Refine arrow linkers and noncanonical pairs

Now let's refine things a bit.

We'll move some of the arrows so that instead of shooting off at weird angles they have joints that primarily go horizontal and vertical.

Let's turn on control features that will let us adjust linkers:

show_linker_controls

Little circles show up at the ends of each arrow. We can add joints to linkers by pulling on little circles at residues. We can also remove unwanted joints by moving little circles back on to residues. Try it out.

This is also a good time to display noncanonical pairs, which also involve lines that we can add joints to and refine:

show_noncanonical_pairs

As we do this refinement of the 2D drawing, we'd like to preserve some elements of the 3D arrangement of the nucleotides. How to do this is subjective, but it helps to have a reference 3D structure open in Pymol side-by-side. Let's also color nucleotides in 'rainbow' coloring in both Pymol and RiboDraw (MATLAB) to make it easy to see correspondences.

Start up Pymol, load in the secondary structure, and setup up 5'-to-3' rainbow coloring:

load 1gid_RNAA.pdb
show cartoon
rr()   # this is if you have RiboVis loaded. Otherwise color->spectrum->rainbow

Try to zoom in on the P6a-P6b region (where the red and blue strands intertwine). Should look like this:

step03 pymol

Back in RiboDraw (MATLAB), you can get matching rainbow 5'-to-3' coloring with the command

color_drawing rainbow

Note that you'll be able to restore coloring to black -- or any other coloring scheme -- later on.

Let's zoom into the P6a-P6b region (use the magnifying glass tool in the menubar of the Matlab figure window).

Now try to move the nucleotides, linkers, and helices around P6a and P6b (which bracket the so-called tetraloop receptor) to look like this:

step03 linker refinement

Tip: As we move helices, associated residues move with them. If you don't like how a non-helical residue is assigned to a "parent" helix, you can actually drag that residue into another helix, and that will change its assignment.

Note: Noncanonical pairs for last two residues (U259, C260) look misassigned by rna_motif because of the heuristic that Rosetta uses to classify pairs; could be fixed in the future, but generally these issues only occur in parts of the structure with poorly resolved pairs. **

Step 4. Slicing out sections to focus.

OK, in principle, now you're good to go -- just keep refining. But there are some useful extra tricks worth knowing.

For RNAs that contain subdomains, it can often be easier to refine the RiboDraw layouts if we just focus on those subdomains. Its also often nice to have those 'subdrawings' on hand. Last, for large RNAs (like the ribosome), RiboDraw can get slow due to MATLAB's poor handling of numerous graphical elements. Then it actually becomes critical to slice out subdomains to maintain real-time interactivity as we lay out the drawing.

In this step, you'll learn how to slice and merge a subdomain of P4-P6 called P5abc.

Step 4A. Slice out P5abc

Let's zoom back out (double click with magnifying glass icon, or type set_artboards).

Then, let's slice out the P5abc subdomain

 slice_drawing( 'A:127-195' )

It will show up in a new Figure 2 window that we can edit, just like the original drawing (which stays in the original Figure 1 window).

You might get a warning like: Warning! slice_res only catches part of helix Helix_A127... but that's just to tell you that a few stray residues are also going to be transferred to this sliced drawing (which will simplify merging later on).

Since this subdrawing all looks kind of the same color, recolor by rainbow so that the residues have really contrasting colors, and zoom in:

color_drawing rainbow
set_artboards

You can also get rid of those distracting extra helix labels:

hide_helix_label( 'P5a*' );
hide_helix_label( 'P5a**' );

Your subdrawing RiboDraw window should look like this:

Ribodraw P5abc sliceout

Over in PyMol, you can also recolor the molecule to allow side-by-side comparison of P5abc in 3D with your 2D drawing in RiboDraw. In PyMol:

color white
rrs( 'chain A and resi 127-195')
hide ev, not (chain A and resi 127-195)

Pymol P5abc

Step 4B. Refine P5abc layout

Now again edit the linkers, helices, and nucleotides until they look good to you – mostly horizontal and vertical lines, and with nucleotides similar to their arrangement in 3D:

Pymol P5abc refine

If you want you can save this subdrawing:

save_drawing( 'subdrawing_P5abc.mat' );

Step 4C. Re-install the subdomain into the global drawing

You can now move your nice P5abc drawing back into the full drawing of the P4-P6 RNA:

merge_drawing();
set_artboards

The information on what the parent drawing for the subdrawing was (Figure 1) was stored when it was sliced out. But if you want to be careful you can also specify which parent drawing to merge into, by specifying its figure number (1, in this case): merge_drawing(1);

Pymol P5abc refine

Step 5. Flipping a multi-helix domain

In old drawings, P5abc is often drawn so that P5c juts out to the left; but given the known 3D structure of P4-P6, it probably should be drawn pointing to the right, towards the other helices in the molecules rather than away.

RiboDraw provides a nice way to select domains and flip them. Here's how to do it with P5abc. First, let's select P5abc as a domain object

setup_domain(  'A:127-195','P5abc' )
list_domains
hide_domain_label( 'P5abc' )

Then, we can get rotation/dragging/flipping controls for this domain -- totally analogous to the controls for helices -- with:

show_domain_controls

before P5abc flip

Now click on the flip (the thin red dashed line going through the red rectangle):

after P5abc flip

Step 6. Refine!

Let's refine the rest of the drawing. Restore rainbow coloring in RiboDraw:

color_drawing rainbow

and in Pymol:

rrs()

Remove some extraneous labels and controls:

hide_helix_label( 'P4*')
hide_domain_controls

And strive for a layout that looks like this:

Step 7. Color as you like it

Typically for these 2D layouts it's best to not have rainbow coloring in the final image – too many colors. It's more conventional to have the structures colored by helices or stacks of helices. Here's how you can do that.

Get rid of some of the control features so that it's easier to see the final result.

hide_linker_controls
hide_helix_controls

Then it's a matter of setting up domains, and choosing colors. Conveniently, all the colors available in Pymol ("salmon", "teal", "marine", etc.) can be used in RiboDraw, as well as RGB values, as described in the color_drawing documentation.

Start by making the whole thing gray:

color_drawing gray

Expand the definition of P5abc to include all the nucleotides that are on the left hand side of the drawing, and let's color them marine:

setup_domain(  'A:122-198','P5abc' ); color_drawing marine P5abc 

Similar for the other domains – let's color them purple and forest:

setup_domain(  'A:107-121 A:199-214','P4P5' );
color_drawing purple P4P5 
setup_domain(  'A:220-253','P6' ); 
color_drawing forest P6 

You probably don't need the domain labels:

hide_domain_label( 'P5abc' );
hide_domain_label( 'P4P5' );
hide_domain_label( 'P6' );

Step 8. Final touches

As a matter of due diligence, e.g., if this will go into a paper, its helpful now to go through and inspect the base pairs. As it turns out different base pair annotation packages make different choices about what to count as a base pair.

Its helpful to therefore get a cross check for the base pairs that come from Rosetta/rna_motif and base pairs assigned by a totally independent package, here DSSR.

Put your PDB into Web-DSSR. Download the .csv file from the first block of output (should be called tablePairs.csv).

Note: You can also run DSSR locallyf from command-line and that can be important if you need segid information -- see docs for check_drawing_against_dssr.

Now run:

check_drawing_against_dssr( 'tablePairs.csv' );

Output should look like:

...
A:A225 A:A226 cSH     Linker_A225_A226_noncanonical_pair
A:A226 A:U249 cWW     Linker_A226_A249_noncanonical_pair


Following drawing pairs are found by DSSR but with different Leontis-Westhof edge1/edge2/orientation
A:A153 A:G250 tWS     Linker_A153_A250_noncanonical_pair
A:U167 A:A173 tWW     Linker_A167_A173_noncanonical_pair
A:A184 A:G212 tWS     Linker_A184_A212_noncanonical_pair


Following drawing pairs are *not* found by DSSR
A:U107 A:C260 cHW     Linker_A107_A260_noncanonical_pair
A:C124 A:C197 tHW     Linker_A124_A197_noncanonical_pair
A:A152 A:U224 cWS     Linker_A152_A224_noncanonical_pair
A:A153 A:G251 tSS     Linker_A153_A251_noncanonical_pair
A:G215 A:U259 cWW     Linker_A215_A259_noncanonical_pair
A:A235 A:U236 tSH     Linker_A235_A236_noncanonical_pair

...

Number of matched pairs: 68
Number of weak matched pairs: 3
Number of drawing pairs that do not match DSSR: 6
Number of DSSR pairs that do not match drawing: 10

Most of the base pairs (68 out of 77) match between Rosetta/Ribodraw and DSSR. An additional 3 Rosetta pairs are weak matches to DSSR -- the same residues are involved but the packages make a different choice of Watson-Crick edge (this issue arises often with pairs with single bifurcated hydrogen bonds where its not obvious which 'edge' is involved in the pair). There are 6 that are in Rosetta/Ribodraw, but not in DSSR.

Look at them one by one (coloring them black in pymol) with commands like

color black, resi 124+197
zoom resi 124+197

checkDSSR_124_197 This pair A:C124 A:C197 tHW looks like potentially a legit C-C pair, and DSSR misses it due to a different choice in how far apart the bases. But now look at A:G215 A:U259 cWW:

color black, resi 215+259
zoom resi 215+259

checkDSSR_215_259 This looks like an artifact, so we should delete it.

After inspection here I decided to remove three of the pairs, which I carried out through:

delete_linker( 'Linker_A235_A236_noncanonical_pair' )
delete_linker( 'Linker_A107_A260_noncanonical_pair' )
delete_linker( 'Linker_A215_A259_noncanonical_pair' )

Now let's save the drawing:

save_drawing( '1gidA_drawing.mat' );
export_drawing( '1gidA_drawing.png' );

Tip: You can also export to PDF, PS and (maybe) SVG by changing the extension. Check out documentation for export_drawing.

Here's my final .png result, after a few more tweaks:

1gidA RiboDraw drawing

Let's also prepare our 3D view of the molecule with the same coloring. In Pymol:

color gray
select P5abc, resi 122-198
select P4P5, resi 107-121 or resi 199-214
select P6, resi 220-253
color marine, P5abc
color purple, P4P5
color forest, P6
hide spheres

Save the Pymol session for future use, and also make an image in 3D

save 1gid_RNAA.pse
ray 600,800
save 1gidA_pymol.png

1gidA Pymol

Step 9. Other color schemes

What if you have experimental data you want to visualize on your secondary structure?

For example, papers that discover new conserved ncRNA elements follow a certain visual language to convey sequence conservation: nucleotides may be red or gray to indicate sequence frequency, or replaced wholesale by colored circles if deletions are possible.

Structural studies that use chemical reagents to support or invalidate a hypothesized secondary structure may wish to visualize relative reactivity to chemical reagents. RiboDraw makes both easy.

Initialize a drawing from sequence and secondary structure, such as the COVID19 frameshift element. Arrange the helices as you prefer. You can bold all 100% conserved residues with:

set_boldface(1, get_res_tags( 'A:13459-13487 A:13489-13509 A:13511-13516 A:13518-13520 A:13522-13532 A:13534-13535 A:13537-13547' ) )

or set their font color via set_fontcolor, which is an alternative to color_drawing that takes a set of residue tags directly, rather than a predefined domain.

FSE

Coloring by reactivity is straightforward as well, but you use a command like:

color_rings_by_float_data('A:13459-13547', SHAPE_normalized);

where SHAPE_normalized is a 1D array of normalized reactivity data. The result will be colored from white to yellow to red.

stem5