Skip to content

Latest commit

 

History

History
 
 

data_visualization

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Notes about data visualization options

There are a number of situations where visualization can play a role in the Archivers 2.0 effort.

One pressing need is conveying how much of a given web resource has been archived. The Archivers 2.0 Alpha currently system uses a dynamic hierarchical graph, a sample screenshot of which is shown here:

This conveys exact information about the URLs that have been archived, but it is suboptimal in both the information it conveys and its usability. For example, a common question is “how much of ABC has been archived?” but the current graph visualization approach makes this difficult to answer. The ideas below are some options that have been discussed for new different visualization and data exploration interface to replace the one shown above. Preliminary research was done by @mhucka on behalf of Qri.io, and subsequent discussions over the Archivers' Slack led to additional ideas.

Another visualization need is to show relationships between entities of various kinds (e.g., URLs, documents, whatever). This raises somewhat different requirements. Some ideas for this need are discussed further below as well.

“Sunburst” chart

Also known as Sunburst diagrams, these are a common type of visualization that uses concentric rings to show hierarchy, and radial segments to show groups or categories. Each ring corresponds to a level in the hierarchy, and the central circle represents the root. A natural mapping for the Archivers' coverage map would be to make a single sunburst diagram correspond to a single domain (e.g., epa.gov), use concentric rings for URL hierarchies, and use segments for subdomains within the domain. (E.g., if the root is epa.gov, a subdomain might be water.epa.gov.)

Some variations on the sunburst idea exist. Vesper provides a kind of dynamic sunburst diagram for drilling down into data. Clicking on a piece of the diagram reveals another diagram, with intuitive animations to show the transition and filling in of new circles. Clicking on the middle node goes back up a level to the previous sunburst diagram. Online demos exist, and the code is open-source in GitHub:

Sunburst diagrams are also known as bilevel partitions and radial treemaps. An implementation of almost exactly what Vesper does is available for D3.js:

“Sequence sunburst” diagrams

Though hierarchy is captured in a sunburst diagram by the concentric circles, path relationships can be difficult to grasp. @blackglade discovered an interesting variation by Kerry Rodden: the sequence sunburst diagram. This produces a dynamic path list in the upper left corner of the screen when the user mouses over areas of the sunburst diagram, thus making the hierarchical path explicit.

Online demos exist and the code is open-source in GitHub

“Icicle trees”

Icicle trees, also known as icicle charts and partition layouts, are very similar to a rectangular version of sunburst diagrams. Instead of the radial arrangement of parts, icicle trees arrangement rectangular elements linearly. In the classical icicle tree, the orientation is vertical: the highest element is the root, and each successive layer downward represents another level in a hierarchy. However, they can be oriented horizontally too. An important feature of icicle trees is that the area of each rectangle in the diagram can be made proportional to some variable.

Icicle trees have a long history. A 1981 paper by Kleiner and Hartigan described a "trees and castles" type of diagram that may have inspired icicle trees. An early implementation was provided in a software package called InfoVis, back over a decade ago. InfoVis is implemented in Java.

Although icicle trees don't appear to be as popular as sunburst diagrams, there now exist a number of implementations of this type of visualization. There is an implementation of a zoomable icicle diagram in D3.js by Mike Bostock:

Another idea is to orient icicle trees horizontally instead of vertically. Not only does this make it easier and more natural to add text to the diagram; it should also make implementations simpler. (If the widths of the rectangles are always the same, then the problem of scaling the size of different blocks reduces to the problem of changing only the heights of the blocks, because the width is constant for all blocks at each level of the hierarchy, i.e., along the horizontal axis.)

It turns out that horizontally-oriented icicle trees are exactly what D3 calls a partition layout. The examples given in the D3 API reference do not use color very much, but presumably one could enhance them further by a dash of color.

For Archivers 2.0, it remains to decide what information to map to the different aspects of the diagram. One possibility is the following:

  1. put domains/subdomains on the horizontal axis (e.g., epa.gov would be the left-most column, then subdomains would be the second column, and so on moving to the right)
  2. map the area of block to the amount of coverage

BioFabric

BioFabric is a variation on connect node diagrams. It avoids the hairball problem by making nodes into horizontal lines instead of points.

BioFabric is implemented in Java. The theory behind the visualization approach has been published in a paper in 2012.

Miscellaneous resources

Random note that might be interesting to people: here are 3 catalogs of data visualization types – not software per se, but rather, visualization approaches & display types:

Finally, it's worth mentioning that a 2001 survey paper found that the related visualization approach of tree maps are the worst among the tree-like visualizations, and that icicle trees were as good or better than any other. So this is some support for the idea that icicle trees are not a bad type of visualization.

Barlow, T., & Neville, P. (2001). A comparison of 2-D visualizations of hierarchies. In IEEE symposium on information visualization (INFOVIS 2001) (pp. 131-138).