Sunday, January 25, 2009

Munzner's TreeJuxtaposer

I completed reading the SIGGRAPH paper by Tamara Munzner about visualization and structural comparison of trees.  I'm focused solely on the visualization aspect, for the Tree of Life dataset. Comparison of trees or subtrees is not within the intended functionality of my program. The paper was dense and would certainly require re-reading if I were to try to implement any of her algorithms, but reading it once gave me a sense of the topics she addressed and the difficulties she encountered. 

Topics relevant to my project: 
- Scalability in Tree and Display Size
- Guaranteed Visibility of landmark nodes, regardless of user's navigation. While her goal in doing this is for the sake of comparison, it would be of additional value to my program, where students may want to focus on the relationships between specific organisms.  
- Occlusion of other nodes due to labels

Topics I don't need to bother with:
- Automatic Identification of Structural Differences between input trees
- Differences Characterization - exactly how two trees are structurally different

Topics I could read more about:
- Herman et.al. 2000 survey of tree visualization research
- Quadtrees (her datastructure of choice)
 H. Hauser: Scientific visualization: The visual extraction of knowledge from data, Springer, ch. Generalizing focus+context visualization, pp. 305–327, 2005. 
- Other tree visualization options that are different from the TreeJuxtaposer (referenced in Munzner's paper)

2 comments:

_ said...

Regarding whether to extend existing code or reimplement, one thing to keep in mind (that I've learned only through painful experience) is that existing codebases often contain many hidden pitfalls, making it very difficult to both estimate the amount of time it will take to actually complete the project and, potentially, limiting the time actually saved by using that codebase. It might say as much about me as about the code I was working on, but at many times I've chosen to reimplement something from the ground up rather than use an existing codebase; in at least some cases, I'm fairly certain that this was the right decision to make.

I think there are two particular things you can look at when trying to make this decision:

1) Does the code, in a modular way (i.e., with you only using a well-defined interface, as opposed to cutting various snippets out), do precisely what you need it to do? For example, if you're implementing "grep" and you have a library that can match strings against regular expressions, that's useful code. If, on the other hand, you're implementing "grep" and you have a library that can match strings against a different syntax of regular expression than the one you intend to implement, it may not be very useful; it fails the modular/preciseness test. Any time you have to dive into the internals of an implementation and make changes, you're putting much more at risk.

2) Take a quick sample of the code and try to get a feel for how well you can understand it. If it's readable and well-laid out, point #1 is less of a concern. Many times, however, you'll be disappointed.

(Of course, I work at Microsoft, where "Not Invented Here" means "not in this hallway," so you can take the above with a grain of salt.)

One other classic book on data visualization you may want to look at is Tufte's The Visual Display of Quantitative Information. It's about representing data on paper more than on the screen, but it still may apply. And for inspiration, the blog flowingdata.com is a fun place.

(Hi! I'm one of the alums Amy asked to check out the senior project blogs. Nice to meet you!)

Maddy said...

Hi Dan - thanks for the advice about code-reuse. I'll definitely check out those visualization sources as well.
~Maddy