Tuesday, February 24, 2009

Learning to SVN

Greg, author of PhyloWidget,  is currently updating some of the old code. I wanted to find a way to work on the newest version without having to manually update each file myself based on his changes. The code is posted on GoogleCode, and in order to update my own local version Greg said I needed to use an SVN (Subversion) client. I had no idea what that meant, but I found out that there's an open-source plug-in for Eclipse called Subclipse that does exactly what I need, all within Eclipse. It took more tries than it should have to get it installed and working correctly, but I'm finally able to download the latest code from Google and run the application. Relatively small task, yes, but it took a lot of investigating and troubleshooting, and it's at least a significant accomplishment that I figured most of it out on my own.

For my own reference: 
is the official Subclipse site. 

is the site that was most helpful in describing how Subversion and Subclipse work.

is the GoogleCode website where the most recent code is available.

Monday, February 23, 2009

URL Interactions

Today's task was to create a program where I could test connecting to and interacting with a website. The site I am using is called Morphbank, which is a searchable database containing images of organisms for use by scientists.  

To get myself up to speed about Java and interacting with the internet, I read the networking chapter of "Core Java 2, Volume 2" Using their examples, I was able to create a program to connect with the Morphbank website and read in the HTML of the webpage I wanted. Next, since I'll actually want to input search data, I read through the HTML to find the necessary parameters and wrote a test function to post search data into the online form and return the result page. I filtered through the result HTML to find the lines containing image references and printed those to the screen. The next step will be harvesting those images and including them within the PhyloWidget program. I'll have to learn a little more before I understand exactly how to do that, but coming from absolutely no web-interaction experience and very little knowledge of HTML, I'm pretty proud of what I got done today. 

In other news, I'm in the middle of trying to figure out how to download the latest PhyloWidget code using a SVN client, but so far it hasn't worked. Even when it does, the code is read-only in that format, which won't be so useful to me since I'm hoping to modify it. I need to figure out the best way to update my code and then go ahead and do that. 

In general, I'm glad to say that I'm pretty well on target for next week's Alpha Review. I've made the decision to build on the existing PhyloWidget code in Java using the Processing UI libraries. I've made a sizable dent in understanding how all the code fits together and where I will be interacting with it most, like where to interrupt to cause label name changes, for example. I've refreshed my memory about Java enough to start writing successful programs, and I've learned how to use Eclipse. I've got some details to work out about my ultimate design goals and algorithms before submitting the revised design document, but I think I'm reasonably well on my way to figuring those things out, too.

Thursday, February 19, 2009

Label Name Interception

This week I've gotten my hands dirty sifting through lots of code to find both the parser and renderer portions of PhyloWidget. 

The parser reads in the newick strings, and it is found in org.phylowidget.tree.TreeIO.java


public static RootedTree parseNewickString(RootedTree tree, String s)


parses through all the notation, handles determining the different levels of the tree and the parent-child relationships. 

When it's created a string of text it considers the label, it calls:


PhyloNode curNode = newNode(tree, curLabel, nhx, poorMans);



static PhyloNode newNode(RootedTree t, String s, boolean useNhx, boolean poorMan)


...deals with NHX annotation stuff, then...


s = parseNexusLabel(s);


(removes single quotes and replaces underscores with spaces)


t.addVertex(v);

t.setLabel(v, s);


(self explanatory)


So that was pretty exciting, because it allowed me to understand how labels are created and stored in tree nodes, which will be useful since I plan to be able to intercept those labels before render time and change them. In practice, the idea is to display common names in place of scientific names for the organisms in the tree. 


Then I went on to find the renderer. The class LabelRender is found inside NodeRenderer.java. In the method render(), we have


canvas.text(tree.getLabel(n), offX - curTextSize / 3 - s, offY - s - curTextSize / 3);



n is a PhyloNode and tree is a RootedTree so getLabel(vertex) is in RootedTree.java

that calls

vertex.getLabel() which is in PhyloNode.java (extends CachedVertex which extends DefaultVertex)


It is at this point that I intercept the label and change the return value so that the rendered label name is different than the stored one. I created a wrapper class for HashMap called NameLookup, which for testing purposes just stores mappings from each capital letter of the alphabet to the corresponding number, from 1 to 26. 


I created a NameLookup object called map in 

PhyloNode.java and updated the getLabel() method to look up the value in the map and return the resulting string (in this case a number). Any label that is mapped to null in the NameLookup map is changed to the string "Hi Val". The replacement label string gets passed along all the way back to the render. 


For the tree:((A,B),(C,D),(E,F,G,H,'*'),I,J); we end up with the following image

Another way I can do this is store the common names as NHX annotations on the tree, so that I don't have to constantly keep looking up the same names for each update render. I don't know much about NHX, but I'll figure it out if we decide that's the best way. Alternatively, I can just make a "common label" variable in PhyloNode objects and only do the lookup once, when the node is constructed. 

In other news, Val is working on getting me access to image websites so that I can integrate images of the organisms in to the display options. Gotta learn how to interact with web-based applications from within Java, but I took at look at some of the URL and HTML files in the API and it doesn't look too painful. 

Thursday, February 12, 2009

Font Change!

One of this week's goals has been to successfully change some visual attribute of the PhyloWidget application. The purpose of this was the demonstrate whether the code was navigable enough to facilitate making changes, starting small of course. I found the file called FontLoader and was able to change the font from Vera (right) to Georgia (left) and Arial Black (bottom)! 

Sadly, changing to Wingdings or to a Hebrew didn't work, but I think that has more to do with the fonts themselves than anything in the program. It seems to only work with English fonts, which is fine. I was just experimenting. 


Wednesday, February 11, 2009

New York Times

My friend forwarded me a link to the New York Times article discussing exactly what I'm working on - Tree of Life visualization (and the inherent difficulties involved). Check it out! 

Monday, February 9, 2009

Eclipse/Standalone Update

Source code is up and running through Eclipse. Yay! Many thanks to Greg Jordan for his help. 

As for the standalone version, I still not sure why Terminal says it "cannot execute binary file," especially since I set the file permissions to 755 so that anyone can execute the .sh and .bat files. Looking into that. 

Eclipse/Standalone: One Step Closer

Standalone version: 
I changed the permissions on the .jar files in the lib directory, and the result is a new set of errors: 

lib/freeloader.jar: line 1: PK: command not found
lib/freeloader.jar: line 2: ?h79: command not found
lib/freeloader.jar: line 3: syntax error near unexpected token `)'
K-*??ϳR0?3??r?Cq,HL?HU?%?A??E??%?)?N? ??z?ƺI?'MANIFEST.MF?M??LK-.?
./Phylowidget Full.sh: line 1: lib/itext.jar: cannot execute binary file
./Phylowidget Full.sh: line 1: lib/pdf.jar: cannot execute binary file
lib/phylowidget.jar: line 1: PK: command not found
lib/phylowidget.jar: line 2: ??h9: command not found
lib/phylowidget.jar: line 3: syntax error near unexpected token `)'
?I??'ϳR0?3??r?Cq,HL?HU?%?A??E??%?)?N?@?@TA-INF/MANIFEST.MF?M??LK-.?

Can we call that progress? Maybe. Anyone know what those might mean? 


Source Code: 
I successfully got Eclipse to recognize the source code in the folder I downloaded, and after downloading Processing, importing it as a library, and adding it to the build path, I now only have 2 errors. They have to do the following import statement: 

import com.lowagie.text.Document;

Anyone know what that might be? There's no "com" directory in my source code folder, so I'm not sure where I'm supposed to find that library. I e-mailed Greg Jordan, the author of the code, to see if he could point me in the right direction.

Closer, for sure, but not there yet. Once I get it running, my task is to make at least one visible change to the code, to prove that it's easy enough to work with. Val suggested I try to change the font, size, or color of the leaf node labels on the tree. If I can figure out how to do that with relatively little frustration, then the code is easy enough to navigate and worth building upon. We shall see about all that once I get the thing running.

Thursday, February 5, 2009

Fun and Useful Resources

Val helped me get in touch with Greg Jordan, author of the original PhyloWidget program. He recommended the two following site for both information and inspiration: 

Rebecca Shapley's work on "Teaching with a Visual Tree of Life"
http://groups.ischool.berkeley.edu/TOL/

and the UK Wellcome Trust's "Wellcome Tree of Life"
http://www.wellcometreeoflife.org/interactive/


The second link looked really neat at first, and the graphics in the video are great, but it was tough to interact with and was pretty shallow in terms of the amount of information that was available. I liked the idea of color-coding the images based on whether outside data was available, though. 

The study done at Berkley may prove incredibly useful. The final report from their study was ~112 pages long, but I spent a while going through the powerpoint (with notes) from their presentation as well as reading briefly through the different sections of the paper to see what would come in handy later on. They have a whole section of recommendations based on interviews with teachers about what functionality they would like in a Tree of Life program. Just for a taste of some of their results, here are features that over 80% of respondents considered important or very important: 

  • Zooming in to any part of the tree 
  • Seeing areas of controversy 
  • Viewing the relationship of divergence events to geological time 
  • Seeing the distribution of important character states 
  • Bookmarking particular branches on the tree 
  • Accessing geographic distributions of groups of organisms 
  • Viewing the distribution of biological patterns across the tree  

(Green and Shapley, p. 47)

So as you can see it's going to be a very useful document. I looked for Rebecca Shapley's contact information so that I might be able to get in touch with her directly (she now works for Google), and although her e-mail address wasn't displayed on her website, I actually found her on Facebook and sent her a message! It'd be quite nice to have her direct input, since she conducted this massive amount of research that would be incredibly useful to making my project successful. 

Wednesday, February 4, 2009

Eclipse

I've spent a lot of time going through the Eclipse Java development "Basic Tutorial" - using their test code and following the tutorial instructions to see all of the things I can do with it. It seems like a great development environment - it'll just take a little while for me to get used to all of the menus and icons so that I can find what I need. There are so many great features to take advantage of while I'm coding, that I hope I can remember at least half of them when they would come in handy. 

With all of my new knowledge, I now need to figure out how to create a project to house all of the PhyloWidget source code, and then run it and see what happens. Turns out that the "Project Configuration Tutorial" is next in the help docs I've been reading, so that should be a good start. 

I still can't figure out how to run the standalone version, since I don't know what to do with the .sh or .bat files to make them, well, do something. 

I've got a lot of reading to do about info visualization, too. Seems like people have been recommending good resources, so thanks for that. I just need to sit down and read them now. 

Sunday, February 1, 2009

I feel like an idiot of a computer science major...

I'm looking at the Eclipse website to figure out what to download, since I'll need to use Eclipse to write the code for this project. The options include: 

Eclipse Classic - The classic Eclipse download: the Eclipse Platform, Java Development Tools, and Plug-in Development Environment, including source and both user and programmer documentation.

Or

Eclipse IDE for Java Developers - The essential tools for any Java developer, including a Java IDE, a CVS client, XML Editor and Mylyn.

Or both? I think it's the first one, but who knows?

Also, I downloaded the standalone version of PhyloWidget, so that I could play with it and get used to the functionality of the program. Right now I feel like a bit of an idiot because I don't know what to do with it, like, how to just run the program. I couldn't see any instructions on the site. All I have is a folder of .jar files and two other files, one of which doesn't open, and the other looks like it has a command-line command, which I couldn't get to run because it said "Permission denied." 

Not such a good start eh?