Sunday, March 22, 2009

"Technical Difficulties"

My computer has died... well it's not quite pronounced dead but it's in critical condition and beyond warranty. I'm in the process of rescuing my files off the hard-drive, and tomorrow I'll make some calls and see if someone can take a look at it.
But for the meantime, if it appears that I am not making due progress, this is why. I hope to have this resolved quite shortly, but since I have no idea what the problem is, all I can do is wait and see.

Friday, March 20, 2009

Common Name Search Begins

I am working on a portion of code which will take the scientific names of each organism in a given tree and search for the corresponding common name of each one. I am doing the search using the database at ubio.org. I can currently post a search term to the form and retrieve the resulting HTML page. I can parse that page to determine whether it is the page for a specific organism or whether the program has found multiple possible matches and is asking for more information. 

The current issue is that the HTML I get back from the initial search is not the same as the HTML I get when I do the search in a browser and look at the resulting source code. That's definitely a sizeable issue. The URL is the same, but one's looking for more information and the other is an "Advanced Search" page that I can't even navigate to if I try in my browser. 

I at least have my algorithm written out for how I intend to parse the HTML in order to find the common name of the organism (once I figure out how to get to the correct HTML).

Enter search term into ubio.org. Get resulting page

  

     after text string "Scientific Match" search for "a href ='" string

   save the next text till "'" as a link string

   save a substring of the text after the next > and before <

   compare that to the search term. 

   If it's a match

   follow the saved link 

on the resulting page: 

Find the text "name = 'Common name'"

  After that point, find "namebankID"

  Save a substring of the text after the next > and before <

   return the substring as the common name 

Else 

  repeat by searching for next "a href = '" string

  Stop when you've reached ". 

  Return either the search term or a null string as the common name.

  


Friday, March 13, 2009

Rough UI Diagram


In the formatting pannel there will also be Hide/Show images buttons.

Wednesday, March 4, 2009

Algorithm Details

Below is the section on Algorithm Details from my newly updated Design Document. Once the entire document is posted on the DMD Senior Projects website, I will provide a link to it. For now, this is the part I've been working the hardest on  lately, and it will outline my plan for the rest of the project, so I think it's worth posting about. 




I have chosen to build on the existing PhyloWidget program. This will allow me to focus on combining images with trees and incorporating common names of organisms, without having to recreate a significant amount of code. I will take advantage of PhyloWidgetʼs existing  interface, tree rendering process, Newick parser, and many other features. Modifications and additions will serve the purpose of adding content fit for students and adapting the interface to be better suited for their needs.  

 

The program begins by asking the user to input either a newick format tree, a taxonomic name, or a common name of an organism (or multiple organisms). The programʼs overall algorithm is as follows:  

 

1) Search www.ubio.org for the taxonomic name or common name, whichever was not given 

 

2) Search TreeBase by taxonomic name for relevant trees, prune them, and import them into PhyloWidget 

 

3) Search online images databases such as www.morphbank.com for images of the organisms  

 

4) Display those images alongside the leaf labels containing both the common and taxonomic names 

 

Inputs that do not yield a tree will return a dialogue box to the user asking for more or different information. Image searches that do not yield any images will either be symbolized by a standard replacement image or will be drawn without any images.  

 

3.1.1 Scientific and Common Name Search 

I will interact with the website www.ubio.org, which searches based on keywords and can return the common or taxonomic name of an organism along with other related information. If the user inputs a common name, that name will become 

the search term, and the desired result will be the taxonomic name. The reverse is true if the user inputs a taxonomic name or imports a tree which contains taxonomic names as the node labels. 

I hope to extend the functionality of this component to allow users to search for multiple organisms at once.  


3.1.2 TreeBase Search/Prune 

Using the taxonomic name from the ubio search, I will search the TreeBase database for phylogenetic trees containing the desired organism(s). I will  determine which of the resulting trees is the best one to display and load that into PhyloWidget. Often, the tree will have more nodes than is suitable to display in an educational program to avoid overwhelming the user. I will prune the tree before displaying it and only search for images and common names for organisms contained in the pruned tree.  


3.1.3 Image Search 

Using the taxonomic name, I will search a series of image databases including Morphbank to find images of each organism in the tree. In the case that many images are found, I will display the first and allow the user to view the additional  

images and change which one is displayed on the tree. Also at this time the common name search is repeated, using UBIO to find the common names for all of the organisms in the downloaded tree.  


3.1.4 Image Integration 

The image and common name will be displayed on the tree leaf nodes alongside the taxonomic name from TreeBase. The user will have the full range of control options available for manipulating the treeʼs display parameters that is already part of PhyloWidget. In addition, the user will be able to control selecting the displayed image for each node, if more than one image is available. Finally, the user will specify whether the taxonomic name or the common name is displayed more prominently. 

Tuesday, March 3, 2009

Alpha Preparation

I spoke with Val to outline the four specific tasks I’ll need to complete in order to allow the user to input an organism’s name and end up with a tree complete with images and both the common and scientific names for each organism. I divided up the major coding tasks and estimated how long it would take me to do each and which order it would make the most sense, and I used that information to revise my Gantt chart.

I’ve spent the rest of this week revising and updating my design document to reflect what I’ve accomplished and my goals for the rest of the semester.

I also contacted Anne Olsen at The National Biological Information Infrastructure (NBII) to ask about how to search the NBII image database, at the recommendation of Greg Riccardi from Morphbank. 

I'll be meeting with Joe on Thursday morning and Val on Thursday afternoon to review my progress and updated design document. 

Monday, March 2, 2009

Revised Gantt Chart

I am in the process of revising my design document to reflect what I've done up to this point and detail my goals for the rest of the semester. I've already done so for my Gantt chart, so here's a taste of what's going to be in the design doc. 


I wish there were a way to make the image bigger, since I realize it's hard to read the text. Sorry about that. If Joe could add a link to each of our Gantt charts on the Final Projects page for our class website, I could link you to that. What do you think, Joe? 

For now let it suffice to say that I divided up the major coding tasks and estimated how long it would take me to do each and which order it would make the most sense. Those are the tasks represented in blue. Also, the random break in the middle of April is Passover.