Adventures of Maddy: Common Name Search Begins

I am working on a portion of code which will take the scientific names of each organism in a given tree and search for the corresponding common name of each one. I am doing the search using the database at ubio.org. I can currently post a search term to the form and retrieve the resulting HTML page. I can parse that page to determine whether it is the page for a specific organism or whether the program has found multiple possible matches and is asking for more information.

The current issue is that the HTML I get back from the initial search is not the same as the HTML I get when I do the search in a browser and look at the resulting source code. That's definitely a sizeable issue. The URL is the same, but one's looking for more information and the other is an "Advanced Search" page that I can't even navigate to if I try in my browser.

I at least have my algorithm written out for how I intend to parse the HTML in order to find the common name of the organism (once I figure out how to get to the correct HTML).

Enter search term into ubio.org. Get resulting page

after text string "Scientific Match" search for "a href ='" string

save the next text till "'" as a link string

save a substring of the text after the next > and before <

compare that to the search term.

If it's a match

follow the saved link

on the resulting page:

Find the text "name = 'Common name'"

After that point, find "namebankID"

Save a substring of the text after the next > and before <

return the substring as the common name

Else

repeat by searching for next "a href = '" string

Stop when you've reached ".

Return either the search term or a null string as the common name.

Adventures of Maddy

Friday, March 20, 2009

Common Name Search Begins

No comments:

Welcome!

Blog Archive

Followers