The current issue is that the HTML I get back from the initial search is not the same as the HTML I get when I do the search in a browser and look at the resulting source code. That's definitely a sizeable issue. The URL is the same, but one's looking for more information and the other is an "Advanced Search" page that I can't even navigate to if I try in my browser.
I at least have my algorithm written out for how I intend to parse the HTML in order to find the common name of the organism (once I figure out how to get to the correct HTML).
Enter search term into ubio.org. Get resulting page
after text string "Scientific Match" search for "a href ='" string
save the next text till "'" as a link string
save a substring of the text after the next > and before <
compare that to the search term.
If it's a match
follow the saved link
on the resulting page:
Find the text "name = 'Common name'"
After that point, find "namebankID"
Save a substring of the text after the next > and before <
return the substring as the common name
Else
repeat by searching for next "a href = '" string
Stop when you've reached ".
Return either the search term or a null string as the common name.
No comments:
Post a Comment