Bioinformatics Search Requirements
This page contains some notes that attempt to distil some specific requirements for search over bioinformatic resources arising from life science research. It is hoped that this distillation, derived in part from working closely with the researchers concerned over a period of a year or so, will provide a complementary view of portal search requirements to that obtained by relatively short interviews with researchers . It also augments some other work  undertaken by the Sakai VRE demonstrator project.
The initial version of this note is based on an internal document produced by Dr David Shotton of the Zoology department at University of Oxford.
1. FlyData project background
The FlyData project aims to improve understanding of genetically regulated biological processes by studing gene expression in Drosophila (fruit fly) testes. There are a number of practical reasons for studying this seemingly obscure piece of anatomy, mainly to do with visibility of the results of genetic processes. data from this project will be published in the Drosophila Testis Gene Expression Database (FlyTED, formerly known as DTGED). A presentation about this project is available , which summarizes the various data flows involved (see slides 16-18 of that presentation).
Of particular note is that this project involves cross-referencing local experimental data and researcher annotations with a range of proprietary and public bioinformatics resources. Various kinds of search and lookup are pervasive throughout the experimental procedure, and this distillation attempts to highlight them.
2. A search walk-through
The process described below took about an hour to reproduce. The search would be hard to automate from the outset, but it seems likely that, given a suitable record, a repeat against newer versions of the databases might be conducted largely automatically.
Starting point is an article cited in PubMed (http://www.ncbi.nlm.nih.gov/). On the occasion of the walk-through, the actual article or initial interest was located using the PI's memory of the author's name. The article contained information that a homologue of a gene whose mutations cause inherited kidney disease in humans was required for male fertility in Drosophila, and a mutation of this gene caused the male fruit flies to be sterile. (Homologous genes from different species share substantial common nucleotide subsequences, from which it might be inferred that they have some common evolutionary ancestor.)
Search FlyBase (http://www.flybase.org/) for the name (amo) of the gene known to cause sterility in fruit flies. This yields a page entitled, "Synopsis of Gene Pkd2" of information about this gene, including a "closest relatives" link. An attempt to follow this link results in a blank (so in this case that feature is not much help).
Find translation (protein sequence) of the amo gene in FASTA format. Open a new window and go to "BLAST: protein query vs translated database" http://www.ncbi.nlm.nih.gov/BLAST/ and enter the protein sequence translation. This query may take a while, so we leave this to crunch.
Go back to FlyBase and click forward to "Full report". Researcher is curious about "larval feeding behaviour" and "smooth muscle contraction". However, the evidence for these annotations was notproperly displayed - a known fault in FlyBase at the time this exercise was conducted.
Returning to the BLAST report: High homologies are shown with many other species in right hand region of polycystic kidney disease gene sequence, including Macaca testis-specific expression of "pck disease-like 2" protein, alias "polycystic kidney disease 2-like 2 (PKD2L2)". This and other information on other species leads to some conclusions about there being a definite relationship between this gene and the disease-regulating gene mentioned originally.
In "FlyBase Full Report: Literature", look at abstract titles and availability of full text of meeting reports. No further information gleaned here.
It is known that, in humans, this gene PKD2 affects calcium import into flagella, and PKD2 is associated with PKD1. Does Drosophila have a polycystic disease 1 homolog, PKD1?
- Flybase search for PKD1 and then for PKD* - nothing interesting except update on PKD2.
Try searching PubMed for information about PKD1 im flies - abandoned.
- Search protein databases for polycystin (named used in literature).
- Cut and paste sequence of one human gene homologue results found into Blast and conduct focused search against fly databases. No significant hits. Conclude PKD11 has no homologues in Drosophila.
- Recordining start points of searches may be useful, so avoid dependence on memory of original investigator.
- Negative results seem to figure highly. Recording searches with no significant results might be valuable. Is it plausible to capture reason for no interest in results? (Tough!)
The FlyBase fault would be a good reason to repeat the search at a later time, when the fault has (presumable) been corrected. maybe, more generally, a flag to "try this again later" would be a useful datum to capture here?
- To what extent could such a search be templated? Would it be faster even if it still required interactivity to complete the search?
- How to reduce sensitivity to layout of search results presentation?
-- GrahamKlyne 2006-03-13 17:48:19