Data modelling for search in the VRE
In the requirements documents (listed in SakaiVre/UserRequirements) two things seem clear. That users want higher level search (coordination and location, rather than document finding) and that many of the activities impact on the privacy of others. We have been modelling some of the relevant aspects of the data accessible to the VLE with this in mind.
Diagram 1 shows a number of silos or data sources (across) and a number of levels of abstraction (up). Red indicates material with privacy / confidentiality issues, green material which is only likely to work if public. The data range from traditional sources (the Internet, physical books and electronic documents) to project data and documents, and personal and social information. Each of these information sources has traditional metadata about it (file size, author, licence, etc), it also has personal preference metadata about it (personal bookmarks, annotations, etc).
Bridging all the sources, is linking metadata. This metadata is typically third party metadata which tags, clusters, links, groups, reviews or annotates data and/or documents. This is the kind of metadata that people really need to be able to query. High quality linking metadata is relatively expensive to create and maintain, so it generally pooled.
Concrete example: in http://www.flickr.com/, the photos are the data. The EXIF information (which is automatically recorded by the camera when the image is taken) is metadata. The member accounts is social data. The members' favourites is personal preference metadata. The groups, clusters and "interestingness" is the linking metadata. This linking metadata is what draws people to flickr, as opposed to the hundreds of other photo hosting galleries and websites.
The value of linking metadata increaces non-linearly, by the network effect. This mitigates against the development of walled gardens and private Internets and towards interoperable systems.
Diagram 2 is a process diagram for the research cycle as a whole and event organisation. Again red indicates privacy / confidentiality issues and green things that are open (or in the case of peer review the anonymous / open dynamics are well understood). Searching over the green aspects represents only technical barriers, searching over the red aspects is fraught with social and organisation issues. Information from the red areas is also problematic stylistically: because it has not be prepared for public view it is likely to be very idiosyncratic, reducing the opportunities and capibilities for searching over it.
Other diagrams produced in this work can be found at http://www.flickr.com/photos/stuartyeates/tags/vre/