Mapping LDAP to FOAF

LDAP is a directory protocol, here at Oxford the we have an LDAP server than contains information on every user, every unit and every division (for examples of the information held, see below). FOAF is a XML/RDF format for specifying information about people and the groups they are in.

A number of the VRE user requirements for searching involve searching for people (or involve people as search terms), but the current search tools for searching people are severly limited. One possible solution to this, is to boot-strap the ability to search within a community by enabling people to publish their own metadata, in a systematic way.

Contents:

1. Basic Plan

There are three parts to the plan. Part A deals with group data, Part B deals with personal data and Part C deals with aggregation of other data published within and about the institution.

1.1. Part A

Publish group details of name, URL, postal address, telephone number, fax number and super-group. DO NOT expose any emails in any form. All this information is available from LDAP and can be updated on a regular schedule to keep it current.

This allows individuals to associate themselves with their unit in FOAF, without having to replicate all the details. By encoding the URL as a homepage and not using foaf:authority we can be clear that we are not claiming to be or to represent the group, merely containing data about them and pointing to them.

1.2. Part B

Expose an interface for individuals to publish their own details in FOAF. The interface would find basic information out about the individual from LDAP (title, several forms of name, email and email SHA1) and links to their organisation (if the Part A above is not done, the information has to be included en-mass here). The interface would allow users to enter other details about themselves and their research groups, as is standard in FOAF creation tools (see [1] and [2]).

FOAF contains a specific mechanism for hiding email addresses to avoid the problem of attracting SPAM [4]. Rather than distribute email addresses, SHA1 hashes of email addresses are used, so that anyone who already has the email address can easily check to see if the hash matches an email address, but there is no way of recovering the email address from the hash.

Users would be shown the resulting FOAF and the implications of publication explained before they were asked for conformation to publish the FOAF. They would have the option of disabling some or all sections of the FOAF in an easy to use manner. In particular they would be given the option of publishing the email address, the hash of their email address, or neither. They will also have the option of removing previous published FOAF, although 3rd parties may have already downloaded the FOAF. Users have the option of hosting the FOAF on their own sites (or indeed, publishing it as part of their blog or similar service) by cutting and pasting.

1.3. Part C

Publish a list of 3rd party FOAF and FOAF-related RDF URLs related to Oxford communities and people. The list would be populated initially with existing sites found using publicly accessible searches.

By having a central point from which Oxford and Oxford-related communities can have their FOAF files linked, we encourage them to create such files and enable the FOAF creation tool in Part B to have a richer, wider variety of FOAF.

2. Issues

2.1. Confidentiality

We need to be very careful that personal information is not published without the informed consent of the individuals involved. On the user data-entry and confirmation pages we need explicit information about this.

2.2. Email addresses

Many people are hesitant to publish email address for fear that they will attract unwanted commercial emails, SPAM and viruses. SHA1 hashes should probably be used everywhere by default

2.3. Data Policy

There is undoubtedly a university data policy that says what we can do with various parts of this data. We need to check we're on the right side of it.

2.4. Experimental nature of the LDAP service

LDAP is not a core service at Oxford, so normal operations of the portal must not be dependent on the ability to connect to LDAP. This means that the FOAF needs to be generated and stored, to be served as requested, which also prevents FOAF crawlers who might accidentally hammer us from badly effecting the LDAP servers, which would make us very unpopular.

2.5. Expiry times

FOAF and RDF expect to be able to use the HTTP Expires: and Last-Modified: headers to indicate when data was generated and when it should be considered stale. It's not clear whether we have access to this within SAKAI, or whether we need to publish the FOAF and RDF outside the portal.

2.6. Is RDF/ FOAF the right format?

There are a number of options for representing the data, including eRDF[5] and TEI[6]. The latter has the advantage that it can be happily embedded in normal HTML, while the latter is already widely used within the institution and can be readily be used to represent former relationships as well as current ones. Generating data in these formats is not necessarily mutually exclusive.

3. RDF and FOAF namespaces

prefix

url

docs

purpose

rdf

http://www.w3.org/1999/02/22-rdf-syntax-ns

http://www.w3.org/RDF/

RDF base spec

rdfs

http://www.w3.org/2000/01/rdf-schema||http://www.w3.org/2000/01/rdf-schema

RDF schemas

foaf

http://xmlns.com/foaf/0.1/

http://xmlns.com/foaf/0.1/

Friend Of A Friend

wot

http://xmlns.com/wot/0.1/||http://xmlns.com/wot/0.1/

Links to the GPG/PGP Web Of Trust

dc

http://purl.org/dc/elements/1.1/

http://dublincore.org/

Dublin Core

vCard

http://www.w3.org/2001/vcard-rdf/3.0||http://www.w3.org/TR/vcard-rdf

vCard spec

bio

http://vocab.org/bio/0.1/||http://vocab.org/bio/0.1/||bio (birth, death, marriage)

lang

http://f14web.com.ar/inkel/rdf/schemas/lang/1.1||http://f14web.com.ar/inkel/rdf/schemas/lang/||(natural) languages spoken

geo

http://www.w3.org/2003/01/geo/wgs84_pos||http://www.w3.org/2003/01/geo/||geo location

ns0

http://www.w3.org/2003/12/exif/ns||http://www.w3.org/2003/12/exif/

EXIF data (camera technical details)

ical

http://www.w3.org/2002/12/cal/ical||http://www.w3.org/2002/12/cal/ical

iCal calendaring standard

There is a comprehensive list at http://www.schemaweb.info/schema/BrowseSchema.aspx

4. Command lines / example data

If you have openldap installed and are within the oxford firewall. To see my ldap entry:

syeates@oucs-yeates:~/downloads/openldap$ ldapsearch -P 2 -x -h ldap.ox.ac.uk -b "ou=people,dc=ox,dc=ac,dc=uk" '(&(objectClass=*)(sn='Yeates')(givenname='stuart'))'
# extended LDIF
#
# LDAPv2
# base <ou=people,dc=ox,dc=ac,dc=uk> with scope sub
# filter: (&(objectClass=*)(sn=Yeates)(givenname=stuart))
# requesting: ALL
#

# yeatessa00, people, ox.ac.uk
dn: uniqueIdentifier=yeatessa00,ou=people,dc=ox,dc=ac,dc=uk
objectClass: oucsOrganizationalPerson
uniqueIdentifier: yeatessa00
cn: Stuart Andrew Yeates
cn: Stuart A Yeates
cn: Stuart Yeates
sn: Yeates
givenName: Stuart
displayName: Stuart Yeates
initials: SA
oucsCompletionDate: 20071126000000Z
universityBarcode: <REMOVED>
universityBarcodeCheckDigit: I
oucsPrimaryAffiliation: oucs
oucsAffiliation: oucs
oucsDepartment: oucs
oucsCollege: none
oucsStatus: staff
preferredMail: stuart.yeates@oucs.ox.ac.uk
mail: stuart.yeates@oucs.ox.ac.uk
mail: stuart.yeates@computing-services.oxford.ac.uk
mail: syeates@herald.ox.ac.uk
oucsDivision: acserv
oucsUsername: syeates
oucsPrimaryUsername: syeates

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1
syeates@oucs-yeates:~/downloads/openldap$

and to see the unit information for my unit the command is similar. Obviously I have removed the email addresses, for the reasons discussed above. There are many (dozens at least) email addresses associated with a significant number of units within Oxford.

syeates@oucs-yeates:~/downloads/openldap$ ldapsearch    -P 2    -x    -h ldap.ox.ac.uk    -b "ou=units,dc=ox,dc=ac,dc=uk"    '(&(objectClass=*)(uniqueidentifier='oucs'))'
# extended LDIF
#
# LDAPv2
# base <ou=units,dc=ox,dc=ac,dc=uk> with scope sub
# filter: (&(objectClass=*)(uniqueidentifier=oucs))
# requesting: ALL
#

# oucs, units, ox.ac.uk
dn: uniqueIdentifier=oucs,ou=units,dc=ox,dc=ac,dc=uk
objectClass: oucsOrganizationalUnit
uniqueIdentifier: oucs
ou: Computing Services
cn: Computing Services
postalAddress: 13 Banbury Road, Oxford, OX2 6NN
telephoneNumber: +44 1865 273200
facsimileTelephoneNumber: +44 1865 273275
oucsSuperUnit: it
oucsDivision: acserv
oucsPreferredMailDomain: computing-services.oxford.ac.uk
oucsMailDomain: computing-services.oxford.ac.uk
oucsMailDomain: oucs.ox.ac.uk
mail: <REMOVED>@computing-services.oxford.ac.uk
mail: <REMOVED>@computing-services.oxford.ac.uk
mail: <REMOVED>@computing-services.oxford.ac.uk
....
mail: <REMOVED>@computing-services.oxford.ac.uk
mail: <REMOVED>@computing-services.oxford.ac.uk
oucsUnitURI: http://www.oucs.ox.ac.uk/

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1
syeates@oucs-yeates:~/downloads/openldap$ 

5. Code

I've attached a pair of scripts to this wiki page. The first generates a foaf:Group for a unit from ldap and the second gets a list of all the units from LDAP, filters out the security conscious ones (biomed and expsych) and sandwiches the FOAF between pre-determined FOAF for the university as a whole.

The SWED [7] project has an open source servlet for generating semantic web content.

The Chimera [8] project aim "to use the friend-of-a-friend vocabulary (FOAF), together with existing user vocabularies, to create a toolkit for the discovery and formation of new research communities."

7. References


Creative Commons License
The content of this wiki is licensed under the Creative Commons Attribution-ShareAlike 2.0 England & Wales Licence.

OSS Watch is funded by the Joint Information Systems Committee (JISC) and is situated within the Research Technologies Service (RTS) of the University of Oxford.