A brief overview of how the CKAN service could be used for cataloging library datasets.

William Waites
11th November 2010

About CKAN

CKAN is a dataset registry.

It is a free service that is built using Free Software written by the Open Knowledge Foundation.

It is being used by governments and community organisations to document and curate datasets.

It is also being used by the semantic web community to document datasets used to produce the LOD cloud diagram.

About CKAN (cont'd)

The CKAN service allows everyone to add and edit content.

Whilst datasets are publicly editable, datasets can belong to groups which are curated.

Core Concepts: Packages

A package is synonymous with a dataset.

There are generic metadata fields, i.e. title, creator, maintainer.

There are flexible metadata fields for other attributes.

There are special typed link metadata fields for describing network-accessible resources.

Core Concepts: Package Metadata

Apart from the basic package metadata there is a concept called "extras", or key-value pairs.

The keys and values are not enforced in any way.

Best practice is to decide on conventions for keys as a community (as the LOD cloud group have done for their metadata).

Similarly for the "resources" which describe links to external resources.

Core Concepts: Groups

Unlike packages, groups are curated.

Anyone can create a group and set what criteria they like for packages to be included.

The group curators determine what it means to be so included.

Core Concepts: Group Example

The LOD cloud group has a set of criteria (size, number of links to other datasets, etc.).

Membership in the group means inclusion in the cloud diagram.

The inclusion criteria are documented with package metadata fields.

Linked Data

Question: this all sounds very much like it might be better modelled with RDF, right?

Yes, but:

  • When CKAN was started triplestores available weren't sufficiently stable and scalable
  • Vocabularies for describing datasets (DCat, voiD) are quite recent

Package (soon group) metadata is available as RDF at

This data is linked from the main CKAN package pages.

Caveat: "extras" fields need to be mapped to RDF predicates on a case-by-case basis.

Concrete Suggestions

  • create an lld group on CKAN
  • create a wiki page for guidance on extras conventions (perhaps starting with the LOD guidance)
  • start adding datasets!