COMP60411: Modelling Data on the Web

This is the web page where you will find news and information about COMP60411 for 2019/20. There is, additionally, the page from the syllabus.


The course is taught by Tim Morris and Uli Sattler.

If you have any questions that might be of interest to others, please feel free to post it on the attendant Blackboard site.

Coursework and Timing:

The course starts on Monday, September 25th, 2019, at 9:00 in room 2.15 with lectures and labs. We plan to finish by 4:30 pm.

The deadline for handing in the coursework for the

Coursework, announcements, feedback, and discussions will be handled via Blackboard.

For most coursework, it can be helpful to use the <oXygen/> XML editor: they have given us a free group licence, which is available in Blackboard: check the Week 2 Forum.

Late coursework:

If you have mitigating circumstances (either for lateness or for any other issue), you should fill at the mitigating circumstances form and hand it in to the student support office. The instructors and teaching assistants do not grant extensions or resits for coursework directly: You need to go through the mitigating circumstances committee. (Feel free to come talk to us about problems you are having as early as possible. We will help you navigate the system. But we will adhere to the system.) If you do not have mitigating circumstances, then you will receive 0 marks for work handed in late, regardless of the reason (because we want to discuss the coursework after the deadline).

Concerning Literature:

Please note that all books used for this course are available in the resource center or online (see below in the table for the readings of each week). If you prefer to buy them, please note that

Schedule and materials:

Please note that this course unit is a severely modified version of previous editions: we have dropped a substantial number of topics, and added other topics, so please do not rely too heavily on exams, tales, or additional material from previous years!

Teaching assistants (TAs) will be in the MSc Lab between 14:00 and 15:00 Tuesdays to Fridays to help with coursework and other confusions. Tim and Uli also tend to drop in frequently. Also, always subscribe to and keep an eye on the Blackboard forums: your question is probably already discussed there!

Week Date Topic(s) Resources/Reading Slides
1 Sept. 23, 2019 Course organisation
Tables and Relations:
  1. Data Structure: tables
  2. Schema Language: CSVW & SQL
  3. Data Manipulation: Python & SQL
Blackboard area
Learning SQL
Week 1 and as a pdf document
2 Sept. 30, 2019 Tree data models:
  1. Data Structure formalisms: JSON
  2. Schema Language: JSON Schema
  3. Data Manipulation: Python
Self-Describing, Variability, Nesting
Blackboard area

JSON, XML, and Schemas The Essence of XML

JSON Schema
A JSON Schema validator

Early Clark on XML Namespaces
Later Clark on XML Namespaces
Namespace Myths
Namespaces FAQ (Very extensive!)
Week 2
3 Oct. 7, 2019 Tree data models:
  1. Data Structure formalisms: XML
  2. Schema Language: RelaxNG
  3. Data Manipulation: DOM, XPath
Variability, Nesting, Self-Describing
Blackboard area

JSON, XML, and Schemas Learning XML XML in a Nutshell
XML Specification
XML Namespaces 1.0 Recommendation Designing Extensible, Versionable XML Formats

Taxonomy of XML schema languages using formal language theory
XPath Rec
XPath Functions

Week 3
4 Oct. 14, 2019 Tree data models, XML continued:
  1. Data Structure formalisms: XML
  2. Schema Language: Schematron
  3. Data Manipulation: SAX
Tree data models, JSON:
  1. Data Structure formalisms: JSON
  2. Schema Language: JSON Schema
  3. Data Manipulation: Python
Robustness and Error Handling
Blackboard Area

XQuery formal semantics (heavy going)
The Essence of XML
Influence on the Design of XQuery

Comparing XML Schema Languages
Refining the Taxonomy of XML Schema Languages. A new Approach for Categorizing XML Schema Languages in Terms of Processing Complexity.
Week 4 Slides
5 Oct. 21, 2019 Graph data models:
  1. Data Structure formalisms: RDF
  2. Schema Language: RDFS (and Schematron for XML)
  3. Data Manipulation: SPARQL
Robustness, Retrospective
Blackboard area

RDF - Resource Description Framework
RDFS - RDF Schema 1.1 Specification
SPARQL Property Path Expressions

SPARQL EndPoint to DBPedia
SPARQL EndPoint to WikiData
Learning SPARQL

Schematron: validating XML using XSLT
Error handling and Web language design

Week 5 Slides
Oct. 28-Nov. 1: Reading Week.