Best practices for Linked Data - W3C

4 downloads 4990 Views 8MB Size Report
Best practices for Linked Data Asunción Gómez-Pérez Facultad de Informática, Universidad Politécnica de Madrid Avda. Montepríncipe s/n, 28660 Boadilla del Monte ...
Best practices for Linked Data Asunción Gómez-Pérez Facultad de Informática, Universidad Politécnica de Madrid Avda. Montepríncipe s/n, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net [email protected] Phone: 34.91.3367417, Fax: 34.91.3524819

Acknowledgements: M. Poveda, V. Rodríguez-Doncel , D. Vila BabeLData: TIN2010-17550

Linked Data: why it is important? •  Facilitate data integration §  §  §  §  § 

From heterogeous sources In different formats Different granularity In different languages From different countries

© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig

Asunción Gómez-Pérez

W3C @ Spain – 2013 Madrid, 18th December

BD AEMET

BD VIAF BD BNE

BD IGN

BD Prisa

BD DBpedia

Data Integration

BNE Ubicado en 1605

Alcalá de Henares

El Quijote Año de Publicación

Same as Autor

M. Cervantes

birthPlace M. Cervantes

Alcalá de Henares

M. Cervantes Year of publication

creator Don Quixote

1960 Alcalá de Henares

Alcalá de Henares

Translated into located

guía

Hebrew Tapas Siglo de Oro

VIAF

Asunción Gómez-Pérez

W3C @ Spain – 2013 Madrid, 18th December

Temperatura 20º

3 3

Foundations Unique identifiers: URI

RDF(S) models

identify or name a resource

Equivalence links to other datasets Same As

Data navigation http://iflastandards.info/ns/fr/frbr/frbrer/C1001

http://iflastandards.info/ns/fr/frbr/frbrer/C1005 Is creator of

Person

Cer

Is a

Work Is a

Cervantes http://datos.bne.es/resource/XX1718747

Is creator of Cer

El Quijote http://datos.bne.es/resource/XX3383563

Same As

Same As

Cervantes http://viaf.org/viaf/17220427

Cervantes

http://www.w3.org/DesignIssues/LinkedData.html

http://dbpedia.org/resource/Miguel_de_Cervantes

Asunción Gómez-Pérez

W3C @ Spain – 2013 Madrid, 18th December

4

The model (Ontology) and the data for humans Idiom

translation

Is creator of

Year

Work

birthPlace Person

Ontology

Place

Publication date

Located at Has subject

Library

Catalán

translation 1960

Is creator of El Quijote

birthPlace Cervantes

Alcalá de Henares

Publication date Has subject Located in Vida de Cervantes

Data

BNE

Asunción Gómez-Pérez

W3C @ Spain – 2013 Madrid, 18th December

5 5

The model and the data for Machines Language

Ontology

http://iflastandards.info/ns/fr/frbr/frbrer/C1002

translation

Is creator of

work Año

http://iflastandards.info/ns/fr/frbr/frbrer/C1001

Person http://iflastandards.info/ns/fr/frbr/frbrer/C1005

Publication date

birthPlace Has subject

http://geo.linkeddata.es/ontology/Municipio

Located in Biblioteca http://xmlns.com/foaf/0.1/Organization

Catalán http://datos.bne.es/resource/XX1924295 http://geo.linkeddata.es/resource/Alcalá de Henares

translation Don Quijote de la Mancha 1960

http://datos.bne.es/resource/XX3383563

Es autor

Cervantes Saavedra, Miguel de

birthPlace

http://datos.bne.es/resource/XX1718747

Publication date Has subject Located in BNE

http://datos.bne.es/resource/bimo0002045496 Vida de Miguel de Cervantes Saavedra

http://datos.bne.es/#

Asunción Gómez-Pérez

6 W3C @ Spain – 2013 Madrid, 18th December

Data 6

Linked Data is to be processed by machines

Asunción Gómez-Pérez

W3C @ Spain – 2013 Madrid, 18th December

The generation process Providers

Domains

Asunción Gómez-Pérez

Sources

W3C @ Spain – 2013 Madrid, 18th December

Languages

The Linked Data Generation Process Data Curation Specification

Exploitation

Modelling

Publication

Generation

Linking

9

There is no One-Size-Fits-All Formula

Lot of data in many domains …

Music

On-line activities

Publications

E-Gov Cross-domains

Geographic

Life Sciences

I want to use Linked Open Data

§  Who generated the LD dataset? §  When the LD dataset was created? §  How the LD dataset was created? §  Is the latest version of the LD dataset? §  Is the license information clearly stated in the LD dataset? §  How is LD licenses offered? §  Is the LD dataset monolingual or multilingual?

LOD observations

•  How the LD generation process influence the use of the data by third parties? •  •  •  • 

Vocabularies Licenses Language Provenance

How to prevent GIGO

GARBAGE PROCESS

Asunción Gómez-Pérez

W3C @ Spain – 2013 Madrid, 18th December

Vocabularies

14

th

Cervantes at the data level URI URI URI URI URI

http://www.server1.org/resource/Cervantes

Cervantes

Same as

http://d-nb.info/gnd/11851993X

Same as http://datos.bne.es/resource/XX1718747

Author

Same as

Phone

http://www.server2.es/resource/Cervantes

D. Quijote Date of Birth

914 296 093 #People Size

1547

Same as

1547

http://geo.linkeddata.es/page/resource/Municipio/Cervantes

Asunción Gómez-Pérez

W3C @ Spain – 2013 Madrid, 18th December

276,4 km²

Cervantes and a bit of semantics rdf:type

Retaurant URI URI URI URI URI

http://www.server1.org/resource/Cervantes

rdf:type http://d-nb.info/gnd/11851993X

Person

rdf:type

Same as http://datos.bne.es/resource/XX1718747

rdf:type

Street

http://www.server2.es/resource/Cervantes

Author

D. Quijote Date of Birth

1547

rdf:type

Municipality

http://geo.linkeddata.es/page/resource/Municipio/Cervantes

Asunción Gómez-Pérez

W3C @ Spain – 2013 Madrid, 18th December

Cervantes (Person)

Cervantes foaf foaf:Agent foaf:Group foaf:Document

foaf:Organization foaf:Person foaf:mbox

foaf:publications

foaf:Image

- foaf:firstName - foaf:surname foaf:img

- foaf:birthday

owl:Thing

foaf:knows foaf:depiction foaf:homepage

“Miguel”

instanceOf

instanceOf

foaf:firstName

“de Cervantes Saavedra”

foaf:surname

bibliothek:Cervantes

foaf:birthday

“29-09”

instanceOf http://www.BibliothekBerlin/…/images/Quixote.tif

foaf:img foaf:publications foaf:depiction

http://.../authors/cervantes.png

http://www.BibliothekBerlin.com/.../3-538-06892-5

17

instanceOf

License Information

18

LOD observations: Licenses

How Open is the Open Linked Data Cloud?

An example: the British National Bibliography

License Information is not up to date

Metadata information without license information

License information provided as XML

Linked Data Rights pattern

http://oeg-dev.dia.fi.upm.es/licensius/static/ldr/

Lenguage

25

Rationale: LOD is dominated by the English Language §  2007 §  2009 §  2013

Questions: 1.  Searching resources in a particular language 2.  Distribution of natural languages across RDF datasets? 3.  Usage of language tags to indicate the natural language of RDF tags? 1.  Distribution of usage of language tags 2.  Distribution of literals tagged as English vs other languages 3.  Distribution of literals tagged in languages other than English 26

Example of multilingual library resource The dataset publisher does not tag the language of the content of different fields

“Ernest Hemingway” and “El viejo y el mar” MARC 21 records

Asunción Gómez-Pérez

W3C @ Spain – 2013 Madrid, 18th December

27

Multilingualism and the Linked Data Process How to represent language information for datasets? • 

# VoiD description :bne a void:Dataset; dcterms:language . # DCAT description :bne a dcat:Dataset; dcterms:language

How to represent language information in Linked Data? § 

Traditional annotation properties for most cases dbpedia:Miguel_de_Cervantes rdfs:label "Miguel de Cervantes"@es . "ミゲル・デ・セルバンテス"@ja . "미겔 데 세르반테스"@ko .

§ 

Richer models for more demanding applications # LEMON isbd:T1001 lemon:isReferenceOf [lemon:isSenseOf :cartographic]. :cartographic a lemon:LexicalEntry; lemon:form [lemon:writtenRep “cartográfico”@es; isocat:grammaticalGender isocat:masculine]; lemon:form [lemon:writtenRep “cartográfica”@es; isocat:grammaticalGender isocat:feminine]. isocat:grammaticalGender rdfs:subPropertyOf lemon:property.

Asunción Gómez-Pérez

W3C @ Spain – 2013 Madrid, 18th December

Implementation of the recording of data and metadata provenance Generation process • PROV-O @W3C creator  

Resource provenance •  DC

File.txt  

creaDonDate   rights  

John  

12-­‐2-­‐1900   GPL  

used   Revision   Process   generatedBy  

PROVENANCE   Model  (RDF(S))  

Filev1.   txt   RDF     Store  

29

1

Conclusions

The use of §  §  §  §  § 

Data curated Use vocabularies widely known License metadata in RDF Language metadata in RDF Provenance metadata in RDF

§  Will influence the use of the linked data by third parties

Asuncion Gomez-Perez

W3C @ Spain – 2013 Madrid, 18th December

Thanks for your attention !

Asuncion Gomez-Perez

Guidelines for Multilingual Linked Data. WIMS – 2013 Madrid, 12-14 June

31

There is no One-Size-Fits-All Formula Phase

BNE

DC

Wgs84 time

geometry2rdf NOR2O

DNB VIAF LIBRIS DBPEDIA

Publication

Exploitation

PRISA

INE Scovo

SSN ontology

SIOC

Data cube

MARiMbA

Silk

Links generation

AEMET

hydrontology

Modeling

RDF generation

IGN

DBPEDIA

CSV parser

CSV parser

Silk

Silk

NOR2O NOR2O

DBPEDIA Geolinkeddata.es

Geonames

Geolinkeddata.es

Pubby

Geolinkeddata.es

sitemap4rdf

SPARQL map4rdf http://oa.upm.es/14465/1/2.formulaLD.pdf

The multilingual Web of Data: Current state Monolingual datasets

Multilingual datasets

349

635

1,906

2,201

January 2012

June 2012

676 1,984

December 2012

1. Number of Monolingual and multilingual datasets RDF literals with English tag

431,660

RDF literals without language tag 2,567,324 10,250,936

January 2012

3,154,779

RDF literals with language tag 3,365,930

10,594,338

June 2012

12,272,806

December 2012

2. Current usage of language tagging capabilities in RDF

RDF literals with other language tag

403,714

557,785

2,135,664

2,751,065

2,808,145

January 2012

June 2012

December 2012

3. English tags versus other languages' tags

4. Evolution of top-10 languages 33