Best practices for Linked Data Asunción Gómez-Pérez Facultad de Informática, Universidad Politécnica de Madrid Avda. Montepríncipe s/n, 28660 Boadilla del Monte ...
Best practices for Linked Data Asunción Gómez-Pérez Facultad de Informática, Universidad Politécnica de Madrid Avda. Montepríncipe s/n, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net
[email protected] Phone: 34.91.3367417, Fax: 34.91.3524819
Acknowledgements: M. Poveda, V. Rodríguez-Doncel , D. Vila BabeLData: TIN2010-17550
Linked Data: why it is important? • Facilitate data integration § § § § §
From heterogeous sources In different formats Different granularity In different languages From different countries
© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
BD AEMET
BD VIAF BD BNE
BD IGN
BD Prisa
BD DBpedia
Data Integration
BNE Ubicado en 1605
Alcalá de Henares
El Quijote Año de Publicación
Same as Autor
M. Cervantes
birthPlace M. Cervantes
Alcalá de Henares
M. Cervantes Year of publication
creator Don Quixote
1960 Alcalá de Henares
Alcalá de Henares
Translated into located
guía
Hebrew Tapas Siglo de Oro
VIAF
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
Temperatura 20º
3 3
Foundations Unique identifiers: URI
RDF(S) models
identify or name a resource
Equivalence links to other datasets Same As
Data navigation http://iflastandards.info/ns/fr/frbr/frbrer/C1001
http://iflastandards.info/ns/fr/frbr/frbrer/C1005 Is creator of
Person
Cer
Is a
Work Is a
Cervantes http://datos.bne.es/resource/XX1718747
Is creator of Cer
El Quijote http://datos.bne.es/resource/XX3383563
Same As
Same As
Cervantes http://viaf.org/viaf/17220427
Cervantes
http://www.w3.org/DesignIssues/LinkedData.html
http://dbpedia.org/resource/Miguel_de_Cervantes
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
4
The model (Ontology) and the data for humans Idiom
translation
Is creator of
Year
Work
birthPlace Person
Ontology
Place
Publication date
Located at Has subject
Library
Catalán
translation 1960
Is creator of El Quijote
birthPlace Cervantes
Alcalá de Henares
Publication date Has subject Located in Vida de Cervantes
Data
BNE
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
5 5
The model and the data for Machines Language
Ontology
http://iflastandards.info/ns/fr/frbr/frbrer/C1002
translation
Is creator of
work Año
http://iflastandards.info/ns/fr/frbr/frbrer/C1001
Person http://iflastandards.info/ns/fr/frbr/frbrer/C1005
Publication date
birthPlace Has subject
http://geo.linkeddata.es/ontology/Municipio
Located in Biblioteca http://xmlns.com/foaf/0.1/Organization
Catalán http://datos.bne.es/resource/XX1924295 http://geo.linkeddata.es/resource/Alcalá de Henares
translation Don Quijote de la Mancha 1960
http://datos.bne.es/resource/XX3383563
Es autor
Cervantes Saavedra, Miguel de
birthPlace
http://datos.bne.es/resource/XX1718747
Publication date Has subject Located in BNE
http://datos.bne.es/resource/bimo0002045496 Vida de Miguel de Cervantes Saavedra
http://datos.bne.es/#
Asunción Gómez-Pérez
6 W3C @ Spain – 2013 Madrid, 18th December
Data 6
Linked Data is to be processed by machines
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
The generation process Providers
Domains
Asunción Gómez-Pérez
Sources
W3C @ Spain – 2013 Madrid, 18th December
Languages
The Linked Data Generation Process Data Curation Specification
Exploitation
Modelling
Publication
Generation
Linking
9
There is no One-Size-Fits-All Formula
Lot of data in many domains …
Music
On-line activities
Publications
E-Gov Cross-domains
Geographic
Life Sciences
I want to use Linked Open Data
§ Who generated the LD dataset? § When the LD dataset was created? § How the LD dataset was created? § Is the latest version of the LD dataset? § Is the license information clearly stated in the LD dataset? § How is LD licenses offered? § Is the LD dataset monolingual or multilingual?
LOD observations
• How the LD generation process influence the use of the data by third parties? • • • •
Vocabularies Licenses Language Provenance
How to prevent GIGO
GARBAGE PROCESS
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
Vocabularies
14
th
Cervantes at the data level URI URI URI URI URI
http://www.server1.org/resource/Cervantes
Cervantes
Same as
http://d-nb.info/gnd/11851993X
Same as http://datos.bne.es/resource/XX1718747
Author
Same as
Phone
http://www.server2.es/resource/Cervantes
D. Quijote Date of Birth
914 296 093 #People Size
1547
Same as
1547
http://geo.linkeddata.es/page/resource/Municipio/Cervantes
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
276,4 km²
Cervantes and a bit of semantics rdf:type
Retaurant URI URI URI URI URI
http://www.server1.org/resource/Cervantes
rdf:type http://d-nb.info/gnd/11851993X
Person
rdf:type
Same as http://datos.bne.es/resource/XX1718747
rdf:type
Street
http://www.server2.es/resource/Cervantes
Author
D. Quijote Date of Birth
1547
rdf:type
Municipality
http://geo.linkeddata.es/page/resource/Municipio/Cervantes
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
Cervantes (Person)
Cervantes foaf foaf:Agent foaf:Group foaf:Document
foaf:Organization foaf:Person foaf:mbox
foaf:publications
foaf:Image
- foaf:firstName - foaf:surname foaf:img
- foaf:birthday
owl:Thing
foaf:knows foaf:depiction foaf:homepage
“Miguel”
instanceOf
instanceOf
foaf:firstName
“de Cervantes Saavedra”
foaf:surname
bibliothek:Cervantes
foaf:birthday
“29-09”
instanceOf http://www.BibliothekBerlin/…/images/Quixote.tif
foaf:img foaf:publications foaf:depiction
http://.../authors/cervantes.png
http://www.BibliothekBerlin.com/.../3-538-06892-5
17
instanceOf
License Information
18
LOD observations: Licenses
How Open is the Open Linked Data Cloud?
An example: the British National Bibliography
License Information is not up to date
Metadata information without license information
License information provided as XML
Linked Data Rights pattern
http://oeg-dev.dia.fi.upm.es/licensius/static/ldr/
Lenguage
25
Rationale: LOD is dominated by the English Language § 2007 § 2009 § 2013
Questions: 1. Searching resources in a particular language 2. Distribution of natural languages across RDF datasets? 3. Usage of language tags to indicate the natural language of RDF tags? 1. Distribution of usage of language tags 2. Distribution of literals tagged as English vs other languages 3. Distribution of literals tagged in languages other than English 26
Example of multilingual library resource The dataset publisher does not tag the language of the content of different fields
“Ernest Hemingway” and “El viejo y el mar” MARC 21 records
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
27
Multilingualism and the Linked Data Process How to represent language information for datasets? •
# VoiD description :bne a void:Dataset; dcterms:language . # DCAT description :bne a dcat:Dataset; dcterms:language
How to represent language information in Linked Data? §
Traditional annotation properties for most cases dbpedia:Miguel_de_Cervantes rdfs:label "Miguel de Cervantes"@es . "ミゲル・デ・セルバンテス"@ja . "미겔 데 세르반테스"@ko .
§
Richer models for more demanding applications # LEMON isbd:T1001 lemon:isReferenceOf [lemon:isSenseOf :cartographic]. :cartographic a lemon:LexicalEntry; lemon:form [lemon:writtenRep “cartográfico”@es; isocat:grammaticalGender isocat:masculine]; lemon:form [lemon:writtenRep “cartográfica”@es; isocat:grammaticalGender isocat:feminine]. isocat:grammaticalGender rdfs:subPropertyOf lemon:property.
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
Implementation of the recording of data and metadata provenance Generation process • PROV-O @W3C creator
Resource provenance • DC
File.txt
creaDonDate rights
John
12-‐2-‐1900 GPL
used Revision Process generatedBy
PROVENANCE Model (RDF(S))
Filev1. txt RDF Store
29
1
Conclusions
The use of § § § § §
Data curated Use vocabularies widely known License metadata in RDF Language metadata in RDF Provenance metadata in RDF
§ Will influence the use of the linked data by third parties
Asuncion Gomez-Perez
W3C @ Spain – 2013 Madrid, 18th December
Thanks for your attention !
Asuncion Gomez-Perez
Guidelines for Multilingual Linked Data. WIMS – 2013 Madrid, 12-14 June
31
There is no One-Size-Fits-All Formula Phase
BNE
DC
Wgs84 time
geometry2rdf NOR2O
DNB VIAF LIBRIS DBPEDIA
Publication
Exploitation
PRISA
INE Scovo
SSN ontology
SIOC
Data cube
MARiMbA
Silk
Links generation
AEMET
hydrontology
Modeling
RDF generation
IGN
DBPEDIA
CSV parser
CSV parser
Silk
Silk
NOR2O NOR2O
DBPEDIA Geolinkeddata.es
Geonames
Geolinkeddata.es
Pubby
Geolinkeddata.es
sitemap4rdf
SPARQL map4rdf http://oa.upm.es/14465/1/2.formulaLD.pdf
The multilingual Web of Data: Current state Monolingual datasets
Multilingual datasets
349
635
1,906
2,201
January 2012
June 2012
676 1,984
December 2012
1. Number of Monolingual and multilingual datasets RDF literals with English tag
431,660
RDF literals without language tag 2,567,324 10,250,936
January 2012
3,154,779
RDF literals with language tag 3,365,930
10,594,338
June 2012
12,272,806
December 2012
2. Current usage of language tagging capabilities in RDF
RDF literals with other language tag
403,714
557,785
2,135,664
2,751,065
2,808,145
January 2012
June 2012
December 2012
3. English tags versus other languages' tags
4. Evolution of top-10 languages 33