@inesc-id.pt
INESC-ID
Scopus Publications
Scholar Citations
Scholar h-index
Scholar i10-index
Henk Alkemade, Steven Claeyssens, Giovanni Colavizza, Nuno Freire, Jörg Lehmann, Clemens Neudecker, Giulia Osti, and Daniel van Strien
Ubiquity Press, Ltd.
Nuno Freire, Hugo Manguinhas, Antoine Isaac, and Valentine Charles
Springer Nature Switzerland
Gonçalo Melo da Silva, Ana Celeste Glória, Ângela Sofia Salgueiro, Bruno Almeida, Daniel Monteiro, Marco Roque de Freitas, and Nuno Freire
MDPI AG
The ROSSIO Infrastructure is developing a free and open-access platform for aggregating, organising, and connecting the digital resources in the Social Sciences, Arts and Humanities provided by Portuguese higher education and cultural institutions. This paper presents an overview of the ROSSIO Infrastructure, its main objectives, the institutions involved, and the services offered by the infrastructure’s aims through its platform—namely, a discovery portal, digital exhibitions, collections, and a virtual research environment. These services rely on a metadata-aggregation solution for bringing the digital objects’ metadata from the providing institutions into ROSSIO. The aggregated datasets are converted into linked data and undergo an enrichment process based on controlled vocabularies, which are developed and published by ROSSIO. The paper will describe this process, the applications involved, and how they interoperate. We will further reflect on how these services may enhance the dissemination of science, considering the FAIR principles.
Mónica Marrero, Antoine Isaac, and Nuno Freire
Springer International Publishing
Nuno Freire, Enno Meijers, Sjors de Valk, Julien A. Raemy, and Antoine Isaac
Springer International Publishing
Digital cultural heritage resources are widely available on the web through the digital libraries of heritage institutions. To address the difficulties of discoverability in cultural heritage, the common practice is metadata aggregation, where centralized efforts like Europeana facilitate discoverability by collecting the resources’ metadata. We present the results of the linked data aggregation task conducted within the Europeana Common Culture project, which attempted an innovative approach to aggregation based on linked data made available by cultural heritage institutions. This task ran for one year with participation of twelve organizations, involving the three member roles of the Europeana network: data providers, intermediary aggregators, and the central aggregation hub, Europeana. We report on the challenges that were faced by data providers, the standards and specifications applied, and the resulting aggregated metadata.
Nuno Freire, Glen Robson, John B. Howard, Hugo Manguinhas, and Antoine Isaac
Springer Science and Business Media LLC
In the World Wide Web, a very large number of resources are made available through digital libraries. We (Europeana and data providers) report on case studies that tested the application of some of the most promising Web technologies, exploring several solutions based on the International Image Interoperability Framework (IIIF) and Sitemaps. We also describe an analysis of the Schema.org vocabulary for application in the context of cultural heritage and metadata aggregation. The solutions were tested successfully and leveraged on existing technology and knowledge in cultural heritage, with low implementation barriers. The future challenges lie in choosing among the several possibilities and standardizing solution(s). Europeana will proceed with recommendations for its network and is actively working within the IIIF community to achieve this goal.
Luciana Candida Silva, José Eduardo Santarem Segundo, and Nuno Freire
University Library System, University of Pittsburgh
Objetivo. La Web Semántica y los Datos Enlazados hacen hincapié en la reutilización y la vinculación de los recursos ricamente descritos en la Web. Estos principios se ajustan al propósito del Modelo de Datos de Europeana (EDM) de utilizar la información de los recursos existentes y apoyar su enriquecimiento estableciendo nuevas relaciones entre ellos. Así, el objetivo de este estudio es describir las relaciones semánticas incrustadas en los elementos de la EDM, destacando las ventajas de la utilización de este modelo para la recuperación de información en la Web y, de este modo, fomentar la adopción de metodologías semánticas en los proyectos brasileños.Método. Se trata de una investigación cualitativa de carácter descriptivo-documental, basada en la familia documental EDM. En primer lugar, el estudio identificó los conceptos y tecnologías de la web semántica y los datos enlazados, y a continuación analizó el memorial descriptivo del modelo de datos de Europeana. Se detallaron los principios y el desarrollo del EDM, haciendo hincapié en los elementos semánticos que modelan y apoyan la funcionalidad de Europeana.Resultados. Como resultado, se identificaron las posibilidades de conectar datos de diferentes instituciones, con el fin de enriquecer la información de los registros de un determinado objeto del patrimonio cultural. Conclusiones. Este estudio demostró, a través de las relaciones semánticas, que la estructura semántica de EDM constituye una referencia a seguir para la publicación de datos de proyectos nacionales en Linked Open Data, con el fin de garantizar la creciente interconexión de los datos, aumentar la velocidad de circulación de la información entre las partes interesadas y acelerar nuevos descubrimientos.
Nuno Freire and Mário J. Silva
Springer International Publishing
Nuno Freire, Hugo Manguinhas, and Antoine Isaac
Springer International Publishing
This article presents an observational study of the virtual graph formed by equivalence links between agent entities across 8 knowledge bases. To evaluate the potential of this linked data graph, we measured the equivalences that it could provide for a real dataset. We crawled the virtual graph by starting from references to agents we found in descriptions of objects collected from data of cultural heritage institutions in Europeana. Our study characterizes the current virtual equivalence graph, presenting statistics about the links, their type and origin. Crawling the equivalences for agent URIs required several crawling iterations on the virtual equivalence graph. The amount of gathered equivalences grows steeply in the first 3 crawling iterations and stabilizes on the 4th iteration. VIAF was the KB with the highest number of equivalences, reaching 60.7%, and it was followed by Wikidata with 34.5%.
Nuno Freire and Diogo Proença
Springer International Publishing
Large ontologies are available as linked data, and they are used across many domains, but to process them considerable resources are required. RDF provides automation possibilities for semantic interpretation, which can lower the effort. We address the usage of RDF reasoning in large ontologies, and we test approaches for solving reasoning problems, having in mind use cases of low availability of computational resources. In our experiment, we designed and evaluated a method based on a reasoning problem of inferring Schema.org statements from cultural objects described in Wikidata. The method defines two intermediate tasks that reduce the volume of data used during the execution of the RDF reasoner, resulting in an efficient execution taking on average 10.3 ± 7.6 ms per RDF resource. The inferences obtained in the Wikidata test were analysed and found to be correct, and the computational resource requirements for reasoning were significantly reduced. Schema.org inference resulted in at least one rdf:type statement for each cultural resource, but the inference of Schema.org predicates was below expectations. Our experiment on cultural data has shown that Wikidata contains alignment statements to other ontologies used in the cultural domain, which with the application of RDF and OWL reasoning can be used to infer views of Wikidata expressed in cultural domain’s data models.
Nuno Freire and Sjors de Valk
IEEE
Publication and usage of linked data has been highly pursued by cultural heritage institutions and service providers in this domain. Much research and cooperation are taking place in adapting and improving cultural heritage data models for linked data and in defining ontologies and vocabularies, as well as the setting up of services based on linked data. This article presents an evaluation of ontologies and vocabularies published as liked data, which originate from the cultural heritage domain, or are frequently used and linked to in this domain. Our study aims to evaluate their usability by crawlers operating on the web of data, according to specifications and practices of linked data, the Semantic Web and ontology reasoning. We evaluate having in mind the use case of general data consumption applications based on RDF, RDF Schema, OWL, SKOS and linked data’s guidelines. We have evaluated twelve ontologies and vocabularies and identified that four were not fully compliant, and that alignments between ontologies are not included in the definitions of the ontologies. This study contributes to the research of novel services consuming linked data. It also allows to better assess the automation that can be achieved to handle the variety and large volume of linked data, when assessing the viability of new services based on linked data in cultural heritage.
Nuno Freire, René Voorburg, Roland Cornelissen, Sjors de Valk, Enno Meijers, and Antoine Isaac
MDPI AG
Online cultural heritage resources are widely available through digital libraries maintained by numerous organizations. In order to improve discoverability in cultural heritage, the typical approach is metadata aggregation, a method where centralized efforts such as Europeana improve the discoverability by collecting resource metadata. The redefinition of the traditional data models for cultural heritage resources into data models based on semantic technology has been a major activity of the cultural heritage community. Yet, linked data may bring new innovation opportunities for cultural heritage metadata aggregation. We present the outcomes of a case study that we conducted within the Europeana cultural heritage network. In this study, the National Library of The Netherlands contributed by providing the role of data provider, while the Dutch Digital Heritage Network contributed as an intermediary aggregator that aggregates datasets and provides them to Europeana, the central aggregator. We identified and analyzed the requirements for an aggregation solution for the linked data, guided by current aggregation practices of the Europeana network. These requirements guided the definition of a workflow that fulfils the same functional requirements as the existing one. The workflow was put into practice within this study and has led to the development of software applications for administrating datasets, crawling the web of data, harvesting linked data, data analysis and data integration. We present our analysis of the study outcomes and analyze the effort necessary, in terms of technology adoption, to establish a linked data approach, from the point of view of both data providers and aggregators. We also present the expertise requirements we identified for cultural heritage data analysts, as well as determining which supporting tools were required to be designed specifically for semantic data.
Nuno Freire, A. Isaac, Twan Goosen, D. Broeder, Hugo Manguinhas and V. Charles
Cultural heritage institutions hold collections of printed newspapers that are valuable resources for the study of history, linguistics and other Digital Humanities scientific domains. Effective retrieval of newspapers content based on metadata only is a task nearly impossible, making the retrieval based on (digitized) full-text particularly relevant. Europeana, Europe’s Digital Library, is in the position to provide access to large newspapers collections with full-text resources. Full-text corpora are also relevant for Europeana’s objective of promoting the usage of cultural heritage resources for use within research infrastructures. We have derived requirements for aggregating and publishing Europeana’s newspapers full-text corpus in an interoperable way, based on investigations into the specific characteristics of cultural data, the needs of two research infrastructures (CLARIN and EUDAT) and the practices being promoted in the International Image Interoperability Framework (IIIF) community. We have then defined a “full-text profile” for the Europeana Data Model, which is being applied to Europeana’s newspaper corpus.
Nuno Freire and Antoine Isaac
Springer International Publishing
Wikidata is an outstanding data source with potential application in many scenarios. Wikidata provides its data openly in RDF. Our study aims to evaluate the usability of Wikidata as a data source for robots operating on the web of data, according to specifications and practices of linked data, the Semantic Web and ontology reasoning. We evaluated from the perspective of two use cases of data crawling robots, which are guided by our general motivation to acquire richer data for Europeana, a data aggregator from the Cultural Heritage domain. The first use case regards general data consumption applications based on RDF, RDF-Schema, OWL, SKOS and linked data. The second case regards applications that explore semantics relying on Schema.org and SKOS. We conclude that a human operator must assist linked data applications to interpret Wikidata’s RDF because of the choices that were taken at Wikidata in the definition of its expression in RDF. The semantics of the RDF output from Wikidata is “locked-in” by the usage of Wikidata’s own ontology, resulting in the need for human intervention. Wikidata is only a few steps away from high quality machine interpretation, however. It contains extensive alignment data to RDF, RDFS, OWL, SKOS and Schema.org, but a machine interpretation of those alignments can only be done if some essential Wikidata alignment properties are known.
Nuno Freire
Springer International Publishing
This paper describes the Data Aggregation Lab software tool, which implements the metadata aggregation workflow of Cultural Heritage, based on semantic technologies. It aims to provide a framework to support several aspects of our research, such as conducting case studies, provide reference implementations, and support technology adoption. Currently, it provides working implementations of data aggregation methods with which Europeana research has obtained positive results. These methods explore technologies such as linked data, Schema.org, IIIF, Sitemaps and RDF.
Péter Király, Juliane Stiller, Valentine Charles, Werner Bailer, and Nuno Freire
Springer International Publishing
Europeana.eu aggregates metadata describing more than 50 million cultural heritage objects from libraries, museums, archives and audiovisual archives across Europe. The need for quality of metadata is particularly motivated by its impact on user experience, information retrieval and data re-use in other contexts. One of the key goals of Europeana is to enable users to retrieve cultural heritage resources irrespective of their origin and the material’s metadata language. The presence of multilingual metadata descriptions is therefore essential for successful cross-language retrieval. Quantitatively determining Europeana’s cross-lingual reach is a prerequisite for enhancing the quality of metadata in various languages. Capturing multilingual aspects of the data requires us to take into account the full lifecycle of data aggregation including data enhancement processes such as automatic data enrichment. The paper presents an approach for assessing multilinguality as part of data quality dimensions, namely completeness, consistency, conformity and accessibility. We describe the measures defined and implemented, and provide initial results and recommendations.
Nuno Freire, Enno Meijers, Sjors de Valk, Rene Voorburg, Antoine Isaac, and Roland Cornelissen
IEEE
A very large number of online cultural heritage (CH) resources is made available through numerous digital libraries. To address the difficulties of discoverability in CH, the common practice is metadata aggregation, where centralized efforts like Europeana facilitate discoverability by collecting the resources’ metadata. In the last years, the CH domain has invested in data models for Linked Data (LD) representation of CH metadata. LD, however, also has potential for innovating metadata aggregation. We present the results of a pilot case study within the Europeana Network. In this pilot, the National Library of The Netherlands plays the role of initial data provider, with the Dutch Digital Heritage Network the one of intermediary service providing datasets to Europeana. We analysed the requirements for an LD aggregation solution and defined a workflow that fulfils the same functional requirements as Europeana’s current solution. The workflow was put into practice within the pilot and led to the development of several software components for managing datasets, harvesting LD, data analysis and integration. Our analysis of the experience discusses the effort of adopting such an LD approach for data providers and aggregators, the expertise required by CH data analysts, and the supporting tools required for semantic data.
Nuno Freire, Enno Meijers, René Voorburg, and Antoine Isaac
Elsevier BV
Abstract The existence of many digital libraries, maintained by different organizations, brings challenges to the discoverability of cultural heritage (CH) resources. Metadata aggregation is an approach where centralized efforts like Europeana facilitate their discoverability by collecting the resource’s metadata. Nowadays, CH institutions are increasingly applying technologies designed for the wider interoperability on the Web. In this context, we have identified the Schema.org vocabulary and linked data (LD) as potential technologies for innovating CH metadata aggregation. We present the results of an analysis using the case of the Europeana network of aggregators and data providers as basis. We have conducted a survey of the available linked data technology, and we defined a solution, which we have put into practice in a pilot implementation within the Europeana network. In this pilot, the National Library of The Netherlands fulfils the role of data provider, with the Dutch Digital Heritage Network, as national aggregator, supporting the provision of several datasets from the national library to Europeana. The metadata is published using LD practices, having Schema.org as the main vocabulary. The national library also implements all the necessary semantic web mechanisms, defined in our solution, for making the datasets discoverable and harvestable by Europeana. Our proposal involves the use of vocabularies for description of datasets, and their distributions, namely DCAT, VoID and Schema.org. Europeana implements the LD harvester side of the solution and applies it to harvest the Schema.org data from the national library.
Nuno Freire, Valentine Charles, and Antoine Isaac
Springer International Publishing
In the World Wide Web, a very large number of resources is made available through digital libraries. The existence of many individual digital libraries, maintained by different organizations, brings challenges to the discoverability, sharing and reuse of the resources. A widely-used approach is metadata aggregation, where centralized efforts like Europeana facilitate the discoverability and use of the resources by collecting their associated metadata. The cultural heritage domain embraced the aggregation approach while, at the same time, the technological landscape kept evolving. Nowadays, cultural heritage institutions are increasingly applying technologies designed for the wider interoperability on the Web. In this context, we have identified the Schema.org vocabulary as a potential technology for innovating metadata aggregation. We conducted two case studies that analysed Schema.org metadata from collections from cultural heritage institutions. We used the requirements of the Europeana Network as evaluation criteria. These include the recommendations of the Europeana Data Model, which is a collaborative effort from all the domains represented in Europeana: libraries, museums, archives, and galleries. We concluded that Schema.org poses no obstacle that cannot be overcome to allow data providers to deliver metadata in full compliance with Europeana requirements and with the desired semantic quality. However, Schema.org’s cross-domain applicability raises the need for accompanying its adoption by recommendations and/or specifications regarding how data providers should create their Schema.org metadata, so that they can meet the specific requirements of Europeana or other cultural aggregation networks.
Nuno Freire, Antoine Isaac, Glen Robson, John Brooks Howard, and Hugo Manguinhas
IOS Press
In the World Wide Web, a very large number of resources are made available through digital libraries. The existence of many individual digital libraries, maintained by different organizations, brings challenges to the discoverability and usage of these resources by potential users. A widely-used approach is metadata aggregation, where a central organization takes the role of facilitating the discoverability and use of the resources, by collecting their associated metadata. The central organization has the possibility to further promote the usage of the resources by means that cannot be efficiently undertaken by each digital library in isolation. This paper focuses in the domain of cultural heritage, where OAI-PMH has been the embraced solution, since discovery of resources was only feasible if based on metadata instead of full-text. However, the technological landscape has changed. Nowadays, with the technological improvements accomplished by network communications, computational capacity, and Internet search engines, the motivation for adopting OAI-PMH is not as clear as it used to be. In this paper, we present the results of our analysis of available potential technologies, using as application context the Europeana Network and its requirements for metadata aggregation. We cover the following technologies: IIIF (International Image Interoperability Framework); Webmention; Linked Data Notifications; WebSub; Sitemaps; ResourceSync; Open Publication Distribution System (OPDS); Linked Data Platform; and Schema.org.