Online event
9 Sep 2020, 15-18 CET
3-hour online event | 9 SEPT 2020, 15:00 - 18:00 CET

The 2nd Workshop
on Open Citations
and Open Scholarly Metadata

A 3-hour online workshop for researchers, scholarly publishers, funders, policy makers, and opening citations advocates, interested in the creation, reuse, and improvement, of open citation data and open scholarly metadata, with invited speakers.

4 Sessions


Open Citations and Open Scholarly Metadata to support Digital Cultural Heritage

Improving Research Assessment using Open Citations and Open Scholarly Metadata

Open Scholarly Metadata

Sustainability of Open Scholarly Metadata Infrastructures

Authors are invited to submit a short abstract (500 words max.) that fits one of the four sessions topics. Authors of selected contributions will be asked to record a 5-minute video presentation, publish it on any favourite social platform (e.g. Youtube), and forward the URL to organisers. Organisers will collect video links and make them available on both the workshop website and in a dedicated online space for engaging with interested parties and encouraging a broader discussion.

Submission deadline: 3 August 2020 23:59 CET

Notification deadline: 10 August 2020 23:59 CET

Video preparation deadline: 31 August 2020 23:59 CET

Submission deadline: 3 August 2020 23:59 CET

Notification deadline: 10 August 2020 23:59 CET

Video preparation deadline: 31 August 2020 23:59 CET

Vanessa Proudman

SPARC Europe, Director

Cameron Neylon

Centre for Culture and Technology, Curtin University

Stefano Zacchiroli

Software Heritage, CTO

Daniel Mietchen

University of Virginia

David Shotton

University of Oxford


Silvio Peroni

University of Bologna


Marilena Daquino

University of Bologna


Ludo Waltman

Centre for Science and Technology Studies (CWTS), Leiden University


Philipp Mayr

GESIS – Leibniz-Institute for the Social Sciences


Giovanni Colavizza

University of Amsterdam


Matteo Romanello

Silvio Peroni

Workshop Introduction

Marilena Daquino

Authors’ contributions acknowledgment:

  • Serhii Nazarovets (State Scientific and Technical Library of Ukraine). Using Crossref Citation Data for Measuring Impact in the Humanities [video]
  • Roderic Page (University of Glasgow). Wikidata and the "bibliography of life" [video]
  • Olesya Mryglod (Institute for Condensed Matter Physics of the National Academy of Sciences of Ukraine) and Serhii Nazarovets (State Scientific and Technical Library of Ukraine). Open Scholarly Metadata: formalisation is required [video]
  • Konstantinos Stathoulopoulos (Mozilla). Orion: A research measurement and knowledge discovery tool for scholarly publications [video]
  • Paloma Marín-Arraiza and Gabriela Mejias (ORCID). Building Open Scholarly Metadata Services with ORCID [video]
  • Emma Ganley (protocols.io). Open Scholarly Metadata is needed for all research outputs, including methods [video]
  • Wiesława Duży and Tomasz Panecki (Institute of History, Polish Academy of Sciences). Historical spatial data repository at the Institute of History, Polish Academy of Sciences [video]
  • Kevin Boyack (SciTech Strategies, Inc.). A detailed open access model of the PubMed literature [video]
  • Martin Czygan (Internet Archive). Fatcat [video]

Session I: Open Citations and Open Scholarly Metadata to support Digital Cultural Heritage

Stefano Zacchiroli

Referencing (all) publicly available software source code [SLIDES HERE]

Software exists in different forms, most notably executable and source code forms, which are useful for different reasons and in different contexts. When writing about software in scholarly publications it is often useful to reference software (a related, but different activity than that of citing it) in source code form. In particular, it is necessary to do so with a high degree of precision to enable the scientific reproducibility of experiments that depend on software. Software Heritage, the largest public archive of software source code, maintains a standard scheme of intrinsic identifiers (known as SWHIDs) to reference source code artifacts such as files, directories, commits, etc. In this talk we will present SWHIDs, discuss how they are useful in the context of scholarly publications, and briefly review other Software Heritage activities in the field.

Matteo Romanello

Towards a Humanities Citation Index (HuCI) [SLIDES HERE]

Citation indexes are by now part of the research infrastructure in use by most scientists: a necessary tool in order to cope with the constantly increasing amounts of scientific literature being published. Commercial citation indexes are designed for the sciences and have uneven coverage and unsatisfactory characteristics for humanities scholars, while no comprehensive citation index is published by a public organization. Starting from this need for a Humanities Citation Index (HuCI), this talk is composed of two parts. First, we briefly discuss a set of requirements that are relevant to humanities scholars and that need to inform the design of HuCI, namely 1) comprehensive source coverage, 2) chronological depth, 3) being collection-driven and 4) being rich in context. Second, we present the ScholarIndex — the application layer underpinning the creation of HuCI — which allows for distributing the creation and curation of citation data (via a digital library application embedding the necessary machine learning components), and to centrally expose them via a citation index.

Session II: Improving Research Assessment using Open Citations and Open Scholarly Metadata

Ludo Waltman

Responsible research assessment requires open scholarly metadata [SLIDES HERE]

Open citations and open scholarly metadata are important for a variety of reasons. I will discuss an important but not so often used argument for openness of scholarly metadata: Openness is essential to enable responsible research assessment. There is a growing consensus among experts on preconditions for responsible research assessment, but some of these preconditions cannot be fulfilled without having open scholarly metadata. This means that openness of scholarly metadata is a necessary step toward responsible research assessment.

Alessia Bardi

The OpenAIRE Research Graph [SLIDES HERE]

The purpose of the European OpenAIRE infrastructure is to facilitate, foster, support, and monitor Open Science scholarly communication in Europe. The infrastructure has been operational for almost a decade and successful in linking people, ideas and resources in support of the free flow, access, sharing, and re-use of research outcomes. Thanks to synergies and collaborations with a plethora of scholarly communication stakeholders, including institutional repositories, funders, research communities and publishers, OpenAIRE delivers the OpenAIRE Research Graph, an open metadata research graph of interlinked scientific products, with access rights information, links to fundings and research communities. It has been conceived as a trusted, open, public good where research is contextualised and traversable, facilitating the discovery of research outcomes, the monitoring of Open Science uptake, trends, and research impact.

5 min break

Session III: Open Scholarly Metadata

Silvio Peroni

There and back again: past, present, and future of OpenCitations [SLIDES HERE]

OpenCitations is an independent infrastructure organization for open scholarship dedicated to the publication of open bibliographic and citation data by the use of Semantic Web (Linked Data) technologies. In my talk, I will briefly introduce the evolution of OpenCitations over the past few years, starting from the release of the new instance of the OpenCitations Corpus in 2016 to the creation of the OpenCitations Indexes and related tool, services, and software. As a consequence of being selected by the Global Sustainability Coalition for Open Science Services (SCOSS) as an open infrastructure deserving of crowdfunding support from the scholarly community, we have defined a three-year plan which involves new governance structure and future services that I then will outline.

Philipp Mayr

Non-source items are a serious problem everywhere [SLIDES HERE]

Making bibliographic data available for researchers, scholars and others is important in all disciplines to ensure easy and fast access to the literature and other scientific resources such as research datasets. Our previous project EXCITE has addressed this problem and narrowed the gap between the availability of citation data in the social sciences. EXCITE has researched, developed, and deployed powerful tools (https://github.com/exciteproject/) that localize, extract and segment reference strings in PDF documents and then match them against bibliographic databases. One of the main conclusions derived from EXCITE is that the metadata of approx. 60% of the cited papers and other scientific resources are outside of available bibliographic databases. The extracted reference strings (items) that could not be matched are called “non-source items” (NSI). NSI include incomplete or erroneous references as well as references that indeed do not exist in the available bibliographic databases, especially references to datasets, websites and other material. This talk will highlight the significance of NSI for citation matching and suggest possible algorithms to reduce the amount of NSI in digital libraries.

Daniel Mietchen

State of WikiCite in 2020 [SLIDES HERE]

WikiCite is an initiative to collect bibliographic and citation information, particularly of references cited from Wikimedia projects like Wikipedia, Wikisource or Wikidata. It provides an umbrella for a broad range of activities at the intersection between Wikimedia, libraries and other organizations engaged in scholarly communication or cultural heritage. Over the past few years, many of these activities have involved in-person events - including participation in the Workshop on Open Citations in 2018 - but the ongoing COVID-19 pandemic has changed that, and the WikiCite community is adapting. In this talk, I will provide an overview of WikiCite activities during the last 12-18 months as well as of what is ongoing and planned.

Session IV: Sustainability of Open Scholarly Metadata Infrastructures

Cameron Neylon

Turning the tables with sustainable infrastructure [SLIDES HERE]

What if the academy were actually in control of metadata? When we think about the key places of evidence in scholarly communications we often turn to bibliographic databases provided by third party media organisations. Increasingly these organisations are not only commercial parties outside of the control of academic institutions but are organisations that are entirely unrelated to scholarship. There has been much talk, not least by myself, of ‘community ownership’, and ‘open infrastructures’ but what does it take to make that a reality in practice?

Patricia Feeney

Get your house in order: creating an open interconnected future

Crossref metadata is openly available and actively used, but there is always more we can do. We want our metadata infrastructure to be dependable, accessible, engaging, open, and, increasingly, interconnected. Metadata can unite researchers, publishers, funders, and libraries in accelerating research. In this talk, I’ll discuss our vision for the future and how we hope to get there.

Vanessa Proudman

Community funding Open Infrastructure, with an update from SCOSS

Open scholarly infrastructure provides the foundation for keeping costs down and quality high, and above all ensures community-driven development: remaining close to academic values. While information and its supporting infrastructure may want to be free and stay free and open, it costs money to create, curate, develop, and sustain. This session will examine why community funding is essential and what some of the pre-requisites are for funding open infrastructure effectively. It will also update you on SCOSS: Global Sustainability Coalition for Open Science Services. The user community and its institutions have a critical role to play in sustaining Open scholarly infrastructure.


David Shotton

Roundup discussion and conclusion


9 September 2020, 15:00 - 18:00 CET.

Due to the COVID-19 international situation, WOOC 2020 will be an online event. We are working on a 3-hour Webinar where our invited speakers will present their work, and participants can engage in Q&A. Moreover, participants are warmly invited to submit their own contributions!

Covid-19 Outbreak

Considering current situation about COVID-19, the organising committee thinks it is wise not to have the Workshop on Open Citations and Open Scholarly Metadata as planned. While we decided to postpone the Workshop to the next year (2021), we are currently working on a one-day remote event.