September 3-5, 2018 | bologna, italy

Workshop on Open Citations

the workshop


About 500 million open bibliographic citations are available on the web. We invite to this workshop researchers, scholarly publishers, funders, policy makers, and opening citations advocates, interested in the widespread adoption of practises for creation, reuse and improvement of open citation data.

day workshop
plenary sessions
day hackathon
invited speakers

call for participation


Applications are now closed! The program for platform presentations is complete, and places for the Hack Day all assigned. Should you want to attend the workshop, please contact the Workshop committee, and you'll be added to the waiting list.

Application deadline:May 20, 2018
Notification of acceptance: June 1, 2018 July 9, 2018 (for late applicants)
Registration deadline: June 30, 2018 July 13, 2018 (for late applicants)

Proposals should address one of the following topics:

Opening up citations

Initiatives, collaborations, methods and approaches for the creation of open access to bibliographic citations.

Policies and funding

Strategies, policies and mandates for promoting open access to citations, and transparency and reproducibility of research and research evaluation.

Publishers and learned societies

Approaches to, benefits of, and issues surrounding the deposit, distribution, and services for open bibliographic metadata and citations.


Metrics, visualizations and other projects. The uses and applications of open citations, and bibliometric analyses and metrics based upon them.

Invited speakers

Since 2015, Ginny has been developing a new team at Crossref encompassing outreach and education, user experience and support, and metadata strategy.

Author image
Ginny Hendricks Crossref

Dario is a social computing researcher and open knowledge advocate. He is the Director, Head of Research at the Wikimedia Foundation.

Author image
Dario Taraborelli Wikimedia Foundation / I4OC

Stephen is a professor of structural biology at Imperial College London. He is also chair of the DORA steering group.

Author image
Stephen Curry Imperial College London | DORA

Catriona has more than 19 years experience in scholarly publishing and 14 years in Open Access publishing. She is Director of Open Science at Hindawi.

Author image

Stephanie spent over 10 years in the academic publishing industry in the fields of biology and chemistry. She is CEO of ScienceOpen.

Author image
Stephanie Dawson ScienceOpen

Diego works at the European Research Council and is responsible for the data infrastructure, the information flow architecture and the policy analysis.

Author image
Diego Valerio Chialva European Research Council

Philipp is a WP leader in the EXCITE project and organizer of the workshop series on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries.

Author image
Philipp Mayr EXCITE | GESIS - Leibniz Institute for the Social Sciences

Jo McEntyre is Team Leader for Literature Services at the European Bioinformatics Institute (EMBL-EBI), where she is responsible for developing Europe PMC, the European database for full-text life science research articles.

Author image
Jo McEntyre Europe PMC | EMBL-EBI

Zeyd is a postdoctoral researcher and a team member of the project EXCITE. Currently, he is involved in the extraction and segmentation of reference strings from Social Science publications.

Author image
Zeyd Boukhers EXCITE | University of Koblenz-Landau

David is Co-Director of the OpenCitations Project, and founding member of the Initiative for Open Citations (I4OC) and of Force11. He has for the last decade pioneered the field of Semantic Publishing, employing semantic web technologies.

Author image
David Michael Shotton OpenCitations | University of Oxford

Silvio Peroni is an Assistant Professor at the University of Bologna. He is one of the main developers of SPAR Ontologies, he is Director of OpenCitations, and founding member of the Initiative for Open Citations (I4OC).

Author image
Silvio Peroni OpenCitations | University of Bologna


Day 1 | Monday, September 3

9:00-9:30 Registration and breakfast
9:30-9:40 Greetings by FICLIT representative (Francesca Tomasi, DHDK Director)
9:40-9:50 Workshop introduction by David Shotton
9:50-10:20 [Invited talk] Dario Taraborelli (Wikimedia Foundation / I4OC) Remixing the graph
Over the past years, several datasets representing free-to-read (or partially reusable) citation data have been made available to researchers and bibliometricians. However, what one can build on top of an open citation graph vastly exceeds the value of free-to-read citation data. In this talk, I introduce the Initiative for Open Citations (I4OC) and present a key example of reuse and remix of open citation data in Wikidata — the free knowledge base that anyone can edit. With its open human and algorithmic curation model, Wikidata allows embedding bibliographic and citation datasets in a much richer set of relations compared to most citation indexes. It supports novel use cases that go beyond bibliometric analysis and thanks to its open nature it makes gaps and quality issues in the underlying citation data easily auditable.
10:20-10:50 [Invited talk from the organising committee] Johanna McEntyre (Europe PMC) Open Citations and Europe PMC
Europe PMC is a database of the life sciences research literature, contains over 30M abstracts, including PubMed and 5M full text articles. The mission of Europe PMC is to make the content as widely available as possible, which we do via the website, APIs and bulk downloads. Built in partnership with PMC USA, with whom the full text articles are shared, Europe PMC adds value to the core content in a number of ways, one of which is to generate article-level citation counts. These are generated via the references lists from the full text article in Europe PMC, supplemented with reference lists from Crossref metadata services. With the formation of IO4C, we have seen a dramatic increase in the number of citations we have been able to add to the Europe PMC network, data that we redistribute via the website and APIs. In this talk I will describe how Europe PMC manages, distributes and reuses open citations.
10:50-11:20 Coffee break
11:20-11:50 [Invited talk from the organising committee] David Shotton (OpenCitations) Why and how should we share citations openly
In this presentation, I will outline the importance of citations in the world of scholarship, and how early on academics missed the boat for open citation information, leading to the growth of commercial citation indexes. After discussing problems with commercial citation indexes – cost, data availability, visualization tools, and variation in citation statistics – I will mention recent initiatives to reclaim citations for the world of open scholarship: The Initiative for Open Citations, Crossref as a reference repository, WikiCite, OpenCitations as a scholarly infrastructure organization, and the OpenCitations Corpus. I will introduce a formal definition of an open citation, and will outline the the SPAR (Semantic Publishing and Referencing) ontologies and OpenCitations Data Model that we at OpenCitations have developed to facilitate the creation of machine-readable open citation data. I will then describe the requirements for citations to be treated as first-class data entities, and the new global persistent identifier for citations that we have developed, the Open Citation Identifier (OCI), and will conclude by showing how OCIs can be used to create simple citation indexes of open citation sources such as the Crossref Open Citations Index.
11:50-12:20 [Invited talk from the organising committee] Philipp Mayr (EXCITE) Recent advances in the project EXCITE - Extraction of Citations from PDF Documents
The "Extraction of Citations from PDF Documents" - EXCITE project at GESIS - Leibniz Institute for the Social Sciences and University of Koblenz-Landau is in line with the Open Citations Initiative and aims to make more citation data available to researchers with a particular focus on the social sciences. The shortage of citation data for the international and German social sciences is well known to researchers in the bibliometrics field and has itself often been subject to academic studies. In order to open up citation data in the social sciences, the EXCITE project develops a set of algorithms for the extraction of citation and reference information from PDF documents and the matching of reference strings against bibliographic databases. The presentation focuses on the overall reference extraction, matching and publication workflows and tool chain. A demo of the web demonstrator completes the talk.
12:20-12:50 [Invited talk] Ginny Hendricks (Crossref) Crossref: underpinning citation opportunities
12:50-14:00 Lunch
14:00-14:10 Announcements
14:10-14:25 [Selected talk] Martin Fenner (DataCite) Open Citations and Data Citations
14:25-14:40 [Selected talk] Luc Boruta (Thunken) Cobaltmetrics: preventing citation decay and obfuscation
14:40-14:55 [Selected talk] Ludo Waltman (Leiden University) Comparing bibliographic data sources
14:55-15:10 [Selected talk] Jodi Schneider (University of Illinois at Urbana-Champaign) Detecting problematic citation patterns with the Open Citations Corpus
15:10-15:25 [Selected talk] Anne Lauscher (Universität Mannheim), Kai Eckert (Hdm Stuttgart), Lukas Galke (ZBW Kiel) Libraries as Curators of Open Citations: perspectives of the project LOC-DB in Germany
15:25-15:40 [Selected talk] Matteo Romanello (École polytechnique fédérale de Lausanne), Giovanni Colavizza (The Alan Turing Institute) The Scholar Index: a Distributed, Collaborative Citation Index for the Arts and Humanities
15:40-16:10 [Invited talk] Stephanie Dawson (ScienceOpen) Open Citations in Action: Case Study ScienceOpen
Viewed collectively, an article’s references contain a wealth of information. Networks of citations can trace the genealogy of ideas, and through reference lists one can see methods, theories, and ideologies introduced and die away. Until recently, the references of an article were under strict copyright control and tracking citations was left up to big business, but the open science movement has started to change that. New initiatives such at the I4OC Initiative for Open Citations are spearheading a growing consensus that citations are metadata and therefore should be freely accessible via Crossref, regardless of article license type. The discovery platform ScienceOpen is a case study in the kind of search environment that can be built on top of an open citations commons. ScienceOpen recently added over 100 million citation connections between articles, and uses its powerful search engine to expose this knowledge to researchers for a richer discovery experience. This enables us to provide users with tools to track their own article citations through time, trace citation networks, discover similar articles, discover highly-cited articles in a research field, sort searches by citations, and ultimately create a vast, contextual citation-based network for an intelligent search and discovery experience. ScienceOpen provides a good example of your open citations – and open science in action.
16:10-18:00 Coffee break and Poster session
19:30-22:00 Social dinner

Day 2 | Tuesday, September 4

9:30-10:00[Invited talk] Catriona MacCallum (Hindawi) Open Citations as Academic and Cultural Capital
10:00-10:30[Invited talk from the organising committee] Zeyd Boukhers (EXCITE) A Generic Approach for Reference Extraction from PDF Documents
Extracting and parsing cited references from publications in PDF format is important to ensure the acknowledgement of the sources of information. However, the mention of these sources differs from a community to another and from a publication to another. This citation diversity lies mainly in the indexation style (e.g., one or several reference sections), the existence of components (e.g. editor, source, URL, etc.) and the type of references (e.g. grey literature, academic literature, etc.). In order to automatically and accurately extract and parse difference kinds of references, EXCITE proposes a generic approach that combines Random Forest and Conditional Random Fields (CRF) in a coherent mechanism. Random Forest is employed for the initial classification of each line in the document, whereas CRF parses the potential reference lines into essential components (e.g., author, title, etc.). Here, different line combinations are iteratively assessed in order to obtain the proper combination with the help of a probabilistic approach.
10:30-11:00[Invited talk] Stephen Curry (DORA) The Declaration on Research Assessment (DORA): Opening up the measures of success
In the 21st Century do we still know what it means to be a successful researcher? And where to find the border between academic freedom of inquiry and responsibility to funders – often ultimately taxpayers – who look to researcher to help tackle major societal challenges? Though many are attracted to research by the thrill of the intellectual challenge and the chance to make the world a better place, the increasing performance management of scientific and scholarly enquiry is straining the health of the system. Few would deny our responsibility not only to deliver discoveries, innovations and insights that are exciting and relevant, but also have to be mindful of the need to do so reliably; and to disseminate our findings (and our methods, data, and reagents) as rapidly and as widely as current technology permits. For now we are struggling to meet these demands, in part because the misuse of metrics such as impact factors, h-indices, and university rank in research assessment reinforces definitions of value that are at the same time too vague and too narrow. Initiatives such as DORA, especially when allied to the developing goals of open science, can help us restore the well-being of research assessment, of research – and of researchers.
11:00-11:30Coffee break
11:30-12:00[Invited talk] Diego Valerio Chialva (European Research Council) From Open Citation Data to Linked Open Data: a prototype at the ERC
After briefly examining the advantages of open data and in particular open citation data for monitoring and evaluation of research and research funding, in this talk I discuss how the new relevant issue within an open data framework is linking data, in particular open citations to other open data. I will then present the work I and Mr Alexis-Michel Mugabushaka have being doing in the Unit A1 at ERCEA, in collaboration and open discussion with other external interested and research groups, in prototyping an open research graph, modelling specific relevant new data ontologies, constructing the graph itself, and in using it for analysis.
12:00-12:15 [Selected talk] Sergey Parinov (RANEPA) Open citation content data
12:15-12:30 [Selected talk] Daniel Ecer (eLife Sciences) Citation Sentiment
12:30-12:45 [Selected talk] Colin Batchelor (Royal Society of Chemistry) The Cambridge Metrics Group and article-level-metrics
14:00-14:15 [Selected talk] Nees Jan van Eck (Centre for Science and Technology Studies, Leiden University) Visualizing science based on open data sources
14:15-14:30 [Selected talk] Nataliia Kaliuzhna (Independent researcher) ScientoMiner ICR - the Gephi plugin for importing scholarly citations data from Crossref services
14:30-14:45 [Selected talk] Finn Årup Nielsen (Technical University of Denmark) Scholia as of September 2018
14:45-15:00 [Selected talk] Daniel Mietchen (Data Science Institute, University of Virginia) A guided tour through citation networks around public health emergencies
15:00-15:30[Invited talk from the organising committee] Silvio Peroni (OpenCitations) The OpenCitations Corpus, its data and its interfaces: present status and future plans
15:00-16:00Coffee break
16:00-16:50Round table
16:50-17:05Workshop closing by David Shotton
17:10-19:10Sightseeing tour

Day 3 (Hack day) | Wednesday, September 5

9:30-09:45Welcome and introductions
9:45-10:30Presenting API and data
10:30-11:00Pitching projects and ideas
11:00-11:20Coffee break
11:20-11:30Formation of working groups
11:30-13:00Hack session 1
14:00-16:00Hack session 2
16:00-16:20Coffee break
16:20-17:20Show and Tell
17:20-17:30Closing 3rd day

our organisers

OpenCitations runs the OpenCitations Corpus (OCC), a RDF repository of open scholarly citation data harvested from the scholarly literature.

Author image
OpenCitations Service Organisation

The EXCITE Project is developing a tool chain of software components for reference extraction from PDF documents, to be applied to existing scientific bibliographic databases.

Author image
EXCITE Project

Europe PMC is a database for life-science literature and a platform for text-based innovation.

Author image
Europe PMC Infrastructure



Alessandra Auddino / Colin Batchelor / Luc Boruta / Zeyd Boukhers /Bilal Hayat Butt / Diego Chialva / Giovanni Colavizza / Stephen Curry / Marilena Daquino / Stephanie Dawson / Angelo Di Iorio


Daniel Ecer / Nees Jan van Eck / Martin Fenner / Francesca Giovannetti / Piero Grandesso / Deborah Grbac / Vittorio Grieco / Bianca Gualandi / Ivan Heibi / Ginny Hendricks / Zuang Huang / Nataliia Kaliuzhna / Rachael Kotarski / Bianca Kramer


Anne Lauscher / Steffen Lemke / Mairelys Lemus-Rojas / Maria Levchenko / Freddy Linpens / Catriona MacCallum / Andrea Mannocci / Philipp Mayr / Jo McEntyre /Daniel Mietchen / Ross Mounce / Finn Årup Nielsen / Astrid Orth


Sergey Parinov / Silvio Peroni / Sara Ricetto / Matteo Romanello / Agata Rotondi / John Samuel / Jodi Schneider / Gautam Kishore Shahi / David Shotton / Gianmarco Spinaci / Steffen Staab / Chiara Storti


Dario Taraborelli / Francesca Tomasi / Angelika Tsivinskaya / Sahar Vahdati / Barney Walker / Ludo Waltman / Laurel Zuckerman

travel information

Welcome to Bologna!

The University of Bologna is the oldest university in the western world, and one of the largest universities in Italy (with about 90,000 enrolled students).

Get to Bologna

The airport “Guglielmo Marconi” is located at 15-20 minutes by car from the city centre. A direct bus, Airport Bus BLQ, leaves regularly from the Airport for Bologna Central Rail Station. The trip costs 6€. A taxi costs about 15-20€ (call a taxi at 0039 051 372727).

The venue

The workshop will take place at the University of Bologna in the heart of the city - the School of Arts, Humanities and Cultural Heritage (italian translation: Scuola di Lettere e Beni Culturali), via Zamboni 34, Room "Aula Affreschi". From the train station you can either walk to FICLIT (20') or take bus C (direction "Cestello", see the time table, page 2).


The organizers are negotiating with local hotels for rooms to be reserved and made available at special rates for participants. More information available here.

About the location

Bologna is home to numerous prestigious cultural, economic and political institutions as well as one of the most impressive trade fair districts in Europe. In 2000 it was declared European capital of culture, and in 2006, a UNESCO "city of music".

For any enquiry

contact us

Contact Info

Where to Find Us

via Zamboni, 32 (first floor) | Bologna, BO | 40126 Italy

Any Doubt?

open an issue on github!

Email Us