September 3-5, 2018 | bologna, italy

Workshop on Open Citations

the workshop


About 500 million open bibliographic citations are available on the web. We invite to this workshop researchers, scholarly publishers, funders, policy makers, and opening citations advocates, interested in the widespread adoption of practises for creation, reuse and improvement of open citation data.

day workshop
day hackathon
invited speakers

call for participation


Applications are now closed! The program for platform presentations is complete, and places for the Hack Day all assigned. Should you want to attend the workshop, please contact the Workshop committee, and you'll be added to the waiting list.

Application deadline:May 20, 2018
Notification of acceptance: June 1, 2018 July 9, 2018 (for late applicants)
Registration deadline: June 30, 2018 July 13, 2018 (for late applicants)

Proposals should address one of the following topics:

Opening up citations

Initiatives, collaborations, methods and approaches for the creation of open access to bibliographic citations.

Policies and funding

Strategies, policies and mandates for promoting open access to citations, and transparency and reproducibility of research and research evaluation.

Publishers and learned societies

Approaches to, benefits of, and issues surrounding the deposit, distribution, and services for open bibliographic metadata and citations.


Metrics, visualizations and other projects. The uses and applications of open citations, and bibliometric analyses and metrics based upon them.


Day 1 | Monday, September 3

9:00-9:30 Registration and breakfast
9:30-9:40 Greetings by FICLIT representative (Francesca Tomasi, DHDK Director) [video]
9:40-9:50 Workshop introduction by David Shotton [slides] [video]
9:50-10:20 [Invited talk] Dario Taraborelli (Wikimedia Foundation / I4OC) Remixing the graph [slides] [video]
Over the past years, several datasets representing free-to-read (or partially reusable) citation data have been made available to researchers and bibliometricians. However, what one can build on top of an open citation graph vastly exceeds the value of free-to-read citation data. In this talk, I introduce the Initiative for Open Citations (I4OC) and present a key example of reuse and remix of open citation data in Wikidata — the free knowledge base that anyone can edit. With its open human and algorithmic curation model, Wikidata allows embedding bibliographic and citation datasets in a much richer set of relations compared to most citation indexes. It supports novel use cases that go beyond bibliometric analysis and thanks to its open nature it makes gaps and quality issues in the underlying citation data easily auditable.
10:20-10:50 [Invited talk from the organising committee] Johanna McEntyre (Europe PMC) Open Citations and Europe PMC [slides] [video]
Europe PMC is a database of the life sciences research literature, contains over 30M abstracts, including PubMed and 5M full text articles. The mission of Europe PMC is to make the content as widely available as possible, which we do via the website, APIs and bulk downloads. Built in partnership with PMC USA, with whom the full text articles are shared, Europe PMC adds value to the core content in a number of ways, one of which is to generate article-level citation counts. These are generated via the references lists from the full text article in Europe PMC, supplemented with reference lists from Crossref metadata services. With the formation of IO4C, we have seen a dramatic increase in the number of citations we have been able to add to the Europe PMC network, data that we redistribute via the website and APIs. In this talk I will describe how Europe PMC manages, distributes and reuses open citations.
10:50-11:20 Coffee break
11:20-11:50 [Invited talk from the organising committee] David Shotton (OpenCitations) Why and how should we share citations openly [slides] [video]
In this presentation, I will outline the importance of citations in the world of scholarship, and how early on academics missed the boat for open citation information, leading to the growth of commercial citation indexes. After discussing problems with commercial citation indexes – cost, data availability, visualization tools, and variation in citation statistics – I will mention recent initiatives to reclaim citations for the world of open scholarship: The Initiative for Open Citations, Crossref as a reference repository, WikiCite, OpenCitations as a scholarly infrastructure organization, and the OpenCitations Corpus. I will introduce a formal definition of an open citation, and will outline the the SPAR (Semantic Publishing and Referencing) ontologies and OpenCitations Data Model that we at OpenCitations have developed to facilitate the creation of machine-readable open citation data. I will then describe the requirements for citations to be treated as first-class data entities, and the new global persistent identifier for citations that we have developed, the Open Citation Identifier (OCI), and will conclude by showing how OCIs can be used to create simple citation indexes of open citation sources such as the Crossref Open Citations Index.
11:50-12:20 [Invited talk from the organising committee] Philipp Mayr (EXCITE) Recent advances in the project EXCITE - Extraction of Citations from PDF Documents [slides] [video]
The "Extraction of Citations from PDF Documents" - EXCITE project at GESIS - Leibniz Institute for the Social Sciences and University of Koblenz-Landau is in line with the Open Citations Initiative and aims to make more citation data available to researchers with a particular focus on the social sciences. The shortage of citation data for the international and German social sciences is well known to researchers in the bibliometrics field and has itself often been subject to academic studies. In order to open up citation data in the social sciences, the EXCITE project develops a set of algorithms for the extraction of citation and reference information from PDF documents and the matching of reference strings against bibliographic databases. The presentation focuses on the overall reference extraction, matching and publication workflows and tool chain. A demo of the web demonstrator completes the talk.
12:20-12:50 [Invited talk] Ginny Hendricks (Crossref) Crossref: underpinning citation opportunities [slides] [video]
no spoilers
12:50-14:00 Lunch
14:00-14:10 Announcements
14:10-14:25 [Selected talk] Martin Fenner (DataCite) Open Citations and Data Citations [slides] [video]
While the focus of open citations is on opening up the reference lists of publishers working with Crossref, open citations is also highly relevant for citations of scholarly content that uses DataCite DOIs. The number of data citations in the metadata that publishers send to Crossref is very small. This presentation will discuss the current issues and how this situation can be improved going forward.
14:25-14:40 [Selected talk] Luc Boruta (Thunken) Cobaltmetrics: preventing citation decay and obfuscation [slides] [video]
Scholarly communication becomes more and more electronic, and scholarly literature becomes more than just papers. In this context, hyperlinks are the preferred way to reference published or unpublished sources, including datasets and software. While hyperlinks are more powerful than text-based citations in terms of user experience, they are prone to link rot. Shortened URLs add an extra level of indirection that increases the risk of breaking the permanent record of science. Cobaltmetrics monitors trusted sources to index citations, identifiers, and hyperlinks. We go deeper than other altmetrics providers. We mine data in 180+ languages, we unroll shortened URLs from 175+ shorteners, we crack open URLs to extract persistent identifiers, and we convert between 50+ types of identifiers. We will provide empirical data from an analysis of 73.5 millions documents, 52.5 million citations and backlinks, and a database of 6.5 billion shortened URLs, focusing on the effects of reference rot on altmetrics indicators. We will then present good practices that all stakeholders should adopt and, most importantly, technical solutions that altmetrics providers should implement to compensate the effects of link rot and URL variability.
14:40-14:55 [Selected talk] Ludo Waltman (Leiden University) Comparing bibliographic data sources [slides] [video]
In this talk, I will present a comparative perspective on a number of major bibliographic data sources, both ‘open’ and ‘closed’ sources. The comparison includes Web of Science, Scopus, Dimensions, Crossref, and OpenCitations Corpus. I will put special emphasis on the accuracy of the data provided by the different sources. Performing large-scale comparisons of bibliographic data sources is challenging, and therefore I will also discuss the main gaps that we have in our understanding of the strengths and weaknesses of different data sources and ways in which these gaps can be filled.
14:55-15:10 [Selected talk] Jodi Schneider (University of Illinois at Urbana-Champaign) Detecting problematic citation patterns with the Open Citations Corpus [slides] [video]
The structure of citation networks provides evidence about how scientific information is diffused. Problematic citation patterns include the selective citation of positive findings, citation bias, as well as the continued citation of retracted literature (i.e. literature formally withdrawn due to error, fraud, or ethical problems). For instance, there is some evidence that positive results tend to receive more citations.The public domain licensing of the Open Citations Corpus makes it possible, in principle, to estimate the likelihood that any network of research papers suffers from problematic citation. To-date, problematic citation been documented ad-hoc, in several striking studies. In Alzheimer's disease research, biased citation, ignoring critical findings, was used to support successful U.S. NIH grant proposals (Greenberg 2009). Mistranslation of obesity research has been used to justify exertion game research (Marshall & Linehan 2017). Citation of fraudulent research about Chronic Obstructive Pulmonary Disease continued after its retraction (Fulton et al. 2015). The data resulting from such studies is of great use to my lab in replicating and determining how to generalize the detection of problematic citation patterns. Previously, the detection of problematic citation patterns has been a side effect of astute researchers, noticing suspicious findings while conducting systematic literature reviews. This talk will describe work-in-progress in my lab detecting problematic citation patterns using natural language processing, combined with network analysis on the Open Citations Corpus.
15:10-15:25 [Selected talk] Anne Lauscher (Universität Mannheim), Kai Eckert (Hdm Stuttgart), Lukas Galke (ZBW Kiel) Libraries as Curators of Open Citations: perspectives of the project LOC-DB in Germany [slides] [video]
Creation of citation data is not part of the cataloging workflow in libraries nowadays – but traditional tasks are changing, and libraries have a decided interest in the availability of open citation data that can be used for retrieval and bibliometrics. We have started the Linked Open Citation Database (LOC-DB) project to enable libraries to contribute to the creation and curation of open citations. We provide an interface for librarians to incrementally catalog citation metadata with low effort. The librarians are supported by a wide range of helpful tools such as a reference extractor for print as well as digital documents along with a suggestion engine, which harvests external data sources for citation candidates. The database itself is prepared for distributed deployment, such that multiple libraries can collaborate. At any time, the data can be supplied as linked data in the OpenCitations data format. The goal is to show that efficiently cataloging citations of print and electronic resources in libraries is possible. In the presentation, we’ll describe the current state of the workflow and its implementation. We show that automatic reference extraction from scanned print resources could be significantly improved with the implementation of an automatic reference extraction component based on deep learning-based approach. We further give insights on the curation and linking process and provide first evaluation results.
15:25-15:40 [Selected talk] Matteo Romanello (École polytechnique fédérale de Lausanne), Giovanni Colavizza (The Alan Turing Institute) The Scholar Index: a Distributed, Collaborative Citation Index for the Arts and Humanities [slides] [video]
Freeing citation data is an issue of essential importance and urgency for scholarly communication, which has been recently getting the attention it deserves thanks to the Initiative for Open Citations. Simultaneously, we should be concerned as well with another complementary issue, that is the necessity to "recover" citation data from digitized publications. This applies especially to the Humanities, where fields often have century-long traditions. Otherwise, we risk to create citation indexes that take into account only recent publications, mostly in English, while a gap ensues for citations buried in older publications. The creation of a comprehensive citation index for the Arts and Humanities, however, is a titanic endeavour which can only be accomplished with a collaborative, distributed approach, where cultural heritage institutions (e.g. libraries, archives, etc.) play a key role. In this talk we present The Venice Scholar, a citation index of literature on the history of Venice, indexing nearly 3000 volumes of scholarship from the mid 19th century to 2013, from which some 4 million bibliographic references have been extracted. The Venice Scholar, to be publicly launched in September 2018, is the first running instance of the Scholar Index, a platform aimed at creating a comprehensive citation index for the Arts and Humanities. This platform consists of two applications, the Scholar Library (SL) and the Scholar Index (SI), both to be released soon under an open source license.The SL is a digital library system where partner institutions can load their digitized scholarly literature. The system embeds the necessary machine learning components to recognize the text from an image (OCR), extract references and link them to unique identifiers, pointing to external resources (e.g. library catalogues). Each partner institution keeps an instance of the digital library system and its own collection.The SI is the global citation index, which federates all citations extracted from different institutions into a unique index, and provides a rich search interface to navigate through the resulting network of citations, with the final aim of interlinking digital archives and digital libraries. In fact, the SI is currently being extended, thanks to an Europeana Reserach Grant, to provide contextual recommendations of related digital objects from Europeana to its users. The citation data underlying the Venice Scholar are modelled using the OpenCitations Data Model, and will use the OpenCitations Corpus as its publication platform, thus enriching this corpus with some 4 million references "recovered" from historical and current publications about the history of Venice. To conclude, we believe that the creation of a citation index for Arts and Humanities can only be accomplished through a collaborative and federated approach, and by leveraging infrastructure synergies, such as the one with the Open Citation Corpus. In this process, libraries and other institutions should take responsibility for specific areas of knowledge (e.g. a journal, a publisher, or a topic) and, at the same time, be facilitated (e.g. through software) in the task of enriching their digitized collections with citation data.
15:40-16:10 [Invited talk] Stephanie Dawson (ScienceOpen) Open Citations in Action: Case Study ScienceOpen [slides] [video]
Viewed collectively, an article’s references contain a wealth of information. Networks of citations can trace the genealogy of ideas, and through reference lists one can see methods, theories, and ideologies introduced and die away. Until recently, the references of an article were under strict copyright control and tracking citations was left up to big business, but the open science movement has started to change that. New initiatives such at the I4OC Initiative for Open Citations are spearheading a growing consensus that citations are metadata and therefore should be freely accessible via Crossref, regardless of article license type. The discovery platform ScienceOpen is a case study in the kind of search environment that can be built on top of an open citations commons. ScienceOpen recently added over 100 million citation connections between articles, and uses its powerful search engine to expose this knowledge to researchers for a richer discovery experience. This enables us to provide users with tools to track their own article citations through time, trace citation networks, discover similar articles, discover highly-cited articles in a research field, sort searches by citations, and ultimately create a vast, contextual citation-based network for an intelligent search and discovery experience. ScienceOpen provides a good example of your open citations – and open science in action.
16:10-18:00 Coffee break and Poster session
[Poster] Angelo Di Iorio (University of Bologna) Semantic Coloring of Academic References [pdf]
The talk introduces the SCAR project, a collaboration between the University of Bologna and Elsevier, whose goal is to enrich citations with explicit information about their role, features and impact. The name of the project - Semantic Coloring of Academic References - reflects the fact that bibliographies should not be considered as plain lists of entries, but as lists of qualified references that need to be identified and shown appropriately, e.g. by means of different colors. Multiple coloring schemes can be applied to the same bibliography according to different criteria, such as publication year, authorship, citation contexts, justification for citing and so on. The goal of SCAR is to build a prototype that extracts such information from the full text of the articles and enriches bibliographies with such metadata. The project was not originally linked to Open Citation but its results and issues could be relevant for OC, just as OC tools and data could be beneficial for SCAR.
[Poster] Bianca Kramer (Utrecht University Library) DOI wanted - community involvement in open citations [pdf]
Open citations are an important cogwheel in the engine of open scholarly infrastructure. At the same time, their development shows both the power and limitations of established parties, restricting their potential use for the wider scholarly community. For example, the quality of open citations as metadata provided by publishers is often still suboptimal. It can be argued that stricter requirements as to the format and quality of such metadata would discourage publishers from supplying them in the first place (thus slowing the growth of the number of open citations made available). In combinaton with limiting the supply of open citation metadata to publishers, though, a catch-22 situation is created that limits the quality of open citations, and thus, their usability. Simultaneously, commercial parties have ingested these incomplete metadata and improved on them, only to subsequently monetize their use in applications without contributing back to the underlying corpus of open data. While there is rightly no limitation on the use and reuse of open citations, I would like to explore models that would better allow the scholarly community to contribute to the quality and value of open citations metadata on one hand, and encourage their sustainable use on the other hand. The former could be envisioned through forms of crowdsourced improvement of open citations metadata (in which Wikidata could play an important role). An example of the latter would be the monetization of services built on enriched open citation data, without enclosing the data itself. Both models would enable the scholarly community as a whole to not only make optimal use of open citations, but also contribute to their value. By making the wheels turn smoother, we'll collectively get further!
[Poster] Astrid Orth (SUB Göttingen) How do citation-based and alternative metrics benefit each other? [pdf]
I would like to present the approach of the *metrics project to measure the reliability and perceptions of indicators for interactions with scientific products: 1. analysis of how researchers (focussing on social sciences and economics) interact on social media platforms and how their motivations form patterns that need to be taken into account when measuring output on social media channels. 2. provision of a social media registry with functions and accessibility of social media platforms relevant for scholarly communication, and 3. prototype of a crawler that gives insights in the reliability of current social media metrics. The combination of these 3 elements allow a larger degree of transparency when studying scholarly communication. Some shortfalls of alternative metrics are becoming obvious, but can comparably seen in citation metrics. It should thus be most interesting to discuss during this workshop how open citation-based and alternative metrics can benefit each other in providing a complete and true picture of scholarly communication.
[Poster] John Samuel (CPE Lyon) WikiProvenance: Are there enough references to every (known) fact on Wikidata? [pdf]
In a collaborative website like Wikidata, contributors add statements about an item like a person, a scientific subject, a historical place etc. Each of these statements need to be backed by a reference. Yet a large number of statements are added without references. Even well-described items (i.e., the items with a lot of statements) like Douglas Adams, Albert Einstein do not have references on all these facts. During the past few years, there has been a lot of focus on increasing the number of statements of items usually referred to as the item completion efforts. Yet, the references to statements cannot be undermined. That being said, it is equally important to understand the use of external identifiers and links to existing multilingual Wikimedia projects for these items. Since many of these sources also contain a large number of references. Both contributors and users need to get an overview on the reference statistics. With WikiProvenance[1], the goal is to understand and delve into details of the usage of references, the links to external sources in an open and transparent manner. [1]
[Poster] Gautam Kishore Shahi (University of Trento) Semantics Aware Policy Making for Open Citations
Policy-making is one of the significant fields of research in which an organization tries to improve itself or its existing systems by reformations. Citation plays a vital role in the area of research. It has also been studied that due to lack of proper analysis of available resources, the dispensation of funds is uneven. In this paper, a semantic web-based system is proposed which focuses on extracting right knowledge about the Citation. The information can then be used for policy-making by the respective Organization. An ontology model has been proposed for this task. The central area of focus has been to extract only those information which is useful in citation policy-making. Therefore, knowledge of ontology and citation has been combined to produce a semantic web-based citation policy-making.
[Poster] Angelika Tsivinskaya (Center for Institutional Analysis of Science & Education, European University at Saint Petersburg) Bibliographic metrics as performance evaluation measure for Higher Education Institutions in Russia [pdf]

Since 2006, Russian government made energetic attempts to boost academic performance of the country’s universities, to shut down underperforming institutions and to turn the survivors into world-class schools. Since 2012, Russian policies in the higher education sphere were largely directed by the results of the Survey of efficiency of higher education organizations (Monitoring of Efficiency of Educational Organizations) – a set of quantitative performance indicators used to sort efficient from non-efficient organizations. There are also several governmental programs for funding best universities, one of them is “5–top 100” started in 2013. Most of those initiatives include assessment of publication activity. Searchers showed that this funding programs had positive effects on publication activity (Turko, 2016) and there are also studies about collaborations for most citied papers (Pislyakov, 2014). Those studies mostly focused on successful universities and its specific characteristics. Many universities are closely connected to small number of fields and currently used bibliographic metrics do not consider this difference between publication dynamics across fields (Piro, 2014). The main purpose of our study to show how representation of universities in different citation systems depends on such factors as: public or private; localized in a bigger city or a wealthy region; nominal profile (based partly on ties to specific fields); age; ecological situation at a local market for higher education. For purposes of our study we used data collected from Monitoring of Efficiency 2013-2017 which include (per 100 academic employees): Number of citations in Web of Science, Number of citations in Scopus, Number of citations in Russian Scientific Citation Index, Number of publications in Web of Science, Number of publications in Scopus, Number of publications in Russian Scientific Citation Index. Our study shows that facing publication pressures universities have a stable positive trend in number of publications but our data shows that extreme variability of bibliographic metrics exist between different university families. Thus, so-called “classical”, polytechnic and medical universities have higher median number of publication in Web of Science that others, while universities majoring in social and economic sciences, especially ones attached to various ministries have the highest median number of publication in Russian Scientific Citation Index. The results show that the ascriptive variables account for a large share of variance, with families being particularly important.

References: Piro F. N., Aksnes D. W., Rørstad K. (2013) A Macro Analysis of Productivity Differences across Fields: Challenges in the Measurement of Scientific Publishing // Journal of the American Society for Information Science and Technology, vol. 64, no 2, pp. 307–320. Pislyakov V., Shukshina E. (2014) Measuring Excellence in Russia: Highly Cited Papers, Leading Institutions, Patterns of National and International Collaboration // Journal of the Association for Information Science and Technology, vol. 65, no 11, pp. 2321–2330. Turko T., Bakhturin G., Bagan V., Poloskov S., Gudym D. (2016) Influence of the Program «5–top 100» on the Publication Activity of Russian Universities // Scientometrics. Vol. 109. No 2. pp. 769–782

[Poster] Barney Walker (Imperial College London) Citation Gecko: A Tool for Literature Discovery and Exploration using Localised Citation Networks [pdf]
Citation Gecko is a new, open-source web app that gives researchers a birds-eye view of the relevant literature. Using openly-available citation data, it constructs and visualises the local citation network in the researcher’s area, helping them discover literature they may have missed and make sense of how papers are connected.Why is this so useful? Traditionally, the literature review process has involved iteratively searching keyword combinations and manually following references from one paper at a time. With this method it’s easy for important papers to slip through the net and is difficult to prioritise what to read first.How does it work? Gecko circumvents the need for researchers to define their area of interest in keywords by simply starting with a set of ‘seed papers’ which are representative of their area of interest. Gecko then finds all the papers that a) cite, b) are cited-by, or c) are co-cited with the seed papers in order to build a ‘localised’ citation network. Visualising this network gives researchers an overview of how the different papers fit together. Local, rather than global, metrics can then be defined on the network providing suggestions that are more likely to relevant to the researcher. New seed papers can then be added on-the-go from among the results, expanding the network and building up a more complete map of the literature. Citation Gecko is designed to fit with a researcher’s workflow, integrating with reference managers such as Zotero directly and allowing for upload of bibtex files, as well as providing native search functions. By demonstrating how having open citations can lead to greater discoverability of research articles we hope to incentivise more publishers to open up their citation data.
19:30-22:00 Social dinner

Day 2 | Tuesday, September 4

9:30-10:00[Invited talk] Catriona MacCallum (Hindawi) Open Citations as Academic and Cultural Capital [slides] [video]
10:00-10:30[Invited talk from the organising committee] Zeyd Boukhers (EXCITE) A Generic Approach for Reference Extraction from PDF Documents [slides] [video]
Extracting and parsing cited references from publications in PDF format is important to ensure the acknowledgement of the sources of information. However, the mention of these sources differs from a community to another and from a publication to another. This citation diversity lies mainly in the indexation style (e.g., one or several reference sections), the existence of components (e.g. editor, source, URL, etc.) and the type of references (e.g. grey literature, academic literature, etc.). In order to automatically and accurately extract and parse difference kinds of references, EXCITE proposes a generic approach that combines Random Forest and Conditional Random Fields (CRF) in a coherent mechanism. Random Forest is employed for the initial classification of each line in the document, whereas CRF parses the potential reference lines into essential components (e.g., author, title, etc.). Here, different line combinations are iteratively assessed in order to obtain the proper combination with the help of a probabilistic approach.
10:30-11:00[Invited talk] Stephen Curry (DORA) The Declaration on Research Assessment (DORA): Opening up the measures of success [slides] [video]
In the 21st Century do we still know what it means to be a successful researcher? And where to find the border between academic freedom of inquiry and responsibility to funders – often ultimately taxpayers – who look to researcher to help tackle major societal challenges? Though many are attracted to research by the thrill of the intellectual challenge and the chance to make the world a better place, the increasing performance management of scientific and scholarly enquiry is straining the health of the system. Few would deny our responsibility not only to deliver discoveries, innovations and insights that are exciting and relevant, but also have to be mindful of the need to do so reliably; and to disseminate our findings (and our methods, data, and reagents) as rapidly and as widely as current technology permits. For now we are struggling to meet these demands, in part because the misuse of metrics such as impact factors, h-indices, and university rank in research assessment reinforces definitions of value that are at the same time too vague and too narrow. Initiatives such as DORA, especially when allied to the developing goals of open science, can help us restore the well-being of research assessment, of research – and of researchers.
11:00-11:30Coffee break
11:30-12:00[Invited talk] Diego Valerio Chialva (European Research Council) From Open Citation Data to Linked Open Data: a prototype at the ERC [slides] [video]
After briefly examining the advantages of open data and in particular open citation data for monitoring and evaluation of research and research funding, in this talk I discuss how the new relevant issue within an open data framework is linking data, in particular open citations to other open data. I will then present the work I and Mr Alexis-Michel Mugabushaka have being doing in the Unit A1 at ERCEA, in collaboration and open discussion with other external interested and research groups, in prototyping an open research graph, modelling specific relevant new data ontologies, constructing the graph itself, and in using it for analysis.
12:00-12:15 [Selected talk] Sergey Parinov (RANEPA) Open citation content data [slides] [video]
The project CyrCitEc ( creates a source of open citation content data. It is funded by the Russian Presidential Academy of National Economy and Public Ad-ministration (RANEPA, The project has two main aims: 1) to create a public service for processing available research papers full text (particularly, in PDF and with main focus on Social Sciences), in order to build and regularly update an open dataset of citation relationships and citations content; 2) to use the citation content data for developing methods of qualitative citation analysis, which can be used for improving of current practice of a research performance assessment. The project tends to provide a pilot version of open scholarly infrastructure based on following pillars:1.Open distributed architecture. It means providing a concept, open source software and an initial core infrastructure for interoperable systems, which are processing citation relationships and its content from research papers’ full text. 2. Two initial nodes of this core infrastructure, presented by interacting CitEc ( and CyrCitEc systems. Currently these nodes are ex-changing by citations data. The nodes have a specialization on processing papers in specific languages: Romano-Germanic languages by CitEc and Russian by CyrCitEc. Other nodes, e.g. specialized on processing citation data in languages, like Chinese, Japanese, Arabic, etc., can be added by the same way. There is also an intention to integrate data about references into the OpenCitations Corpus ( Transparency. It allows publishers, authors and readers of papers to see for each paper how their citation data are extracted by the system and to trace why some papers' references / in-text citations are not processed or not counted.4. Better representation and usability of citation data by its deeper integration with a digital library tools and services. 5. Enrichment facilities. The system provides tools for authors of papers to enter additional data to correct errors of processing citations from their papers and to enrich their citation relationships, e.g. by qualitative characteristics of their motivation for citing papers of other authors, etc.6. Public control. Readers of papers can see how authors used enrichment facilities to increase their number of citations. Public will be able to react on wrong authors behaviour. CyrCitEc takes papers’ metadata from the Socionet digital library (, which also includes a full set of metadata from RePEc (
12:15-12:30 [Selected talk] Daniel Ecer (eLife Sciences) Citation Sentiment [slides] [video]
Not all citations are equal. Some citations are positive, while others may actually criticise the referenced work. In this talk we are presenting results of a project to analyse the sentiment of citations, the challenges in getting training data and why out-of-the box models trained on Twitter may not provide the same results for the subtle language used in scientific manuscripts.
12:30-12:45 [Selected talk] Colin Batchelor (Royal Society of Chemistry) The Cambridge Metrics Group and article-level-metrics [slides] [video]
In this talk we present a consortium-led case study on article-level metrics with a focus on their scientific usefulness and statistical validity. The consortium consists of a number of non-profit organisations involved with academic publishing, originally drawn from the Cambridge (UK) area. The consortium meets regularly to discuss data science issues associated with publishing including topics as diverse as identifying usage patterns, gender bias, user experience and data repositories, and article level metrics. A subset of these members (Cambridge University Press, the Royal Society of Chemistry, PLoS, eLife and The Company of Biologists) decided to share data so that an analysis of the scientific validity could be undertaken. To attempt to ascertain scientific validity we looked at a multitude of measures used commonly used a variety of statistical methods to group them into sets of distinct factors. The results of the factor analysis where sets of metrics which measured different aspects of a papers impact. We found that while these patterns of impact are different across different publishers, there are clear commonalities in which metrics fall together. To identify the usefulness of these different sets of metrics we then aligned them with the measures identified through the snowball initiative. In this way, we have started to identify high-level metrics that are statistically distinct, in that they measure different phenomena, and are useful in that they are similar to measures independently proposed by a consortium of universities. We present the latest results from this ongoing project.
14:00-14:15 [Selected talk] Nees Jan van Eck (Centre for Science and Technology Studies, Leiden University) Visualizing science based on open data sources [slides] [video]
I will demonstrate the use of the VOSviewer software (, of which I am one of the developers, for creating bibliometric visualizations of science based on openly available bibliographic data sources. Both the use of Crossref data and the use of data from the OpenCitations Corpus will be demonstrated. In addition, I will show how data from Dimensions can be used. The possibilities and limitations of the currently available open data sources will be discussed, also in comparison with more established data sources such as Web of Science and Scopus. Finally, I will provide my perspective on future developments, focusing especially on the integration of open data sources and visual analysis tools.
14:15-14:30 [Selected talk] Nataliia Kaliuzhna (Independent researcher) ScientoMiner ICR - the Gephi plugin for importing scholarly citations data from Crossref services [slides] [video]
The author presents the sources of problems with conducting of bibliometric research using citation data analysis and points out that the more and more widespread use of the DOI identification system and the growing number of publishing references of articles by individual publishers result in new opportunities for such research. Particularly noteworthy here are CrossRef and Opencitations services that share structured bibliographic information (including citations) for all interested researchers. The author shows functionality of the new developed plugin (ScientoMiner ICR) for the Gephi analytical platform that imports data describing citations from CrossRef services. This enables citation analysis for all parties interested in this source of research data. The capabilities of this module will be presented on the basis of an analysis of citations from selected Ukrainian journals. (Anna Kaminska, Serhii Nazarovets, Nataliia Kaliuzhna)
14:30-14:45 [Selected talk] Finn Årup Nielsen (Technical University of Denmark) Scholia as of September 2018 [slides] [video]
Scholia is a website that visualize scientific information from Wikidata using the SPARQL-based Wikidata Query Service. Scholia shows profiles, e.g., for researchers, organizations, countries, publishers, events, awards and topics including chemicals with tables and visualizations. Scholia can be used for researcher profiles, research analytics, bibliographic reference management with the LaTeX as well as discovery of new research. We continuously expand the functionality of Scholia. I will give an update of the current state of Scholia.
14:45-15:00 [Selected talk] Daniel Mietchen (Data Science Institute, University of Virginia) A guided tour through citation networks around public health emergencies [slides] [video]
Citation networks provide a way to explore how knowledge spreads. In the context of public health emergencies, the timeliness of this spreading is of special concern. In this talk, I will explore citation networks around public health emergencies and highlight how they change on the time scale of specific emergencies like the Ebola or Zika virus outbreaks.
15:00-15:30[Invited talk from the organising committee] Silvio Peroni (OpenCitations) The OpenCitations Corpus, its data and its interfaces: present status and future plans [slides] [video]
no spoilers
15:30-16:00Coffee break
16:00-16:50Round table
16:50-17:05Workshop closing by David Shotton [video]
17:10-19:10Sightseeing tour

Day 3 (Hack day) | Wednesday, September 5

9:30-09:45Welcome and introductions
9:45-10:30Presenting API and data
10:30-11:00Pitching projects and ideas
11:00-11:20Coffee break
11:20-11:30Formation of working groups
11:30-13:00Hack session 1
14:00-16:00Hack session 2
16:00-16:20Coffee break
16:20-17:20Show and Tell
17:20-17:30Closing 3rd day

Our organisers

David Shotton
OpenCitations | University of Oxford
Johanna McEntyre
Maria Levchenko
Marilena Daquino
University of Bologna
Philipp Mayr
EXCITE | GESIS - Leibniz Institute for the Social Sciences
Silvio Peroni
OpenCitations | University of Bologna
Steffen Staab
University of Koblenz-Landau

OpenCitations runs the OpenCitations Corpus (OCC), a RDF repository of open scholarly citation data harvested from the scholarly literature.

Author image
OpenCitations Service Organisation

The EXCITE Project is developing a tool chain of software components for reference extraction from PDF documents, to be applied to existing scientific bibliographic databases.

Author image
EXCITE Project

Europe PMC is a database for life-science literature and a platform for text-based innovation.

Author image
Europe PMC Infrastructure


Invited speakers

Since 2015, Ginny has been developing a new team at Crossref encompassing outreach and education, user experience and support, and metadata strategy.

Author image
Ginny Hendricks Crossref

Dario is a social computing researcher and open knowledge advocate. He is the Director, Head of Research at the Wikimedia Foundation.

Author image
Dario Taraborelli Wikimedia Foundation / I4OC

Stephen is a professor of structural biology at Imperial College London. He is also chair of the DORA steering group.

Author image
Stephen Curry Imperial College London | DORA

Catriona has more than 19 years experience in scholarly publishing and 14 years in Open Access publishing. She is Director of Open Science at Hindawi.

Author image

Stephanie spent over 10 years in the academic publishing industry in the fields of biology and chemistry. She is CEO of ScienceOpen.

Author image
Stephanie Dawson ScienceOpen

Diego works at the European Research Council and is responsible for the data infrastructure, the information flow architecture and the policy analysis.

Author image
Diego Valerio Chialva European Research Council

Philipp is a WP leader in the EXCITE project and organizer of the workshop series on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries.

Author image
Philipp Mayr EXCITE | GESIS - Leibniz Institute for the Social Sciences

Jo McEntyre is Team Leader for Literature Services at the European Bioinformatics Institute (EMBL-EBI), where she is responsible for developing Europe PMC, the European database for full-text life science research articles.

Author image
Jo McEntyre Europe PMC | EMBL-EBI

Zeyd is a postdoctoral researcher and a team member of the project EXCITE. Currently, he is involved in the extraction and segmentation of reference strings from Social Science publications.

Author image
Zeyd Boukhers EXCITE | University of Koblenz-Landau

David is Co-Director of the OpenCitations Project, and founding member of the Initiative for Open Citations (I4OC) and of Force11. He has for the last decade pioneered the field of Semantic Publishing, employing semantic web technologies.

Author image
David Michael Shotton OpenCitations | University of Oxford

Silvio Peroni is an Assistant Professor at the University of Bologna. He is one of the main developers of SPAR Ontologies, he is Director of OpenCitations, and founding member of the Initiative for Open Citations (I4OC).

Author image
Silvio Peroni OpenCitations | University of Bologna


Agata Rotondi
Research Associate, Department of Computer Science and Engineering, University of Bologna, Italy
Alessandra Auddino
Master student, DHDK, University of Bologna, Italy
Andrea Mannocci
Research Associate, Knowledge Media Institute - Open University, Milton Keynes, UK
Angelika Tsivinskaya
Researcher, Center for Institutional Analysis of Science & Education, European University at Saint Petersburg, Russia
Angelo Di Iorio
Senior Assistant Professor, Department of Computer Science and Engineering, University of Bologna, Italy
Anne Lauscher
PhD candidate, Universität Mannheim, Germany
Astrid Orth
Project Manager, Niedersächsische Staats- und Universitätsbibliothek Göttingen (SUB), Germany
Bianca Gualandi
Digital Content Manager, Open Book Publishers, Cambridge, UK
Bianca Kramer
Librarian, Utrecht University Library
Bilal Hayat Butt
Assistant Professor, D.H.A. Suffa University (DSU), Karachi, Pakistan
Barney Walker
PhD candidate, Centre for Synthetic Biology, Imperial College London, UK
Catriona MacCallum
Director of Open Science, Hindawi
Chiara Storti
Librarian, National Central Library of Florence - ITC Department, Italy
Colin Batchelor
Data scientist, Royal Society of Chemistry, UK
Daniel Ecer
Data scientist, eLife Sciences
Daniel Mietchen
Data Science Institute, University of Virginia, USA
Dario Taraborelli
Director, Head of Research, Wikimedia Foundation, San Francisco, USA
@ReaderMeter | website
David Shotton
Co-director, OpenCitations | Senior Researcher, University of Oxford, UK
@dshotton | website
Deborah Grbac
Librarian, Università Cattolica del Sacro Cuore, Milano, Italy
Diego Valerio Chialva
European Research Council
Dominika Tkaczyk
R&D Developer, Crossref
Finn Årup Nielsen
Associate Professor, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Denmark
@fnielsen | website
Francesca Giovannetti
Master student, DHDK, University of Bologna, Italy
Francesca Tomasi
Assistant Professor, Department of Classical Philology and Italian Studies; Coordinator of DHDK, University of Bologna, Italy
Francesco Citti
Professor, Head of the Department of Classical Philology and Italian Studies, University of Bologna, Italy
Freddy Limpens
Research Fellow, Department of Computer Science and Engineering, University of Bologna, Italy
Gautam K. Shahi
Master student, DISI, University of Trento, Italy
Gianmarco Spinaci
Master student, DHDK, University of Bologna, Italy
Giovanni Colavizza
Senior research data scientist, The Alan Turing Institute, London, UK
Ginny Hendricks
Director of Member & Community Outreach, Crossref
Ivan Heibi
Research Assistant, Department of Computer Science and Engineering, University of Bologna, Italy
Johanna McEntyre
Team Leader for Literature Services, European Bioinformatics Institute (EMBL-EBI), UK | Europe PMC
Jodi Schneider
Assistant Professor, School of Information Sciences, University of Illinois at Urbana-Champaign, USA
John Samuel
Assistant Professor, CPE Lyon, France
Laurel Zuckerman
Independent researcher and writer
Luc Boruta
CEO, Thunken
Ludo Waltman
Senior Researcher, CWTS, Leiden University, NL
Mairelys Lemus-Rojas
Digital Initiatives Metadata Librarian, IUPUI University Library, Indiana University - Purdue University Indianapolis, USA
Maria Levchenko
Community Manager, European Bioinformatics Institute (EMBL-EBI), UK | Europe PMC
Marilena Daquino
Research Assistant and PhD candidate, Department of Classical Philology and Italian Studies, University of Bologna, Italy
Martin Fenner
Technical Director, DataCite
Matteo Romanello
Digital Humanities Specialist, École polytechnique fédérale de Lausanne, CH
Nataliia Kaliuzhna
Librarian, Independent Researcher
Nees Jan van Eck
Senior Researcher, CWTS, Leiden University, NL
Philipp Mayr
Department Head and team leader, GESIS - Leibniz Institute for the Social Sciences, Austria
Piero Grandesso
Responsible of Open Access Journals services (AlmaDL journals), ABIS, University of Bologna, Italy
Rachel Kotarski
Head of Research Infrastructure Services, British Library
Ross Mounce
Director of Open Access Programmes, Arcadia Fund
Sahar Vahdati
PhD candidate, Smart Data Analytics (SDA), University of Bonn, Germany
Sara Ricetto
Open Access repository management Librarian, Università Cattolica del Sacro Cuore, Milano, Italy
Sergey Parinov
Leader of Development Team of CyrCitEc project, RANEPA | Russian Academy of Sciences, Moscow, Russia
Silvio Peroni
Senior Assistant Professor, Department of Classical Philology and Italian Studies, University of Bologna, Italy
Steffen Lemke
Research Assistant and PhD candidate, ZBW Leibniz Information Centre for Economics, Austria
Steffen Staab
Professor, University of Koblenz-Landau, Germany
@ststaab | website
Stephanie Dawson
CEO, ScienceOpen
Stephen Curry
Professor, Imperial College London, UK | Chair at DORA
Vittorio Grieco
Student of Medicine and Surgery, University of Bologna, Italy
Zuang Huang
European Bioinformatics Institute (EMBL-EBI), UK
Zeyd Boukhers
Postdoctoral Researcher, University of Koblenz-Landau, Germany

travel information

Welcome to Bologna!

The University of Bologna is the oldest university in the western world, and one of the largest universities in Italy (with about 90,000 enrolled students).

Get to Bologna

The airport “Guglielmo Marconi” is located at 15-20 minutes by car from the city centre. A direct bus, Airport Bus BLQ, leaves regularly from the Airport for Bologna Central Rail Station. The trip costs 6€. A taxi costs about 15-20€ (call a taxi at 0039 051 372727).

The venue

The workshop will take place at the University of Bologna in the heart of the city - the School of Arts, Humanities and Cultural Heritage (italian translation: Scuola di Lettere e Beni Culturali), via Zamboni 34, Room "Aula Affreschi". From the train station you can either walk to FICLIT (20') or take bus C (direction "Cestello", see the time table, page 2).


The organizers are negotiating with local hotels for rooms to be reserved and made available at special rates for participants. More information available here.

About the location

Bologna is home to numerous prestigious cultural, economic and political institutions as well as one of the most impressive trade fair districts in Europe. In 2000 it was declared European capital of culture, and in 2006, a UNESCO "city of music".

For any enquiry

contact us

Contact Info

Where to Find Us

via Zamboni, 32 (first floor) | Bologna, BO | 40126 Italy

Any Doubt?

open an issue on github!

Email Us