Program

Day 1 | Monday, September 3

9:00-9:30	Registration and breakfast
9:30-9:40	Greetings by FICLIT representative (Francesca Tomasi, DHDK Director) [video]
9:40-9:50	Workshop introduction by David Shotton [slides] [video]
9:50-10:20	[Invited talk] Dario Taraborelli (Wikimedia Foundation / I4OC) Remixing the graph [slides] [video] Over the past years, several datasets representing free-to-read (or partially reusable) citation data have been made available to researchers and bibliometricians. However, what one can build on top of an open citation graph vastly exceeds the value of free-to-read citation data. In this talk, I introduce the Initiative for Open Citations (I4OC) and present a key example of reuse and remix of open citation data in Wikidata — the free knowledge base that anyone can edit. With its open human and algorithmic curation model, Wikidata allows embedding bibliographic and citation datasets in a much richer set of relations compared to most citation indexes. It supports novel use cases that go beyond bibliometric analysis and thanks to its open nature it makes gaps and quality issues in the underlying citation data easily auditable.
10:20-10:50	[Invited talk from the organising committee] Johanna McEntyre (Europe PMC) Open Citations and Europe PMC [slides] [video] Europe PMC is a database of the life sciences research literature, contains over 30M abstracts, including PubMed and 5M full text articles. The mission of Europe PMC is to make the content as widely available as possible, which we do via the website, APIs and bulk downloads. Built in partnership with PMC USA, with whom the full text articles are shared, Europe PMC adds value to the core content in a number of ways, one of which is to generate article-level citation counts. These are generated via the references lists from the full text article in Europe PMC, supplemented with reference lists from Crossref metadata services. With the formation of IO4C, we have seen a dramatic increase in the number of citations we have been able to add to the Europe PMC network, data that we redistribute via the website and APIs. In this talk I will describe how Europe PMC manages, distributes and reuses open citations.
10:50-11:20	Coffee break
11:20-11:50	[Invited talk from the organising committee] David Shotton (OpenCitations) Why and how should we share citations openly [slides] [video] In this presentation, I will outline the importance of citations in the world of scholarship, and how early on academics missed the boat for open citation information, leading to the growth of commercial citation indexes. After discussing problems with commercial citation indexes – cost, data availability, visualization tools, and variation in citation statistics – I will mention recent initiatives to reclaim citations for the world of open scholarship: The Initiative for Open Citations, Crossref as a reference repository, WikiCite, OpenCitations as a scholarly infrastructure organization, and the OpenCitations Corpus. I will introduce a formal definition of an open citation, and will outline the the SPAR (Semantic Publishing and Referencing) ontologies and OpenCitations Data Model that we at OpenCitations have developed to facilitate the creation of machine-readable open citation data. I will then describe the requirements for citations to be treated as first-class data entities, and the new global persistent identifier for citations that we have developed, the Open Citation Identifier (OCI), and will conclude by showing how OCIs can be used to create simple citation indexes of open citation sources such as the Crossref Open Citations Index.
11:50-12:20	[Invited talk from the organising committee] Philipp Mayr (EXCITE) Recent advances in the project EXCITE - Extraction of Citations from PDF Documents [slides] [video] The "Extraction of Citations from PDF Documents" - EXCITE project at GESIS - Leibniz Institute for the Social Sciences and University of Koblenz-Landau is in line with the Open Citations Initiative and aims to make more citation data available to researchers with a particular focus on the social sciences. The shortage of citation data for the international and German social sciences is well known to researchers in the bibliometrics field and has itself often been subject to academic studies. In order to open up citation data in the social sciences, the EXCITE project develops a set of algorithms for the extraction of citation and reference information from PDF documents and the matching of reference strings against bibliographic databases. The presentation focuses on the overall reference extraction, matching and publication workflows and tool chain. A demo of the web demonstrator completes the talk.
12:20-12:50	[Invited talk] Ginny Hendricks (Crossref) Crossref: underpinning citation opportunities [slides] [video] no spoilers
12:50-14:00	Lunch
14:00-14:10	Announcements
14:10-14:25	[Selected talk] Martin Fenner (DataCite) Open Citations and Data Citations [slides] [video] While the focus of open citations is on opening up the reference lists of publishers working with Crossref, open citations is also highly relevant for citations of scholarly content that uses DataCite DOIs. The number of data citations in the metadata that publishers send to Crossref is very small. This presentation will discuss the current issues and how this situation can be improved going forward.
14:25-14:40	[Selected talk] Luc Boruta (Thunken) Cobaltmetrics: preventing citation decay and obfuscation [slides] [video] Scholarly communication becomes more and more electronic, and scholarly literature becomes more than just papers. In this context, hyperlinks are the preferred way to reference published or unpublished sources, including datasets and software. While hyperlinks are more powerful than text-based citations in terms of user experience, they are prone to link rot. Shortened URLs add an extra level of indirection that increases the risk of breaking the permanent record of science. Cobaltmetrics monitors trusted sources to index citations, identifiers, and hyperlinks. We go deeper than other altmetrics providers. We mine data in 180+ languages, we unroll shortened URLs from 175+ shorteners, we crack open URLs to extract persistent identifiers, and we convert between 50+ types of identifiers. We will provide empirical data from an analysis of 73.5 millions documents, 52.5 million citations and backlinks, and a database of 6.5 billion shortened URLs, focusing on the effects of reference rot on altmetrics indicators. We will then present good practices that all stakeholders should adopt and, most importantly, technical solutions that altmetrics providers should implement to compensate the effects of link rot and URL variability.
14:40-14:55	[Selected talk] Ludo Waltman (Leiden University) Comparing bibliographic data sources [slides] [video] In this talk, I will present a comparative perspective on a number of major bibliographic data sources, both ‘open’ and ‘closed’ sources. The comparison includes Web of Science, Scopus, Dimensions, Crossref, and OpenCitations Corpus. I will put special emphasis on the accuracy of the data provided by the different sources. Performing large-scale comparisons of bibliographic data sources is challenging, and therefore I will also discuss the main gaps that we have in our understanding of the strengths and weaknesses of different data sources and ways in which these gaps can be filled.
14:55-15:10	[Selected talk] Jodi Schneider (University of Illinois at Urbana-Champaign) Detecting problematic citation patterns with the Open Citations Corpus [slides] [video] The structure of citation networks provides evidence about how scientific information is diffused. Problematic citation patterns include the selective citation of positive findings, citation bias, as well as the continued citation of retracted literature (i.e. literature formally withdrawn due to error, fraud, or ethical problems). For instance, there is some evidence that positive results tend to receive more citations.The public domain licensing of the Open Citations Corpus makes it possible, in principle, to estimate the likelihood that any network of research papers suffers from problematic citation. To-date, problematic citation been documented ad-hoc, in several striking studies. In Alzheimer's disease research, biased citation, ignoring critical findings, was used to support successful U.S. NIH grant proposals (Greenberg 2009). Mistranslation of obesity research has been used to justify exertion game research (Marshall & Linehan 2017). Citation of fraudulent research about Chronic Obstructive Pulmonary Disease continued after its retraction (Fulton et al. 2015). The data resulting from such studies is of great use to my lab in replicating and determining how to generalize the detection of problematic citation patterns. Previously, the detection of problematic citation patterns has been a side effect of astute researchers, noticing suspicious findings while conducting systematic literature reviews. This talk will describe work-in-progress in my lab detecting problematic citation patterns using natural language processing, combined with network analysis on the Open Citations Corpus.
15:10-15:25	[Selected talk] Anne Lauscher (Universität Mannheim), Kai Eckert (Hdm Stuttgart), Lukas Galke (ZBW Kiel) Libraries as Curators of Open Citations: perspectives of the project LOC-DB in Germany [slides] [video] Creation of citation data is not part of the cataloging workflow in libraries nowadays – but traditional tasks are changing, and libraries have a decided interest in the availability of open citation data that can be used for retrieval and bibliometrics. We have started the Linked Open Citation Database (LOC-DB) project to enable libraries to contribute to the creation and curation of open citations. We provide an interface for librarians to incrementally catalog citation metadata with low effort. The librarians are supported by a wide range of helpful tools such as a reference extractor for print as well as digital documents along with a suggestion engine, which harvests external data sources for citation candidates. The database itself is prepared for distributed deployment, such that multiple libraries can collaborate. At any time, the data can be supplied as linked data in the OpenCitations data format. The goal is to show that efficiently cataloging citations of print and electronic resources in libraries is possible. In the presentation, we’ll describe the current state of the workflow and its implementation. We show that automatic reference extraction from scanned print resources could be significantly improved with the implementation of an automatic reference extraction component based on deep learning-based approach. We further give insights on the curation and linking process and provide first evaluation results.
15:25-15:40	[Selected talk] Matteo Romanello (École polytechnique fédérale de Lausanne), Giovanni Colavizza (The Alan Turing Institute) The Scholar Index: a Distributed, Collaborative Citation Index for the Arts and Humanities [slides] [video] Freeing citation data is an issue of essential importance and urgency for scholarly communication, which has been recently getting the attention it deserves thanks to the Initiative for Open Citations. Simultaneously, we should be concerned as well with another complementary issue, that is the necessity to "recover" citation data from digitized publications. This applies especially to the Humanities, where fields often have century-long traditions. Otherwise, we risk to create citation indexes that take into account only recent publications, mostly in English, while a gap ensues for citations buried in older publications. The creation of a comprehensive citation index for the Arts and Humanities, however, is a titanic endeavour which can only be accomplished with a collaborative, distributed approach, where cultural heritage institutions (e.g. libraries, archives, etc.) play a key role. In this talk we present The Venice Scholar, a citation index of literature on the history of Venice, indexing nearly 3000 volumes of scholarship from the mid 19th century to 2013, from which some 4 million bibliographic references have been extracted. The Venice Scholar, to be publicly launched in September 2018, is the first running instance of the Scholar Index, a platform aimed at creating a comprehensive citation index for the Arts and Humanities. This platform consists of two applications, the Scholar Library (SL) and the Scholar Index (SI), both to be released soon under an open source license.The SL is a digital library system where partner institutions can load their digitized scholarly literature. The system embeds the necessary machine learning components to recognize the text from an image (OCR), extract references and link them to unique identifiers, pointing to external resources (e.g. library catalogues). Each partner institution keeps an instance of the digital library system and its own collection.The SI is the global citation index, which federates all citations extracted from different institutions into a unique index, and provides a rich search interface to navigate through the resulting network of citations, with the final aim of interlinking digital archives and digital libraries. In fact, the SI is currently being extended, thanks to an Europeana Reserach Grant, to provide contextual recommendations of related digital objects from Europeana to its users. The citation data underlying the Venice Scholar are modelled using the OpenCitations Data Model, and will use the OpenCitations Corpus as its publication platform, thus enriching this corpus with some 4 million references "recovered" from historical and current publications about the history of Venice. To conclude, we believe that the creation of a citation index for Arts and Humanities can only be accomplished through a collaborative and federated approach, and by leveraging infrastructure synergies, such as the one with the Open Citation Corpus. In this process, libraries and other institutions should take responsibility for specific areas of knowledge (e.g. a journal, a publisher, or a topic) and, at the same time, be facilitated (e.g. through software) in the task of enriching their digitized collections with citation data.
15:40-16:10	[Invited talk] Stephanie Dawson (ScienceOpen) Open Citations in Action: Case Study ScienceOpen [slides] [video] Viewed collectively, an article’s references contain a wealth of information. Networks of citations can trace the genealogy of ideas, and through reference lists one can see methods, theories, and ideologies introduced and die away. Until recently, the references of an article were under strict copyright control and tracking citations was left up to big business, but the open science movement has started to change that. New initiatives such at the I4OC Initiative for Open Citations are spearheading a growing consensus that citations are metadata and therefore should be freely accessible via Crossref, regardless of article license type. The discovery platform ScienceOpen is a case study in the kind of search environment that can be built on top of an open citations commons. ScienceOpen recently added over 100 million citation connections between articles, and uses its powerful search engine to expose this knowledge to researchers for a richer discovery experience. This enables us to provide users with tools to track their own article citations through time, trace citation networks, discover similar articles, discover highly-cited articles in a research field, sort searches by citations, and ultimately create a vast, contextual citation-based network for an intelligent search and discovery experience. ScienceOpen provides a good example of your open citations – and open science in action.
16:10-18:00	Coffee break and Poster session
	[Poster] Angelo Di Iorio (University of Bologna) Semantic Coloring of Academic References [pdf] The talk introduces the SCAR project, a collaboration between the University of Bologna and Elsevier, whose goal is to enrich citations with explicit information about their role, features and impact. The name of the project - Semantic Coloring of Academic References - reflects the fact that bibliographies should not be considered as plain lists of entries, but as lists of qualified references that need to be identified and shown appropriately, e.g. by means of different colors. Multiple coloring schemes can be applied to the same bibliography according to different criteria, such as publication year, authorship, citation contexts, justification for citing and so on. The goal of SCAR is to build a prototype that extracts such information from the full text of the articles and enriches bibliographies with such metadata. The project was not originally linked to Open Citation but its results and issues could be relevant for OC, just as OC tools and data could be beneficial for SCAR.
	[Poster] Bianca Kramer (Utrecht University Library) DOI wanted - community involvement in open citations [pdf] Open citations are an important cogwheel in the engine of open scholarly infrastructure. At the same time, their development shows both the power and limitations of established parties, restricting their potential use for the wider scholarly community. For example, the quality of open citations as metadata provided by publishers is often still suboptimal. It can be argued that stricter requirements as to the format and quality of such metadata would discourage publishers from supplying them in the first place (thus slowing the growth of the number of open citations made available). In combinaton with limiting the supply of open citation metadata to publishers, though, a catch-22 situation is created that limits the quality of open citations, and thus, their usability. Simultaneously, commercial parties have ingested these incomplete metadata and improved on them, only to subsequently monetize their use in applications without contributing back to the underlying corpus of open data. While there is rightly no limitation on the use and reuse of open citations, I would like to explore models that would better allow the scholarly community to contribute to the quality and value of open citations metadata on one hand, and encourage their sustainable use on the other hand. The former could be envisioned through forms of crowdsourced improvement of open citations metadata (in which Wikidata could play an important role). An example of the latter would be the monetization of services built on enriched open citation data, without enclosing the data itself. Both models would enable the scholarly community as a whole to not only make optimal use of open citations, but also contribute to their value. By making the wheels turn smoother, we'll collectively get further!
	[Poster] Astrid Orth (SUB Göttingen) How do citation-based and alternative metrics benefit each other? [pdf] I would like to present the approach of the *metrics project to measure the reliability and perceptions of indicators for interactions with scientific products: 1. analysis of how researchers (focussing on social sciences and economics) interact on social media platforms and how their motivations form patterns that need to be taken into account when measuring output on social media channels. 2. provision of a social media registry with functions and accessibility of social media platforms relevant for scholarly communication, and 3. prototype of a crawler that gives insights in the reliability of current social media metrics. The combination of these 3 elements allow a larger degree of transparency when studying scholarly communication. Some shortfalls of alternative metrics are becoming obvious, but can comparably seen in citation metrics. It should thus be most interesting to discuss during this workshop how open citation-based and alternative metrics can benefit each other in providing a complete and true picture of scholarly communication.
	[Poster] John Samuel (CPE Lyon) WikiProvenance: Are there enough references to every (known) fact on Wikidata? [pdf] In a collaborative website like Wikidata, contributors add statements about an item like a person, a scientific subject, a historical place etc. Each of these statements need to be backed by a reference. Yet a large number of statements are added without references. Even well-described items (i.e., the items with a lot of statements) like Douglas Adams, Albert Einstein do not have references on all these facts. During the past few years, there has been a lot of focus on increasing the number of statements of items usually referred to as the item completion efforts. Yet, the references to statements cannot be undermined. That being said, it is equally important to understand the use of external identifiers and links to existing multilingual Wikimedia projects for these items. Since many of these sources also contain a large number of references. Both contributors and users need to get an overview on the reference statistics. With WikiProvenance[1], the goal is to understand and delve into details of the usage of references, the links to external sources in an open and transparent manner. [1] https://rawgit.com/johnsamuelwrites/WikiProvenance/master/index.html
	[Poster] Gautam Kishore Shahi (University of Trento) Semantics Aware Policy Making for Open Citations Policy-making is one of the significant fields of research in which an organization tries to improve itself or its existing systems by reformations. Citation plays a vital role in the area of research. It has also been studied that due to lack of proper analysis of available resources, the dispensation of funds is uneven. In this paper, a semantic web-based system is proposed which focuses on extracting right knowledge about the Citation. The information can then be used for policy-making by the respective Organization. An ontology model has been proposed for this task. The central area of focus has been to extract only those information which is useful in citation policy-making. Therefore, knowledge of ontology and citation has been combined to produce a semantic web-based citation policy-making.
	[Poster] Angelika Tsivinskaya (Center for Institutional Analysis of Science & Education, European University at Saint Petersburg) Bibliographic metrics as performance evaluation measure for Higher Education Institutions in Russia [pdf] Since 2006, Russian government made energetic attempts to boost academic performance of the country’s universities, to shut down underperforming institutions and to turn the survivors into world-class schools. Since 2012, Russian policies in the higher education sphere were largely directed by the results of the Survey of efficiency of higher education organizations (Monitoring of Efficiency of Educational Organizations) – a set of quantitative performance indicators used to sort efficient from non-efficient organizations. There are also several governmental programs for funding best universities, one of them is “5–top 100” started in 2013. Most of those initiatives include assessment of publication activity. Searchers showed that this funding programs had positive effects on publication activity (Turko, 2016) and there are also studies about collaborations for most citied papers (Pislyakov, 2014). Those studies mostly focused on successful universities and its specific characteristics. Many universities are closely connected to small number of fields and currently used bibliographic metrics do not consider this difference between publication dynamics across fields (Piro, 2014). The main purpose of our study to show how representation of universities in different citation systems depends on such factors as: public or private; localized in a bigger city or a wealthy region; nominal profile (based partly on ties to specific fields); age; ecological situation at a local market for higher education. For purposes of our study we used data collected from Monitoring of Efficiency 2013-2017 which include (per 100 academic employees): Number of citations in Web of Science, Number of citations in Scopus, Number of citations in Russian Scientific Citation Index, Number of publications in Web of Science, Number of publications in Scopus, Number of publications in Russian Scientific Citation Index. Our study shows that facing publication pressures universities have a stable positive trend in number of publications but our data shows that extreme variability of bibliographic metrics exist between different university families. Thus, so-called “classical”, polytechnic and medical universities have higher median number of publication in Web of Science that others, while universities majoring in social and economic sciences, especially ones attached to various ministries have the highest median number of publication in Russian Scientific Citation Index. The results show that the ascriptive variables account for a large share of variance, with families being particularly important. References: Piro F. N., Aksnes D. W., Rørstad K. (2013) A Macro Analysis of Productivity Differences across Fields: Challenges in the Measurement of Scientific Publishing // Journal of the American Society for Information Science and Technology, vol. 64, no 2, pp. 307–320. Pislyakov V., Shukshina E. (2014) Measuring Excellence in Russia: Highly Cited Papers, Leading Institutions, Patterns of National and International Collaboration // Journal of the Association for Information Science and Technology, vol. 65, no 11, pp. 2321–2330. Turko T., Bakhturin G., Bagan V., Poloskov S., Gudym D. (2016) Influence of the Program «5–top 100» on the Publication Activity of Russian Universities // Scientometrics. Vol. 109. No 2. pp. 769–782
	[Poster] Barney Walker (Imperial College London) Citation Gecko: A Tool for Literature Discovery and Exploration using Localised Citation Networks [pdf] Citation Gecko is a new, open-source web app that gives researchers a birds-eye view of the relevant literature. Using openly-available citation data, it constructs and visualises the local citation network in the researcher’s area, helping them discover literature they may have missed and make sense of how papers are connected.Why is this so useful? Traditionally, the literature review process has involved iteratively searching keyword combinations and manually following references from one paper at a time. With this method it’s easy for important papers to slip through the net and is difficult to prioritise what to read first.How does it work? Gecko circumvents the need for researchers to define their area of interest in keywords by simply starting with a set of ‘seed papers’ which are representative of their area of interest. Gecko then finds all the papers that a) cite, b) are cited-by, or c) are co-cited with the seed papers in order to build a ‘localised’ citation network. Visualising this network gives researchers an overview of how the different papers fit together. Local, rather than global, metrics can then be defined on the network providing suggestions that are more likely to relevant to the researcher. New seed papers can then be added on-the-go from among the results, expanding the network and building up a more complete map of the literature. Citation Gecko is designed to fit with a researcher’s workflow, integrating with reference managers such as Zotero directly and allowing for upload of bibtex files, as well as providing native search functions. By demonstrating how having open citations can lead to greater discoverability of research articles we hope to incentivise more publishers to open up their citation data.
19:30-22:00	Social dinner

Day 2 | Tuesday, September 4

9:00-9:30	Breakfast
9:30-10:00	[Invited talk] Catriona MacCallum (Hindawi) Open Citations as Academic and Cultural Capital [slides] [video] [TBA]
10:00-10:30	[Invited talk from the organising committee] Zeyd Boukhers (EXCITE) A Generic Approach for Reference Extraction from PDF Documents [slides] [video] Extracting and parsing cited references from publications in PDF format is important to ensure the acknowledgement of the sources of information. However, the mention of these sources differs from a community to another and from a publication to another. This citation diversity lies mainly in the indexation style (e.g., one or several reference sections), the existence of components (e.g. editor, source, URL, etc.) and the type of references (e.g. grey literature, academic literature, etc.). In order to automatically and accurately extract and parse difference kinds of references, EXCITE proposes a generic approach that combines Random Forest and Conditional Random Fields (CRF) in a coherent mechanism. Random Forest is employed for the initial classification of each line in the document, whereas CRF parses the potential reference lines into essential components (e.g., author, title, etc.). Here, different line combinations are iteratively assessed in order to obtain the proper combination with the help of a probabilistic approach.
10:30-11:00	[Invited talk] Stephen Curry (DORA) The Declaration on Research Assessment (DORA): Opening up the measures of success [slides] [video] In the 21st Century do we still know what it means to be a successful researcher? And where to find the border between academic freedom of inquiry and responsibility to funders – often ultimately taxpayers – who look to researcher to help tackle major societal challenges? Though many are attracted to research by the thrill of the intellectual challenge and the chance to make the world a better place, the increasing performance management of scientific and scholarly enquiry is straining the health of the system. Few would deny our responsibility not only to deliver discoveries, innovations and insights that are exciting and relevant, but also have to be mindful of the need to do so reliably; and to disseminate our findings (and our methods, data, and reagents) as rapidly and as widely as current technology permits. For now we are struggling to meet these demands, in part because the misuse of metrics such as impact factors, h-indices, and university rank in research assessment reinforces definitions of value that are at the same time too vague and too narrow. Initiatives such as DORA, especially when allied to the developing goals of open science, can help us restore the well-being of research assessment, of research – and of researchers.
11:00-11:30	Coffee break
11:30-12:00	[Invited talk] Diego Valerio Chialva (European Research Council) From Open Citation Data to Linked Open Data: a prototype at the ERC [slides] [video] After briefly examining the advantages of open data and in particular open citation data for monitoring and evaluation of research and research funding, in this talk I discuss how the new relevant issue within an open data framework is linking data, in particular open citations to other open data. I will then present the work I and Mr Alexis-Michel Mugabushaka have being doing in the Unit A1 at ERCEA, in collaboration and open discussion with other external interested and research groups, in prototyping an open research graph, modelling specific relevant new data ontologies, constructing the graph itself, and in using it for analysis.
12:00-12:15	[Selected talk] Sergey Parinov (RANEPA) Open citation content data [slides] [video] The project CyrCitEc (https://github.com/citeccyr) creates a source of open citation content data. It is funded by the Russian Presidential Academy of National Economy and Public Ad-ministration (RANEPA, http://www.ranepa.ru/eng/). The project has two main aims: 1) to create a public service for processing available research papers full text (particularly, in PDF and with main focus on Social Sciences), in order to build and regularly update an open dataset of citation relationships and citations content; 2) to use the citation content data for developing methods of qualitative citation analysis, which can be used for improving of current practice of a research performance assessment. The project tends to provide a pilot version of open scholarly infrastructure based on following pillars:1.Open distributed architecture. It means providing a concept, open source software and an initial core infrastructure for interoperable systems, which are processing citation relationships and its content from research papers’ full text. 2. Two initial nodes of this core infrastructure, presented by interacting CitEc (http://citec.repec.org/) and CyrCitEc systems. Currently these nodes are ex-changing by citations data. The nodes have a specialization on processing papers in specific languages: Romano-Germanic languages by CitEc and Russian by CyrCitEc. Other nodes, e.g. specialized on processing citation data in languages, like Chinese, Japanese, Arabic, etc., can be added by the same way. There is also an intention to integrate data about references into the OpenCitations Corpus (http://opencitations.net/).3. Transparency. It allows publishers, authors and readers of papers to see for each paper how their citation data are extracted by the system and to trace why some papers' references / in-text citations are not processed or not counted.4. Better representation and usability of citation data by its deeper integration with a digital library tools and services. 5. Enrichment facilities. The system provides tools for authors of papers to enter additional data to correct errors of processing citations from their papers and to enrich their citation relationships, e.g. by qualitative characteristics of their motivation for citing papers of other authors, etc.6. Public control. Readers of papers can see how authors used enrichment facilities to increase their number of citations. Public will be able to react on wrong authors behaviour. CyrCitEc takes papers’ metadata from the Socionet digital library (https://socionet.ru/), which also includes a full set of metadata from RePEc (http://repec.org).
12:15-12:30	[Selected talk] Daniel Ecer (eLife Sciences) Citation Sentiment [slides] [video] Not all citations are equal. Some citations are positive, while others may actually criticise the referenced work. In this talk we are presenting results of a project to analyse the sentiment of citations, the challenges in getting training data and why out-of-the box models trained on Twitter may not provide the same results for the subtle language used in scientific manuscripts.
12:30-12:45	[Selected talk] Colin Batchelor (Royal Society of Chemistry) The Cambridge Metrics Group and article-level-metrics [slides] [video] In this talk we present a consortium-led case study on article-level metrics with a focus on their scientific usefulness and statistical validity. The consortium consists of a number of non-profit organisations involved with academic publishing, originally drawn from the Cambridge (UK) area. The consortium meets regularly to discuss data science issues associated with publishing including topics as diverse as identifying usage patterns, gender bias, user experience and data repositories, and article level metrics. A subset of these members (Cambridge University Press, the Royal Society of Chemistry, PLoS, eLife and The Company of Biologists) decided to share data so that an analysis of the scientific validity could be undertaken. To attempt to ascertain scientific validity we looked at a multitude of measures used commonly used a variety of statistical methods to group them into sets of distinct factors. The results of the factor analysis where sets of metrics which measured different aspects of a papers impact. We found that while these patterns of impact are different across different publishers, there are clear commonalities in which metrics fall together. To identify the usefulness of these different sets of metrics we then aligned them with the measures identified through the snowball initiative. In this way, we have started to identify high-level metrics that are statistically distinct, in that they measure different phenomena, and are useful in that they are similar to measures independently proposed by a consortium of universities. We present the latest results from this ongoing project.
12:45-14:00	Lunch
14:00-14:15	[Selected talk] Nees Jan van Eck (Centre for Science and Technology Studies, Leiden University) Visualizing science based on open data sources [slides] [video] I will demonstrate the use of the VOSviewer software (www.vosviewer.com), of which I am one of the developers, for creating bibliometric visualizations of science based on openly available bibliographic data sources. Both the use of Crossref data and the use of data from the OpenCitations Corpus will be demonstrated. In addition, I will show how data from Dimensions can be used. The possibilities and limitations of the currently available open data sources will be discussed, also in comparison with more established data sources such as Web of Science and Scopus. Finally, I will provide my perspective on future developments, focusing especially on the integration of open data sources and visual analysis tools.
14:15-14:30	[Selected talk] Nataliia Kaliuzhna (Independent researcher) ScientoMiner ICR - the Gephi plugin for importing scholarly citations data from Crossref services [slides] [video] The author presents the sources of problems with conducting of bibliometric research using citation data analysis and points out that the more and more widespread use of the DOI identification system and the growing number of publishing references of articles by individual publishers result in new opportunities for such research. Particularly noteworthy here are CrossRef and Opencitations services that share structured bibliographic information (including citations) for all interested researchers. The author shows functionality of the new developed plugin (ScientoMiner ICR) for the Gephi analytical platform that imports data describing citations from CrossRef services. This enables citation analysis for all parties interested in this source of research data. The capabilities of this module will be presented on the basis of an analysis of citations from selected Ukrainian journals. (Anna Kaminska, Serhii Nazarovets, Nataliia Kaliuzhna)
14:30-14:45	[Selected talk] Finn Årup Nielsen (Technical University of Denmark) Scholia as of September 2018 [slides] [video] Scholia is a website that visualize scientific information from Wikidata using the SPARQL-based Wikidata Query Service. Scholia shows profiles, e.g., for researchers, organizations, countries, publishers, events, awards and topics including chemicals with tables and visualizations. Scholia can be used for researcher profiles, research analytics, bibliographic reference management with the LaTeX as well as discovery of new research. We continuously expand the functionality of Scholia. I will give an update of the current state of Scholia.
14:45-15:00	[Selected talk] Daniel Mietchen (Data Science Institute, University of Virginia) A guided tour through citation networks around public health emergencies [slides] [video] Citation networks provide a way to explore how knowledge spreads. In the context of public health emergencies, the timeliness of this spreading is of special concern. In this talk, I will explore citation networks around public health emergencies and highlight how they change on the time scale of specific emergencies like the Ebola or Zika virus outbreaks.
15:00-15:30	[Invited talk from the organising committee] Silvio Peroni (OpenCitations) The OpenCitations Corpus, its data and its interfaces: present status and future plans [slides] [video] no spoilers
15:30-16:00	Coffee break
16:00-16:50	Round table
16:50-17:05	Workshop closing by David Shotton [video]
17:10-19:10	Sightseeing tour

Day 3 (Hack day) | Wednesday, September 5

9:00-9:30	Breakfast
9:30-09:45	Welcome and introductions
9:45-10:30	Presenting API and data
10:30-11:00	Pitching projects and ideas
11:00-11:20	Coffee break
11:20-11:30	Formation of working groups
11:30-13:00	Hack session 1
13:00-14:00	Lunch
14:00-16:00	Hack session 2
16:00-16:20	Coffee break
16:20-17:20	Show and Tell
17:20-17:30	Closing 3rd day

September 3-5, 2018 | bologna, italy

Workshop on Open Citations

the workshop

Information

day workshop

day hackathon

invited speakers

participants

call for participation

Application

Opening up citations

Policies and funding

Publishers and learned societies

Projects

Program

Day 1 | Monday, September 3

Day 2 | Tuesday, September 4

Day 3 (Hack day) | Wednesday, September 5

Our organisers

David Shotton

Johanna McEntyre

Maria Levchenko

Marilena Daquino

Philipp Mayr

Silvio Peroni

Steffen Staab

Sponsors

Invited speakers

Participants

Agata Rotondi

Alessandra Auddino

Andrea Mannocci

Angelika Tsivinskaya

Angelo Di Iorio

Anne Lauscher

Astrid Orth

Bianca Gualandi

Bianca Kramer

Bilal Hayat Butt

Barney Walker

Catriona MacCallum

Chiara Storti

Colin Batchelor

Daniel Ecer

Daniel Mietchen

Dario Taraborelli

David Shotton

Deborah Grbac

Diego Valerio Chialva

Dominika Tkaczyk

Finn Årup Nielsen

Francesca Giovannetti

Francesca Tomasi

Francesco Citti

Freddy Limpens

Gautam K. Shahi

Gianmarco Spinaci

Giovanni Colavizza

Ginny Hendricks

Ivan Heibi

Johanna McEntyre

Jodi Schneider

John Samuel

Laurel Zuckerman

Luc Boruta

Ludo Waltman

Mairelys Lemus-Rojas

Maria Levchenko

Marilena Daquino

Martin Fenner

Matteo Romanello

Nataliia Kaliuzhna

Nees Jan van Eck

Philipp Mayr

Piero Grandesso

Rachel Kotarski

Ross Mounce

Sahar Vahdati

Sara Ricetto

Sergey Parinov