About 2 billion open bibliographic citations are available on the web. We invite to this workshop researchers, scholarly publishers, funders, policy makers, and opening citations advocates, interested in the widespread adoption of practises for creation, reuse and improvement of open citation data.
This years' theme is The value of open scholarly metadata for research assessment purposes
The workshop materials are available in the WOOC2023 community on Zenodo
WOOC materialsIf you wish to participate in the Workshop on Open Citations and Open Scholarly Metadata 2023 as an attendee, please fill in the form by providing the requested information and a short bio to apply for the event. In case of oversubscriptions, the organisers will select attendees from among those who have applied. The call for participation is open.
For the accepted attendees, we request a small registration fee (VAT included) of €100 to cover expenses (€25 for students). Details of how to pay will be sent with the notification of acceptance together with information to finalize the registration.
Apply for WOOC-2023
Call for participation: closed
Register here to join the online streaming of the Workshop on Open Citations and Open Scholarly Metadata 2023. The online participation is free. Fill in the form with your contact details and a short bio, and we will contact you to send the link to the virtual room a few days before the workshop.
Register to join online streamingDue to organizational reasons, it won't be possible for the online participants to interact with the in-presence speakers and participate in the Q&A. For any questions and clarifications please contact comms@opencitations.net.
10:30 | Breakfast and reception |
11:15 | Welcome and Workshop introduction by Silvio Peroni |
11:30 | [Invited talk] Bianca Kramer (Sesame Open Science | SCOSS) Openness as a social norm This presentation will discuss multiple aspects of openness in research assessment, as well as developments that either facilitate or limit that openness. This ranges from the freedom researchers, institutions and funders have to decide what is important in assessment, to the availability of data, infrastructure and services to make such assessments possible. What’s the balance between the need for relevant (context-dependent) and comparable information? Do we need common indicator frameworks? Is there a role for community curation to enhance and correct open metadata? And how do we ensure the sustainability of open metadata providers? Seeing openness as a social norm means trying to find answers to these questions as a community - and let these answers guide our choices in how we approach research assessment. |
12:00 | [Invited talk] Iratxe Puebla (DataCite) The Global Data Citation Corpus: An international collaboration to enable broader discovery and assessment of data reuse and impact
While data sharing has increased over time and citations to those datasets are increasingly created, the disparity of locations where they are stored limits our understanding of data reuse and our ability to assess the impact of open data. To address this challenge, DataCite is developing a Global Data Citation Corpus, a vast, open-access collection of data citations from a variety of sources and disciplines, which will enable the global community to discover and access datasets more easily than ever before. We will outline the technical aspects behind the corpus, our progress to date, and its potential applications in research and beyond. By unlocking the power of data citation, we can promote more open, collaborative and impactful research practices, and facilitate the integration of data into the broader scholarly conversation. |
12:30 | [Invited talk] Nina Tscheke (ScienceOpen) Open Metadata for Books: Next-Generation Digital Discovery A digital disconnect between book metadata created for libraries and sales and the metadata required for effective tracking of online usage and citations prevents books from playing the role in research assessment that they could and should. At the core of the issue is a relatively low uptake of DOIs as persistent identifiers for books and correspondingly a lower number of researchers citing books using a DOI. This talk will focus on ScienceOpen’s support for open metadata for books, particularly a pilot project to create a catalogue for the Association of European University Presses (AEUP) based on DOIs. Going beyond the DOI we will provide background on other persistent identifiers and support for best practice indexing and discovery online. Citations are central to most research assessment strategies – but these focus almost entirely on journal articles. Including books would be particularly important for the humanities and social sciences and must be a long-term goal for universities and funders. Since the launch of ScienceOpen ten years ago, discoverability has always been a driver for our developments. Aggregation of content required focus on persistent identifiers across the platform from early on; publications, citations, or author outputs can be easily tracked over time via DOIs and ORCIDs. Our infrastructure is therefore not only built around scholarly metadata that is open and freely accessible to a global research community but provides the insights for publisher assessment around citation-tracking and alternative metrics, and the means for seamless sharing of open citations data. You are still missing DOIs for your books? Use the BookMetaHub to quickly enrich your books with PIDs and export to Crossref—you can also add chapter-level data to make your book even richer! The implementation of both the full Crossref Funder Registry and Research Organization Registry (ROR) within all metadata and publishing workflows allows publishers, editors, and their submitting authors to readily identify institutions, authors, or publications with the help of an intuitive look-up & match for ROR and FundRef IDs. And while outputting machine-readable reference lists remains a problematic topic, our DOI-based solution will ensure a simple handling and sharing even for non-full text XML content. The fact that open metadata increases visibility is no longer news: Each additional element—such as affiliations or biographical data, PIDs, translated titles, abstracts, or even full text translations—adds important context data to any publication, functions as a waypoint for further investigations within a certain research field, and helps to open up each publication to the reception of new audiences. And within the dynamic search interface of the ScienceOpen platform those data points manifest themselves as a nexus always linking back to your own Version of Record to increase discoverability and impact within the thicket of the academic research landscape. |
13:00 | Lunch |
14:30 | Posters and Booths 2-minute pitch |
14:50 | [Speakers from committee] Silvio Peroni (University of Bologna; OpenCitations) and Chiara Di Giambattista (University of Bologna; OpenCitations) OpenCitations: an independent, community-led, and not-for-profit Open Science infrastructure organisation to support reproducible and transparent research assessment Citations are a conceptual directional link from citing entities to cited entities, with the purpose of acknowledging other works. Such citations are a core part of scholarly communication, which involves the flow of information and ideas through the citation network, for the benefit of the scholarly endeavor at a whole. Often, citations have also been used as one of the proxies to measure the impact of research outputs, people and institutions, for instance in research assessment exercises. The open availability of citation data is a crucial requirement to enable reproducibility in bibliometrics and scientometrics research. In addition, it is a crucial requirement for guaranteeing transparency for research assessment purposes, since they allow even those evaluated to verify their assessments according to the source data. However, currently research assessment is affected by an overall lack of transparency of the methods in use, and the scholarly community has to strive to put to the commons citation and bibliographic data that cannot be copyrighted. Since its foundation in 2010, OpenCitations has been established as an independent, community-led, and not-for-profit Open Science infrastructure with the mission to harvest and openly publish accurate and comprehensive metadata describing the world’s academic publications and the scholarly citations that link them, and to preserve ongoing access to this information by secure archiving. OpenCitations enables the creation of reproducible metrics for research assessment exercises by providing open citation metadata, and develops and shares open-source software and open services. OpenCitations’ commitment to Open Science has been fostered also through numerous collaborations with other Open Science actors, including OpenAIRE and the Global Sustainability Coalition for Open Science Services (SCOSS), which in 2019 selected OpenCitations as an innovative service with the potential to support change in research assessment. Currently, the OpenCitations collections contain more than 2.6 billion indexed records, and it keeps growing, always remaining loyal to its principles and values, and with the priority to keep its services, software and data always without charge under open licenses for fostering maximum reuse. |
15:20 | [Selected talk ] Teresa Kubacka (ETH Library) and Simon Willemin (ETH Library) Evaluating the quality and reliability of open bibliometric data for country-wide research assessment purposes in Switzerland - insights from project TOBI Despite strong international relevance of research done at Swiss Higher Educational Institutions (HEIs), scientometric services in Switzerland remain relatively underused in the areas of research assessment and research management. This is striking especially in comparison with countries like Germany or the Netherlands. Each of those countries hosts a competence center for bibliometrics with country-wide relevance and has some system in place to track the national scholarly metadata in a standardized way. There is no such centralized body in Switzerland and the issue is further complicated by the distributed nature of governance in the country. Consequently, the institutions need to decide by themselves how and how much they want to use bibliometrics, which often turns out to be too big of an investment and leads to a missed opportunity. However, the current lack of existing country-wide structures also presents a chance to design and introduce new solutions for quantitative research assessment in a way that would reflect the values put forward by DORA and CoARA and would be adapted to the local requirements. The ETH Library of ETH Zurich has been active in the domain of scientometrics since 2020 and has been promoting the responsible use of scientometrics in research assessment. It is currently leading two projects in this topic, co-funded by “Swissuniversities”, a federally funded association of Swiss HEIs. In this presentation, we would like to share insights from one of them: “Towards Open Bibliometric Indicators” (TOBI), which started in March 2023. The anchor of the TOBI project are the DORA Recommendations #11-#14, which call for transparency and openness in data sources and methods used by organizations that supply metrics. We strongly believe that it means basing research assessment on open data, as only open data allows for a full reproducibility of calculations and curation. However, as long as the quality of the data remains unclear, biases may uncontrollably distort the calculated indicators and, in the worst case, render the results unreliable. If such biases are unknown to the analyst or decision-maker, distorted metric values may be misinterpreted and could lead to wrong decisions. Commercial, closed datasets--while suffering from the problems such as black-box curation, prohibitive licenses or being owned by big publishers--have the benefit that they have been widely used and scrutinized, so that the limitations of standard indicators are known to the scientometric community. TOBI aims to undertake a detailed evaluation of quality of the scholarly metadata contained in various open bibliometric data sources, in a subset related to the research conducted at Swiss HEIs. The respective subsets are being analysed in terms of available metadata, data completeness, disambiguation quality, coherence, biases and identification of potential gaps, considering different facets (publication types, research fields, etc.) whenever possible. The outcomes of project TOBI will be shared as reports, open source code and datasets, empowering the Swiss HEIs to become more active in the domain. In the presentation we will share the interim insights from the project and discuss challenges we have encountered. |
15:40 | [Selected talk] Maria Levchenko (EMBL-EBI) Europe PMC supports evaluation of preprints by linking available peer reviews in one resource with open APIs Preprinting is a method to share research results quickly and openly to a wide audience and the COVID-19 pandemic was a pivotal moment for the advancement of preprinting in the life sciences. The Declaration on Research Assessment (DORA) highlights “ the recognition of preprints offers value to early career researchers, whose career progression and ability to obtain funding may be negatively impacted by the extended review time and multiple rejections typical of journal publications'' and many institutions and funders now accept preprints on grant funding applications in recognition of a preprint being a complete and public draft of a scientific document that will allow reviewers to assess a more up-to-date picture of researchers' work. However, despite changing policies of institutions and funders to encourage or require them, lack of trust in preprints as non-peer reviewed outputs remains. Peer review is also occurring on preprints, but peer reviews are often hosted on alternative platforms to the preprints themselves and the association between them requires wider scale integration to improve discoverability and thus use for research assessment purposes. Europe PMC is a life sciences resource containing 42.2 million journal article abstracts and 8.8 million full text articles. Since 2018 preprints have been indexed in Europe PMC and now over half a million preprint abstracts and over 47 thousand full text preprints are indexed in Europe PMC, from over 30 different preprint servers. These preprints are linked to accompanying research articles, when they are published and an article status monitor tool has been developed to allow users to check if preprints have a more recent or published version, or have been withdrawn or removed. Europe PMC is now developing open source software to improve the linking of peer reviews on 3rd party platforms with preprints indexed in Europe PMC, to build on the existing aggregation of preprints into one indexing service and further assist in discoverability of associated peer review of this content. Collaborating with Sciety and Review Commons, Europe PMC is utilising the DocMaps infrastructure to index links to preprint peer reviews and associated metadata (such as review platform, review title, date, reviewer name (where provided)) and including available persistent identifiers, such as ORCID, DOI or RORID. All information will be available in Europe PMC’s open APIs for meta analysis and reuse and our approach to designing the display is user-centred using user research. This work is extensible to Crossref and DataCite peer review registered DOIs, which will create a rich ecosystem of research literature and peer review materials, including timelines and actors, available in one search engine and via the Europe PMC open APIs. Europe PMC is engaged with the community to evaluate machine readable metadata requirements for peer review material, having been instrumental in the JATS4R recommendation on Peer Review Materials. From this basis Europe PMC will be considering the unique needs and capabilities of the small but growing preprint peer review ecosystem to build standards for preprint peer review metadata, which will assist in changing research assessment criteria. |
16:00 | [Selected talk] Jessica Lam (Institute of Neuroinformatics, University of Zurich and ETH Zurich) Semantic Metadata Matching for Better Bibliography Linking Traditional approaches to quantitative research assessment rely heavily on bibliometrics, which include citation count, h-index, and impact factor. However, the accuracy of bibliometrics is tied to the quality of bibliography linking (i.e. the identification of the work being referenced in each bibliography entry). A recent study [1] showed that only two out of five freely available bibliographic databases had full bibliography linking accuracy for at least 30% of its PubMed papers. Given the substantial influence that bibliometrics can have on career prospects and perception of scholarly contribution, we believe there is a pressing need for better bibliography linking methods. The primary form of bibliography linking has been through Digital Object Identifiers (DOIs). Many publishers such as Elsevier and Springer strongly encourage authors to include DOIs in their bibliographies and make these DOIs openly available via CrossRef. Databases derived from CrossRef such as MAG, COCI, and Dimensions can therefore perform bibliography linking by DOI. The reliability of this method is supported by the aforementioned study [1], which showed that the CrossRef-derived databases had better bibliography linking than S2ORC, a database that does not utilise CrossRef. However, this method is likely to perform poorly on works from certain domains (e.g. machine learning) specifying DOIs is not the norm and works from preprint repositories (e.g. arXiv) that do not have DOIs. Another approach to bibliography linking is lexical metadata matching, such as the trigram-based matching of titles used in the construction of S2ORC. The primary advantage of this method is in its independence from author-provided DOIs and thus broader applicability; however, S2ORC was shown to have the lowest bibliography linking quality out of the five databases evaluated by the aforementioned study [1]. We propose to explore semantic metadata matching for bibliography linking. This involves using Natural Language Processing (NLP) to compute a vector representation for each bibliography entry and for each paper based on their metadata, then performing linking via a nearest neighbour search. We will benchmark the performance of multiple NLP models, ranging from general representation models (e.g. Sent2vec [2], SBERT [3]) to models designed specifically for scientific text (e.g. HAtten [4], SciNCL [5]). We will focus on citations between PubMed Central Open Access (PMCOA) papers because those ground-truth bibliography links are freely and openly available. Finally, we will also experiment with various linking distance thresholds to assess the trade-off between linking precision and recall. References
[1] Finding citations for PubMed: a large-scale comparison between five freely available bibliographic data sources (Liang et al., 2021). |
16:20 | Poster exhibition session with coffee break Due to the exponential increase in number of publications, researchers and journals, many bibliometric tools have been developed to evaluate research performance such vosview, publish or perish, etc. Each of which has its own purpose and none of them has a sophisticated module to evaluate and compare scientists between them. In this paper, we propose a new bibliometric tool called REPE (REsearch Performance Evaluator) that uses different bibliometric and scientometrics indicators to evaluate and compare scientists. Our tool helps research institutes in hiring and promoting the most convenient scientists for their research projects. Moreover, it can help scientists in finding other researchers to share research ideas and establish collaborations. [Poster] Yannis Manolopoulos (Open University of Cyprus) Do students surpass their mentors?The case of nurturers has been studied in the literature of Scientometics. It is common that often some doctoral graduates surpass their mentors in terms of research impact measured by considering bibliographic data (e.g. citations, h-index). Also, not rarely, the citation curve of the mentor is a simple replica of that of the doctoral graduate. Based on a dataset consisting of Professors in Greek Departments of Computer Science & Engineering, along with their doctoral students for a period of ~20 years, we will devise new metrics to quantify the above phenomenon and extract such special outlier cases. [Poster] Olga Pagnotta (University of Bologna) Investigating the performance of GROBID and OUTCITEIn a prior study (Cioffi & Peroni, 2022), we analysed the available reference extraction tools to understand their performances off-the-shelf – i.e. by using them as they have been configured, without prior training. We evaluated them against a corpus of 56 PDF articles (our gold standard) published in 27 subject areas (Computer Science, Arts and Humanities, Mathematics, etc.). From that analysis, we have identified the two most promising tools for bibliographic reference extraction and parsing, i.e. Anystyle and Grobid, which are CRF based. We have extended such study by training Grobid against an extended gold standard with various training configurations to understand how much the performances improve. As a result, we have also revised the code used for testing and comparing the reference extraction software to make it available also for others to be reused for similar analysis. Othe tests have been performed on OUTCITE and new conversions and evaluations softwares have been created for the purpose. The final aim of this work would be to develop a reference extraction service which enables a user to provide a PDF of a scholarly article in input and to have, in return, citation data and bibliographic metadata from all the references that are cited by the given article in a format that enables their ingestion in OpenCitations (Peroni & Shotton, 2020). As a first step, a series of tests have been performed to check if the two software have been recently updated. During the testing phase, some differences emerged comparing the results obtained in the current study to the ones of (Cioffi & Peroni, 2022). With those differences in mind, we have modified the evaluation code (Cioffi, 2022) used in the prior study to adapt it to the current version of the tools. Afterwards, we have proceeded with the training phase. For what concerns GROBID, a series of instructions are available for training the tool with specific data. The work on OUTCITE has been done with the development team in Cologne, on site, during a three months stay. [Poster] Alessandro Bertozzi (NET7) Serica: enhancing open scholarly research through collaborative metadata management and assessmentSerica is a multilingual collaborative digital environment for scholars of texts, images and musical documents from the 2nd century BC to the 19th and 20th centuries AD, concerning the Central Asian routes between China and Europe. The project was developed and designed in collaboration between the company Net7 and the University of Pisa and Turin. It was created to enable a community of researchers to take advantage of a shared platform to enter, manage, review, enrich and share scholarly metadata in an open format. The article delves into the methodologies employed in the collection, organization, and enrichment of these metadata within the Serica platform. It provides a detailed examination of the tools employed for ensuring accurate representation, discoverability, and interoperability of research materials in an open format. Furthermore, the article highlights how Serica's shared platform encourages scholars from different disciplines to contribute their expertise and insights, facilitating interdisciplinary dialogue and knowledge exchange. Finally, we will focus on some possible applications of scholarly metadata reuse for research evaluation purposes. |
20:00 | Social Dinner "Signorvino" restaurant, Piazza Maggiore, 1/C, 40124 Bologna BO [open in maps] |
09:00 | [Invited talk] Suzanne Dumouchel (EOSC Association | OPERAS AISBL | CNRS) Open research assessment: switching from quantitative to qualitative practices The need to work on the quality of open scientific metadata is no longer up for discussion. Everyone is aware of this, both for data sharing and as part of the Open Science policy. It is also true in terms of the evolution of research evaluation practices. We need to promote new ways of doing research, by proposing alternative models to impact factors and the quantitative approach. Open scientific metadata plays a decisive role here. But here again, we could be stuck with the quantitative approach. We need to go further and work on the link between metadata, in particular around knowledge graphs. Creating links means giving meaning, and in so doing, we are moving towards a qualitative approach to research evaluation practices. My presentation aims to explore this aspect by also considering the role of vocabularies and ontologies used in metadata schemas and in establishing the link between data and metadata. |
09:30 | [Invited talk] Stephen Curry (Imperial College London | DORA) Opening up space for research assessment reform Despite more than a decade of campaign efforts, research assessment remains closely tied to publication metrics that are often based on citations and aggregated into forms, such as the journal impact factor and H-index, that obscure the details of the underlying research quality. Open practices offer a way out, not just through increased transparency, which permits finer discrimination of citation performance, but through broader shifts towards open scholarship that are necessary to maintain healthy relations between the academy and the societies they purport to serve. In my talk I will explore how infusions of openness can help the academic community to think more deeply and more robustly about how to recognise and reward the higher qualities of research. |
10:00 | [Invited talk] Toma Susi (CoARA | University of Vienna) How can open citations and metadata help us in reforming research assessment? The current forms of research assessment have become the defining problem for many stakeholders in academia, with a whole host of detrimental effects. For the research system as a whole, this fuels the crises of reproducibility, hinders the much-needed transition to open science, and upholds a costly and outdated publishing system. The pressure to conform to one-sided and metricized forms of assessment contributes to stress, burnout and, most tragically, to disvaluing of the diverse talents and contributions that they could bring to research. Although these problems have been long recognized, the systemic nature of the dilemma has stymied reform – until the Coalition for Advancing Research Assessment (CoARA). Now that the reform of research assessment is finally moving ahead, what role can open citations and metadata play? While CoARA's focus is on qualitative assessment, the responsible use of indicator will also play a role, especially in certain kinds of evaluation settings. It is important to note that an overarching principle for these reforms is the independence and transparency of the data, infrastructure and criteria necessary for research assessment, as well as clear and transparent data collection, algorithms and indicators, in the control and ownership by the research community. The opportunity to make an important contribution to a better system of research evaluation is therefore clearly at hand. |
10:30 | Coffee break |
11:00 | [Invited talk] Paolo Manghi (OpenAIRE) Challenges in Constructing Scientific Knowledge Graphs for Research Assessment in Open Science
The advent of Open Science has propelled the scholarly record to a new level, expanding the portrayal beyond the traditional publication of research products to include data and software. The integration of metadata records on all research products and their associated interrelationships, achieved via Scientific Knowledge Graphs (SKGs), serves as a tool for generating indicators for research assessment purposes. However, the urgency and momentum of Open Science has led to publication workflows that possibly disregard their traditional “gatekeeping” role, resulting in a scholarly communication record where not all products are subject to peer-review and proper metadata curation, depending on the given context's maturity. Furthermore, the variety of research communities and interpretations of research data and software, the moltitude of data sources and their multi’purpose usages, poses a challenge in constructing SKGs that include only scientifically relevant products and adequately classify them. Consequently, research assessment in Open Science remains an area of ongoing exploration. This presentation highlights today’s challenges in constructing SKGs to facilitate an open and transparent research assessment infrastructure, and reflects on possible short-term and long-term solutions to create a uniform Open Science scholarly record that supports it. |
11:30 | [Speaker from committee] Giovanni Colavizza (Haute Ecole d’Ingénierie et de Gestion du Canton de Vaud | University of Bologna) and Puyu Yang (University of Amsterdam) Wikipedia and open access Wikipedia is a well-known platform for disseminating knowledge, and scientific sources, such as journal articles, play a critical role in supporting its mission. The open access movement aims to make scientific knowledge openly available, and we might intuitively expect open access to help further Wikipedia's mission. However, the extent of this relationship remains largely unknown. To fill this gap, we analyzed a large dataset of citations from Wikipedia and modelled the role of open access in Wikipedia's citation patterns. We find that open-access articles are extensively and increasingly more cited in Wikipedia. What is more, they show a 15% higher likelihood of being cited in Wikipedia when compared to closed-access articles, after controlling for confounding factors. We name this the "open-access citation effect", which is particularly strong for articles with low citation counts, including recently published ones. Our results confirm how open access plays a key role in the dissemination of scientific knowledge, including by providing Wikipedia editors timely access to novel results. |
12:00 | [Speaker from committee] Philipp Mayr (EXCITE | GESIS - Leibniz Institute for the Social Sciences) Outcomes of the OUTCITE Project on Reference Extraction & Linking in the Social Sciences We present main outcomes of the DFG-funded project OUTCITE [1], including, but not limited to:
- Comparison of different extraction and parsing pipelines on different gold datasets We will try to balance introductory aspects for new participants and novel aspects for attendees of previous reference extraction workshops.
[1] https://gepris.dfg.de/gepris/projekt/293069437 |
12:30 | Lunch |
14:00 | [Selected talk] Thanasis Vergoulis (ATHENA RC) BIP! Services: Demonstrating the Potential of Open Scholarly Data in Research Assessment Research assessment is central in a wide variety of applications in the fast-paced scholarly research landscape, ranging from assisting knowledge discovery and strategic planning of research funding and performing organisations to informing researcher hiring and promotion processes. In the past, research assessment processes were heavily relying on non-transparent indicators being calculated based on data contained in restricted sources (e.g., the data silos of scientific publishers), hindering transparency and raising impediments to the independent verification of the findings by third parties. However, the rising popularity of Open Science during the recent years started changing the situation by making a large variety of scholarly data openly available. BIP! Services (https://bip.imsi.athenarc.gr/site/home) is a comprehensive suite of tools that leverages open scholarly metadata, to demonstrate their potential in assisting research assessment processes by calculating transparent, citation-based indicators for research products and related entities (e.g., researchers). The workshop presentation will introduce the audience to the key tools comprising BIP! Services, namely BIP! Finder, and BIP! Scholar, as well as their underlying data, which are also available as an open dataset, called BIP! DB. The presentation will also explain the benefits introduced by the previous services and will elaborate on the technical details behind them. To begin with, BIP! Finder is a service that facilitates scientific knowledge discovery by offering publication ranking functionalities according to a variety of (citation-based) impact indicators transparently calculated on open scholarly metadata (including citations from OpenCitations). Each indicator is designed to capture scientific impact from a different perspective, offering useful insights for diverse use cases. Researchers can utilise BIP! Finder to prioritise their reading, uncover hidden gems, identify emerging trends, and make informed decisions regarding their publication strategies. BIP! Scholar, the second service to be presented, can help researchers to create research profiles that summarise their research careers. These profiles contain a variety of indicators attempting to capture insights on different aspects related to researchers’ activities (e.g., their productivity, overall impact, dedication to Open Science practices, etc). The researchers can also determine their role in the production of each research product and they can add narratives to better explain the respective lines of work and elaborate on their motivation and importance. These profiles can remain private to serve as a self-monitoring tool for researchers, but they can also be shared with third parties (e.g., hiring committees) to provide useful insights on researchers’ achievements in different topics or under the perspective of different contribution roles, assisting the respective research assessment processes. Finally, the underlying data for both aforementioned services are publicly available through BIP! DB, an open dataset, which is frequently updated, built upon scholarly metadata from several well-known resources including the OpenAIRE Graph, OpenAlex, and OpenCitations. By making these impact indicators openly available, BIP! DB promotes transparency and facilitates the creation of value-added services from the broader scholarly community. |
14:20 | [Selected talk] Peter Aspeslagh (ECOOM-University of Antwerp) Retrieving and registering author affiliation data for Flemish non-Web of Science SSH publications: pragmatic strategies to integrate scholarly metadata from a multitude of sources ECOOM-University of Antwerp, the Antwerp branch of the Flemish R&D Monitoring Centre, manages the Flemish Academic Bibliography for the Social Sciences and Humanities (VABB-SHW), compiling all SSH publications from the five Flemish universities. VABB-SHW serves as major data source for the Flemish performance-based research funding system. In 2019, an internationalization parameter was added: publications with at least one co-author affiliated to a non-Belgian institution receive an increased weight in funding distribution. However, VABB-SHW does not include author affiliation data. Still, almost half of those publications are included in the Web of Science, allowing a quick download of author affiliation data. The other half, approved by an authoritative panel (GP-publications), are not. In order to implement the internationalization parameter, author affiliation data for a subset of more than 23.000 publications for a 10 year time-window (2011-2020) had to be retrieved. Therefore, ECOOM-Antwerp started an extensive data collection and registration operation. An online application was established, allowing several steps to maximize the efforts of the operation. Apart from the registration of organization identifiers of the affiliations (first GRID, later ROR), available fulltexts were stored, current metadata was checked and missing data was added, focussing on DOI’s and abstracts. A multitude of data sources and strategies had to be consulted and applied in order to collect the necessary data. First of all, non-Web of Science citation databases, like Crossref or Scopus, were checked via title matching or DOI (if available). This delivered author affiliation data for about 15% of the dataset. Next, remaining publications with DOI in VABB-SHW were visited and registered manually. Subsequent steps contained the retrieval of fulltexts via Google Scholar or the websites of journals and publishers, finishing with the consultation of the hardcopy in a university library if no digital version was available. This means that manual intervention was needed for 85% of the publications, although the number included in non-Web of Science citation databases is recently increasing. The data collection project highlighted a number of gaps in the open availability of scholarly metadata. First, a lot of (mainly domestic) publications do not contain DOI’s or do not have their metadata transferred to international databases. Second, the level of availability of author affiliation data on the websites of publishers or on the pdf files of the fulltexts hugely differ. Furthermore, international organization databases, like ROR, do not cover all organizations mentioned in the affiliation data. About one third of affilliated organizations (more than 2000) had to be added to the database. This delivered a ‘ROR+’ organization database that will be developed for use in multiple contexts. It is our aim to present this data collection process and the lessons learned during the workshop, discuss our experiences and sketch ideas for further valorization of open scholarly metadata, in particular author affiliation data. |
14:40 | [Selected talk] Martin Czygan (Internet Archive) Refcat Citation Graph Updates As part of its scholarly data efforts, the Internet Archive released a first version of a citation graph dataset, named refcat, derived from scholarly publications and additional data sources, in October 2021 [1]. Since then, we have been updating the raw data input as well as refining the derivation process. In this talk we want to give an update on the project and the dataset. |
15:00 | Coffee Break |
15:20 | [Selected talk] Patryk Hubar (The Institute of Literary Research of the Polish Academy of Sciences / University of Warsaw) Machine-learning based solutions for retrospective conversion of printed bibliographies. Unleashing humanities metadata for literary scholars In humanities research, it is relatively common to reference older publications. Accessing metadata for these resources can present a challenge, as the bibliographic information is often contained within printed bibliographies, rather than in easily accessible online databases. Enhancing the accessibility to such information has the potential to significantly benefit research in the arts and humanities field. This presentation aims to discuss automated strategies employed in retrospective conversion process of these physical bibliographies, taking the Polish Literary Bibliography (PLB) as a case in point, using SpaCy, a natural language processing tool developed by Explosion AI. The task of processing bibliographic entries from domain-specific bibliographies presents difficulties from multiple perspectives. Typically, such bibliographies do not adhere to any particular standard for publishing information and instead rely on their own solutions. Parsing bibliographic records and extracting their individual elements, such as the author, title, and physical description, poses a significant challenge. A further complication arises from the quality of Optical Character Recognition (OCR), the suboptimal performance of which often undermines the efficiency of automatic processing. Despite these challenges, a model capable of effectively addressing these tasks has been successfully trained through the application of machine learning-based mechanisms. This enhancement in the quantity of bibliographic information about literary works is particularly vital in the context of identifying sources present in bibliographic references in academic publications. In the era of open science, achieving the most comprehensive representation of literary output is of paramount importance. The retrospective conversion of the Polish Literary Bibliography not only increased a significant number of citations publicly available. It has also increased the degree of interlinking between documents by capturing a broad variety of text types, from reviews and polemics to other forms of literary interaction. This comprehensive approach enriches the academic landscape, providing a more interconnected view of literary production, and fostering a deeper understanding of the relationships and dialogues inherent in academic and literary work. |
15:40 | [Selected talk] José Luis Ortega (Institute for Advanced Social Sciences - Spanish National Research Council) SILICE: A Spanish proposal for a local scholarly information system based only on open data The aim of this communication is to present a new prototype of a national scholarly information system based on open sources. The reason of this proposal is to explore the possibilities and limitations of bibliographic databases ingested by open data, and their contribution to research evaluation processes based on Open Science criteria. Specifically, how it is possible to create ad hoc information systems apart from commercial platforms, with a much more reduced cost and adaptable to each particular environment. SILICE (Sistema de Información para la LIteratura Científica Española) is a beta web application (web site) that attempts to gather the scholarly production of both Spanish researchers and organizations, including citations and altmetric mentions to these publications. The system is structured in several relational modules that illustrate the outputs and associated metrics of each entity. These entities are: publications, authors, organizations and disciplines. The main originality of this product is that it is entirely based on open sources. Two strategies were used to obtain the most reliable collection of the Spanish research publications: a set from authors and a set from organizations. This double strategy is due to the double need to gather the total production of Spanish authors, wherever it is produced; and the total production of Spanish organizations, independent of the origin of their researchers. For authors, we have obtained 110 thousand author profiles registered in ORCID, selecting researchers working in Spanish research organizations. Form this list of authors, we have extracted the list of identifiers (mainly DOIs) of their publications. For organizations, we have selected Spanish research organizations from RoR (Research Organization Registry) and have searched for their publications in different open databases: Crossref and OpenAlex. Both sets of publications, from the authors and from the organizations, were stored in a relational database. This method allows to define different entities according to a specific open source: authors (ORCID), organizations (RoR), publications (Crossref, OpenAlex) and disciplines (Crossref). Finally, this database is enriched with citations and mentions to these publications. Unlike other commercial databases, citations are not computed among the stored references of the indexed publications. Instead, we aim to create a graph database (Neo4j) of citations where citation relationships are deposited, regardless of the citing documents are indexed in the database. In this form, we can show the global impact of the publications, and not limited to the Spanish environment. To build this citation database we attempt to use COCI and OpenAlex as main citation sources. Crossref Event Data and Altmetric would be used to add altmetric mentions. |
16:10 | [Selected talk] Vincent W.J. van Gerven Oei (Thoth) Managing and Disseminating Open Metadata of Scholarly Monographs Within the scholarly publishing landscape, monographs are often considered as secondary to journals. Nevertheless, especially within the Humanities and Social Sciences, monographs remain the primary mode of scholarly communication and a core part of a scholar's research assessment basis. Yet, the monograph publishing ecosystem has been slow to adopt and integrate open standards, including with regard to metadata, making them also less accessible to libraries, research aggregators, and academic institutions. Open metadata management and dissemination platform Thoth solves for these issues, giving both publishers an open source tool to manage and disseminate CC0-licensed metadata to a variety of repositories, and libraries and other third parties to ingest and access those data through an open API. This presentation provides an overview of the design architecture and user interface of Thoth, its various outputs and APIs, and the ways in which Thoth is helping authors and open access publishers to improve their works' visibility in the scholarly publishing landscape |
16:30 | Closing |
17:00-19:00 | Informal tour of Bologna |
The University of Bologna is the oldest university in the western world, and one of the largest universities in Italy (with about 90,000 enrolled students).
The workshop will take place at the University of Bologna in the heart of the city - the Department of History, Cultures, and Civilization (italian translation: Dipartimento di Storia, Culture e Civiltà), Piazza S. Giovanni in Monte, 2, Room "Aula Prodi". From the train station you can either walk to DISCI (30') or take either bus 25 (direction “Deposito Due Madonne”, details at https://www.tper.it/bo-25) or bus C (direction “Cestello”, details at https://www.tper.it/bo-c).
The event's sessions will be also streamed online.
Further information about the virtual and free attendance will follow soon.
The airport “Guglielmo Marconi” is located at 15-20 minutes by car from the city centre. A direct train, Marconi Express, leaves regularly from the Airport for Bologna Central Rail Station. The trip costs 11€. A taxi costs about 20€ (call a taxi at 0039 051 372727).
The organizers are negotiating with local hotels for rooms to be reserved and made available at special rates for participants. Check the Hotels in Bologna.
Bologna is home to numerous prestigious cultural, economic and political institutions as well as one of the most impressive trade fair districts in Europe. In 2000 it was declared European capital of culture, and in 2006, a UNESCO "city of music". Bologna’s porticoes have been listed in the UNESCO World Heritage List in 2021.
via Zamboni, 32 (first floor) | Bologna, BO | 40126 Italy
open an issue on
github!