The Project and the Observatory

ToscanaOpenResearch (TOR) is the official portal of the Tuscany Region, designed to promote and showcase the regional research, innovation, and higher education ecosystem.

The project was born within the framework of the Regional Conference for Research and Innovation (established by Regional Law 20/2009). Specifically, it serves as the operational tool for its Regional Observatory, aimed at fostering transparent and inclusive ecosystem governance and supporting the policies of the Regional Government.

ToscanaOpenResearch represents an advanced policy intelligence tool. It provides a qualified knowledge base on the research, innovation, and higher education system, allows for the sharing of information scenarios with stakeholders, and supports the design of evidence-based policies.

From a technical standpoint, the portal integrates various information sources - primarily Linked Open Data - made interoperable through a specific ontology and dedicated controlled vocabularies. The data, organized into dynamic information dashboards, is fully accessible and downloadable to allow exploration and reuse by economic operators, researchers, and citizens.

In line with these objectives, in 2024 the Tuscany Region signed the "Barcelona Declaration on Open Research Information". This underscores its commitment to promoting policy practices founded on open data, with the ultimate goal of fostering decision-making processes based on transparent and inclusive information.

Methodology

ToscanaOpenResearch is based on a platform for the integration of and access to heterogeneous data related to the research and innovation ecosystem, built on a relational database. To promote data use and interoperability, and to facilitate the extraction and analysis of information from different classification systems, ToscanaOpenResearch uses an approach that combines:

  • Crosswalks between the sources' native classifications: mappings that harmonize heterogeneous classification systems—both national (such as "Scientific-Disciplinary Sectors" [SSD], "CUN Areas", and "ATECO" economic activities) and international (such as the "International Patent Classification")—allowing for cross-queries regardless of the original taxonomy.
  • External classification systems implemented by applying automated Natural Language Processing (NLP) techniques to the abstracts of projects and publications.

The development of the new release of TOR is the result of a process led by the Tuscany Region, supported by its technical partner SIRIS Academic.

Data

To date, TOR mainly integrates open data from national, European, and global open databases. Specifically, the main integrated databases are:

  • CORDIS: the European Union's platform for research projects funded under European framework programs (Horizon 2020, Horizon Europe, etc.);
  • OpenCoesione: the Italian portal for projects funded by cohesion policies, used to monitor the expenditure of European structural funds across the national territory;
  • OpenAlex: an open bibliometric database covering scientific publications, authors, institutions, and their affiliations;
  • PATSTAT: the European Patent Office's database containing worldwide patent information, including thematic classifications, inventors, and applicants;
  • USTAT (MUR Statistics Office): statistical data on the Italian university system, including students, graduates, teaching staff, PhD programs, and disciplinary classifications;
  • Cerca Università: an information source regarding Italian universities and their educational offerings;
  • Excelsior (Unioncamere): an information system focusing on the occupational and training needs of Italian companies, providing data on requested professions, economic sectors, and skills;
  • Registro Imprese (Business Register): the official source for Italian companies, featuring information on their sector of economic activity, production value, and registered office.

How Heterogeneous Data Communicate with Each Other

To allow for cross-cutting analyses among heterogeneous sources, crosswalks between classification systems have been established alongside data enrichment operations. In particular:

  • SSD-GSD Correspondence: a mapping between Scientific-Disciplinary Sectors (SSD 2015 and 2024) and the new Scientific-Disciplinary Groups (GSD), organized according to the 14 disciplinary areas of the National University Council (CUN). This correspondence is defined by the Ministry of University and Research (D.M. 639/2025).
  • Disambiguation of Organizations: a harmonization table was created to link the different spelling variations of Italian university names (found in USTAT and CercaUniversità sources) to unique identifiers through automated normalization and manual validation.
  • Geolocation of Patent Holders: a methodology was developed and implemented to integrate geographic information (at the provincial and regional levels) for patent holders, which is largely incomplete in the original PATSTAT source.
  • Development and implementation of text classifiers for the automatic labeling of research documents through content analysis.

Focus 1 – Semantic Analysis

The abstracts of publications, patents, and R&I projects contain a wealth of textual information that describes in detail current challenges, proposed or demonstrated progress, and the expected impact of the innovation process. To unlock the value of this semantic richness, Natural Language Processing (NLP) and Deep Learning techniques were used to analyze how the outputs of research activities align with specific taxonomies identified as relevant to the regional context. Specifically, these include the ERC classification (a result of the TOR project) and the taxonomy of Tuscany's Smart Specialization Strategy (S3)—the latter currently in the implementation phase. The approach relies on an automated classifier that analyzes each document individually, assigning taxonomic categories based on the actual text content rather than on declared metadata or manual tagging. This ensures greater consistency and comparability across heterogeneous sources.

Focus 2 – ERC Classification: Development and First Application

As part of the TOR project, an experimental framework for automated text classification was designed and developed to assign research documents to European Research Council (ERC 2024) panels.
Adopting the ERC classification addresses the need for a shared European disciplinary framework, which makes it possible to:

  • compare the research activities of Tuscan actors with those of other regions and countries on a consistent basis, overcoming the limitations of purely national classifications;
  • utilize the classifications most commonly used in the context of European research policies, rather than the native classifications of the sources used, which are often more technical in nature (such as the "topics" adopted by OpenAlex);
  • categorize documents from heterogeneous sources under a single, "cross-cutting" system.

It is important to emphasize the experimental nature of this activity: the automated classifier is not just a tool to generate visualizations; it is an actual methodological output of the project, developed jointly by SIRIS Academic and the various Sectors of the Tuscany Region involved in the project.

The results presented in the visualizations derive from a first application of the classifier. The performance is already reliable for exploratory analysis and general comparisons, but it will be further refined with subsequent iterations of the model. The developed classifier has been made publicly available on the HuggingFace platform. It is a model trained for multi-label classification: starting from the title and abstract of a scientific publication, the system can simultaneously assign multiple ERC categories, reflecting the often interdisciplinary nature of research.

Partners

The development of ToscanaOpenResearch is the result of the collaboration and joint commitment of several Regional Government departments and directorates involved in supporting research and innovation, the Directorate for Information Systems, Technological Infrastructures and Innovation, IRPET (Regional Institute for the Economic Programming of Tuscany), FST (Fondazione Sistema Toscana), and the URTT (Regional Office for Technology Transfer).

Scientific Collaborations and Co-design

The methodological framework of TOR stems from a long journey of participatory co-design. This process has actively involved:

  • Universities and Research Centers based in Tuscany.
  • Representatives of local businesses, to align the tool with the needs of the regional production system.
  • Leading national and international agencies, including AlmaLaurea, OpenAIRE, UIBM, and Cineca.

This methodology was officially recognized as a national best practice within the PON Governance initiative (Open Community PA 2020).

Technical Development and Communication

The new version of TOR - which aims to overcome some of the limitations of the previous release - is currently being developed through a collaboration between the Tuscany Region (Directorate for Education, Training, Research and Employment; Directorate for Information Systems, Technological Infrastructures and Innovation), SIRIS Academic (domain expert), and TAI and Netseven (responsible for building the information infrastructure and the website). The management of communication activities and the dedicated YouTube channel is entrusted to Fondazione Sistema Toscana (FST).

Useful links

This section brings together a curated selection of links to introduce the various actors involved in Higher Education, Research, and Innovation within the Tuscany Region:

  • Giovanisì: the Tuscany Region's project for youth autonomy. It is a system of opportunities aimed at young people up to 40 years of age, funded through regional, national, and European resources (from the ESF+, ERDF, and EAFRD programs). The project is divided into 5 areas: I study and train, Work, I start a business, I participate, and Culture.
  • DSU Toscana: the Tuscany Region's Agency for the Right to University Education. It provides services and support measures for university students enrolled in bachelor's, master's, PhD programs, and specialization schools across the regional territory.
  • URTT (Regional Office for Technology Transfer): active since 2020, its goal is to create inter-university coordination to support the research structures and Technology Transfer Offices of universities and partner institutions, fostering a shared vision and an integrated approach to adding value to research.
  • TOUR4EU: an association under Belgian law that brings together the Tuscany Region and the seven universities in the territory. Its objective is to promote the internationalization of university research and the participation of Tuscan research groups in Horizon calls and other EU programs.
  • The "University and Research" page on the Tuscany Region website gathers all information regarding projects that implement the Regional Government's policies on the right to university education, higher education, and the support and promotion of research. These are managed by the Department for the Right to University Education and Support for Research.
  • Sviluppo Toscana S.p.A.: an in-house company of the Tuscany Region. It operates primarily in support of the region and its dependent entities within the framework of regional planning policies. As a fully controlled operational arm of the Tuscany Region, it manages projects funded by European Structural and Investment Funds (ESIF); promotes technology transfer and the enhancement of research; manages business incubators and productive infrastructures; and helps businesses access calls for proposals and funding opportunities.
  • IRPET (Regional Institute for the Economic Programming of Tuscany): a public institution that conducts research in the economic, social, and territorial fields, aimed at the planning, analysis, and evaluation of public policies. Its activities support the Regional Government and Council, but more broadly, IRPET conducts research in collaboration with and for public and private operators, institutions, other research institutes, and universities.
  • Invest in Tuscany: a liaison office between the Tuscan territory and the international economic and financial community. It serves as a point of reference to support potential investors in building and developing their business in Tuscany.
  • Sportello imprese Unlock: a project by the Tuscany Region in collaboration with non-agricultural trade associations. It offers micro, small, and medium-sized enterprises a permanent support desk capable of providing information on public funding and directing needs and projects toward practical solutions.
Pubblicato il: