Winter School 2022

The second winter school of the KnowGraphs takes place on January 31st – February 2nd, 2022. The goals of this school are:

  1. Disciplinary Training
    • Topic: Scientific foundations
    • Approach: Tutorials on Techniques related to KGs
  2. Complementary Training
    • Topic: Research methodology
    • Approach: Data challenge
  3. Professional Development
    • Topic: Possible professional paths in Science and Industry, IPR
    • Approach: Industry talks and Talks on Data and Law

Programme

Serving and Querying Open Knowledge Graphs on the Web

Presenter: Prof. Dr. Axel Polleres

Abstract: This talk will be divided in two parts… Motivated by querying Wikidata as a rich source knowledge graph for all kinds of practical projects and applications, we will cover in a tutorial style manner SPARQL as a query language for KGs. We will also discuss RDF’s and SPARQL’s limits and related challenges: in terms of expressivity (for instance regarding querying contextualized graphs, such as property graphs) but also in terms of the bottleneck of current SPARQL endpoints in terms of scalability and serving complex queries to many users/clients concurrently.
In the second half of the talk we’ll talk about a possible solution to this bottleneck from our own research: We will present and step by step introduce the WISE-KG [1,2] project, a framework for serving and querying SPARQL under high concurrency, to many users as an extension of the Linked Data Fragments Framework [3] by Verborgh et al. Time allowed, we will also cover and explain the underlying HDT [4] queryable graph compression format, and give a further outlook on how polyglott persistence and replication models could make SPARQL querying of Distributed Knowledge Graphs more scalable.

References:

  • Amr Azzam, Javier D. Fernández, Maribel Acosta, Martin Beno, and Axel Polleres. SMART-KG: Hybrid shipping for SPARQL querying on the web. In The Web Conference 2020, Taipei,Taiwan, 2020.
  • Amr Azzam, Christian Aebeloe, Gabriela Montoya, Ilkcan Keles, Axel Polleres, and Katja Hose. WiseKG: Balanced Access to Web Knowledge Graphs. In Proceedings of the Web Conference 2021, pages 1422–--1434, Ljubljana, Slovenia, 2021. ACM / IW3C2.
  • Ruben Verborgh, Miel Vander Sande, Olaf Hartig, Joachim Van Herwegen, Laurens De Vocht, Ben De Meester, Gerald Haesendonck, Pieter Colpaert: Triple Pattern Fragments: A low-cost knowledge graph interface for the Web. J. Web Semant. 37-38: 184-206 (2016)
  • Javier D. Fernández, Miguel A. Martinez-Prieto, Claudio Gutiérrez, Axel Polleres, and Mario Arias. Binary RDF Representation for Publication and Exchange (HDT). Journal of Web Semantics (JWS), 19(2), 2013.

Slides: Part 1, Part 2


Bio: Axel Polleres heads the Institute for Data, Process and Knowledge Management of Vienna University of Economics and Business (WU Wien) which he joined in Sept 2013 as a full professor in the area of "Data and Knowledge Engineering". Since January 2017 he is also a member of the Complexity Science Hub Vienna Faculty. Between January and June 2018, he has been appointed as visiting professor at Stanford University. He obtained his Ph.D. and habilitation from Vienna University of Technology and worked at University of Innsbruck, Austria, Universidad Rey Juan Carlos, Madrid, Spain, the Digital Enterprise Research Institute (DERI) at the National University of Ireland, Galway, and for Siemens AG's Corporate Technology Research division before joining WU Wien. His research focuses on Intelligent Data Management, Data Science, Artificial Intelligence, Semantic Web and Knowledge Graphs, particularly on query languages, reasoning about ontologies, rules, logic programming, Web services, knowledge management, Linked Open Data, configuration technologies and their applications. Moreover, he actively contributed to international standardisation efforts within the World Wide Web Consortium (W3C) where he co-chaired the W3C SPARQL working group.

Application talk 1: Use of knowledge graphs in the health domain. Lessons learned,

Presenter: Dr. Irini Fundulaki

Abstract: In this talk we are going to present the ICT Ecosystem we designed and developed for the National Network For Genetic Cardiovascular Diseases Study And Prevention Of Sudden Death In The Young On The Basis Of Precision Medicine.

Slides: Link

Bio: Dr. Irini Fundulaki is a Principal Researcher (Grade B) of the Information Systems Laboratory at the Institute of Computer Science (ICS), FORTH. She is interested in database management and information systems for Web Data and more specifically on provenance models for RDF data, storage and indexing Schemes for RDF data provenance and access control for RDF and XML data. She is interested in Benchmarking for Linked Data and as the scientific responsible for the Linked Data Benchmark Council (LDBC) and Holistic Benchmarking for Big Linked Data (HOBBIT) FP7 and H2020 European projects respectively, she designed and developed benchmarks for instance matching, linking and versioning. Since 2018 she is the scientific supervisor on behalf of the Institute of Computer Science – FORTH in the National Flagship Actions on Precision Medicine: (a) The Hellenic Precision Medicine Network on Cancer (b) The National Network For Genetic Cardiovascular Diseases Study And Prevention Of Sudden Death In The Young On The Basis Of Precision Medicine and (c) The National Research Network for Neurodegenerative Diseases on the basis of Precision Medicine. She has more than 80 publications, with 2278 citations and an h-index of 28.

Embedding Knowledge Graphs and Graph Queries

Presenter: Dr. Michel Cochez

Abstract: Machine learning approaches have been applied successfully in many fields, also in knowledge representation. For example, graph embedding techniques have been taken up by the community as a tool to solve various tasks. Also other models which connect the knowledge representation (where information is typically represented in the form of a graph) with the machine learning world exist. In this talk, I will first give a general overview of the topics in the field of representation learning for knowledge graphs, present several of the algorithms used to obtain ambeddings, and then present recent work on inductive representation learning and approximate graph query answering.

Slides: Link

Bio: Dr. Michael Cochez is an assistant professor in the Knowledge Representation and Reasoning group at the Vrije Universiteit Amsterdam and manager of the Discovery Lab (an ICAI lab in collaboration with Elsevier and the University of Amsterdam). He works on bridging the gap between machine learning and knowledge graphs. His research interests include embedding of knowledge graphs for downstream machine learning tasks, dealing with missing information in graphs (link prediction, approximate graph query answering) and applications such as question answering and recommendations. dr. Cochez obtained his BSc. degree from the University of Antwerp, Belgium, his MSc. and Phd from the University of Jyväskylä, Finland, and was a postdoc at Fraunhofer FIT, Germany. More info https://www.cochez.nl

Challenges for Efficiently Creating and Maintaining Knowledge Graphs

Presenter: Prof. Dr. Maria-Esther Vidal

Abstract: Knowledge graphs (KGs) have gained momentum as expressive data structures to represent the convergence of data and knowledge spread across various data sources. Albeit coined by the research community for several decades, KGs play an increasingly relevant role in scientific and industrial areas. In particular, the rich amount of data existing in encyclopedic KGs like DBpedia and Wikidata, or domain-specific KGs (e.g., Bio2RDF) demonstrate the feasibility of integrating factual domain-specific knowledge following the Linked Data principles.
Data integrated into existing KGs are collected in heterogeneous formats or physically distributed over multiple data sources. The declarative definition of KGs using W3C recommended languages like R2RML and RML have gained momentum to provide transparent, maintainable, and traceable processes of KG creation. Nevertheless, KG creation is laborious, and various parameters like number and type of mapping assertions and data source complexities like large volume, variety, and high duplicate rates may considerably affect the performance of KG creation. Thus, the process of KG creation demands specialized data management techniques to scale the process of KG creation up.
In this tutorial, we present the challenges faced at data integration and knowledge engineering levels to empower the pipelines of KG creation. We will explain the role of knowledge extraction, mapping languages, integrity constraints, and provenance. We will further discuss techniques for planning and transforming heterogeneous data into RDF KGs following declarative mapping assertions specified in R2RML and RML. Given a set of mapping assertions, the planner provides an optimized execution plan by partitioning and scheduling the execution of the mapping assertions. Moreover, at the execution level, we will discuss the role of physical operators in efficiently performing the optimized execution plans. The impact of these techniques will be addressed on state-of-the-art engines for RDF KG creation and existing benchmarks. Lastly, we will present strategies for efficiently validating integrity constraints over KGs specified using the W3C recommended language SHACL.
As a result, at the end of the tutorial, the attendees will put in perspective the benefits of planning the execution of pipelines for KG creation and validation, and the role of data management for efficiently constructing and assessing the quality of KGs.

Slides: Link

Bio: Prof. Dr. Maria-Esther Vidal is a full professor at the Leibniz University of Hannover and leads the Scientific Data Management (SDM) group at TIB-Leibniz Information Centre for Science and Technology. She is also a member of the L3S Research Centre, and a full professor (retired) at Universidad Simón Bolívar (USB), Venezuela. She has made significant contributions to data management, semantic data integration, and machine learning over knowledge graphs. Maria-Esther is a co-author of more than 220 peer-reviewed articles in Semantic Web, Databases, Bioinformatics, and Artificial Intelligence. She has been awarded the Science Award on Responsible Research by Stifterverband with the recommendation of the Leibniz Association. Maria-Esther is also actively shaping her research communities. She has been an editorial board member of renowned journals (e.g., JWS, JDIQ) and general chair, co-chair, senior reviewer of major scientific events (e.g., WWW, ISWC, and AAAI). Under her direction, her team has developed technologies of predominant relevance in the whole process of knowledge graph creation from heterogeneous data and query processing. She serves as an expert in several advisory boards, summer schools, and doctoral consortiums. She has advised more than 25 doctoral students, and more than 120 Master and bachelor students in Computer Science. She has been a doctoral and habilitation committee member in France, Italy, the Netherlands, Germany, Ireland, Argentina, Uruguay, and Venezuela.

Application talk 2: Addressing Gartner's challenges for Knowledge Graph implementation

Presenter: Silver Oliver

Abstract: Gatner's Hype Cycle for Artificial Intelligence paper identifies 4 obstacles for knowledge graph implementation. In this talk we will take a practical look at addressing these obstacles drawing upon real world case studies.

Slides: Link

Bio: Silver Oliver is an Information Architect who has been working on knowledge graph implementations for the last 15 years. He is a trained Information Professional who focus is on ontology development and implementation. Previously head of Information Architecture at the BBC Silver has worked for Data Language for last 10 years covering a breadth of domains.

Querying Federations of Knowledge Graphs, Maribel Acosta

Presenter: Maribel Acosta

Abstract: Federations of Knowledge Graphs (KGs) are composed of multiple, decentralized, autonomous sources that expose KGs on the web. To query such KGs, Federated Query Engines (FEQs) implement query processing techniques for reducing the execution time of queries while maximizing the answer completeness.
The first part of this talk will introduce the general architecture of FQEs with an overview of (i) source selection and query decomposition, (ii) query planning, and (iii) query execution techniques. We will learn that due to the decentralized and autonomous aspects of federated KGs, query planning and execution techniques can fail if the runtime conditions are not taken into account. To address this, we will present Adaptive Query Processing (AQP) tailored to federated KGs. We will focus on two types of adaptivity that produce results incrementally and address query performance issues due to network delays or suboptimal plans. In addition, this talk will present a metric for benchmarking querying approaches that produce answers incrementally.
In the second part of this talk, we will concentrate on heterogeneous KG federations. Heterogeneity in query processing can come in different forms, e.g., data models, languages, hardware, interfaces, etc. In this talk, we will focus on heterogeneous federations composed of interfaces with different expressivity. The Linked Data Fragments (LDFs) Framework defines the expressivity of the interfaces in terms of the class of SPARQL expressions they can evaluate and the metadata they provide. This talk will then present the challenges that FQEs face when querying KG federations with heterogeneous LDF interfaces. To address these challenges, in our most recent work, we propose an interface-aware framework that exploits the capabilities of the member of the federations to speed the query execution. This talk will conclude with an outlook on other research problems in the context of federations of KGs.

Slides: Link

Bio: Maribel Acosta is an Assistant Professor at the Ruhr-University Bochum, Germany, where she is the Head of the Database and Information Systems Group and a member of the Institute for Neural Computation (INI). Her research interests include query processing over decentralized knowledge graphs and knowledge graph quality with a special focus on completeness. More recently, she has applied Machine Learning approaches to these research topics. Maribel conducted her bachelor and master studies in Computer Science at the Universidad Simon Bolivar, Venezuela. In 2017, she finished her Ph.D. at the Karlsruhe Institute of Technology, Germany, where she was also a Postdoc and Lecturer until 2020. She is an active member of the (Semantic) Web and AI communities, and has acted as Research Track Co-chair (ESWC, SEMANTiCS) and reviewer of top conferences and journals (WWW, AAAI, ICML, NEURIPS, ISWC, ESWC, SWJ, JWS). More info can be found here: https://www.ini.rub.de/the_institute/people/maribel-acosta

Application Talk 3: Managing and Analyzing Legal Knowledge Graphs

Presenters: María Navas-Loro and Erwin Filtz

Abstract: There is a strong demand for access to legal information. For instance, the European Union portal EUR-Lex, which publishes legislation, case law, and other documents related to EU law, serves more than two million documents monthly. Nevertheless, the information required by different stakeholders (such as organizations, companies or citizens) is usually distributed through various sources and presented in different formats and languages, turning legal information retrieval into a time-consuming and tedious process. Legal Knowledge Graphs offer a solution to this situation. In this session, we will present several use cases of legal Knowledge Graphs, highlight the main challenges found and analyse the main lacks in the domain and the lessons learnt. We will go through the different phases of the construction of these Knowledge Graphs, from the search for sources of documents and their main impediments to the extraction of information from them to their subsequent transformation into graph elements.

Slides: Link

Bio: María Navas-Loro is a postdoctoral researcher in the Ontology Engineering Group (Universidad Politécnica de Madrid, Spain). She is specialised in temporal information processing in the legal domain and has collaborated in national and European projects, such as LPSBigger, Lynx or the current NextProcurement. She is currently working on improving the extraction of events from texts in different languages and domains, and her interest covers natural language processing techniques and knowledge representation.

Erwin Filtz has a background in law and information systems and is working as a research scientist for Siemens. His interests are focused on Knowledge Graphs in the legal domain with the goal to increase accessibility to legal information by applying various tools and techniques from various domains.

Knowledge Graphs and Deep Learning for Analyzing Legal Documents

Presenter: Prof. Manolis Koubarakis

Abstract: In this talk I will present the platform Nomothesia (legislation.di.uoa.gr/) and its knowledge graph which encodes Greek legislation as linked data. I will also discuss deep learning techniques for named entity recognition and topic classification for Greek legal documents. Finally, I will present open problems in this exciting research area.

Slides: Link

Bio: Manolis Koubarakis is a Professor and Director of Graduate Studies in the Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens. He leads the Artificial Intelligence team (http://ai.di.uoa.gr). He holds a Ph.D. in Computer Science, from the National Technical University of Athens, an M.Sc. in Computer Science, from the University of Toronto, and a diploma (B.Sc.) in Mathematics, from the University of Crete. He is a Fellow of EurAI (European Association for Artificial Intelligence) since 2015 and President of the Hellenic Association for Artificial Intelligence. He is a member of the Advisory Board that implements the Hellenic National Strategy for Artificial Intelligence. He has published more than 200 papers that have been widely cited (7099 citations and h-index 44 in Google Scholar) in the areas of Artificial Intelligence (especially Knowledge Representation), Databases, Semantic Web and Linked Geospatial Data (especially Earth observation data). His research has been financially supported with a total amount exceeding 8 million Euros by the European Commission, the Hellenic Foundation for Research and Innovation, the Greek General Secretariat for Research and Technology, the European Space Agency and industry.

Knowledge Graphs: Trust, Privacy, and Transparency from a Legal Governance Approach

Presenter: Prof. Dr. Daniel Schwabe

Abstract: In this talk I will examine the issues and requirements for Knowledge Graphs to support Trust, Privacy and Transparency when consuming online information. I will present a framework that enables integrating normative and legal aspects, illustrated with a concrete use case.

Slides: Link

Bio: Professor at the Department of Informatics, Catholic University in Rio de Janeiro (PUC-Rio) from 1981 to 2020, where he led the TecWeb Lab. He received his BSc degree in Mathematics from PUC-Rio in 1975, his MSc degree in Computer Science also from PUC-Rio in 1976, and his PhD degree in Computer Science from UCLA 1981. He is currently a visiting researcher at the Jozef Stefan Institute in Slovenia, and at the Information Sciences Institute (ISI)/USC in the USA. He has over 250 publications, H-Index of 41, and 8913 citations (Google Scholar). In the 1990s and early 2000s he was one of the pioneers in the Web Engineering field, and later in Semantic Web technologies and applications. More recently he has focused on designing, building and implementing Knowledge Graphs, with particular emphasis on providing support for Trust, Privacy and Transparency aspects when consuming online information.

Practical Information

The winter school will take place online using the BigBlueButton platform. Registered participants will receive a link to the room via mail before the winter school starts.

People external to the KnowGraphs project can participate in the winter school. Please contact Dr. Irini Fundulaki before January 25th via mail for registration. Please use the following e-mail subject “Registration ITN Knowgraphs 2022 Winter School".

Schedule

All times are in CET.

Day 1

9:00 – 9:30
Introduction
9:30 – 11:30
Serving and Querying Open Knowledge Graphs on the Web
Prof. Dr. Axel Polleres
11:30 – 11:45
Coffee break ☕
11:45 – 12:30
Application talk 1: Use of knowledge graphs in the health domain. Lessons learned
Dr. Irini Fundulaki
12:30 – 13:30
Lunch 🍔
13:30 – 15:30
Embedding Knowledge Graphs and Graph Queries
Assist. Prof. Michael Cochez
15:30 – 15:45
Coffee break ☕
15:45 – 18:00
Data Challenge - Hackathon: Benchmarking with GERBIL 💻
  • Group formation (3 people per group)
  • Setup for benchmarking
  • Questions to helpers

Day 2

9:00 – 9:30
Introduction
9:30 – 11:30
Challenges for Efficiently Creating and Maintaining Knowledge Graphs
Prof. Dr. Maria-Esther Vidal
11:30 – 11:45
Coffee break ☕
11:45 – 12:30
Application talk 2: Addressing Gartner's challenges for Knowledge Graph implementation
Silver Oliver
12:30 – 13:30
Lunch 🍝
13:30 – 15:30
Querying Federations of Knowledge Graphs
Assist. Prof. Maribel Acosta
15:30 – 15:45
Coffee break ☕
15:45 – 18:00
Data Challenge - Hackathon: Benchmarking with GERBIL 💻

Day 3

9:00 – 9:30
Introduction
9:30 – 11:30
Application talk 3: Managing and Analyzing Legal Knowledge Graphs
Dr. Erwin Filtz and Dr. María Navas-Loro
11:30 – 11:45
Coffee break ☕
11:45 – 12:30
Knowledge Graphs and Deep Learning for Analyzing Legal Documents
Prof. Dr. Manolis Koubarakis
12:30 – 13:15
Lunch 🍜
13:15 – 15:00
Knowledge Graphs: Trust, Privacy, and Transparency from a Legal Governance Approach
Prof. Dr. Daniel Schwabe
15:30 – 16:00
Coffee break ☕
16:00 – 17:30
Data Challenge Presentations
17:30 – 17:59
Wrap up