Mining Data Semantics

Presenters: Jie Tang, Tsinghua University, China
Ying Ding, Indiana University, US

The tutorial aims to discuss key issues and practices of mining semantics in heterogeneous information networks. Social, information and biological systems usually consist of a large number of interacting, multi-typed components connected via various types of links, which makes heterogeneous networks ubiquitous. Mining semantics from the heterogeneous networks can address several important questions, including (1) semantics of social ties: how users are connected via different types of social relationships? (2) semantics of user behavior: how users’ personalized behaviors can be discovered from heterogeneous networks? (3) semantics of network formations: what are the fundamental structures underlying the networks and how networks are formed by different communities? On the other hand, more and more semantic data are available thanks to the initiatives of the Linked Open Data (LOD) and robust techniques for semantic annotation of Web, social, sensor, mobile and biological data. In particular, as semantic data is organized as a new form of heterogeneous networks, semantic graph mining faces a new class of challenges compared with traditional graph mining. Mining and analyzing data semantics by exploring the power of links and designing novel algorithms by considering the semantic features of the data will foster a cross-disciplinary forum to further enhance existing bounds and create new connections among these communities. The tutorial will provide a hands-on experience on how to apply data integration and data discovery in large patent datasets, scholarly publication datasets, and open biomedical datasets.