Uncovering, Understanding, and Predicting Links
Speaker: Jonathan Chang
Series: Final Public Orals
Location: Engineering Quadrangle B327
Date/Time: Tuesday, September 6, 2011, 11:00 a.m. - 1:00 p.m.
Network data, such as citation networks of documents, hyperlinked networks of web pages, and social networks of friends, are pervasive in applied statistics and machine learning. The statistical analysis of network data can provide both useful predictive models and descriptive statistics. Predictive models can point social network members towards new friends, scientific papers towards relevant citations, and web pages towards other related pages. Descriptive statistics can uncover the hidden community structure underlying a network data set.
In this work we develop new models of network data that account for both links
and attributes. We also develop the inferential and predictive tools around these models to make them widely applicable to large, real-world data sets. One such model, the Relational Topic Model can predict links using only a new node’s attributes. Thus, we can suggest citations of newly written papers, predict the likely hyperlinks of a web page in development, or suggest friendships in a social network based only on a new user’s profile of interests. Moreover, given a new node and its links, the model provides a predictive distribution of node attributes. This mechanism can be used to
predict keywords from citations or a user’s interests from his or her social connections.
While explicit network data — network data in which the connections between
people, places, genes, corporations, etc. are explicitly encoded — are already ubiquitous, most of these can only annotate connections in a limited fashion. Although relationships between entities are rich, it is impractical to manually devise complete characterizations of these relationships for every pair of entities on large, real-world corpora. To resolve this we present a probabilistic topic model to analyze text corpora and infer descriptions of its entities and of relationships between those entities. We show qualitatively and quantitatively that our model can construct and annotate graphs of relationships and make useful predictions.