Relational Knowledge Discovery: Applications to Text

Tuesday, April 18, 2000 - 10:45am - 11:10am
Keller 3-180
David Jensen (University of Massachusetts)
Individual documents are situated within a complex web of relations. Documents are related to other documents through citations, hyperlinks, and other connections. Documents are also related to non-document objects such as authors, publishers, and archives. Finally, the content of documents can often establish relationships between documents and the people, places, things, and other topics they discuss. These relations are among the most accessible and most useful information about a given document.

Unfortunately, nearly all current techniques in knowledge discovery and data mining use extremely limited data representations, and they are unable to express and analyze rich relational structures. New techniques are needed to address the growing interest in mining relational information in text, databases, XML and other structured and semi-structured formats.

In this talk, I will discuss the special challenges of representing and analyzing relational data, with special attention to the problem of analyzing data derived from text. I will describe several systems, focusing on Proximity, a system under development in my research group at the University of Massachusetts.