Neural net word embeddings: What was old is new again

Friday, October 6, 2017 - 1:25pm - 2:25pm
Lind 305
Jeremy Bellay (Battelle)
A first step of many machine learning approaches is the embedding the input data in a vector representation. In the case of the linguistic data, the word2vec algorithm (specifically skipgram with negative sampling) is popular and often considered to be the state of the art. We will discuss how word2vec and related algorithms work, and some interesting properties and applications for these algorithms on their own. We will also show that these algorithms can be viewed as a type of exponential family principal component analysis.

Jeremy received his PhD from the University of Minnesota under Prof. Arnd Scheel in 2009. Since then he has worked in numerous fields including bioinformatics, healthcare research, and cyber security often with a focus on network structure and semantic technologies. He currently is a principal research scientist in the Cyber Innovations group at Battelle Memorial Institute - a non-profit research organization devoted to scientific innovation for the benefit of humanity.