View on GitHub

Research Review Notes

Summaries of academic research papers

A Neural Probabilistic Language Model


Idea

This is the seminal paper on neural language modeling that first proposed learning distributed representations of words. There is an obvious distinction made for predictions in a discrete vocabulary space vs. predictions in a continuous space i.e. the curse of dimensionality. The solution proposed is to have real-valued word feature vectors that are learnt along with the joint probability function of their occurrence in sequences in the corpus.

Method

Observations

The model had test perplexity gains of 24% for Brown and 8% on AP news compared to the state-of-the-art n-gram models (smoothed trigram).