CBOW  learns to predict the word by the context window(+nt -nt words) around it by taking the max probability of the word that fits this context(the most frequent word it learns).Therefore words with infrequent values will not work great with this technique.   Skip-Gram is inverted to CBOW, thus learns toContinue Reading

A prior probability distribution of an uncertain quantity is the probability distribution that would express one’s beliefs about this quantity before some evidence is taken into account. A prior can be determined from past information, such as previous experiments.Continue Reading

Expresses how likely particular values of statistics or a random variables are for a given set of observations. Indexes the family of joint probability distribution of the random sample evaluated at the given observations.Continue Reading