CBOW  learns to predict the word by the context window(+nt -nt words) around it by taking the max probability of the word that fits this context(the most frequent word it learns).Therefore words with infrequent values will not work great with this technique.   Skip-Gram is inverted to CBOW, thus learns toContinue Reading

Hidden states are the unknowns we try to detect or predict. The Hidden states have a relationship amongst themselves called the transition probabilities. Observations are the evidence variables that we have a priori. Observations and states have a relationship between them called the emission probabilities.Continue Reading