CBOW learns to predict the word by the context window(+nt -nt words) around it by taking the max probability of the word that fits this context(the most frequent word it learns).Therefore words with infrequent values will not work great with this technique.
Skip-Gram is inverted to CBOW, thus learns to predict the context. Therefore words with similar context(similar surrounding words) will be clustered together with similar word embedding vectors(projections).
Comments are closed, but trackbacks and pingbacks are open.