Turkish Word Embeddings


While watching NLP community diving deep into deep learning, I took one baby step and created Turkish word embedings with the famous word2vec code. Different Turkish resources were crawled, cleaned and combined by me. The resulting corpus has about 2 billion tokens. I still have more resource waiting to be trained. Till then you can use word vectors trained with the following settings: cbow (continous bag of words), context window size as 8 and dimension as 200.

If you want to use these vectors in your work, let me know.


Creative Commons License
Turkish Word Embeddings is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.