FastText has recently gained popularity among developers and researchers as the word embedding of choice, alongside GloVe, word2vec, StarSpace, RAND-WALK etc. By default, spaCy expects you to provide a word2vec vector. However, you can use your fastText vectors if you want to!
Train the Dragon
The first step is to obviously train your fastText vector. After the training process, you should be left with a
.vec and a
.bin file. We will be needing the
.vec file for the exercise.
We actually have a script in the spaCy source code examples for using a
Saving for the future
Now, modify the script a bit.
main, just save your
nlp object onto the disk.
And voila. You're done. Now you can use your model and test out incredible stuff.
Now, just to test if our model is working [of course please choose words corresponding to your vector. I am choosing mine from the Bengali language]
import spacy nlp = spacy.load('dir_name') doc = nlp('বাংলা বাংলাদেশ') print(doc.similarity(doc)
You should get a non-zero number if everything worked out fine.