Gensim列车不更新权重

Gensim train not updating weights(Gensim列车不更新权重)
本文介绍了Gensim列车不更新权重的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个特定于领域的语料库,我正在尝试为其训练嵌入。因为我想全面掌握词汇,所以我添加了glove.6B.50d.txt中的单词向量。从这里添加向量后,我正在使用我拥有的语料库训练模型。

我正在尝试here中的解决方案,但单词嵌入似乎没有更新。

这是我到目前为止拥有的解决方案。

#read glove embeddings
glove_wv = KeyedVectors.load_word2vec_format(GLOVE_PATH, binary=False)

#initialize w2v model
model =  Word2Vec(vector_size=50, min_count=0, window=20, epochs=10, sg=1, workers=10, 
                      hs=1, ns_exponent=0.5, seed=42, sample=10**-2, shrink_windows=True)
model.build_vocab(sentences_tokenized)
training_examples_count = model.corpus_count

# add vocab from glove
model.build_vocab([list(glove_wv.key_to_index.keys())], update=True)
model.wv.vectors_lockf = np.zeros(len(model.wv)) # ALLOW UPDATE OF WEIGHTS FROM BACK PROP; 0 WILL SUPPRESS

# add glove embeddings
model.wv.intersect_word2vec_format(GLOVE_PATH,binary=False, lockf=1.0)

下面我正在训练模型并检查训练中明确出现的特定单词的单词嵌入

# train model
model.train(sentences_tokenized,total_examples=training_examples_count, epochs=model.epochs)

#CHECK IF EMBEDDING CHANGES FOR 'oyo'
print(model.wv.get_vector('oyo'))
print(glove_wv.get_vector('oyo'))

单词oyo的单词嵌入在训练前后是相同的。我哪里错了?

输入语料库sentences_tokenized包含几个包含单词oyo的句子。其中一句话--

'oyo global platform empowers entrepreneur small business hotel home providing full stack technology increase earnings eas operation bringing affordable trusted accommodation guest book instantly india largest budget hotel chain oyo room one preferred hotel booking destination vast majority student country hotel chain offer many benefit include early check in couple room id card flexibility oyo basically network budget hotel completely different famous hotel aggregator like goibibo yatra makemytrip partner zero two star hotel give makeover room bring customer hotel website mobile app'

推荐答案

您在这里即兴创作了很多潜在的错误或次优化。请特别注意:

这篇关于Gensim列车不更新权重的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Leetcode 234: Palindrome LinkedList(Leetcode 234:回文链接列表)
How do I read an Excel file directly from Dropbox#39;s API using pandas.read_excel()?(如何使用PANDAS.READ_EXCEL()直接从Dropbox的API读取Excel文件?)
subprocess.Popen tries to write to nonexistent pipe(子进程。打开尝试写入不存在的管道)
I want to realize Popen-code from Windows to Linux:(我想实现从Windows到Linux的POpen-code:)
Reading stdout from a subprocess in real time(实时读取子进程中的标准输出)
How to call type safely on a random file in Python?(如何在Python中安全地调用随机文件上的类型?)