NLPL word embeddings repository

brought to you by Language Technology Group at the University of Oslo

We feature models trained with clearly stated hyperparametes, on clearly described and linguistically pre-processed corpora.

More information and hints at the NLPL wiki page. You can also download the JSON file containing metadata for all the models in the repository.

Filter your search by:

Language

Algorithms:

Lemmatization:

All models

ID Download link Vector size Window Corpus Vocabulary size Algorithm Lemmatization
0 Download 300 10 British National Corpus
163473 Gensim Continuous Skipgram True
1 Download 300 None Google News 2013
2883863 Gensim Continuous Skipgram False
2 Download 300 5 Norsk Aviskorpus/NoWaC
306943 Gensim Continuous Skipgram True
3 Download 300 5 English Wikipedia Dump of February 2017
296630 Gensim Continuous Skipgram True
4 Download 300 2 Gigaword 5th Edition
314815 Gensim Continuous Skipgram True
5 Download 300 5 English Wikipedia Dump of February 2017
273992 Gensim Continuous Skipgram True
6 Download 300 5 English Wikipedia Dump of February 2017
302866 Gensim Continuous Skipgram False
7 Download 300 5 English Wikipedia Dump of February 2017
273930 Global Vectors True
8 Download 300 5 English Wikipedia Dump of February 2017
302815 Global Vectors False
9 Download 300 5 English Wikipedia Dump of February 2017
273930 fastText Skipgram True
10 Download 300 5 English Wikipedia Dump of February 2017
302815 fastText Skipgram False
11 Download 300 5 Gigaword 5th Edition
261794 Gensim Continuous Skipgram True
12 Download 300 5 Gigaword 5th Edition
292479 Gensim Continuous Skipgram False
13 Download 300 5 Gigaword 5th Edition
262269 Global Vectors True
14 Download 300 5 Gigaword 5th Edition
292967 Global Vectors False
15 Download 300 5 Gigaword 5th Edition
262269 fastText Skipgram True
16 Download 300 5 Gigaword 5th Edition
292967 fastText Skipgram False
17 Download 300 5 English Wikipedia Dump of February 2017
Gigaword 5th Edition
259882 Gensim Continuous Skipgram True
True
18 Download 300 5 English Wikipedia Dump of February 2017
Gigaword 5th Edition
291186 Gensim Continuous Skipgram False
False
19 Download 300 5 English Wikipedia Dump of February 2017
Gigaword 5th Edition
260073 Global Vectors True
True
20 Download 300 5 English Wikipedia Dump of February 2017
Gigaword 5th Edition
291392 Global Vectors False
False
21 Download 300 5 English Wikipedia Dump of February 2017
Gigaword 5th Edition
260073 fastText Skipgram True
True
22 Download 300 5 English Wikipedia Dump of February 2017
Gigaword 5th Edition
291392 fastText Skipgram False
False
23 Download 300 5 English Wikipedia Dump of February 2017
228670 Gensim Continuous Skipgram True
24 Download 300 5 English Wikipedia Dump of February 2017
228671 fastText Skipgram True
25 Download 300 5 English Wikipedia Dump of February 2017
228671 Global Vectors True
26 Download 300 5 Gigaword 5th Edition
209512 Gensim Continuous Skipgram True
27 Download 300 5 Gigaword 5th Edition
209865 Global Vectors True
28 Download 300 5 Gigaword 5th Edition
209865 fastText Skipgram True
29 Download 300 2 Gigaword 5th Edition
297790 Gensim Continuous Skipgram True
30 Download 100 10 Ancient Greek CoNLL17 corpus
45742 Word2Vec Continuous Skipgram False
31 Download 100 10 Arabic CoNLL17 corpus
1071056 Word2Vec Continuous Skipgram False
32 Download 100 10 Basque CoNLL17 corpus
426736 Word2Vec Continuous Skipgram False
33 Download 100 10 Bulgarian CoNLL17 corpus
628026 Word2Vec Continuous Skipgram False
34 Download 100 10 Catalan CoNLL17 corpus
799020 Word2Vec Continuous Skipgram False
35 Download 100 10 ChineseT CoNLL17 corpus
1935503 Word2Vec Continuous Skipgram False
36 Download 100 10 Croatian CoNLL17 corpus
928316 Word2Vec Continuous Skipgram False
37 Download 100 10 Czech CoNLL17 corpus
1767815 Word2Vec Continuous Skipgram False
38 Download 100 10 Danish CoNLL17 corpus
1655886 Word2Vec Continuous Skipgram False
39 Download 100 10 Dutch CoNLL17 corpus
2610658 Word2Vec Continuous Skipgram False
40 Download 100 10 English CoNLL17 corpus
4027169 Word2Vec Continuous Skipgram False
41 Download 100 10 Estonian CoNLL17 corpus
926795 Word2Vec Continuous Skipgram False
42 Download 100 10 Finnish CoNLL17 corpus
2433286 Word2Vec Continuous Skipgram False
43 Download 100 10 French CoNLL17 corpus
2567698 Word2Vec Continuous Skipgram False
44 Download 100 10 Galician CoNLL17 corpus
363106 Word2Vec Continuous Skipgram False
45 Download 100 10 German CoNLL17 corpus
4946997 Word2Vec Continuous Skipgram False
46 Download 100 10 Greek CoNLL17 corpus
1183194 Word2Vec Continuous Skipgram False
47 Download 100 10 Hebrew CoNLL17 corpus
672384 Word2Vec Continuous Skipgram False
48 Download 100 10 Hindi CoNLL17 corpus
219285 Word2Vec Continuous Skipgram False
49 Download 100 10 Hungarian CoNLL17 corpus
2702663 Word2Vec Continuous Skipgram False
50 Download 100 10 Indonesian CoNLL17 corpus
2899107 Word2Vec Continuous Skipgram False
51 Download 100 10 Irish CoNLL17 corpus
87115 Word2Vec Continuous Skipgram False
52 Download 100 10 Italian CoNLL17 corpus
2469122 Word2Vec Continuous Skipgram False
53 Download 100 10 Japanese CoNLL17 corpus
3989605 Word2Vec Continuous Skipgram False
54 Download 100 10 Kazakh CoNLL17 corpus
176643 Word2Vec Continuous Skipgram False
55 Download 100 10 Korean CoNLL17 corpus
1780757 Word2Vec Continuous Skipgram False
56 Download 100 10 Latin CoNLL17 corpus
555381 Word2Vec Continuous Skipgram False
57 Download 100 10 Latvian CoNLL17 corpus
560445 Word2Vec Continuous Skipgram False
58 Download 100 10 Norwegian-Bokmaal CoNLL17 corpus
1182371 Word2Vec Continuous Skipgram False
59 Download 100 10 Norwegian-Nynorsk CoNLL17 corpus
223763 Word2Vec Continuous Skipgram False
60 Download 100 10 Old Church Slavonic CoNLL17 corpus
357 Word2Vec Continuous Skipgram False
61 Download 100 10 Persian CoNLL17 corpus
966446 Word2Vec Continuous Skipgram False
62 Download 100 10 Polish CoNLL17 corpus
4420598 Word2Vec Continuous Skipgram False
63 Download 100 10 Portuguese CoNLL17 corpus
2536452 Word2Vec Continuous Skipgram False
64 Download 100 10 Romanian CoNLL17 corpus
2153518 Word2Vec Continuous Skipgram False
65 Download 100 10 Russian CoNLL17 corpus
3338424 Word2Vec Continuous Skipgram False
66 Download 100 10 Slovak CoNLL17 corpus
1188804 Word2Vec Continuous Skipgram False
67 Download 100 10 Slovenian CoNLL17 corpus
706835 Word2Vec Continuous Skipgram False
68 Download 100 10 Spanish CoNLL17 corpus
2656057 Word2Vec Continuous Skipgram False
69 Download 100 10 Swedish CoNLL17 corpus
3010472 Word2Vec Continuous Skipgram False
70 Download 100 10 Turkish CoNLL17 corpus
3633786 Word2Vec Continuous Skipgram False
71 Download 100 10 Ukrainian CoNLL17 corpus
942071 Word2Vec Continuous Skipgram False
72 Download 100 10 Urdu CoNLL17 corpus
108310 Word2Vec Continuous Skipgram False
73 Download 100 10 Uyghur CoNLL17 corpus
27757 Word2Vec Continuous Skipgram False
74 Download 100 10 Vietnamese CoNLL17 corpus
3847942 Word2Vec Continuous Skipgram False
75 Download 400 5 Oil and Gas corpus
285055 Gensim Continuous Bag-of-Words True
76 Download 100 5 Norsk Aviskorpus
NoWaC
NBDigital
4031460 Gensim Continuous Bag-of-Words True
True
True
77 Download 100 5 Norsk Aviskorpus
NoWaC
NBDigital
4480046 Gensim Continuous Bag-of-Words False
False
False
78 Download 100 15 Norsk Aviskorpus
NoWaC
NBDigital
4031461 Global Vectors True
True
True
79 Download 100 15 Norsk Aviskorpus
NoWaC
NBDigital
4480047 Global Vectors False
False
False
80 Download 100 5 Norsk Aviskorpus
NoWaC
NBDigital
3998140 fastText Skipgram True
True
True
81 Download 100 5 Norsk Aviskorpus
NoWaC
NBDigital
4428648 fastText Skipgram False
False
False
82 Download 300 10 ENC3: English Common Crawl Corpus
2000000 Global Vectors False
83 Download 100 15 Norsk Aviskorpus
NoWaC
2239665 Global Vectors True
True
84 Download 100 15 Norsk Aviskorpus
NoWaC
2551820 Global Vectors False
False
85 Download 100 15 Norsk Aviskorpus
1487995 Global Vectors True
86 Download 100 15 Norsk Aviskorpus
1728101 Global Vectors False
87 Download 100 15 NoWaC
1199275 Global Vectors True
88 Download 100 15 NoWaC
1356633 Global Vectors False
89 Download 100 15 NBDigital
2187703 Global Vectors True
90 Download 100 15 NBDigital
2390584 Global Vectors False
91 Download 100 5 Norsk Aviskorpus
NoWaC
2239664 Gensim Continuous Bag-of-Words True
True
92 Download 100 5 Norsk Aviskorpus
NoWaC
2551819 Gensim Continuous Bag-of-Words False
False
93 Download 100 5 Norsk Aviskorpus
1487994 Gensim Continuous Bag-of-Words True
94 Download 100 5 Norsk Aviskorpus
1728100 Gensim Continuous Bag-of-Words False
95 Download 100 5 NoWaC
1199274 Gensim Continuous Bag-of-Words True
96 Download 100 5 NoWaC
1356632 Gensim Continuous Bag-of-Words False
97 Download 100 5 NBDigital
2187702 Gensim Continuous Bag-of-Words True
98 Download 100 5 NBDigital
2390583 Gensim Continuous Bag-of-Words False
99 Download 100 5 Norsk Aviskorpus
NoWaC
NBDigital
4031460 Gensim Continuous Skipgram True
True
True
100 Download 100 5 Norsk Aviskorpus
NoWaC
NBDigital
4480046 Gensim Continuous Skipgram False
False
False
101 Download 100 5 Norsk Aviskorpus
NoWaC
2239664 Gensim Continuous Skipgram True
True
102 Download 100 5 Norsk Aviskorpus
NoWaC
2551819 Gensim Continuous Skipgram False
False
103 Download 100 5 Norsk Aviskorpus
1487994 Gensim Continuous Skipgram True
104 Download 100 5 Norsk Aviskorpus
1728100 Gensim Continuous Skipgram False
105 Download 100 5 NoWaC
1199274 Gensim Continuous Skipgram True
106 Download 100 5 NoWaC
1356632 Gensim Continuous Skipgram False
107 Download 100 5 NBDigital
2187702 Gensim Continuous Skipgram True
108 Download 100 5 NBDigital
2390583 Gensim Continuous Skipgram False
109 Download 100 5 Norsk Aviskorpus
NoWaC
NBDigital
3998140 fastText Continuous Bag-of-Words True
True
True
110 Download 100 5 Norsk Aviskorpus
NoWaC
NBDigital
4428648 fastText Continuous Bag-of-Words False
False
False
111 Download 100 5 Norsk Aviskorpus
NoWaC
2239665 fastText Continuous Bag-of-Words True
True
112 Download 100 5 Norsk Aviskorpus
NoWaC
2551820 fastText Continuous Bag-of-Words False
False
113 Download 100 5 Norsk Aviskorpus
1487995 fastText Continuous Bag-of-Words True
114 Download 100 5 Norsk Aviskorpus
1728101 fastText Continuous Bag-of-Words False
115 Download 100 5 NoWaC
1199275 fastText Continuous Bag-of-Words True
116 Download 100 5 NoWaC
1356633 fastText Continuous Bag-of-Words False
117 Download 100 5 NBDigital
2187703 fastText Continuous Bag-of-Words True
118 Download 100 5 NBDigital
2390584 fastText Continuous Bag-of-Words False
119 Download 100 5 Norsk Aviskorpus
NoWaC
2239665 fastText Skipgram True
True
120 Download 100 5 Norsk Aviskorpus
NoWaC
2551820 fastText Skipgram False
False
121 Download 100 5 Norsk Aviskorpus
1487995 fastText Skipgram True
122 Download 100 5 Norsk Aviskorpus
1728101 fastText Skipgram False
123 Download 100 5 NoWaC
1199275 fastText Skipgram True
124 Download 100 5 NoWaC
1356633 fastText Skipgram False
125 Download 100 5 NBDigital
2187703 fastText Skipgram True
126 Download 100 5 NBDigital
2390584 fastText Skipgram False
127 Download 50 5 Norsk Aviskorpus
NoWaC
2551820 fastText Skipgram False
False
128 Download 300 5 Norsk Aviskorpus
NoWaC
2551820 fastText Skipgram False
False
129 Download 600 5 Norsk Aviskorpus
NoWaC
2551820 fastText Skipgram False
False
130 Download 50 5 Norsk Aviskorpus
1487995 fastText Skipgram True
131 Download 300 5 Norsk Aviskorpus
1487995 fastText Skipgram True
132 Download 600 5 Norsk Aviskorpus
1487995 fastText Skipgram True
133 Download 50 5 Norsk Aviskorpus
1487994 Gensim Continuous Bag-of-Words True
134 Download 300 5 Norsk Aviskorpus
1487994 Gensim Continuous Bag-of-Words True
135 Download 600 5 Norsk Aviskorpus
1487994 Gensim Continuous Bag-of-Words True
136 Download Arabic CoNLL17 corpus
Embeddings from Language Models (ELMo) False
137 Download Bulgarian CoNLL17 corpus
Embeddings from Language Models (ELMo) False
138 Download Catalan CoNLL17 corpus
Embeddings from Language Models (ELMo) False
139 Download Czech CoNLL17 corpus
Embeddings from Language Models (ELMo) False
140 Download Old Church Slavonic CoNLL17 corpus
Embeddings from Language Models (ELMo) False
141 Download Danish CoNLL17 corpus
Embeddings from Language Models (ELMo) False
142 Download German CoNLL17 corpus
Embeddings from Language Models (ELMo) False
143 Download Greek CoNLL17 corpus
Embeddings from Language Models (ELMo) False
144 Download English CoNLL17 corpus
Embeddings from Language Models (ELMo) False
145 Download Spanish CoNLL17 corpus
Embeddings from Language Models (ELMo) False
146 Download Estonian CoNLL17 corpus
Embeddings from Language Models (ELMo) False
147 Download Basque CoNLL17 corpus
Embeddings from Language Models (ELMo) False
148 Download Persian CoNLL17 corpus
Embeddings from Language Models (ELMo) False
149 Download Finnish CoNLL17 corpus
Embeddings from Language Models (ELMo) False
150 Download French CoNLL17 corpus
Embeddings from Language Models (ELMo) False
151 Download Irish CoNLL17 corpus
Embeddings from Language Models (ELMo) False
152 Download Galician CoNLL17 corpus
Embeddings from Language Models (ELMo) False
153 Download Ancient Greek CoNLL17 corpus
Embeddings from Language Models (ELMo) False
154 Download Hebrew CoNLL17 corpus
Embeddings from Language Models (ELMo) False
155 Download Hindi CoNLL17 corpus
Embeddings from Language Models (ELMo) False
156 Download Croatian CoNLL17 corpus
Embeddings from Language Models (ELMo) False
157 Download Hungarian CoNLL17 corpus
Embeddings from Language Models (ELMo) False
158 Download Indonesian CoNLL17 corpus
Embeddings from Language Models (ELMo) False
159 Download Italian CoNLL17 corpus
Embeddings from Language Models (ELMo) False
160 Download Japanese CoNLL17 corpus
Embeddings from Language Models (ELMo) False
161 Download Korean CoNLL17 corpus
Embeddings from Language Models (ELMo) False
162 Download Latin CoNLL17 corpus
Embeddings from Language Models (ELMo) False
163 Download Latvian CoNLL17 corpus
Embeddings from Language Models (ELMo) False
164 Download Dutch CoNLL17 corpus
Embeddings from Language Models (ELMo) False
165 Download Norwegian-Bokmaal CoNLL17 corpus
Embeddings from Language Models (ELMo) False
166 Download Norwegian-Nynorsk CoNLL17 corpus
Embeddings from Language Models (ELMo) False
167 Download Polish CoNLL17 corpus
Embeddings from Language Models (ELMo) False
168 Download Portuguese CoNLL17 corpus
Embeddings from Language Models (ELMo) False
169 Download Romanian CoNLL17 corpus
Embeddings from Language Models (ELMo) False
170 Download Russian CoNLL17 corpus
Embeddings from Language Models (ELMo) False
171 Download Slovak CoNLL17 corpus
Embeddings from Language Models (ELMo) False
172 Download Slovenian CoNLL17 corpus
Embeddings from Language Models (ELMo) False
173 Download Swedish CoNLL17 corpus
Embeddings from Language Models (ELMo) False
174 Download Turkish CoNLL17 corpus
Embeddings from Language Models (ELMo) False
175 Download Uyghur CoNLL17 corpus
Embeddings from Language Models (ELMo) False
176 Download Ukrainian CoNLL17 corpus
Embeddings from Language Models (ELMo) False
177 Download Urdu CoNLL17 corpus
Embeddings from Language Models (ELMo) False
178 Download Vietnamese CoNLL17 corpus
Embeddings from Language Models (ELMo) False
179 Download ChineseT CoNLL17 corpus
Embeddings from Language Models (ELMo) False
180 Download 300 20 Russian National Corpus
189193 Gensim Continuous Bag-of-Words True
181 Download 300 2 Russian National Corpus
164996 fastText Skipgram True
182 Download 300 2 Russian National Corpus
Russian Wikipedia dump of December 2018
248978 Gensim Continuous Skipgram True
True
183 Download 300 5 Russian National Corpus
Russian Wikipedia dump of December 2018
248118 Gensim Continuous Skipgram True
True
184 Download 300 5 Russian News
249318 Gensim Continuous Skipgram True
185 Download 300 2 Taiga corpus
249565 Gensim Continuous Skipgram True
186 Download 300 5 Taiga corpus
249946 Gensim Continuous Skipgram True
187 Download 300 10 Taiga corpus
192415 fastText Continuous Bag-of-Words True
188 Download 300 3 Corpus of Historical American English (diachronic)
100000 Gensim Continuous Bag-of-Words True
189 Download 300 3 NBdigital corpus (diachronic)
100000 Gensim Continuous Bag-of-Words True
190 Download 300 3 Russian National Corpus (diachronic)
100000 Gensim Continuous Bag-of-Words True
191 Download 300 5 Gigaword 5th Edition (diachronic)
Gensim Continuous Bag-of-Words True
192 Download 300 5 News on the Web (diachronic)
Gensim Continuous Bag-of-Words True
193 Download 1024 English Wikipedia Dump of February 2017
Embeddings from Language Models (ELMo) False
194 Download 1024 News on the Web
Embeddings from Language Models (ELMo) False
195 Download 1024 Russian Wikipedia dump of December 2018
Russian National Corpus
Embeddings from Language Models (ELMo) False
False
196 Download 1024 Russian Wikipedia dump of December 2018
Russian National Corpus
Embeddings from Language Models (ELMo) True
True
197 Download 768 Finnish web corpus
BERT False
198 Download 768 Finnish web corpus
BERT False
199 Download 2048 Taiga corpus
Embeddings from Language Models (ELMo) True
200 Download 300 3 English Wikipedia Dump of October 2019
249212 Gensim Continuous Skipgram True
201 Download 1024 German Wikipedia Dump of March 2020
Embeddings from Language Models (ELMo) True
202 Download 1024 Swedish Wikipedia Dump of March 2020
Embeddings from Language Models (ELMo) True
203 Download 1024 Latin Wikipedia Dump of March 2020
Embeddings from Language Models (ELMo) True
204 Download 300 2 Russian National Corpus
Russian Wikipedia dump of December 2018
Russian News from Dialogue Evaluation 2020
Araneum Russicum Maximum
998459 Gensim Continuous Bag-of-Words True
True
True
True
205 Download 100 5 Polish CommonCrawl Dump of December 2019
4885806 fastText Continuous Bag-of-Words False
206 Download 100 5 Polish CommonCrawl Dump of December 2019
4885806 fastText Skipgram False
207 Download 100 5 Polish CommonCrawl Dump of December 2019
35193029 Gensim Continuous Bag-of-Words False
208 Download 100 5 Polish CommonCrawl Dump of December 2019
35193029 Gensim Continuous Skipgram False
209 Download 1024 English Wikipedia Dump of October 2019
Embeddings from Language Models (ELMo) True
210 Download 1024 Norwegian Wikipedia Dump of September 2020
Embeddings from Language Models (ELMo) True
211 Download 1024 Norwegian Wikipedia Dump of September 2020
Embeddings from Language Models (ELMo) False
212 Download 2048 Araneum Russicum Maximum
Embeddings from Language Models (ELMo) True
213 Download 300 5 GeoWAC: Population-balanced Russian Gigaword Corpus
154923 fastText Skipgram True
214 Download 300 5 GeoWAC: Population-balanced Russian Gigaword Corpus
347295 fastText Skipgram False
216 Download 768 Norsk Aviskorpus
Norwegian Bokmål Wikipedia Dump of September 2020
Norwegian Nynorsk Wikipedia Dump of September 2020
BERT False
False
False
217 Download 2048 Norsk Aviskorpus
Norwegian Bokmål Wikipedia Dump of September 2020
Norwegian Nynorsk Wikipedia Dump of September 2020
Embeddings from Language Models (ELMo) False
False
False
218 Download 2048 Norsk Aviskorpus
Norwegian Bokmål Wikipedia Dump of September 2020
Norwegian Nynorsk Wikipedia Dump of September 2020
Embeddings from Language Models (ELMo) False
False
False
219 Download 2048 Arabic CoNLL17 corpus
Basque CoNLL17 corpus
ChineseT CoNLL17 corpus
English CoNLL17 corpus
Finnish CoNLL17 corpus
Hebrew CoNLL17 corpus
Hindi CoNLL17 corpus
Italian CoNLL17 corpus
Japanese CoNLL17 corpus
Korean CoNLL17 corpus
Russian CoNLL17 corpus
Swedish CoNLL17 corpus
Turkish CoNLL17 corpus
Embeddings from Language Models (ELMo) False
False
False
False
False
False
False
False
False
False
False
False
False
220 Download 300 10 Russian National Corpus
Russian Wikipedia Dump of November 2021
249333 Gensim Continuous Bag-of-Words True
True
221 Download 768 Norwegian Colossal Corpus (NCC)
C4 Web Corpus
BERT False
False
222 Download 300 5 English Wikipedia Dump of November 2021
199807 Gensim Continuous Skipgram False
223 Download 300 2 English Wikipedia Dump of November 2021
199430 Gensim Continuous Skipgram True
224 Download 200 10 Ukrainian CoNLL17 corpus
99884 Gensim Continuous Bag-of-Words True
225 Download 2048 Corpus of Historical American English
Embeddings from Language Models (ELMo) True
226 Download 2048 NBdigital corpus (diachronic)
Norsk Aviskorpus (2012-2019)
Embeddings from Language Models (ELMo) True
True

Version 2.0

This page accompanies the following paper:

Fares, Murhaf; Kutuzov, Andrei; Oepen, Stephan & Velldal, Erik (2017). Word vectors, reuse, and replicability: Towards a community repository of large-text resources, In Jörg Tiedemann (ed.), Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017. Linköping University Electronic Press. ISBN 978-91-7685-601-7