NLPL word embeddings repository

brought to you by LTG Oslo (work in progress)

We feature models trained with clearly stated hyperparametes, on clearly described and linguistically pre-processed corpora.

More information and hints at the NLPL wiki page. You can also download the JSON file containing metadata for all the models in the repository.

Filter your search by:

Language

Algorithms:

Lemmatization:

All models

ID Download link Vector size Window Corpus Vocabulary size Algorithm Lemmatization
0 Download 300 10 British National Corpus
163473 Gensim Continuous Skipgram True
1 Download 300 None Google News 2013
2883863 Gensim Continuous Skipgram True
2 Download 300 5 Norsk Aviskorpus/NoWaC
306943 Gensim Continuous Skipgram True
3 Download 300 5 English Wikipedia Dump of February 2017
296630 Gensim Continuous Skipgram True
4 Download 300 2 Gigaword 5th Edition
314815 Gensim Continuous Skipgram True
5 Download 300 5 English Wikipedia Dump of February 2017
273992 Gensim Continuous Skipgram True
6 Download 300 5 English Wikipedia Dump of February 2017
302866 Gensim Continuous Skipgram False
7 Download 300 5 English Wikipedia Dump of February 2017
273930 Global Vectors True
8 Download 300 5 English Wikipedia Dump of February 2017
302815 Global Vectors False
9 Download 300 5 English Wikipedia Dump of February 2017
273930 fastText Skipgram True
10 Download 300 5 English Wikipedia Dump of February 2017
302815 fastText Skipgram False
11 Download 300 5 Gigaword 5th Edition
261794 Gensim Continuous Skipgram True
12 Download 300 5 Gigaword 5th Edition
292479 Gensim Continuous Skipgram False
13 Download 300 5 Gigaword 5th Edition
262269 Global Vectors True
14 Download 300 5 Gigaword 5th Edition
292967 Global Vectors False
15 Download 300 5 Gigaword 5th Edition
262269 fastText Skipgram True
16 Download 300 5 Gigaword 5th Edition
292967 fastText Skipgram False
17 Download 300 5 English Wikipedia Dump of February 2017
Gigaword 5th Edition
259882 Gensim Continuous Skipgram True
True
18 Download 300 5 English Wikipedia Dump of February 2017
Gigaword 5th Edition
291186 Gensim Continuous Skipgram False
False
19 Download 300 5 English Wikipedia Dump of February 2017
Gigaword 5th Edition
260073 Global Vectors True
True
20 Download 300 5 English Wikipedia Dump of February 2017
Gigaword 5th Edition
291392 Global Vectors False
False
21 Download 300 5 English Wikipedia Dump of February 2017
Gigaword 5th Edition
260073 fastText Skipgram True
True
22 Download 300 5 English Wikipedia Dump of February 2017
Gigaword 5th Edition
291392 fastText Skipgram False
False
23 Download 300 5 English Wikipedia Dump of February 2017
228670 Gensim Continuous Skipgram True
24 Download 300 5 English Wikipedia Dump of February 2017
228671 fastText Skipgram True
25 Download 300 5 English Wikipedia Dump of February 2017
228671 Global Vectors True
26 Download 300 5 Gigaword 5th Edition
209512 Gensim Continuous Skipgram True
27 Download 300 5 Gigaword 5th Edition
209865 Global Vectors True
28 Download 300 5 Gigaword 5th Edition
209865 fastText Skipgram True
29 Download 300 2 Gigaword 5th Edition
297790 Gensim Continuous Skipgram True
30 Download 100 10 Ancient Greek CoNLL17 corpus
45742 Word2Vec Continuous Skipgram False
31 Download 100 10 Arabic CoNLL17 corpus
1071056 Word2Vec Continuous Skipgram False
32 Download 100 10 Basque CoNLL17 corpus
426736 Word2Vec Continuous Skipgram False
33 Download 100 10 Bulgarian CoNLL17 corpus
628026 Word2Vec Continuous Skipgram False
34 Download 100 10 Catalan CoNLL17 corpus
799020 Word2Vec Continuous Skipgram False
35 Download 100 10 ChineseT CoNLL17 corpus
1935503 Word2Vec Continuous Skipgram False
36 Download 100 10 Croatian CoNLL17 corpus
928316 Word2Vec Continuous Skipgram False
37 Download 100 10 Czech CoNLL17 corpus
1767815 Word2Vec Continuous Skipgram False
38 Download 100 10 Danish CoNLL17 corpus
1655886 Word2Vec Continuous Skipgram False
39 Download 100 10 Dutch CoNLL17 corpus
2610658 Word2Vec Continuous Skipgram False
40 Download 100 10 English CoNLL17 corpus
4027169 Word2Vec Continuous Skipgram False
41 Download 100 10 Estonian CoNLL17 corpus
926795 Word2Vec Continuous Skipgram False
42 Download 100 10 Finnish CoNLL17 corpus
2433286 Word2Vec Continuous Skipgram False
43 Download 100 10 French CoNLL17 corpus
2567698 Word2Vec Continuous Skipgram False
44 Download 100 10 Galician CoNLL17 corpus
363106 Word2Vec Continuous Skipgram False
45 Download 100 10 German CoNLL17 corpus
4946997 Word2Vec Continuous Skipgram False
46 Download 100 10 Greek CoNLL17 corpus
1183194 Word2Vec Continuous Skipgram False
47 Download 100 10 Hebrew CoNLL17 corpus
672384 Word2Vec Continuous Skipgram False
48 Download 100 10 Hindi CoNLL17 corpus
219285 Word2Vec Continuous Skipgram False
49 Download 100 10 Hungarian CoNLL17 corpus
2702663 Word2Vec Continuous Skipgram False
50 Download 100 10 Indonesian CoNLL17 corpus
2899107 Word2Vec Continuous Skipgram False
51 Download 100 10 Irish CoNLL17 corpus
87115 Word2Vec Continuous Skipgram False
52 Download 100 10 Italian CoNLL17 corpus
2469122 Word2Vec Continuous Skipgram False
53 Download 100 10 Japanese CoNLL17 corpus
3989605 Word2Vec Continuous Skipgram False
54 Download 100 10 Kazakh CoNLL17 corpus
176643 Word2Vec Continuous Skipgram False
55 Download 100 10 Korean CoNLL17 corpus
1780757 Word2Vec Continuous Skipgram False
56 Download 100 10 Latin CoNLL17 corpus
555381 Word2Vec Continuous Skipgram False
57 Download 100 10 Latvian CoNLL17 corpus
560445 Word2Vec Continuous Skipgram False
58 Download 100 10 Norwegian-Bokmaal CoNLL17 corpus
1182371 Word2Vec Continuous Skipgram False
59 Download 100 10 Norwegian-Nynorsk CoNLL17 corpus
223763 Word2Vec Continuous Skipgram False
60 Download 100 10 Old Church Slavonic CoNLL17 corpus
357 Word2Vec Continuous Skipgram False
61 Download 100 10 Persian CoNLL17 corpus
966446 Word2Vec Continuous Skipgram False
62 Download 100 10 Polish CoNLL17 corpus
4420598 Word2Vec Continuous Skipgram False
63 Download 100 10 Portuguese CoNLL17 corpus
2536452 Word2Vec Continuous Skipgram False
64 Download 100 10 Romanian CoNLL17 corpus
2153518 Word2Vec Continuous Skipgram False
65 Download 100 10 Russian CoNLL17 corpus
3338424 Word2Vec Continuous Skipgram False
66 Download 100 10 Slovak CoNLL17 corpus
1188804 Word2Vec Continuous Skipgram False
67 Download 100 10 Slovenian CoNLL17 corpus
706835 Word2Vec Continuous Skipgram False
68 Download 100 10 Spanish CoNLL17 corpus
2656057 Word2Vec Continuous Skipgram False
69 Download 100 10 Swedish CoNLL17 corpus
3010472 Word2Vec Continuous Skipgram False
70 Download 100 10 Turkish CoNLL17 corpus
3633786 Word2Vec Continuous Skipgram False
71 Download 100 10 Ukrainian CoNLL17 corpus
942071 Word2Vec Continuous Skipgram False
72 Download 100 10 Urdu CoNLL17 corpus
108310 Word2Vec Continuous Skipgram False
73 Download 100 10 Uyghur CoNLL17 corpus
27757 Word2Vec Continuous Skipgram False
74 Download 100 10 Vietnamese CoNLL17 corpus
3847942 Word2Vec Continuous Skipgram False
75 Download 400 5 Oil and Gas corpus
285055 Gensim Continuous Bag-of-Words True
76 Download 100 5 Norsk Aviskorpus + NoWaC + NBDigital
4031460 Gensim Continuous Bag-of-Words True
77 Download 100 5 Norsk Aviskorpus + NoWaC + NBDigital
4480046 Gensim Continuous Bag-of-Words False
78 Download 100 15 Norsk Aviskorpus + NoWaC + NBDigital
4031461 Global Vectors True
79 Download 100 15 Norsk Aviskorpus + NoWaC + NBDigital
4480047 Global Vectors False
80 Download 100 5 Norsk Aviskorpus + NoWaC + NBDigital
3998140 fastText Skipgram True
81 Download 100 5 Norsk Aviskorpus + NoWaC + NBDigital
4428648 fastText Skipgram False
82 Download 300 10 ENC3: English Common Crawl Corpus
2000000 Global Vectors False
83 Download 100 15 Norsk Aviskorpus
NoWaC
2239665 Global Vectors True
True
84 Download 100 15 Norsk Aviskorpus
NoWaC
2551820 Global Vectors False
False
85 Download 100 15 Norsk Aviskorpus
1487995 Global Vectors True
86 Download 100 15 Norsk Aviskorpus
1728101 Global Vectors False
87 Download 100 15 NoWaC
1199275 Global Vectors True
88 Download 100 15 NoWaC
1356633 Global Vectors False
89 Download 100 15 NBDigital
2187703 Global Vectors True
90 Download 100 15 NBDigital
2390584 Global Vectors False
91 Download 100 5 Norsk Aviskorpus
NoWaC
2239664 Gensim Continuous Bag-of-Words True
True
92 Download 100 5 Norsk Aviskorpus
NoWaC
2551819 Gensim Continuous Bag-of-Words False
False
93 Download 100 5 Norsk Aviskorpus
1487994 Gensim Continuous Bag-of-Words True
94 Download 100 5 Norsk Aviskorpus
1728100 Gensim Continuous Bag-of-Words False
95 Download 100 5 NoWaC
1199274 Gensim Continuous Bag-of-Words True
96 Download 100 5 NoWaC
1356632 Gensim Continuous Bag-of-Words False
97 Download 100 5 NBDigital
2187702 Gensim Continuous Bag-of-Words True
98 Download 100 5 NBDigital
2390583 Gensim Continuous Bag-of-Words False
99 Download 100 5 Norsk Aviskorpus + NoWaC + NBDigital
4031460 Gensim Continuous Skipgram True
100 Download 100 5 Norsk Aviskorpus + NoWaC + NBDigital
4480046 Gensim Continuous Skipgram False
101 Download 100 5 Norsk Aviskorpus
NoWaC
2239664 Gensim Continuous Skipgram True
True
102 Download 100 5 Norsk Aviskorpus
NoWaC
2551819 Gensim Continuous Skipgram False
False
103 Download 100 5 Norsk Aviskorpus
1487994 Gensim Continuous Skipgram True
104 Download 100 5 Norsk Aviskorpus
1728100 Gensim Continuous Skipgram False
105 Download 100 5 NoWaC
1199274 Gensim Continuous Skipgram True
106 Download 100 5 NoWaC
1356632 Gensim Continuous Skipgram False
107 Download 100 5 NBDigital
2187702 Gensim Continuous Skipgram True
108 Download 100 5 NBDigital
2390583 Gensim Continuous Skipgram False
109 Download 100 5 Norsk Aviskorpus + NoWaC + NBDigital
3998140 fastText Continuous Bag-of-Words True
110 Download 100 5 Norsk Aviskorpus + NoWaC + NBDigital
4428648 fastText Continuous Bag-of-Words False
111 Download 100 5 Norsk Aviskorpus
NoWaC
2239665 fastText Continuous Bag-of-Words True
True
112 Download 100 5 Norsk Aviskorpus
NoWaC
2551820 fastText Continuous Bag-of-Words False
False
113 Download 100 5 Norsk Aviskorpus
1487995 fastText Continuous Bag-of-Words True
114 Download 100 5 Norsk Aviskorpus
1728101 fastText Continuous Bag-of-Words False
115 Download 100 5 NoWaC
1199275 fastText Continuous Bag-of-Words True
116 Download 100 5 NoWaC
1356633 fastText Continuous Bag-of-Words False
117 Download 100 5 NBDigital
2187703 fastText Continuous Bag-of-Words True
118 Download 100 5 NBDigital
2390584 fastText Continuous Bag-of-Words False
119 Download 100 5 Norsk Aviskorpus
NoWaC
2239665 fastText Skipgram True
True
120 Download 100 5 Norsk Aviskorpus
NoWaC
2551820 fastText Skipgram False
False
121 Download 100 5 Norsk Aviskorpus
1487995 fastText Skipgram True
122 Download 100 5 Norsk Aviskorpus
1728101 fastText Skipgram False
123 Download 100 5 NoWaC
1199275 fastText Skipgram True
124 Download 100 5 NoWaC
1356633 fastText Skipgram False
125 Download 100 5 NBDigital
2187703 fastText Skipgram True
126 Download 100 5 NBDigital
2390584 fastText Skipgram False
127 Download 50 5 Norsk Aviskorpus
NoWaC
2551820 fastText Skipgram False
False
128 Download 300 5 Norsk Aviskorpus
NoWaC
2551820 fastText Skipgram False
False
129 Download 600 5 Norsk Aviskorpus
NoWaC
2551820 fastText Skipgram False
False
130 Download 50 5 Norsk Aviskorpus
1487995 fastText Skipgram True
131 Download 300 5 Norsk Aviskorpus
1487995 fastText Skipgram True
132 Download 600 5 Norsk Aviskorpus
1487995 fastText Skipgram True
133 Download 50 5 Norsk Aviskorpus
1487994 Gensim Continuous Bag-of-Words True
134 Download 300 5 Norsk Aviskorpus
1487994 Gensim Continuous Bag-of-Words True
135 Download 600 5 Norsk Aviskorpus
1487994 Gensim Continuous Bag-of-Words True

Corpora

  1. English Wikipedia dump of February 2017, lemmatized and PoS-tagged. More info can be found in the NLPL Repository metadata file (corpus id 2).


Version 1.1

This page accompanies the following paper:

Fares, Murhaf; Kutuzov, Andrei; Oepen, Stephan & Velldal, Erik (2017). Word vectors, reuse, and replicability: Towards a community repository of large-text resources, In Jörg Tiedemann (ed.), Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017. Linköping University Electronic Press. ISBN 978-91-7685-601-7