User:AKA MBG/English Simple Wikipedia 20080214 freq wordlist

From Wiktionary

About[change]

Most frequent 1000 words in English Simple Wikipedia (14 Feb 2008) are listed with frequencies.

The table contains columns:

  • lemma
  • doc_freq - number of articles in Simple English Wikipedia which contain this lemma;
  • corpus_freq - number of occurences of the lemma in Simple English Wikipedia.

Remark: there is no link for words which appeared due to the imperfection of current version of wiki-to-text-parser, e.g. "tr", "jpg", "ref", etc.

Lemmas are sorted by the "corpus_freq" value.

The following tools are used:

  1. Synarcher + RuPOSTagger (to be released)
  2. OpenOffice Calc function: to lowercase all words which are recognized by Lemmatizer dictionary, since they are converted to UPPER-CASE words by default
    =IF(AND(LEN(B7)>1;EXACT(MID(B7;2;1);LOWER(MID(B7;2;1))));B7;LOWER(B7))
  3. OpenOffice Calc "Save As" Text CSV (.csv)
  4. VIM to create internal link: "the",23199,299315 -> "[[the]]",23199,299315 by regular expression: %s/^"\([^,]*\)",/"[[\1]]",/
  5. CSV Converter from OpenOffice Calc to wiki table

Stranges and facts found in the list[change]

  1. letter S repeats twice: in 22nd and 49th rows.

Wordlist[change]

lemma doc_freq corpus_freq
the 23199 299315
be 23953 171503
of 21734 165170
in 20954 113856
and 19032 112472
a 21709 112185
to 15412 78406
it 13584 41219
for 11432 32047
that 10205 31088
by 11087 29043
he 5032 28203
as 9992 27010
on 10338 26661
have 9019 24640
they 7136 23322
with 9097 22951
or 8747 22907
from 9740 22747
this 7945 21214
s 7343 19099
an 9603 18274
other 10413 17800
also 8971 16371
at 7032 15757
are 6973 15538
can 6167 13924
not 5884 13859
many 6534 13554
one 6591 12106
which 6196 11925
people 5204 11727
name 4786 11594
call 5766 11271
use 5279 11113
but 5469 10506
there 5309 10178
first 4999 9808
make 5064 9675
state 4206 9599
who 4756 9321
some 4754 9279
do 4331 9121
about 5045 9085
when 4817 8904
their 4230 8889
city 3665 8875
s 3794 8859
much 5039 8696
person 4398 8692
time 4370 8503
she 1868 8225
website 6187 8201
new 3876 8195
hi 3150 8006
ha 4896 7924
his 3110 7610
American 3121 7556
become 3883 7550
after 4457 7539
year 3915 7516
very 3782 7066
all 4037 6961
part 3907 6787
world 3735 6709
see 4483 6587
music 1751 6552
two 3898 6521
may 3719 6504
like 3608 6371
because 3695 6200
ref 733 6143
war 2195 6133
country 2712 6031
I 2487 5684
into 3278 5612
were 2889 5532
d 1654 5218
only 3305 5119
b 1324 5101
English 2495 4763
play 1975 4750
th 2493 4749
small 2680 4635
live 2835 4616
take 2873 4596
will 2567 4577
die 2128 4571
than 2972 4570
different 2789 4532
large 2815 4502
bgcolor 19 4495
used 2998 4481
up 2910 4460
write 1888 4451
work 2355 4423
so 2708 4402
often 2767 4401
such 2697 4317
show 2386 4238
king 1921 4219
if 2147 4214
get 2401 4179
where 2820 4175
go 2320 4102
right 1550 4059
word 2143 4046
mean 2762 4033
between 2806 4028
language 1491 3990
know 2838 3966
usually 2621 3875
reference 2809 3860
over 2569 3853
place 2474 3822
area 2152 3780
out 2391 3768
book 1631 3751
group 2261 3745
during 2468 3735
German 1792 3710
thing 2213 3710
game 1178 3689
its 2635 3641
made 2575 3633
start 2393 3626
history 2440 3611
well 2561 3587
no 2209 3584
record 1129 3573
m 1096 3570
de 1598 3555
include 2571 3550
John 1665 3521
century 1818 3514
found 2661 3511
her 1240 3507
come 2417 3507
unite 2204 3487
island 1167 3469
north 2060 3464
example 2242 3433
then 2225 3431
td 36 3425
south 1909 3424
United 2115 3379
long 2322 3364
old 2201 3350
say 1852 3348
system 1560 3331
famous 2265 3325
center 1104 3310
day 1823 3293
life 1971 3288
way 2094 3242
church 935 3241
c 1486 3238
late 2209 3218
death 2344 3210
big 2238 3202
change 1916 3193
align 68 3190
form 2115 3169
great 1903 3153
man 1708 3134
Germany 1531 3127
three 2155 3108
government 1481 3099
England 1699 3092
French 1574 3079
town 1608 3073
official 2296 3066
what 1940 3060
u 1596 3052
president 1164 3032
ii 1684 3026
same 2120 3017
film 1223 3011
good 1967 3007
song 1162 2989
bear 1985 2977
family 1661 2961
number 1488 2959
water 1356 2949
type 1984 2941
sometimes 2092 2934
important 1958 2864
now 2025 2844
France 1461 2833
before 2056 2832
second 1905 2813
give 1896 2812
university 1439 2803
British 1451 2780
high 1767 2739
river 1288 2737
we 1295 2733
national 1522 2733
each 1750 2728
end 1854 2721
band 1097 2719
more 2042 2718
movie 1175 2716
around 1962 2698
early 1840 2695
main 2086 2691
march 1450 2690
member 1590 2662
player 930 2644
km 1067 2605
January 1388 2590
another 1912 2587
image 901 2559
how 1600 2537
begin 1639 2511
known 2086 2502
child 1410 2501
event 2117 2483
school 1203 2471
born 1742 2466
since 1750 2457
god 850 2451
district 1412 2450
house 1194 2441
power 1188 2435
lot 1523 2430
help 1654 2421
capital 1644 2411
create 1737 2409
story 1124 2393
album 820 2392
kill 1221 2375
through 1677 2364
roman 1221 2362
September 1351 2360
east 1371 2358
list 1443 2339
west 1340 2324
actor 992 2302
move 1387 2293
any 1685 2288
popular 1592 2285
law 1057 2275
build 1312 2255
until 1682 2255
most 1825 2242
both 1684 2235
even 1589 2218
August 1346 2212
several 1692 2204
own 1546 2174
you 1162 2173
body 1192 2170
composer 817 2162
while 1532 2160
county 759 2136
base 1488 2134
party 689 2133
human 1202 2132
still 1611 2125
woman 1061 2124
math 228 2115
under 1487 2107
April 1291 2087
force 1074 2085
think 1345 2073
series 1139 2070
October 1267 2065
e 1310 2060
release 1104 2054
kingdom 1296 2049
being 1568 2040
line 912 2025
air 870 2022
July 1247 2021
team 805 2014
London 1030 2009
singer 1083 2008
America 1261 1990
against 1291 1987
however 1429 1985
December 1244 1976
land 1172 1973
company 1118 1972
rock 1049 1942
cause 1171 1936
style 915 1936
common 1452 1935
June 1243 1927
four 1384 1923
animal 1027 1920
region 1093 1916
republic 940 1911
rule 1093 1909
computer 911 1904
back 1284 1896
near 1414 1893
William 1107 1891
today 1420 1882
international 1055 1881
February 1134 1879
last 1402 1878
order 1131 1862
November 1238 1861
sea 1021 1857
York 1038 1856
home 1256 1854
lead 1339 1852
television 1189 1851
birth 1638 1848
character 923 1843
Europe 1160 1841
just 1378 1835
general 1176 1832
tell 1043 1826
little 1369 1823
population 1247 1822
find 1350 1820
kind 1228 1813
study 1080 1812
empire 907 1792
writer 989 1789
together 1301 1780
term 1237 1775
love 824 1771
battle 877 1764
site 1262 1760
Greek 1078 1756
star 944 1749
Japan 916 1748
want 1097 1747
million 1067 1706
look 1166 1695
emperor 802 1690
Japanese 852 1674
hold 1190 1672
plant 738 1652
f 689 1649
age 1034 1638
award 643 1632
son 1028 1629
union 832 1628
something 1126 1624
BC 468 1624
white 958 1617
open 1019 1616
down 1159 1616
money 879 1609
light 834 1608
refer 1289 1594
efcfff 9 1590
j 831 1588
point 983 1587
red 917 1579
believe 918 1576
note 806 1575
father 976 1575
china 777 1565
sound 795 1561
black 922 1557
leader 964 1554
art 862 1552
province 705 1550
food 853 1547
div 598 1542
ret 11 1539
storm 301 1538
tropical 290 1531
army 812 1522
musician 801 1517
set 1027 1516
would 1047 1516
young 1049 1514
asteroid 679 1513
George 894 1511
picture 831 1504
St 746 1502
period 851 1498
fight 883 1498
European 888 1495
minister 748 1493
side 900 1487
left 1022 1481
p 761 1480
India 669 1477
idea 883 1474
special 1044 1474
western 1008 1473
win 905 1472
act 955 1472
Christian 678 1470
modern 1042 1466
science 866 1457
James 841 1454
major 1032 1452
put 1052 1443
earth 812 1436
shall 898 1429
short 1093 1428
n't 858 1427
municipality 1115 1424
energy 449 1421
piece 785 1421
need 1021 1419
service 743 1419
Switzerland 1158 1419
color 659 1419
actress 676 1414
every 1070 1412
hurricane 229 1411
again 1007 1406
lake 531 1403
Henry 745 1402
version 770 1386
allow 997 1383
grow 941 1380
information 979 1375
ISBN 470 1375
few 1102 1373
canton 949 1372
central 913 1371
season 612 1369
jpg 457 1369
Australia 808 1362
cell 347 1356
northern 910 1353
g 902 1351
park 548 1347
ft 124 1345
control 886 1344
head 898 1344
Canada 784 1333
mountain 675 1328
without 1060 1328
Italian 820 1327
named 1118 1316
happen 939 1314
free 880 1309
queen 749 1307
locate 1066 1300
Russian 667 1298
run 906 1295
station 503 1294
football 648 1294
prime 649 1293
pope 509 1292
x 574 1290
h 772 1287
ancient 813 1286
Spanish 719 1278
article 869 1278
video 783 1274
follow 999 1274
problem 776 1266
attack 713 1265
middle 871 1255
leave 926 1251
try 914 1250
Charles 781 1246
third 935 1242
single 762 1238
metal 595 1228
instrument 468 1227
must 765 1220
moon 481 1217
given 998 1214
eat 631 1213
Robert 751 1212
although 973 1212
similar 1004 1204
court 591 1203
council 485 1197
someone 811 1189
join 847 1188
blue 630 1188
strong 794 1187
planet 575 1183
bridge 356 1181
Paul 726 1177
letter 564 1176
close 915 1176
off 861 1175
page 819 1174
hard 869 1174
space 640 1174
Spain 691 1172
la 612 1172
nation 710 1172
stop 797 1171
saint 585 1169
artist 734 1163
top 788 1161
v 677 1160
religion 568 1158
too 900 1156
tree 560 1155
Italy 726 1153
next 862 1151
fire 654 1150
soviet 463 1150
class 696 1148
political 754 1147
means 972 1146
car 581 1144
mother 765 1143
keep 860 1142
r 680 1140
appear 830 1139
military 716 1139
away 888 1132
brother 745 1131
club 547 1130
iii 733 1129
sign 775 1128
public 762 1128
publish 761 1125
gallery 525 1123
California 685 1119
numb 765 1119
eastern 728 1119
friend 721 1114
five 872 1104
program 608 1104
object 605 1102
USA 531 1100
author 639 1100
describe 845 1093
field 636 1091
speak 719 1090
later 902 1086
southern 770 1085
low 672 1085
sell 724 1084
Chinese 531 1080
culture 707 1080
Africa 665 1075
wind 377 1075
building 685 1068
said 731 1067
Russia 603 1067
o 628 1064
society 627 1063
royal 622 1062
league 505 1062
though 861 1059
always 819 1057
movement 580 1052
job 619 1048
send 705 1046
t 625 1044
wife 746 1042
cfcfff 9 1040
result 759 1038
lord 496 1037
catholic 513 1033
source 662 1033
n 516 1028
dynasty 387 1026
sup 230 1025
chemical 576 1023
blood 394 1022
hit 635 1020
never 780 1019
current 728 1018
press 537 1016
local 656 1014
theory 461 1011
case 702 1011
Britain 583 1007
doe 824 1005
hand 625 1005
along 821 1004
ship 506 1001
sub 146 998
ocean 623 994
title 659 992
material 649 992
including 832 988
Ireland 485 987
poet 579 986
map 682 986
date 634 984
sun 628 982
Richard 645 977
produce 695 974
prize 390 970
perform 588 970
David 664 969
sing 475 967
written 723 963
contain 766 958
far 782 957
student 501 957
boy 520 954
almost 831 954
office 627 953
support 689 953
opera 338 953
bring 752 952
element 518 952
meaning 806 951
best 742 950
real 666 948
disease 395 947
my 499 943
night 643 942
green 594 941
radio 506 940
territory 467 939
instead 786 939
L 601 936
Latin 647 934
w 546 932
peter 623 931
Paris 556 931
coast 601 930
bird 370 929
fall 634 926
inside 679 925
Armenian 274 918
once 774 917
symbol 531 917
tr 31 917
former 724 909
san 448 908
girl 513 904
unit 416 904
level 549 903
female 548 901
civil 559 901
especially 782 899
mostly 766 898
feature 631 897
heart 470 896
learn 580 895
role 614 893
add 616 891
original 701 887
talk 677 887
Indian 513 885
view 634 884
holy 470 883
race 406 883
square 456 882
Berlin 318 882
ring 281 882
musical 557 878
prince 545 877
CollГЁge 537 876
certain 699 875
present 653 875
director 568 874
border 585 874
village 553 874
social 517 874
arm 521 873
size 592 872
cannot 617 868
less 702 866
model 499 864
Thomas 591 860
train 450 859
museum 504 858
turn 653 857
bad 603 854
Poland 446 854
daughter 639 851
uk 546 850
community 545 850
duke 456 845
either 751 843
Asia 581 842
male 440 840
politician 522 839
text 432 835
fast 599 835
federal 337 835
reach 611 834
foot 498 834
reason 665 833
return 601 832
lose 638 831
within 661 831
travel 594 830
wave 312 827
meet 630 825
ask 566 822
Michael 551 821
serve 536 819
Scotland 464 817
amount 630 816
simple 617 815
Atlantic 385 814
centre 498 810
natural 568 808
orchestra 265 804
piano 273 803
guitar 323 801
string 215 801
drug 318 800
hall 439 796
living 653 796
process 529 795
parliament 386 793
destroy 590 793
exist 643 791
network 453 788
road 475 787
scientist 529 787
tour 323 786
street 472 786
marry 598 784
Louis 500 781
TV 496 781
soldier 486 780
true 508 780
surface 492 778
Mary 421 778
stone 463 777
project 502 775
belt 665 773
month 596 773
read 528 773
election 358 772
dead 481 770
our 420 769
outside 615 768
among 624 767
remain 618 765
report 442 763
break 626 763
thought 609 763
wall 393 763
cover 615 761
religious 530 761
here 592 761
organization 521 760
himself 555 759
Edward 496 759
Rome 385 758
machine 423 757
full 599 755
six 608 753
above 577 753
influence 548 753
cold 495 751
trade 447 751
mark 531 749
Australian 444 749
following 663 746
issue 438 744
shape 533 743
won 516 742
able 607 741
discover 573 737
k 449 737
association 448 735
mythology 519 735
cross 444 734
championship 302 732
dog 300 732
independent 536 731
minor 501 731
temperature 376 729
gas 335 728
doctor 380 726
Canadian 476 725
seven 564 725
dark 531 725
stay 569 724
Mexico 421 720
summer 509 718
mass 388 718
fish 366 718
online 577 717
ball 267 717
final 467 717
separate 559 716
complete 579 715
possible 593 710
soon 568 708
design 450 707
hot 446 706
division 356 704
damage 399 704
eye 447 704
your 466 703
due 566 702
Washington 438 701
across 541 700
hear 463 700
grand 415 699
defeat 489 696
winter 425 694
elect 433 692
making 611 691
wear 384 691
stage 388 690
continue 543 689
development 446 689
sport 477 688
web 461 686
baby 392 684
scale 302 684
value 417 684
easy 555 683
al 374 681
orbit 367 681
treaty 378 681
lady 375 681
episode 282 679
originally 595 678
Irish 347 677
probably 564 677
peace 462 676
ice 368 676
discovery 471 675
business 448 675
enough 542 674
independence 418 674
plan 443 673
themselves 549 673
oil 329 673
hill 475 672
develop 541 671
dance 358 671
let 511 669
carry 528 669
code 275 668
represent 491 668
pass 510 668
Netherlands 413 667
front 465 667
effect 444 666
Internet 474 665
total 517 665
measure 442 661
Muslim 271 661
product 414 661
wanted 485 658
store 416 658
industry 488 656
replace 555 656
heat 288 655
structure 459 654
dutch 392 654
gold 388 653
research 425 653
concert 323 652
comes 592 652
police 322 650
distance 424 650
feel 433 650
stand 502 650
castle 323 649
paper 393 648
position 449 648
martin 423 648
half 508 647
Jupiter 190 646
von 374 645
production 411 645
standard 387 644
speed 355 644
career 466 642
per 371 639
department 363 639
fruit 335 639
drink 331 637
writing 451 637
bank 426 637
fact 511 637
sister 449 636
solar 215 636
novel 320 634
vote 360 634
method 395 633
opus 248 632
consider 540 631
organ 246 631
days 510 629
parent 452 629
wood 394 628
various 536 628
visit 452 625
weapon 325 625
charge 374 625
Korea 272 624
ground 463 623
airline 106 622
range 472 622
test 319 622
ten 473 621
Egypt 352 618
pop 408 617
bass 305 617
painter 363 617
decide 485 617
electric 310 616
pressure 313 615
cathedral 281 614
polish 339 614
flow 353 614
magazine 393 613
itself 521 613
figure 416 613
founded 495 612
flag 256 612
mainly 544 607
ever 533 606
Adam 285 605
bay 317 605
claim 413 603
harry 215 602
colour 299 601
Joseph 412 601
Sweden 405 599
iv 419 599
voice 342 599
valley 335 598
port 363 597
pacific 326 596
guide 440 592
branch 306 591
tower 288 590
thousand 480 590
fly 361 587
medicine 298 586
belong 510 585
cup 275 585
interest 463 585
past 465 582
orthodox 163 581
literature 383 581
function 355 580
mount 278 580
species 344 579
revolution 382 578
action 426 576
traditional 433 576
Austria 360 576
bill 346 576
genus 258 576
sir 382 574
ad 269 573
golden 345 573
whole 487 573
enter 465 571
sex 279 571
pay 398 570
degree 347 570
working 465 570
heavy 409 570
baseball 291 569
increase 380 568
currently 501 568
week 423 567
length 372 566
van 332 565
African 364 564
Alexander 361 563
market 326 562
satellite 289 559
table 377 559
Nazi 292 558
commonly 484 558
difference 419 557
studio 317 555
why 454 555
philosopher 365 555
specie 321 555
generally 471 552
Jewish 292 551
physical 377 551
Dr 310 549
champion 326 548
airport 264 546
technology 341 546
rest 459 544
nature 405 544
native 410 544
troop 330 543
dfffdf 9 542
understand 403 542
classical 329 542
communist 231 542
hour 364 541
forest 279 541
below 447 540
brown 374 540
activity 399 539
face 400 539
cyclone 135 539
data 305 538
tradition 345 538
democratic 298 537
Florida 294 536
future 412 535
px 276 535
economy 335 534