Wiktionary:Criteria for inclusion

From Wiktionary
Jump to: navigation, search
Green check.png This page is a hard rule on the Simple English Wiktionary. Many people agree with it. They see it as a standard that all users should follow. When changing the page, please check that the other people agree with your changes. Use the talk page when you are not sure or when you want to propose a change.
Shortcut:
WT:CFI
This article's English may not be simple
The English used in this project page may not be easy for everybody to understand.

You can help by making this page simpler.

As a simple English dictionary, Simple English Wiktionary will have all words in the English language, and its definitions will be in simple English.

General rule[change]

A term should be here if it is likely that someone might find it and want to know what it means. More formally, terms that are attested and idiomatic should be here.

"Terms" is a broad word[change]

A term does not need to be a single word. We can also have:

Attestation[change]

"Attested" means confirmed through

  1. Clearly widespread (common) use,
  2. Being in a well-known work such as a book,
  3. Being in a peer-reviewed academic journal, or
  4. Being in permanently recorded media such as books or movies, giving meaning, in at least three separate places over at least a year.

Where possible, it is better to cite sources that will probably stay accessible for a long time, so that someone reading Wiktionary later will probably be able to find the original source. Since Wiktionary is an online dictionary, this means that media such as blogs and usenet groups, which are durably archived by Google, are better. Print media such as books and magazines can also be used, especially if their contents are indexed online. Other recorded media such as audio and video can also be used if their origin can be confirmed and they are durably archived. When quoting a book, say the ISBN.

Giving meaning[change]

This means we do not care if the term is in a word list, its form is talked about, such as "The word 'foo' has three letters," definitions by themselves, and made-up examples of how a word might be used. For example, if a term is in an online dictionary, that is suggestive (helps us think it might be a good term), but it does not show the word used to give meaning. But a sentence like "They raised the jib (a small sail forward of the mainsail) to get the most out of the light wind," in an account of a sailboat race, would be fine. It contains a definition, but the word is also used for its meaning.

Independence[change]

This means that sources that use each other should not be here. If Wikipedia has an article on a subject, and that article is mirrored by another site, some of the words on the mirror site are not independent. It is common to find that material on one site comes from another. The same quote is often exactly the same in different sources. Even if the sources are independent of each other, the terms they use are not.

If a term is only used by a small group, they do not need to look here to find its meaning.

Over at least a year[change]

This means we should not have words that are used for a little while and then disappear. There is no reason why it should be one year, but it seems to work.

Idiomatic[change]

A term is "idiomatic" if its full meaning is not obvious from the meaning of its parts.

For example, this is a door is not idiomatic, but shut up and red herring are.

Compounds are generally idiomatic, even when the meaning is from the parts. The reason is that the parts often have several possible senses, but the compound often only means some of them.

For example, mega- can mean either a million (or 220) of something or simply a very large or important example of something. Similarly star might mean something bright in the sky or a celebrity. But megastar means "a very well-known celebrity", not "a million celebrities" or "a million celestial objects", and only sometimes means "a very large celestial object."

This rule must be used carefully and is subjective to a degree. For example, bank has several senses and parking lot has an idiomatic sense of "large traffic jam". But bank parking lot can't mean "to do financial transactions large traffic jam." With clearly wrong meanings like that removed, the choices are "place to park cars for a kind of business" or "place to park cars by, for, or on a river bank (not, for example, the hill parking lot)." The whole phrase could mean either, depending on context (the first is probably far more common), and so the phrase is not idiomatic.

This rule is sometimes called the fried egg test, as a fried egg generally means an egg (and generally a chicken egg) fried in a particular way. It generally doesn't mean a scrambled egg, even though a scrambled egg is cooked by frying.

Many idioms are clearly idiomatic, for example red herring. We use these tests only in unclear cases.

Misspellings, common misspellings and variant spellings[change]

There is no simple rule, especially in English, for deciding whether a spelling is "correct." A person who says a spelling is right should have sources ready. Published grammars and style guides can be useful there, as can statistics about how common forms are.

Most simple typos are much less common than the most frequent spellings. But some words are often misspelled. For example occurred is often spelled with only one c or only one r, but only occurred is considered correct. The misspellings may deserve separate entries.

English does not have an academy that decides what spellings are right, and that is why it may have uncertain spellings.

Place or historical variations are not misspellings. For example, there are well-known differences between British and American spelling. A spelling considered incorrect in one place may not be found in another, and may even be accepted in another.

Formatting[change]

Once editors decide that a misspelling is important enough to deserve its own page, the formatting should be easy. The usual part of speech headings can be used, followed by this simple entry:

# {{misspelling of|[[...]]}}

Another section that tells why the term is a misspelling is optional.

Word forms (Inflections)[change]

If a word is regular, other forms do not need to be here but can be. If they are here, they should not be redirects but should show the form and link to the main form.

Irregular forms such as geese and were should have their own entries, because people who do not know about the irregularity will look for them under the other form. All forms with idiomatic meanings, such as blues or smitten, should have their own entries, with the regular meanings separate from the idiomatic ones.

Idiomatic phrases[change]

Many phrases have several forms. It is not necessary to include all forms. When present, minor forms should redirect to the main entry. For the main entry, use the most generic form, based on the following:

Pronouns[change]

Use the generic personal pronoun, one or one's. This means that feel one's oats is better than feel his oats. Other personal pronouns, especially in the singular, should not be used except where they are necessary to the meaning.

Articles[change]

Do not use an article at the beginning unless it makes a difference in the meaning. E.g., cat's pajamas instead of the cat's pajamas.

Verbs[change]

Use the infinitive form of the verb (but without "to") for the principal verb of a verbal phrase. This means that for It's raining cats and dogs, or It was raining cats and dogs, or I think it's going to rain cats and dogs any minute now, or It's rained cats and dogs for the last week solid the entry should be under rain cats and dogs. The other forms are found by the usual rules of grammar (including the use of it with weather terms and other impersonal verbs).

Proverbs[change]

Proverbs that are whole sentences should begin with a capital letter. For example: You can't judge a book by its cover.

Languages[change]

Only words used in English should be here. This can mean terms from other languages that are commonly used in English. An entry on the name of a language may be here.

Things that should not be here[change]

Vandalism[change]

Sometimes people will add terms or definitions to Simple English Wiktionary which do not follow Wiktionary's purpose or practices. This is vandalism and will be removed when seen. If the vandalism is a change to a page, that change will be undone. If the vandalism is a new entry, that entry will be removed. This is done when administrators decided to and does not need discussion, even if the vandalism is a new entry for a term which would normally be here but is not here yet.

New words[change]

Some terms are added because people hope that they will be used later, but they are not used now. They can be put on the English Wiktionary's list of protologisms, and should not have a separate entry here.

Wikisaurus[change]

If we decide to make a Simple English Wikisaurus like the English Wikisaurus on the English Wiktionary, its criteria for inclusion will be here.

Wiktionary is not an encyclopedia[change]

See also Wiktionary is not an encyclopedia.

Editors should try to make sure that entries do not become like encyclopedia articles; if this happens, that information should be moved to Wikipedia, but the dictionary entry should stay.

Wiktionary entries are about words, not about people or places. Many places, and some people, are known by single word names that qualify for inclusion as given names or family names. The Wiktionary articles are about the words. Articles about the places and people belong in Wikipedia. For example: Wiktionary will give the pronunciations, alternative spellings, and eponymous meanings, of the names Darlington, Hastings, David, Houdini, and Britney. But articles on the specific towns (Darlington, Hastings), statue (David), escapologist (Houdini), and pop singer (Britney) are Wikipedia's job.

Etymologies[change]

The editors here have decided that we will not include etymologies of words for now. This may change.

Translations[change]

The editors here have decided that we will not include translations of words into other languages. We are trying to define English words, and other languages only make things more complicated.


Issues to consider[change]

Attested vs. the slippery slope[change]

Sometimes people worry that adding an entry will lead to entries for many similar terms. This is not a problem, because each term is considered on its own based on how it is used, not on how other terms are used that are similar in form. Some examples:

  • Any word in any language might be borrowed into English, but only a few are. Having spaghetti does not mean that ricordati is next (we should not have ricordati, because it is not a word used in English).
  • Any word may be put into Pig Latin, but only a few (such as amscray) are in common use.
  • Any word may be put into leet style, but only a few (e.g., pr0n) are in general use.
  • Grammatical affixes like meta- and -ance can be added in many more cases than they are. (Some basic suffixes like plural -s and past tense -ed are used almost anywhere.)
  • It may seem that internet prefixes like e- and i- are used everywhere, but they aren't. If I decide to talk about e-thumb-twiddling but no one else does, then we do not need an entry.

Typographic variants[change]

People do not agree if terms that have unusual characters or are unusual in form, such as G-d, pr0n, i18n or veg*n should be here. Some people think that these terms are not good ones, even though they are used in many places.

Names[change]

There are two kinds of names: individual given names and family names, which are single words, and the names of people, places, and things. Simple English Wiktionary says that they are both proper nouns, but each kind is treated differently.

Given names and family names[change]

Given names (such as David, Roger, and Peter) and family names (such as Baker, Bush, Rice, Smith, and Jones) are words, and the same criteria applies to them as any other words. Simple English Wiktionary has main articles giving meanings and translations for given names and family names. Wiktionary may eventually give alternate spellings and etymologies, but does not give those now.

For most given names and family names, it is easy to show that the word fits the criteria, because most given names and family names are in common use. But being a name does not mean a word should be here. A new name that is not commonly used is still a new word and should not be here. A name that is only used by a few people in one place should not be here.

Nicknames, diminutives, names from one's father's first name (patronymics) and abbreviations of names (such as Jock, Misha, Kenny, Ken, and Rog) are treated the same way as other names.

Names of actual people, places, and things[change]

A name should be here if it is used attributively (like an adjective), with a commonly understood meaning. For example: New York is included because "New York" is used attributively in phrases like "New York delicatessen", to describe a kind of delicatessen. A person or place name that is not used attributively (and that is not a word that should be here for other reasons) should not be here. Lower Hampton, Empire State Building, and George Walker Bush should not be here. In the same way, even though Jefferson (an attested family name word that Wiktionary can discuss) and Jeffersonian (an adjective) should be here, Thomas Jefferson (which isn't used attributively) should not.

A name should be here if it has become a general term. For example: Remington is used as a synonym for any rifle, and Hoover as a synonym for any vacuum cleaner. (Both are also attested family name words, and should be here because of that also.) Hamburger is used as generic term for a type of sandwich. One good rule of thumb on whether a name has become a generic word is whether the word can be used without capitalization (as "sandwich" was in the previous sentence).

A trademark or a company name should not always be here. (Some company names are derived from family names, and are included for that reason.) Some words are trademarks and company names, but not all trademarks and company names are words. (Trademark holders often do not want their trademarks to become words. Adobe Systems says, there is no word Photoshopped, since Photoshop® is a trademark and not a common verb that can have a past participle; Xerox says there is no word xerox, since Xerox® is a trademark and not a common verb; Sony says there is no word Playstationize since there is no word Playstation and PlayStation® is a trademark and not a common verb.) Many trademarks and company names are made to be new words (protologisms). To be here, the trademark or company name must be used as a common word, not only as a trademark.

What Wiktionary is not with respect to names[change]

Wiktionary is not a genealogy database. Simple English Wiktionary articles on family names, for example, should not be about the people who share the family name. They are about the name as a word. For example: the entry "Yoder" may tell what the word means, but it should not tell about the ancestries of people who have the family name Yoder.