Language Status

The title for a language page is an anglicized form of the name used to refer to that language in that country. In most cases the name corresponds to the ISO 639-3 reference name associated with the ISO 639-3 code. Where the users of the language have expressed a preference for a different name, Ethnologue generally follows that preference. In other cases, the primary name may be the most well-known English (or anglicized) name associated with the language. Names are generally recorded using English spellings, though diacritical marks may be included. For some language names in southern Africa special symbols are used to represent the click sounds produced with ingressive mouth air.

The subtitle names the primary country for the language. When a language is spoken in more than one country, Ethnologue designates one of the countries as primary, usually the country of origin. In cases where the language is indigenous in multiple countries, the country having the most users is designated as primary.

A complete language description contains the following elements. Follow the link for a full description of the element. Each language description includes only the elements for which information is known.

Language identification gives the code assigned to the language by the ISO 639-3 standard, plus a list of alternate or other names that have been used to refer to the language.
User population gives the number of people in the country who use this language, plus the total number of users worldwide if it is used in multiple countries. These user populations are broken down into first and second language users where the data are available. Also included may be monolingual population, ethnic population, and other comments about population.
Location describes where the language users are located within the country.
Language status gives the EGIDS level for the language in the country and describes the level of official recognition, if any. If the language is associated with an officially recognized nationality or ethnic group, that association is reported here.
Classification provides the language classification.
Dialects lists the names that have been used to refer to varieties of the language, as well as giving information about dialect relations in terms of intelligibility and lexical similarity with other varieties if available. Includes macrolanguage membership if applicable.
Typology provides typological information, including brief descriptions of basic word order, significant phonological, morphological, and syntactic features, and other matters of interest to linguists.
Language use gives information about domains of use, age of speakers, other comments on the viability of the language and patterns of use, the use of other languages by this language community, and the use by others of this language as a second language.
Language development gives information about literacy rates, use in education, language documentation and development products, revitalization efforts, and language development agencies.
Language resources gives a link to the page from the Open Language Archives Community (OLAC) catalog that lists resources in and about the language.
Writing gives information about writing systems and scripts used for the language.
Other comments gives information identifying non-indigenous languages and all additional information about the language or ethnic group, including primary religious affiliations

If the language has significant use in other countries, subentries for these countries are listed at the bottom of the page. Information like classification and typology which is the same in every country is not repeated in these subentries.

Language identification

The entry begins with the international three-letter ISO 639-3 code that is used to identify the language uniquely, plus a list of other names that have been used to identify it.

ISO 639-3 code. The code assigned to the language by the ISO 639-3 standard (ISO 2007) is given in lower-case letters within square brackets. When a given language is spoken in multiple countries, all of the entries for that language use the same three-letter code. The code distinguishes the language from other languages with the same or similar names and identifies those cases in which the name differs across country borders. These codes ensure that each language is counted only once in world or area statistics.

Alternate names. Many languages are known by or have been referenced in the literature by more than one name. Alternate names come from many diverse sources: speakers may have more than one name for their language, or neighboring groups may use different names. Other names may have been assigned by outsiders and used in ethnographic or linguistic publications before the name used by the speakers themselves was known. Another source of alternate names is variant spellings of what is essentially the same name. In many cases, spellings used in languages of wider communication or in regional languages are also included in the list. Some names may identify the ethnic group or place names that have been used in the literature as names for the language.

Some names, used in the past or in use by others, are pejorative and offensive to the speakers of the language. Those are identified, wherever they are listed, by enclosing the name in double quotation marks and appending the label pej. (pejorative) following the name. We include these names as a means of helping users find languages they may have only heard of or seen referred to by such names. By so doing, Ethnologue in no way implies any endorsement of the pejorative names.

Autonym. This is the “self name”, or, the name of the language in the language itself. Furthermore, the form given is a standard spelling within the writing system of the language, which means that this field is never reported for an unwritten language. When the script is non-Roman or contains unusual characters, a romanization of the name is given in parentheses.

User population

Population data have been provided from many different sources over a number of years. This diversity among sources and dates frequently causes the totals of the populations for all of the languages in any given country to differ markedly from the total current census population of the country.

We do not extrapolate population estimates to bring them up-to-date, since populations of language communities do not necessarily increase or decrease at the same rate within a country and since some initial estimates themselves turn out to have been incorrect to start with. However, some population data submitted to the Ethnologue may be the result of extrapolation.

It is often difficult to get an accurate estimate of the number of speakers of a language. All figures are only estimates; this is true even for census figures. Some sources do not include all dialects in their figures or may count as a single language two languages identified separately in the ISO 639-3 inventory. Some sources count members of ethnic groups, who, in some cases, may not be speakers of the language. Some sources do not make clear whether they refer to the total number of speakers in all countries, or only to those in one of the countries. We attempt to distinguish first-language (L1) users from second-language (L2) users. In a case where the source combines these into a single number, we identify the population as “all users”, rather than “L1 users” or “L2 users”.

Country user population. This field begins with a number; it is the number of all known users in the country. It is suffixed with “all users” if it is known to combine L1 and L2 users and further information about the breakdown follows if it is known. If the initial number is suffixed with “L2 users”, then the only known user population is for L2 users. If the initial number has neither of these phrases suffixed, then it is an estimate of L1 users and there are no known L2 users.

Languages that are no longer in use, but still have ethnic group members who identify with the language, are listed as having “No known speakers” in place of a population figure. Languages that have neither societal use nor remaining ethnic group members are described as “Extinct”. Languages which have no L1 speakers but which are used for specific purposes by a community are identified as “Second Language Only”.

Dates and sources for population data are given where available using the conventions described in How references are cited . Where the word “census” appears as the source, it is generally the national census of the country and is not included in the list of references cited. In some cases the source is a government agency (but not the official census) or another organization. Only when the citation has the form “Author Year” will the source appear in the list of references cited; see How references are cited .

Population stability comment. For some languages, we are able to indicate whether the L1 speaker population is increasing or decreasing. This information also contributes to an overall evaluation of ethnolinguistic vitality. There may be a few cases where the actual speaker population count is not known or is unreported, but the stability and general trend of the population is evident and has been commented on.

Population remarks. Additional information concerning populations may include population breakdowns (by dialect, gender, ethnic groups, or specific villages or communities), the population of the deaf community (in the case of sign language entry), or other comments on demographics. In the case of an extinct or dormant language, an estimate of when the last speakers died is given when available.

Monolingual population. Where the data are available, the number of those who are monolingual is reported. In some cases it is reported as a percentage of the L1 speaker population. Where it is known that there are no monolingual users of the language that fact is reported. This information along with the total speaker population is an indicator of the vitality of the language.

Ethnic population. Where it is known, the population of those who identify themselves as part of the ethnic group, whether or not they speak the language, is given. A language with no first-language speakers will be reported as extinct when the ethnic population figure is zero, absent, or unknown. When the reported L1 speaker population is zero but there is an ethnic population figure, the language will be reported as having “No known speakers”.

Location

A description of the locations where the language is spoken is included in each entry where a specific area can be defined. Those languages that are used everywhere in a country or specified region are reported as “Widespread”. These languages may not appear on the country maps. Languages that are widely dispersed in specific locations or which are used by nomadic groups, are identified as “Scattered.” Generally, where locations are known, they are listed in descending order from the largest geopolitical subdivision to the smallest. Major administrative subdivisions are followed by a colon followed by a comma-separated list of subordinate locations. The list of locations may not be exhaustive and locations other than the first-order subdivisions may not be ranked accurately in the list.

Language status

This part of the entry reports on the vitality status of the language in the country, describes its official function in the country, and supplies additional background information for a language of wider communication (LWC)

EGIDS estimate. The vitality status of the language in the country is summarized by estimating its level on the Expanded Graded Intergenerational Disruption Scale (EGIDS); see the complete section on Language Status for a listing of the levels. In cases where the rest of the language entry is sparse in terms of reporting facts about the situation of the language, this estimate can be taken to be the best guess of contributors familiar with the region.

An EGIDS estimate is provided only for languages that are judged to be “established” within the country. This includes all languages that are indigenous to the country, plus any languages originating from elsewhere that have become rooted in that country. We judge a non-indigenous language to have become established in a country when it meets the following two characteristics. First, it is being acquired by the next generation. This can take place by various means—in the home, through mandatory schooling, or in the work place. Second, its use is a norm (whether as L1 or L2) within a language community or a community of practice. The community of practice may include students who learn it as an L2 in a widespread, mandatory educational system.

From the point of view of sustaining language use, the single most significant break in the EGIDS scale is the divide between 6a and 6b. For languages that are 6a and higher, it is the norm that the language is being learned by all the children within its user community. But at level 6b and below, this is no longer the norm and intergenerational transmission is being disrupted. The determination about that is based on information that has been reported about whether or not all children use the language. For cases in which no such information has been reported, we use an asterisk as a modifier on the EGIDS estimate to indicate that it represents our editorial best guess. Thus 5* or 6a* indicates a language that we think is most likely to be in vigorous use by all, while 6b* indicates a language that we believe is most likely to be losing speakers. These judgments have been made by comparing the population of the language in question to the populations of all the other languages in the same country or region for which there is explicit data about whether the language is vigorous or is beginning to shift.

Special cases. There are two cases in which the status field is reported differently as either unestablished or unattested. Due to the increasing influence of human migration on the world language situation, Ethnologue now includes full entries for immigrant languages that have a significant presence in a host country. Languages identified as Unestablished are those that have not yet become rooted in the host country and thus do not share the characteristics described above of being transmitted to the next generation within the country as a norm for a language community or a community of practice. These include the first languages of refugees, newly arrived immigrants, temporary foreign workers, or immigrants who are so scattered as not to form significant speech communities within the host country. These may also include languages learned as an L2 by a significant number of people in the country through elective classes in education.

In a few cases, there is real doubt as to whether the language actually exists. Although an ISO 639-3 code has been assigned, data on the existence of the language is not convincing. In such cases, we do not report an EGIDS level but identify the language status as Unattested . A full entry is published in order to document what the ISO 639-3 code is meant to signify, the language is not counted in the statistics as a living language. Languages identified as unattested are submitted to the ISO 639-3 review process and removed from future editions if they become deprecated by the ISO 639-3 standard.

Function in country. If the language has been officially recognized in legislation or serves official functions at the national or provincial levels, there is an additional note naming the nature of the recognition and function. If the recognition is statutory, the statute is identified. If the recognition is regional, the region where the status is assigned is identified. The categories for recognition and function are described in the section on Official Recognition .

LWC information. If the reported language status is EGIDS 3 (Wider Communication) and the data are available, further information about the history or the nature of the use of this language as an LWC by L1 speakers of other languages is described.

Classification

This part of the entry names the linguistic affiliation of the language.

All languages are slowly changing, and linguistically related varieties may be diverging or merging. Most languages are related to other languages—to some more closely and to others more distantly. Linguists have used terms such as phylum, stock, family, branch, group, language, and dialect to refer to these relationships in increasing order of linguistic similarity much like a family tree.

Linguistic classification. The classification information for each language follows the general order from largest grouping to smallest. More inclusive group names are given first, followed by the names for less inclusive subgroups, separated by commas.

Language classification information comes from a variety of sources. The Ethnologue attempts to report the generally accepted consensus of scholars working in the language family based on published works and scholarly review. The sources on which the classifications are based are not overtly cited in the language entry but may be included in the list of general references listed at the country level.

A listing of the highest-level language families (including the number of languages, average populations, and countries where spoken) is given in the Statistical Summaries . The family trees may be browsed by going to Browse by Language Family .

Dialects

This part of the entry gives information about the names of dialects of the language. It may also describe the relationships among dialects or to other languages in terms of dialect intelligibility and lexical similarity. It also includes macrolanguage membership if applicable.

Dialect names. Speech varieties which are functionally intelligible to each others’ speakers because of linguistic similarity are generally considered dialects of the same language and the names of all such dialects are listed under that language. In addition, alternate names for individual dialects are listed in parentheses following the primary name for the dialect. When one of these names is known to be offensive to its speakers, it is placed in double quotes (and tagged as pejorative with the abbreviation “ pej .” as is also done for alternate language names).

The listing of dialect names is not the result of rigorous dialectological investigations. As with the alternate names, the list of dialect names includes all names reported to us which may, at one time or another, have been used in reference to some variety of a language. Some of these names are village or regional names and may not actually represent significant linguistic variants. In a few cases, the ISO 639-3 standard has assigned individual language identification codes to varieties which we, on the advice of our contributors and consultants, have included in our list of dialects. In those very few cases, we depart from the ISO 639-3 standard and do not list these varieties separately as individual languages.

Intelligibility and dialect relations. A measure of inherent intelligibility with other varieties is given by percent. Values of less than 85% are likely to signal difficulty in comprehension of the indicated language.

The ability of the users of one variety to understand another variety, based only on the similarity of those two varieties, is called inherent intelligibility. Intelligibility may not be reciprocal or mutual, thus the wording of the intelligibility description may indicate the direction of the intelligibility (e.g., 85% intelligibility of another variety, or 85% intelligibility by speakers of another variety). If the direction of intelligibility is not indicated (e.g., 85% intelligibility with another variety) or is identified as being mutual, it should be understood as being reciprocal with speakers of each of the varieties mentioned understanding each other equally well.

The ability of speakers to understand another variety because of previous exposure to it or learning is called acquired intelligibility and may be commented on in some language entries.

Lexical similarity. The percentage of lexical similarity between two linguistic varieties is determined by comparing a set of standardized wordlists and counting those forms that show similarity in both form and meaning. Percentages higher than 85% usually indicate a speech variant that is likely a dialect of the language with which it is being compared. Unlike intelligibility, lexical similarity is bidirectional or reciprocal.

Macrolanguage membership. If an individual language is a member of a macrolanguage (see Macrolanguages in “The problem of language identification”), that fact is reported here. The listing gives the name of the macrolanguage of which the individual language is a member, the name of the primary country under which its entry is found (if different from the current country), and the ISO code for the macrolanguage. By looking up that entry, it is possible to find a list of all the members of the macrolanguage.

Typology

A list of linguistic features of the language is given. Constituent order is the most commonly reported feature. Other basic characteristics that are of particular interest to linguists are also reported when the data are available. In a growing number of cases these listings are more extensive and cover a range of linguistic features, including information about the existence of prepositions versus postpositions, constituent order in noun phrases, gender, case, transitivity and ergativity, canonical syllable patterns, the number of consonants and vowels, the existence of tone, and in some cases whether users of the language also use whistle speech. These descriptions are no more than brief mentions, however, and do not constitute adequate descriptions of the language.

Language use

This part of the entry gives information about the use and viability of the language, as well as the use of other languages by members of the community. These data, for the most part, provide supporting evidence for the assignment of the EGIDS status (See Language Status section above).

Vitality Remarks. As a general summary, where the language is being passed on to children as their first language, or where it is used frequently and widely within the community, the term “Vigorous” is most often used. Other factors related to language vitality that may be reported are descriptions of languages that are used, use of this language by others, and the degree and nature of language shift that may be taking place.

Domains of use. When more than one language is used in a community, speakers often establish patterns of language use for specific configurations of speakers, topics, and locations. These domains of language use can be described by answering the well-known question, “Who speaks which language to whom, about what, and where?” In some language entries, we are able to specify a set of identified domains of use and we may also report whether the domain is associated exclusively with the language or is one where mixed language use is prevalent.

The Ethnologue does not have sufficient data about each language to permit a full description of the domains of use in this technical sense, but uses the term to refer most often to a general set of categories that name the context is which communication takes place (e.g., home, community, work, education, and religion) and thus only indirectly related to the topics and speakers most generally associated with those settings.

User age range. As language use shifts from a traditional language to one of wider communication, differences in use appear between age groups. As language change takes place, older adults tend to be the final speakers of the traditional language. This field describes the age range of those who use the language as an L1. When possible the value is chosen from the following picklist:

Used by all — The language is used by virtually everyone in every age group.
Some young people, all adults — All adults still use the language, but among children and youth, some use it and some do not.
Some of all ages — Language shift has been in progress for multiple generations; as a result, in each generation there are some who use the language and some who do not.
Adults only — No children or youth use the languages; all remaining users are in the child-bearing generation and older.
Older adults only — The only remaining speakers are middle-aged and older (e.g., 45 and above).
Elderly only — The only remaining speakers are of the great-grandparent generation (e.g., 70 and above).

Language attitudes. This field describes the general attitudes of the language community itself towards the use of its own language. We report only summary attitude evaluations as positive attitudes, neutral attitudes, or negative attitudes. Where attitudes towards use of the language are not the same throughout the community, we may report “mixed attitudes”.

Bilingualism remarks. Descriptions of the use of second languages by this language community are included here. Generally the remark consists of the phrase “Also use” followed by the name(s) of the additional languages. If use of a particular L2 is restricted to a particular domain or region or population segment, a comment to that effect is added.

These statements may be modified by a term estimating the extent of the second-language usage. The terms correspond to fairly broad percentage ranges as follows:

All — At least 95% of the ethnic population use the reported language as L2.
Most — At least 65% but less than 95% of the ethnic population use the reported language as L2.
Many — At least 35% but less than 65% of the ethnic population use the reported language as L2.
Some — At least 5% but less than 35% of the ethnic population use the reported language as L2.
Few — Less than 5% of the ethnic population use the reported language as L2.

These quantifiers are frequently based on the best estimates reported to us, though in some cases they represent calculated conversions of reported percentages over a wide time period. The bilingualism remarks are constructed automatically from the Ethnologue database with the result that they are sometimes repetitive or redundant.

When significant language shift has taken place, the “Also use” wording is changed to “Shifting to” (in the case of EGIDS 7) or “Shifted to” (in the case of EGIDS 8a, 8b, and 9) to indicate that the named language is the one that has been adopted in the home domain as the new L1 among children. If the entry lists additional languages with the “Also use” designation, these indicate languages that are an L2 for both the L1 user community and for those who have shifted to a different L1.

Use as second language. When the language in focus is used by others as a second language (as reported in the bilingualism remarks in other language entries), this is indicated with the phrase “Used as L2 by ...”. Following this introductory phrase is a list of the other languages that are reported to use this one as a second language. As with L2 use, this report of usage does not imply any specific level of proficiency.

Language development

This part of the entry gives information about literacy rates, use in education, publications and use in media, revitalization efforts, and language development agencies.

Literacy rates. Where available, percentages of the speaker population who are literate are given for L1 and L2 languages. Where the L2 is not specifically identified, it is assumed to be the dominant language of the country in focus or another major language in the vicinity.

Literacy remarks. Information concerning motivation for literacy and existence of government (and other) literacy programs are given where available. Additional information concerning literacy that does not appear in related categories may also be reported here.

Use in elementary or secondary schools. The language may be used either as a language of instruction or taught as a subject within one or more schools in the language area. Generally, we only include a statement in this category if the language is used in the schools. Occasionally some additional information about the nature of that use is also available and is reported.

Publications and use in media. The existence of materials that have been produced in the language such as language documentation (dictionaries, grammars, texts), printed literature, and broadcast media is indicated when known. We report the existence of such materials but do not list titles individually. Where extensive literature and media exist, we identify the language as “Fully developed”.

The most widely published book in the world is the Bible with at least portions having been translated and published in 3,116 or 43% of the living languages listed in the Ethnologue . Our information on the existence of the biblical text comes from a variety of sources. The information about Bible publication for each language is given with the dates of both the earliest and the most recent published Bible, New Testament (NT), Old Testament (OT), or complete books (portions) of the Bible.

Revitalization efforts. When formalized efforts to revitalize an endangered language have been reported, a cursory description of those efforts is given.

Language development agencies. Agencies that focus on the revitalization, maintenance, or development of the language are listed. These may be national or provincial official or semi-official entities or they may include formally constituted local organizations. In general, international development organizations are not included here. Additions to the existing information are welcomed.

Language resources

This part of the entry provides a link to the catalog page for this language compiled by the Open Language Archives Community (OLAC). It lists all of the resources held by participating archives that are known to be in or about the language. OLAC compiles an aggregated catalog of approximately 400,000 items held by more than 60 archives.

Writing

For each language, the script used for written materials is given if known. Where multiple scripts are associated with the language they are reported in alphabetical order. Where possible we also report any specific style of a script that is used, the years when a script began to be used or ceased to be used, and other comments regarding writing and orthography. In general, where no script is identified, it can be assumed that there is no widely accepted and used writing system. Scripts other than transcription systems also exist for some Sign Languages but are not in wide use and so are not currently reported.

Other comments

This part of the entry gives additional information that does not fit under the above categories.

Non-indigenous. A language that did not originate in the country, but which is now established there either as a result of its longstanding presence or because of institutionally supported use and recognition is identified here with the label “Non-indigenous”. In general, these non-indigenous languages represent two different situations: Some are heritage languages associated with a long-established community which originated elsewhere. In many, but not all, of these cases the language is losing speakers as its users shift to a more dominant language. Others are major languages that are being transmitted to large numbers of people as a second language through formal educational institutions resulting in widespread second-language acquisition and growing use.

General remarks. These are general statements about the language or its context that do not fall into other specific categories. Alternate identifications of the language community or ethnic group may be identified or explained here. These may include government recognized or official nationalities, ethnic names, or the meanings or derivations of certain names. Other historical and ethnographic information may be included here as well.

Religion. The religious affiliations of the speakers of the language are given where known. These are generally listed in descending order of number of adherents.

Macrolanguage member languages. If the entry is describing a macrolanguage (see Macrolanguages in “The problem of language identification”), then a complete list is given of the individual languages that fall within the scope of the macrolanguage.

Second Language Only status. While there are many languages that are used as second languages by large populations of speakers, the phrase “Second language only” is used to identify a specific category made up of those languages which are used as second languages but have no L1 speakers and generally weak or secondary ethnic or identity associations. These may include languages of special use, such as languages of initiation, languages of interethnic communication, liturgical languages, as well as cants and jargons. Most often these languages are given a status of EGIDS 3 (Wider Communication) but are identified in this way as well because of the absence of L1 speakers.

Use in other countries If the language is present in more than one country, the entries for the language in those countries are listed at the bottom of the page.