The merger of ḍād and ẓā’ in Arabic

As noted in a previous post, Arabic, being a Semitic language, has a series of “emphatic” consonants, which are realized with a simultaneous secondary constriction in the back of the mouth or in the throat. The Arabic pronunciation of the emphatics contrasts with their pronunciation in its hypothesized ancestor Proto-Semitic (PS), where they were likely ejective consonants.


Diagram of the places of articulation of the Arabic consonants. Notice that ظ is pronounced at the tip of the tongue, while ض is pronounced at the sides of the mouth. Source.

Modern Standard Arabic (MSA) has two voiced emphatic consonants ض/ḍād and ظ/ẓā’ḍād is pronounced similar to d as in “dog.” Its typical IPA notation is /dˤ/. ẓā’ is pronounced somewhat like the th in ‘this,’ and is most often transcribed in IPA as /ðˤ/. The UCLA Phonetics Lab has recordings of Arabic, which illustrate the difference between the plain and emphatic consonants.

ḍād and ẓā’ are derived from two PS consonants that are transcribed as *ṣ́ and *ṯ̣, respectively. Because PS is a hypothetical language, we can only speculate as to its actual features (if it was indeed ever a unified language). But based on a variety of information from Semitic languages, both dead and living, scholars have postulated that *ṣ́ was a lateral fricative consonant, something like /ɬ’/ or /tɬ’/. Only the Modern South Arabian languages, spoken primarily in Yemen and Oman, preserve this pronunciation in the present day.

There is a great deal of evidence that at one point, Arabic too preserved a lateral pronunciation of *ṣ́, or more specifically in its Arabic descendant ḍād. For instance, Sībawayh, a Persian scholar who lived in the 8th century CE and who was the first person to write down a grammar of Arabic Al-Kitāb, says the sound was pronounced min bayni ’awwali ḥāffati l-lisāni wa-mā yalīhi mina l-’aḍrāṣ “between the beginning of the tongue’s edge and the adjacent molars.” The probable pronunciation was probably /ɮˤ/. This description is very different from Sībawayh’s characterization of modern ḍād‘s voiceless counterpart ṭā’: mimmā bayni ṭarafi l-lisāni wa-’uṣūli ṯ-ṯanāyā “between the tip of the tongue and the base of the incisors.”

Beyond these descriptions, there is evidence from Arabic dialects and languages that have borrowed from Arabic that ḍād had a lateral pronunciation. For instance, in the Arabic of Dathina, spoken in southern Yemen, ḍād is reportedly pronounced as (not unlike the l in “ball”): e.g. ’abyaḷ for ’abyaḍ “white.” In Malay and Indonesian borrowings from Arabic, ḍad sometimes corresponds to l or dllaif or dlaif < ḍa‘īf “weak.” Similarly, in Spanish borrowings from Arabic, ḍād usually has a lateral element: alcalde “mayor” < l-qāḍī “judge.” Eventually, in a process that will be explained below, ḍād lost its lateral quality and is now pronounced as /dˤ/ in Modern Standard Arabic and many colloquial varieties.

*ṯ̣, the ancestor of ẓā’, was likely an ejective dental fricative or affricate, /θ’/ or /tθ’/. By the time of Sībawayh, it had probably developed into its modern pronunciation of /ðˤ/, as he describes its place of articulation as mimmā bayni ṭarafi l-lisāni wa-’aṭrāfi ṯ-ṯanāyā “between the tip of the tongue and the tip of the incisors.”

While Modern Standard Arabic maintains a regular distinction between ḍād and ẓā’, practically every modern colloquial Arabic variety has merged the two, either into ḍād or ẓā’. The general rule is that if the particular dialect maintained the two interdental fricatives ṯā’/ث and ḏāl/ذ (pronounced like the th in “thin” and “this,” respectively), then the form that survived is ẓā’. If the interdental fricatives became tā’/ت and dāl/د, then ẓā’ merged with ḍād. In other words, it appears that these sound changes took place as a result of analogous interpretation by Arabic speakers. But there exists a problem with this theory. If the loss of ḍād is connected to the preservation of the plain interdental fricatives, how can we explain the continued existence of plain stops tā’ and dāl? If the process were indeed analogous, then they, too, should have merged with their fricative counterparts and became ṯā’/ث and ḏāl/ذ. But that doesn’t happen in any dialect of Arabic.

In order to explain the modern pronunciation of ḍād, we must refer back to its original lateral form. Like all of the emphatics, *ṣ́ had a plain counterpart  in PS, probably pronounced as /ɬ/ or /tɬ/. This sound continued to be distinguished in Arabic, but it its pronunciation had changed by the time of Sībawayh, who described it as being pronounced min wasaṭi l-lisāni baynihi wa-bayna wasaṭi l-ḥanaki l-’a‘lā “from the middle of the tongue and the middle of the soft palate.” Some scholars have suggested that this is a description of /ç/, which is different from the modern pronunciation of /ʃ/. In any case, it’s pretty much clear that the lateral quality had been lost by the 8th century, which made ḍād fairly isolated in the system as a lateral fricative. Over time, ḍād became more and more like ẓā’, eventually merging with it completely. This means that every dialect of Arabic that survives to this day at one point had only the consonant /ðˤ/ to represent both ḍād and ẓā’.  Thereafter, those dialects that merged their interdental fricatives ṯā’ and ḏāl with tā’ and dal also changed the emphatic interdental fricative /ðˤ/ into a new sound /dˤ/, which then became the modern pronunciation of ḍād.

The preceding is a fairly straightforward explanation, but it’s also highly theoretical. In real life, language is very rarely this precise: there are always complications and exceptions. Arabic, after all, is a living language, spoken by hundreds of millions of people in a variety of social contexts. Therefore, it isn’t enough to just present the history of sound changes as if that’s the whole story. We must also look at the how people use the language because that absolutely affects the trajectory of the language as a whole.

For example, in Jordan, there are two types of dialects: Palestinian-origin and native Jordanian-origin. The dialect of the capital Amman is strongly influenced by Palestinian urban dialects, which were brought over by Palestinian refugees after 1948. Like other urban dialects within the Levant, the historic interdental fricatives have been lost for many speakers of Ammani Arabic. The native Jordanian dialects, on the other hand, are mostly rural and have maintained the fricatives. So where someone from Amman might say tnēn [tne:n] “two” or ’axad [ʔaxad] “take,” a person from Karak would say ṯnēn [θne:n] or ’axaḏ [ʔaxað]. Native Jordanian dialects have also historically had an interdental pronunciation for both ḍād and ẓā’: ḍarab “hit” and ẓaher “back” are both pronounced with ẓā’. Yet, as Al-Wer (2004) reports, the ẓa’ pronunciation has developed a stigma in Jordanian society as a mark of uneducated or uncultured speech, which has resulted in Jordanian Arabic speakers using it less and less. It is now a highly marked form that is restricted to a minority of speakers in the country. Yet the plain interdentals ṯā’ and ḏāl are not similarly stigmatized and continue to be pronounced unchanged as fricatives. Thus in Jordan, there are now speakers who preserve the plain interdentals but not their emphatic counterpart, which contradicts the expected outcome of the above historical account.

This situation underscores the importance of social factors as an element in linguistic analysis. In Jordan and in the Levant more generally, urban dialects are considered more prestigious, and there has been a trend in the region as a whole towards the dominance of urban dialects over rural dialects (of course, there are complicating factors, but urban prestige is certainly one of the important factors). If scholars were writing about these sounds 100 years in the future with no information about the social context of the Jordan, they would likely have a hard time explaining why Jordanian dialects maintained the plain interdental fricatives but not the emphatic one. At the same time, if scholars did not have any information about the history of these sounds, they would also not be able to accurately explain why the emphatic ḍād and ẓā’ are treated differently from plain counterparts ṯā’ and ḏāl in modern Arabic dialects. Al-Wer’s study of the ḍād and ẓā’ is a perfect example of how synthesizing a field as theoretical as historical phonology together with a field as practical as sociolinguistics can provide a more complete understanding of the features of a given language.


Al-Wer, E. (2004). Variability reproduced: A variationist view of the [ḏ]̣/[ḍ] opposition in modern Arabic dialects. In M. Haak, R. de Jong, & K. Versteegh (eds.), Approaches to Arabic dialects : A collection of articles presented to Manfred Woidich on the occasion of his sixtieth birthday: 21-32. Leiden: Brill.

Carter, M. G. (2004). Sībawayhi. Oxford: Oxford University Press.

Kogan, L. (2011). Proto-Semitic Phonetics and Phonology. In S. Weninger, G. Khan, M. P. Streck, & J. C. Watson (Eds.), The Semitic Languages: An International Handbook: 54-151. Berlin: De Gruyter Mouton.

Lipinski, E. (1997). Semitic Languages: Outline of a Comparative Grammar. Leuven: Peeters.

Versteegh, K. (1997). The Arabic Language. New York: Columbia University Press.

Watson, J. C. E. (2002). The Phonology and Morphology of Arabic. Oxford: Oxford University Press.


“The Garšūnī language”


My involvement in cataloging Syriac and Arabic manuscripts over the last few years has impressed upon me how often and actively Syriac Orthodox and Chaldean scribes (and presumably, readers) used Garšūnī: it is anything but an isolated occurrence in these collections. This brings to the fore questions of how these scribes and readers thought about Garšūnī. Did they consider it simply a writing system, a certain kind of Arabic, or something else? At least a few specific references to “Garšūnī” in colophons may help us answer them. Scribes sometimes make reference to their transcriptions from Arabic script into Syriac script, and elsewhere a scribe mentions translation “from Garšūnī into Syriac” (CFMM 256, p. 344; after another text in the same manuscript, p. 349, we have in Arabic script “…who transcribed and copied [naqala wa-kataba] from Arabic into Garšūnī”). Such statements show that scribes certainly considered Arabic and Garšūnī distinctly.

View original post 289 more words

Persian loanwords in Arabic


Pahlavi inscription from the mid-Sassanian period. The Pahlavi script was based on Aramaic and is a testament to the cultural exchange between Iranian and Semitic languages. Source.

Much has been said about the influence of Arabic, as the language through which Islam was spread, on the speech of majority Muslim peoples. However, not nearly as much attention has been paid to how other languages have shaped and changed Arabic. Persian, mostly through word borrowings, has been one of the most important — if not the most important — of these languages, and what follows is a brief description of the nature of Persian’s lexical influence and how Arabic adapted its sound and word structure.


The first thing to note is not all Persian words entered into Arabic directly. Many of them were borrowed through Aramaic, which was the major lingua franca and trade language of the region (as well as serving as the official language of the Achaemenid dynasty of ancient Persia). As such, many of the Arabic words for spices, plants, precious stones, and other common goods are Persian in origin. Examples include the following:

  • Arab. ‹ballūr› ‘crystal’ < Pers. ‹belūr›
  • ‹fayrūz› ‘turquoise’ < ‹pīrūze›
  • ‹hāl›/‹hayl› ‘cardamom’ < ‹hel›
  • ‹ˀibrīq› ‘water jug’ < ultimately from ‹āb› ‘water’ + ‹rīxtan› ‘to pour’
  • ‹kanz› ‘treasure’ < Mid. Pers. ‹ganj› via Aram.
  • ‹lāzaward› ‘lapis lazuli’ < ‹lāj(a)vard›
  • ‹līmūn› ‘lemon’ < ‹līmū›
  • ‹marjān› ‘coral’ < Mid. Pers. ‹murvārīt› ‘pearl’ via Aram. ‹margānītā›
  • ‹mawz› ‘banana’ < Mid. Pers. ‹mōz›
  • ‹misk› ‘musk’ < Mid. Pers. ‹mušk›
  • ‹nisrīn› ‘dog rose’ < ‹nasrīn›
  • ‹sabānix› ‘spinach’ < ‹aspanāx›
  • ‹sunbul› ‘hyacinth’ < ‹sonbol›
  • ‹šabat›/‹šibitt› ‘dill’ < ‹ševīd›
  • ‹xiyār› ‘cucumber’ < ‹xiyār›
  • ‹yāqūt› ‘ruby’ < ‹yāqūt›
  • ‹yašb› ‘jasper’ < ‹yashp›
  • ‹yasmīn› ‘jasmine’ < ‹yāsamīn›
  • ‹zanjabīl› ‘ginger’ < Mid. Pers.‹singavēr› via Aram.
  • ‹zumurrud› ‘emerald’ < ‹zomorrod›

The defining characteristic of the majority of these loanwords is that they diverge from the typical native Arabic word structure, which is usually composed of only three root consonants that are modified according to specific, productive patterns. Words like <lāzaward> and <zanjabīl> immediately stand out in this regard.

Some of the words that seem to look like Arabic words conflict in meaning. For example, the word ‹misk› seems to have the root m-s-k, but there is an existing native root meaning ‘touch, grasp.’ Similarly, ‹xiyār› ‘cucumber’ conflicts with the root x-y-r meaning ‘choice.’


Persian and Arabic, although having a long history of interaction and mutual influence, are not related languages. The former is a member of the Indo-European language family and is genetically related to Greek, Latin, Armenian, Sanskrit, English, etc. Arabic, on the other hand, is a member of the Afro-Asiatic language family, like Aramaic, Ge‘ez, Tamazight, Hausa, Somali, etc. There are significant differences between the two, including in their respective sound systems, which means that when words are borrowed, they must undergo a process of (sometimes drastic) adaptation to the sound, syllable, and word structure of the borrowing language.

This happens in Arabic borrowings into Persian; for example, Arabic ‹ádab› ‘discipline, politeness’ (stress on the first syllable in accordance with Arabic stress rules) becomes ‹adáb› in Persian (stress on the second syllable in accordance with Persian stress rules). And certainly the reverse is true in that Arabic also adapted the pronunciation and structure of Persian words into a format compatible with its grammar. The following three sections highlight some of these changes.


Many words in Middle Persian ended in ‹-g›; for example, the name of the language was ‹pārsīg›, ‘plan’ was ‹barnāmag›, and ‘pistachio’ was ‹pistag›. Middle Persian was spoken up until the 9th century, meaning that Arabic borrowed many words from this stage of Persian, including ones ending in ‹-g›. Both Standard Arabic (whether modern or classical) lacks a ‹g›, so the reflex of the Persian is usually ‹j› or ‹q›. Thus Middle Persian ‹barnāmag› became Arabic ‹barnāmij› and ‹pistag› became ‹fustuq›. It should be noted that ‹j› developed from an original ‹g› and that ‹q› is pronounced as ‹g› in many varieties of Arabic, including ancient dialects from the pre-Islamic and early Islamic eras. So the alternative forms that we see in Arabic are not random substitutions.

As with all languages, Middle Persian experienced changes and eventually developed into New (i.e. modern) Persian. Among the changes was that the final ‹-g› was dropped. In modern Persian, the name of the language is ‹pārsī› or <fārsī>, ‘plan’ is ‹barnāme›, and ‘pistachio’ is ‹peste›. But this sound change only occurred in Persian, meaning it had no effect on any Persian loanwords in Arabic. The result is that Arabic contains “fossilized” forms of many Persian words. Other examples include the following:

  • Arab. ‹banafsaj› ‘violet’ versus Pers. ‹banafše›
  • ‹baydaq› ‘pawn (chess)’ versus ‹piyāde›
  • ‹dībāj› ‘silk brocade’ versus ‹dībā›
  • ‹namūḏaj› ‘example’ versus ‹namūne›
  • ‹ṭāzaj› ‘fresh’ versus ‹tāze›

Interestingly, in some spoken Arabic varieties, the word for ‘fresh’ is ‹ṭāza›, lacking any evidence of the Middle Persian final ‹-g›. Ostensibly, this would suggest that Persian loans in Arabic were indeed affected by the sound change. However, this is actually an instance in which Arabic re-borrowed the word – this second time however via Turkish, which had adopted it from Persian after the final ‹-g› had disappeared.

CHANGE OF ‹č› TO ‹ṣ› OR ‹s›

Persian has a ‹č› sound (as in ‘chair’) which standard Arabic and many colloquial varieties lack. Normally, in borrowings from other languages, this sound is usually changed to ‹š› (as in ‘share’), such as Arabic ‹šekk› from English ‘check.’ However, in words of Persian origin, ‹č› tends to correspond to Arabic ‹ṣ› or occasionally ‹s›.

At first glance, this is strange because ‹ṣ› is an emphatic sound, meaning it is pronounced by creating a constriction in the back of the mouth (anywhere from the velum to the pharynx) simultaneously as the sound is produced in the mouth. Compare the plain ‹s> of sūs› (‘licorice’) and its pharyngealized counterpart ‹ṣ in ṣūṣ› (‘chick’). The latter sounds very like a deep ‹s› but nothing at all like ‹č›. This is likely due to the tradition of Iranian languages, such as Sogdian and Pahlavi, to use the Aramaic letter representing ‹ṣ› for ‹č› (in turn due to the fact that ‹ṣ› itself was originally an affricate). Aramaic speakers would have likely read the letter as ‹ṣ›, which then was passed on to Arabic speakers. There are a number of loanwords that exhibit this change, including the following:

  • Arab. ‹jaṣṣ› ‘plaster, gypsum’ < Pers. ‹gač›
  • ‹raṣāṣ› ‘lead’ < Mid. Pers. ‹arčīč›
  • ‹ṣandal› ‘sandal, sandalwood’ < ‹čandal›
  • ‹ṣārūj› ‘mortar’ < Mid. Pers. ‹čārūg›
  • ‹ṣihrīj› ‘cistern’ < Mid. Pers. ‹čahrēg›
  • ‹ṣīn› ‘China’ < ‹čīn›
  • ‹sirāj› ‘lamp’ < ‹čerāġ›


Arabic, being a Semitic language, has a root-and-pattern system of morphology, meaning that roots composed of consonants are applied to pre-existing patterns to form words. An example of this is ‹kitāb› ‘book’ and ‹maktab› ‘desk,’ both from the root k-t-b ‘write.’ The plurals of many words are also determined by pre-existing patterns, so that ‘books’ is ‹kutub› and ‘desks’ is ‹makātib›. Notice that the consonants stay the same, but the vowels are altered to indicate pluralization.

Some Persian words that were borrowed into Arabic very strongly resembled certain plural patterns, and indeed Arabic speakers interpreted these words as plurals. But by reinterpreting originally singular words as plurals, speakers created a lexical ‘gap’ where a singular ought to have been. The solution for Arabic speakers was to back-form new singular forms from plurals based on the forms that exist for native Arabic words.

To illustrate, the Middle Persian word for ‘pawn’ (as in the chess piece) was ‹payādag›, which was borrowed as Arabic ‹bayādiq›. Comparing ‹bayādiq› to ‹makātib› ‘desks,’ one can immediately recognize that while the consonants differ, the vowels are identical. Thus Arabic speakers interpreted ‹bayādiq› not as ‘pawn’ but as ‘pawns.’ Based on this interpretation, if ‹bayādiq› is a plural whose form matches ‹makātib›, then it would follow that the singular form of the former would match that of the latter. Thus, ‹baydaq› ‘pawn’ — having the same vowels and structure as ‹maktab› ‘desk’ — was back-formed as the new singular form.

Some other words that exhibit this back-formation include:

  • ‹firdaws› ‘paradise’ from Old Iranian *‹paridaiza›, which was borrowed as ‹farādīs› ‘paradises.’
  • ‹jāmūs› ‘water buffalo’ from Middle Persian ‹gāwmeš›, which was borrowed as ‹jawāmīs› ‘water buffalos.’
  • ‹nibr› ‘warehouse’ from Middle Persian ‹anbār›, which was borrowed as ‹ˀanbār› ‘warehouses.’

Persian nasta‘līq calligraphy by Maqsud ibn Mahmud, 1708. Nasta‘līq is a calligraphic style invented by Persians for their adapted Arabic script in the 14th-15th century. Source.


Many if not most of the loanwords discussed above were clearly borrowed in pre-Islamic or early Islamic times. This makes sense, given that Arabic was not an established language of prestige in that era unlike Persian, which was the language of one of the two most powerful empires in the region at the time. As Arabic ascended in influence through association with Islam, Persian borrowings into Arabic decreased.

However, they did not stop, and many spoken Arabic varieties continued to borrow Persian words. Obviously, those varieties spoken near or in Iran, such as Iraqi Arabic, contain more Persian loanwords than those that are not. Two examples from my own dialect (urban Palestinian) are ‹bābūj› ‘slipper’ from ‹pāpūč› and ‹šākūš› ‘hammer’ from ‹čakoš›. Neither of these words exists in standard Arabic, but they are both widely used in many spoken varieties of Arabic. The existence of two words for ‘fresh’ borrowed from two different eras of Persian discussed above also demonstrates the continued influence of the language on Arabic.


Iranian Loanwords in Arabic.’ Encyclopædia Iranica.

Jeffrey, A. (1938). Foreign Vocabulary of the Qur’an.

Pharyngealization in Arabic.’ UCLA Phonetics Lab.

The “South Semitic” sprachbund

In linguistics, a sprachbund (German for ‘language league’) is any group of languages which are not necessarily closely related but which nevertheless have many similar features as a result of proximity and language contact. A very well-known example is the Balkan sprachbund, comprised of such languages as Albanian, Greek, Romanian, and Bulgarian. Although these languages come from separate branches of the Indo-European language family, the long history of contact between speakers within a relatively small geographic region has resulted in a convergence on a number of features such as the formation of the future tense, the loss of the infinitive verb, and common vocabulary.

Some scholars have argued that within the Semitic language family, there existed a South Semitic sprachbund, which consisted of Arabic, Ṣayhadic (Sabaean, Qatabānian, etc.), the Modern South Arabian languages (Soqotri, Mehri, etc.), and the Ethiopic languages (Gǝ‘ǝz, Amharic, Gurage, etc.). Throughout most of the 20th century, it was assumed that the unique features that these languages shared was because of a recent common ancestor. However, currently the prevailing assumption is that they are a result of extensive and prolonged interaction between their respective speakers.

Usually, three features are highlighted: 1) the universal change of Proto-Semitic *p > f; 2) broken/internal plurals; and 3) the L-stem form of the verb. I will discuss these below.


The Proto-Semitic language is assumed to have had a *p consonant; however, within the South Semitic sprachbund, this consonant changed, or spirantized, to ‹f›. Thus, Hebrew ‹pēḥām› ‘coal’ corresponds to Arabic ‹faḥm›, Tigrinya ‹fäḥam›, Amharic ‹fǝm›, and Soqotri ‹fḥam›.

The *p > f shift does occur in other Semitic languages as well. In Hebrew and Aramaic when ‹p› follows a vowel, it becomes ‹f›. For example, contrast Hebrew ‹pārastî› ‘I spread’ with ‹efrôs› ‘I will spread.’ In the South Semitic sprachbund, by contrast, this change was unconditional; that is, it affected every occurrence of Proto-Semitic *p regardless of its position within the word.

Incidentally, many Ethiopic languages regained the ‹p› and additionally acquired an emphatic (i.e. ejective) form of the consonant ‹ṗ›, but the words in which these two sounds occur are mostly borrowings from other languages. The letters in the Gǝ‘ǝz script used to represent ‹p› and ‹ṗ› are modified from the ‹t› and the ‹ṣ› letters, respectively. For example, ተ ‹tä› versus ፐ ‹pä› and ጸ ‹ṣä› versus ጰ ‹ṗä›.


Across the Semitic languages, the most common method of pluralization is suffixation. For example, in Akkadian ‹šarr-um› ‘king’ is pluralized by dropping the singular ‹-um› and adding the plural ‹-ū› for ‹šarr-ū› ‘kings.’ The members of the South Semitic sprachbund also exhibit this form of pluralization. For example, some Tigrinya nouns add the plural suffix ‹-at›; thus ‹säb› ‘person’ > <säb-at› ‘persons.’ This is known as the external plural.

Within the sprachbund, there is a second type of pluralization called broken or internal plurals. Recall from my previous posts that Semitic morphology, which is the formation of words and other grammatical units, is based on a root-pattern system. For example, the Gǝ‘ǝz root ‹ṣḥf› means “write.” Inserting the pattern |maC₁C₂aC₃| into this root resulted in ‹maṣḥaf› which means ‘book.’ However, this word was not pluralized by the addition of a suffix. Instead an entirely different pattern was applied to the root of the noun itself, specifically |maC₁āC₂ǝC₃t| which resulted in ‹maṣāḥǝft› ‘books.’ The name for this type of plural comes from the fact that the singular form of the verb is “broken” apart by the insertion of additional vowels and consonants.

Although there is residual evidence of the broken plural in other Semitic languages, only the South Semitic sprachbund exhibits its extensive use. Additionally, not only do the sprachbund members utilize the broken plural, they also share a number of identical pluralization patterns. Perhaps, the most common such pattern is |’VCCV̄C| (where V stands for a short vowel and V̄ for a long vowel). Examples include the following:

  • Arabic ‹wazn› ‘weight’ > ‹’awzān›
  • Gǝ‘ǝz ‹faras› ‘horse’ > <’afrās›
  • Tigre ‹mǝdǝr› ‘land’ > ‹’amdār›
  • Ṣayhadic * ‹hgr› ‘town’ > ‹’hgr›
  • Harsusi ‹ḥamθ› ‘lower belly’ > ‹eḥmōθ›
  • Shehri ‹ḥarf› ‘gold coin’ > ‹ɔḥrɔf›

Despite the preceding examples. the situation in the Ethiopic languages has changed over the centuries. This language group is divided into two subgroups, North (Gǝ‘ǝz, Tigrinya, and Tigre) and South (Amharic, Gurage, Argobba, and so on). Only the North group maintains extensive use of the broken plural. The Gurage languages have almost entirely lost it, and in Amharic what broken plurals exist have been directly borrowed from Gǝ‘ǝz. The loss of this feature is most likely a result of a separate sprachbund with the surrounding Cushitic and Omotic languages, which generally have external plurals.


Like the nouns above, verbs are also formed through the root-pattern system. The Semitic languages all have different templates that provide nuanced changes to the meaning of the verb root. For example in Ḥarsusi and Mehri, two Modern South Arabian languages, applying the root ḳ-f-d to the basic verb pattern |C₁ǝC₂ōC₃| results in ‹ḳǝfōd› ‘to descend,’ while the causative pattern |aC₁C₂ōC₃| results in ‹aḳfōd› ‘to put down (i.e. to cause to descend).’ The latter is known as the C-stem (“C” for causative), and variations of it are exhibited throughout all the Semitic languages. 

In the South Semitic sprachbund, there is a unique pattern known as the L-stem, which features the lengthening of the first vowel from the basic verb form: |C₁aC₂aC₃| > |C₁āC₂aC₃|. In Arabic, this pattern typically denotes the involvement of another person in some sort of reciprocal fashion: ‹qatala› ‘he killed’ vs ‹qātala› ‘he fought (i.e. he killed another)’; ‹kataba› ‘he wrote’ vs ‹kātaba› ‘he corresponded (i.e. he wrote to another).’ There is also a variant of this pattern |taC₁āC₂aC₃|, which denotes a reflexive or reciprocal meaning: ‹taqātal-ū› ‘they fought each other.’

In Gǝ‘ǝz, there is no special meaning to this pattern: ‹bāraka› ‘he blessed’ and ‹māsana› ‘he perished.’ Furthermore, the |taC₁āC₂aC₃| pattern is simply the passive form: ‹tabāraka› ‘he was blessed.’ However, some basic verb forms take on this L-stem passive to create a reciprocal: ‹ḳatala› ‘he killed’ > ‹taḳātal-u› ‘they fought/killed each other,’ identical in meaning and form to the Arabic تقاتلوا ‹taqātal-ū›.

Based on currently available information, the L-stem can only be attributed to Arabic and Ethiopic. The Ṣayhadic scripts did not mark vowels, so it is unclear from the surviving texts if they had an L-stem, while the Modern South Arabian languages, if they ever had the L-stem at all, have merged it with another verb pattern. Nevertheless, the existence of the L-stem in both Arabic and Ethiopic and its use in similar ways points to the effects of the South Semitic sprachbund.


Originally the above features were so convincing to scholars that they placed the members of this sprachbund into one subfamily within Semitic and called it the South Semitic branch. This was the case throughout most of the 20th century, when the classification looked something like this:

However, starting in the 1970s, scholars began to reanalyze their assumptions. Some argued that despite the similarities, there were still important differences within this “South Semitic branch,” including the following:

  1. A conjugation pattern of the imperfective tense (representing both present and future) in Arabic and Ṣayhadic closely resembling the Northwest Semitic forms, which differ from all other varieties of Semitic languages;
  2. Grammatical rules governing the definite article “the” that were identical in Arabic and Northwest Semitic languages; and
  3. the formation of the tens (i.e. twenty, thirty, forty, etc.) based on a noun plural suffix ‹-îm/-īn/-ūn› as opposed to the general Semitic ‹-ā› found in Ethiopic and Akkadian.

These and several other important features together suggested a common Central Semitic subfamily. Since then, there have been many revisions of the traditional classification. The following is one current example taken from Huehnergard and Rubin (2011):

In this tree, Arabic and Ṣayhadic are moved from South Semitic into a “Central Semitic” branch along with Aramaic and Canaanite, while Modern South Arabian and Ethiopic/Ethiopian each get their own separate branches. Currently, the majority of scholars subscribe to some variation of this tree.

In any case, whether a revised classification or the traditional one is correct, it is undeniable that the Semitic languages of Arabia and East Africa interacted with and influenced each other over a significant period of time. What is most remarkable is that the evidence of those interactions can still be observed all of these centuries later.

* The Ṣayhadic script did not mark vowels so the existence of this particular pluralization pattern, in which the second vowel is long is based on conjecture.


Huehnergard, J. (2005). “Features of Central Semitic.” In A. Gianto, Biblical and Oriental Essays in Memory of William L. Moran

Huehnergard, J. and Rubin, A. (2011). “Phyla and Waves: Models of Classification of the Semitic Languages.” In S. Weninger (ed.), The Semitic Languages: An International Handbook.

Lipiński, E. (1997). Semitic Languages: Outline of a Comparative Grammar.

Ratcliffe, R. R. (1998). “Defining Morphological Isoglosses: The ‘Broken’ Plural and Semitic Subclassification.” Journal of Near Eastern Studies.

Simeone-Senelle, M-C. (1997). “The Modern South Arabian Languages.” In R. Hetzron (ed.), The Semitic Languages.

The Semitic languages

The Semitic languages are part of a language family called Afro-Asiatic, which among others includes the Berber/Tamazight, Cushitic, and (ancient) Egyptian languages. The Semitic languages are assumed to have descended from a single source, which is called Proto-Semitic. There is no record of this language, but scholars have been able to piece a lot of information about it from evidence in the daughter languages.


Proto-Semitic and its close relative Proto-Berber were most likely spoken somewhere in Northeastern Africa. Around 3500 BCE, driven by the desertification that would create the Sahara Desert, the speakers of Proto-Semitic migrated east into the Levant, where their presence led to the collapse of the indigenous cultures that existed there. It seems that the Semites didn’t emigrate all at once but rather in waves. Some of them ended up in northern Syria, some in Iraq, others in the Levant and the northern Arabian Peninsula, and still others in the southern Arabian Peninsula and across the Red Sea into East Africa.


Bronze head of an Akkadian ruler, probably Sargon the Great, c. 23rd – 22nd century BCE. Source.

These migration patterns led to the divisions within Semitic. There are many competing theories regarding the classification of these divisions. The most common divides Semitic into East and West groups (Huehnergard and Rubin: 2011). The East group, composed of Eblaite, Akkadian, and Babylonian, died out in the 8th century BCE. The West group is divided into three subgroups. The first is Central Semitic, which is further divided into Northwest Semitic — composed of Aramaic, Ugaritic, and Canaanite, Arabic, and Ṣayhadic. The second is Ethiopic, which is composed of the Semitic languages spoken in eastern Africa. The last subgrouping is the Modern South Arabian languages, which are spoken in the southern Arabian Peninsula.

Other scholars propose theories that significantly deviate from this model. Lipiński (1997), for example, argues that there are four not two macro-divisions. According to him, the Semitic that was spoken in northern Syria developed into the North Semitic branch (composed of Ugaritic and Amorite), in Iraq into the East Semitic branch (Akkadian and Babylonian), in the Levant and northern Arabia into the West Semitic branch (e.g. Arabic, Aramaic, and Canaanite), and finally in southern Arabia and East Africa into the South Semitic branch (Ṣahyadic, Ethiopic, and Modern South Arabian). It should be noted that this is a highly idiosyncratic view that is not widely accepted.

Whichever is the correct division, the largest number of living Semitic languages can be found in East Africa, including Amharic, Gurage, Tigre, and Tigrinya. Outside of that region, the most common Semitic language is Arabic and its highly diverse spoken dialects. Additionally, there are Modern Hebrew; the Neo-Aramaic languages, like Assyro-Chaldean, Turoyo, and Neo-Mandaic; and the Modern South Arabian languages, like Soqotri, Mehri, and Shehri.


In order for a group of languages to constitute a “family,” they must share a large number of unique linguistic features that cannot be attributed to mere borrowings or simultaneous development through contact between speakers. The following is a sampling of the unique features that define Semitic languages.



Maimonides’ autograph draft of his legal code, Mishneh Torah (from the Cairo Genizah), in cursive Sephardic script (Egypt, c. 1180). Source.

All Semitic languages have or had a series of “emphatic” consonants. In proto-Semitic there were at least five ‹ṭ, ḳ, ṱ, ṣ, ṣ́›. Only (standard) Arabic has maintained this series. The Canaanite languages like Phoenician and Hebrew, only had three, having merged ‹ṱ› and ‹ṣ́› with ‹ṣ›. Ethiopic languages also merged these consonants, but many of them also developed new emphatics, such as ‹ṗ› and ‹č̣›.

The term “emphatic” is necessarily imprecise because these consonants are realized differently in the daughter languages. Originally, they were most likely ejective consonants. Only the Ethiopic and Modern South Arabian languages preserve this pronunciation today. In Arabic and most Neo-Aramaic languages, they are pharyngealized (click here to listen to the difference between plain and emphatic consonants in Arabic). In Maltese and Modern Hebrew, the emphatic consonants have been lost under the influence of European languages.


Every Semitic language has two genders, masculine and feminine. The masculine is usually the base form, while the feminine is indicated with a suffix.



Gospel of Luke in Ge‘ez, from the Church of Gännätä Maryam, c. 1500. Source.

The feminine is marked by the suffix ‹-t›. Examples include Akkadian ‹šarr-at-› “queen,” Arabic ‹bint› “daughter,” Gǝ‘ǝz ‹barakat› “blessing,” Hebrew ‹rē’šī› “beginning.” Within the Afro-Asiatic family, this is not unique to Semitic languages. The Berber languages, for example, also mark the feminine with ‹t›, but there it is a circumfix (appearing at the beginning and end of the word). Thus, ‹amaziɣ› ‘Amazigh man’ is masculine, and ‹tamaziɣt› ‘Amazigh woman’ is feminine.

In a number of Central Semitic languages, like Arabic and Hebrew, this suffix was deleted in isolated words, but reappeared if the word was part of a phrase. For example, in Arabic ‘writing,’ feminine noun, is ‹kitāba›; however, ‘a boy’s writing’ is ‹kitābat walad›. Similarly, in Modern Hebrew these are ‹ktiva› and ‹ktivat yéled›.


Semitic languages characteristically divide the second person pronoun into masculine and feminine forms. Examples of the singular forms of “you,” respectively, include Akkadian ‹atta, atti›, Arabic ‹’anta, ’anti›, Geʻez ‹’ānta, ’ānti›, and Hebrew ‹’attā, ’at›. Separate forms also exist in the plural pronouns.

However, in some modern languages and dialects this distinction has been lost or reduced. In many Arabic dialects and other languages like Harari, spoken in Ethiopia, the second person plural no longer distinguishes between gender. Others such as Maltese and Tunisian Arabic have lost the distinction in the singular as well.


The vast majority of Semitic lexicons are composed of abstract roots of three, or sometimes four, consonants. Words are formed by applying these roots to different patterns of vowels and consonants.


Folio from the “Blue Qur’ān.” Second half 9th–mid-10th century CE. Source.

For example, in Arabic the root k-t-b denotes ‘write.’ By itself, it cannot be used in a sentence. However, applying it to the pattern C₁āC₂iC₃, which means ‘doer of [root],’ results in ‹kātib› ‘writer.’ Applying it to the pattern maCC₂aC₃, ‘place of [root],’ results in ‹maktab› ‘desk, office’ (literally, a place where one writes). Other words formed from this root include ‹maktūb› ‘letter’; ‹kitāba› ‘writing’; ‹kātaba› ‘he corresponded (with)’; and ‹istiktāb› ‘dictation.’

This system is very flexible, and it is possible to create new roots from existing words and even from foreign languages. For example, the root ’-m-r-k originates from “America” and means “Americanize.” Thus applying it an existing verb pattern for 4-consonant roots ‹taC₁aCC₃aC₄a› results in ‹taamraka› “he became American.”

These are only a few of the features that distinguish Semitic languages. There many others, such as a verb conjugation system originally centered around aspect rather than tense; object and possessive pronouns as suffixes; and the dual number in verbs, nouns, and adjectives. These subjects are for another day perhaps.

I leave you with a side-by-side comparison of hypothesized Proto-Semitic words and their attested forms in four daughter languages (color-coded according to which branch of Semitic they belong to):

FATHER *’ab- ab- ’ab- ’āḇ ’ab
MOTHER *’imm umm- ’umm- ’imma ǝmm
GOD *’il(-āh-)- il- ’ilāh- ’ēl(ōh)
HOUSE *bayt bīt- bayt- bayiṯ, bēṯ bet
ROPE *ḥabl- ebl- ḥabl- ḥeḇel ḥabl
PEACE *s₁alām- šalām- salām- šālōm salām
WATER *māy- mū- mā’- mayim māy
BLOOD *dam- dam- dam- dām dam
EAR *’uḏn- azn-/uzn- ’uḏn- ’ōzen ’ǝzn
EYE *‘ayn- īn- ‘ayn- ‘ayin, ‘ēn ‘ayn
HAND *yad- id- yad- yāḏ ’ǝd
TONGUE *lis₁ān- lišān- lisān- lāšōn lǝssān
TOOTH *s₁inn- šinn- sinn- šēn sǝnn
BULL/OX *ṯawr- šūr- ṯawr- šōr sor
HORN *ḳarn- qarn- qarn- qeren ḳarn


Black, J., George, A., and Postgate, N. (2000). A Concise Dictionary of Akkadian.

Huehnergard, J. and Rubin, A. (2011). “Phyla and Waves: Models of Classification of the Semitic Languages.” In S. Weninger (ed.), The Semitic Languages: An International Handbook.

Lipiński, E. (1997). Semitic Languages: Outline of a Comparative Grammar.

Leslau, W. (1989). Concise Dictionary of Geʻez.

On the etymology of ‘orange’

The word for ‘orange’ in most European languages ultimately comes from the Sanskrit नारङ्ग ‹nāraṅga›. This was borrowed into Persian as نارنگ ‹nârang›, which in turn was entered into Arabic as نارنج ‹nāranj›. Spanish and Portuguese borrowed this word from Arabic with minimal modification as ‹naranja› and ‹laranja›, respectively. The English form “orange” comes from Old Provençal ‹auranja› via Old French ‹orenge›. The dropping of the initial n in these latter examples is most likely a result of rebracketing, in which a phrase like ‹une norenge› was interpreted by speakers as ‹une orenge›.

41ff786842e866f27eef96ed55fc9061Despite such historical influence, Arabic ‹nāranj› has become obsolete in the modern language, and the current word is برتقال ‹burtuqāl›, which is a medieval transliteration of “Portugal.” A number of other languages also derive their word for ‘orange’ from the name Portugal. Examples include Albanian ‹portokall›, Amharic ብርቱካን ‹bərtukan›, Georgian ფორთოხალი ‹p’ort’oxali›, Greek πορτοκάλι ‹portokáli›, Kurdish پرته‌قاڵ ‹pirteqal›, Neapolitan ‹purtuallo›, Persian پرتقال ‹porteqāl›, and Turkish ‹portakal›.

The reason for this intriguing quirk is that Portugal was once the largest exporter of oranges in the Mediterranean. The fruit became so associated with the country that it took on its name in virtually all the languages spoken in the wider region.

In a way, the spread of this name is a medieval-era meme. It demonstrates how ideas (for example, the association of orange and Portugal) migrate through language itself.