Deciphering German Compounds pt. 2: Phonology

For a quick recap, one thing that has irked me since I began my foray into learning German is that there doesn’t appear to be much rhyme or reason to the way that compound words employ (or don’t employ) linking elements between the constituents. Sometimes, as in a word like Abendessen (“evening” + “meal” AKA “dinner”), you simply squish two nouns together and there you have it: a compound. In other cases, like Geburtstag (“birth” + s + “day”), you have to add one of several linking elements, which are essentially meaningless phonemes or strings of phonemes that – somehow – make the compound well-formed. There are a number of linking elements in modern German, some more productive than others, that occur in different morphological and phonological contexts, none of which are particularly easy to deduce as a non-native speaker.

In my last post, I looked at these linking elements from a morphological perspective, citing authors like Nübling and Szczepaniak whose analyses of German linking elements look at the way they have evolved alongside the rest of the language. Such analyses demonstrate how rules and formulations that once “made sense” to the average speaker have been skewed due to the natural forces of language change, so that now it is much less obvious when certain elements should be used. Their conclusion that the most productive linking element in contemporary German is the –s– morpheme, which serves a few different functional purposes in the language as it is currently spoken. Other elements such as -(e)n-, -e-, -(e)ns- and –es- are far more limited in their applicability, and for the most part cannot be applied to newly-created compound words.

This, for me, cleared a lot of things up. However, historical morphology only tells half the story when it comes to these linking elements. The other half can be accounted for by phonological considerations – that is, the way that the words actually sound. In their 2008 paper, Nübling and Szczepaniak argue that phonology, particularly prosody, also significantly influences the distribution and use of the linking element –s– in modern German.

The requisite background knowledge for their explanation is as follows: in any language, certain syllables receive more stress than others. In the word* “English,” for instance, the first syllable (Eng-) is louder than the second syllable (-lish). In phonology, the syllables that receive stress are known as feet, and languages tend to show preferences for having their feet in consistent places. A trochee is a word whose foot is the first, rather than the last, syllable – so in the previous example, the word “English” is considered a trochee. German, in its modern form, is a language that strongly prefers trochees, i.e. it prefers words whose foot is the first, rather than the second, syllable.

In their phonological analysis, Nübling and Szczepaniak assert that the linking elements -es-, -er-, -e-, and –(e)ns– pair only with monosyllabic words – like Hund (“dog”) – to create trochees and therefore improve the phonological status of the word when it appears in a compound. Basically, adding one of these elements make the word more phonologically “normal.” However, the remaining linking element, –s-, cannot serve this purpose (i.e. create an unstressed syllable) because syllables must contain a vowel, therefore adding –s– alone to a word doesn’t do anything as far as its number of syllables is concerned. What, then, is –s– doing, and why is it so heavily distributed in modern German compound words?

In the course of its evolution, German has taken a number of steps to optimize the structures of its words, both by reducing their size (ideally to one foot per word) and their form (by making the boundaries between words more clear). One of the changes brought about by this optimization has been to increase the number and quality of consonants at the beginning and end (the “edges”) of a word, while deleting or reducing them in the middle. This allows users to more easily discern where the edges of a word most likely are, even if they haven’t actually heard it before, and creates more noticeable contrasts between individual words in a phrase or sentence. However, not all pre-existing words can be optimized to the same extent, which causes certain structures to defy the phonological “ideal” for German words.

This brings us to linking –s-, and its distribution in contemporary compounds. Nübling and Szczepaniak point out that you don’t see this element added to words like Auto, which end in a vowel. Rather, it seems to apply only to words that already end in a consonant, making them more complex from a phonetic perspective. The authors thus propose that linking -s- is more likely to occur in words that stray from the “ideal” structure of a word in German, i.e. those that are not trochees or those than contain multiple feet. So, words like Beruf (with stress on the second syllable) will take an -s- when they appear in a compound (e.g. Berufsfahrer = Beruf [“job”] + s + Fahrer [“driver”]), but Anruf (with stress on the first syllable) will remain bare. The addition of the –s– at the edge of the first constituent in a compound serves to further emphasize the right edge of the word, a task usually accomplished by consistent stress patterns.

This analysis is supported by a corpus search, revealing that 85% of words containing an unstressed prefix (such as be-, ver-, ent-, and ge-) are followed by a linking –s– when used in a compound, whereas this was the case for only 36% of words containing a stressed prefix (e.g. an-, um-, and über-). In other words, if the first constituent is a trochee, it is far less likely to need a linking element. So, while it is not a hard and fast rule, the generalization that –s– indicates phonologically ill-formed words does appear to be a pretty solid way to predict whether a new compound will contain a linking -s– or not.

So, I suppose what I’ve learned from this exercise is that there is, in fact, some degree of consistency in the way linking elements are used in German. I reckon I’ll be considering the stress patterns a lot more carefully now when I try to guess how a particular compound is formed, and maybe I’ll be right more often than I previously have been. Maybe this will help other German learners, or maybe it won’t, but even if it doesn’t I hope that you at least found it interesting.

*For the sake of full transparency, the type of “word” I am actually referring to in the second half of this post is a phonological word, which is not always the same as a morphological word. If you’re a linguist/phonetician, then you probably already know what the difference is, and if you’re not but you want to know anyway, check out this article for a general overview.

Sources

Ewen, Colin J., and Harry Van Der Hulst. The Phonological Structure of Words: An Introduction. Cambridge: Cambridge UP, 2000. Print. Cambridge Textbooks in Linguistics.

Nübling, Damaris, Szczepaniak, Renata. On the way from morphology to phonology: German linking elements and the role of the phonological word. Morphology 18, 1–25 (2008). https://doi.org/10.1007/s11525-008-9120-7

Deciphering German Compounds pt. 1: Morphology

English speakers love to expound on German’s seemingly infinite potential to create extremely specific, often lengthy, compound words by combining two or more simple words into a single term. It’s not like German is the only language to do this; in English we also use compound words with considerable regularity (e.g. doghouse, artwork), and something like 80% of contemporary Mandarin vocabulary is made up of multiple independent words/syllables (e.g. 女人 nǚ rén, literally “female” + “person” AKA “woman”). In fact, German is not even the most extreme case: agglutinative languages such as Turkish and Hungarian can express in a single word what English speakers need a whole sentence to convey. Nevertheless, German compound words continue to be (in)famous in their own right, and admittedly make up a more robust portion of day-to-day vocabulary than English compounds.

Unlike Chinese and English, whose compound words are formulated in a pretty straightforward manner, some German compounds can be a bit quirky. The long and short of it is that some compound words contain one of a handful of added units whose linguistic status is, well, questionable to say the least. Let’s start with a straightforward example that doesn‘t contain any of these mystery elements: the term for linguistics:” Sprachwissenschaft. This word is made up of three morphemes (= meaningful units of language), two of which also happen to be complete words: Sprach(e) (“language/speech”) + wissen (“to know”) + schaft (a nominalizer somewhat comparable to English -ness or -ship. So, Wissenschaft means “science” (literally “knowing-ness”) and Sprachwissenschaft means “language/speech science,” AKA linguistics. Easy peasy.

However, there are other compound words where one of the aforementioned mystery elements intrudes between two or more of the morphemes. Unlike all of the morphemes in the Sprachwissenschaft example, each of which slightly change the meaning of the word, these intruders don’t necessarily carry any meaning on their own; they are just additional sounds that happen to appear in some compounds but not others.

An example of one such problematic compound in German is the word “birthday:” Geburtstag. Again, three units appear in this word: Geburt (“birth”) + s (???) + Tag (“day”). The -s- seems to just appear out of nowhere, without any clear motivation for its inclusion. Yet, if you were to say Geburttag, you would immediately be pegged as a non-native speaker. Unlike –schaft in the example above, the –s– in Geburtstag doesn’t do anything to alter the meaning, it just has to be there.

All together, there are six different linking elements, whose frequency and productivity varies considerably: –s- (as in Geburtstag), -(e)n- (as in Blumenstängel, “flower” + “stem”), -es- (as in Bundesliga, “region” + “league”), -e- (as in Hundefutter, “dog” + “food”), -er- (as in Kinderarbeit, “child” + “work/labor”), and -(e)ns- (as in Namensschild, “name” + “sign”).

Lately I’ve been tearing my hair out trying to find any sort of pattern in the way that these additives appear in German compounds. At first, I thought it was to prevent repetition of the same sound across two adjacent morphemes. So, the problem with Geburt + Tag would be that it situates two /t/ sounds right next to each other, so you insert another phoneme to split them up. Then I came across Schritttempo, combining Schritt (“step”) with Tempo (“speed/pace”) without a problem, so that hypothesis was out. Next, I thought maybe it had to do with the plural form of the first noun, but then I looked up the plural form of Geburt (pl. Geburten) and was disproven once again – besides, “birthsday” doesn’t make much sense anyway, unless you’re talking about twins, for instance. A quick Google search yielded nothing satisfactory either: all the articles written on linking elements in compound words for German learners essentially boiled down to “it’s random, sorry,” and that wasn’t good enough for me.

So, I decided to delve into what linguists have to say about the role of these so-called “linking elements” in compound formation. It turns out this is a hot topic that lots of German speakers – even as far back as one of the Grimm brothers in 1877 – have taken a shot at explaining. While I didn’t find any sort of quick and easy way for foreign language learners to decode or formulate their own compound nouns in German, I thought I would share some of the analyses that I came across because, well, they’re interesting, and maybe other people struggling with this particular aspect of the German language would at least appreciate some more insight than just “it’s random.”

I’m currently between institutions so I don’t have as much access to academic journals as I would like, but nonetheless I managed to download two papers by Nübling and Szczepaniak on the role of linking elements in German compound words, which provide a pretty decent overview of recent work on this topic. I discovered that there actually are a number of both morphological and phonological explanations for different types of linking elements in German compound words; not only that, but the adequacy of these proposals is something that continues to be debated in Germanic linguistics to this day.

I was originally going to write one summary post, but it ended up being wicked long and I figured it was worth splitting into two (still lengthy) summaries, one of the morphological analyses and one of the phonological analyses. We’ll start with morphology because M comes before P.

The Morphological Analysis

First of all, it is worth noting that the morphological status of linking elements remains somewhat debatable, for the simple reason that these units don’t seem to carry any obvious meaning on their own. You may recall that the technical definition of a morpheme is “the smallest meaningful piece of a language,” which distinguishes it from a phoneme, which is just a single sound (often, but not always, associated with one letter) that carries no meaning on its own. Thus, cat is a morpheme that contains three phonemes: the individual sounds /k/ /æ/ /t/ don’t mean anything on their own, but together they represent a specific concept. Some phonemes can also be morphemes; this is the case with English –s, because it changes the meaning from singular to plural. However, the key characteristic in that scenario is that the addition of the –s influences the way you interpret the word it connects to. Adding –s– to the first word of a compound in German, like in Geburtstag, doesn’t do anything to change the meaning. This begs the first of several questions linguists have about the special additions that appear in certain German compound words: should they really be considered morphemes, and if so, what is their morphological role?

One argument for the morphological status of linking elements – and one of the most common explanations I’ve seen in non-academic articles for language learners – is that they are descended from old case endings. While not entirely explanatory, this is often true. Some compounds, especially those that transcend time and technological evolution such as cock’s comb, were formed long ago based on case rules that have now changed or become obsolete. Compounds that utilize certain plural forms or the genitive (= possessive) case may now seem completely random, but actually made perfect sense according to the grammar at the time. So, even though these compounds cannot be deconstructed according to rules of the contemporary language, at one point in time they could; and because they were used often, the old form stuck around.

Demske argues that many of the linking elements actually come from the now-obsolete possessive form of the first noun in a given compound. Many possessive forms of nouns are created by adding -es to the end of the word, e.g. das Kind (“the child”) becomes des Kindes (“of the child”/”the child’s”). By this reasoning, a compound word like Kindesalter could be analyzed as “age of the child.” However, that isn’t what Kindesalter means: it is actually “childhood,” not “the child’s age.” So while for some words, this analysis holds water (e.g. Brückenzoll “bridge’s toll”), for many more contemporary compounds it leads us down the wrong path.

This may be the case with many of the less common linking elements that appear in certain compounds such as –(e)n, -es, and -e-, but still doesn’t predict when or where a specific linking element will appear. Furthermore, the same noun can have different linking elements in different compounds, not all of which can be attributed to an outdated case ending. Kinderschuh, for example, is made up of Kind (“child”) + er (???) + Schuh (“shoe”), and includes the linking element -er-; Kindesalter (Kind [“child”] + –es- [???] + Alter [“age”]), on the other hand, uses a different linking element, and Kindheit uses none! Cases like these demonstrate that the motivation behind different linking elements cannot be entirely explained by case alone.

Worth addressing at this point is that sometimes adding a linking element to a stem noun, such as Kind + -er, creates what looks identical to the plural form of the noun. In some cases, the meaning is not altered by a plural vs. singular reading, such as Kinderschuh, which could read as “a child’s shoe” or “a shoe for children” interchangeably. Other instances, though, highlight the fact that these elements – while sometimes overlapping with the same suffixes used to create a plural noun – play a different role in the language. For example, Kindermund (“child’s mouth”) does not refer to one mouth shared by many children (a terrifying image), but rather one mouth belonging to one child. The extent to which the pluralizing versus linking version of a morpheme, such as –er, should remain distinguished is a point of contention, but is worth addressing for the sake of avoiding confusion: if you come across a compound that looks like it contains a plural, consider that it could also just be a linking element playing tricks on you.

Nübling and Szczepaniak point out that not all linking elements are equally productive, either. They note that only -s- can be combined with a large number of nouns, whereas others such as –n- or –es- are limited to a handful of specific words. In other words, if you were to guess blindly at which linking element should go between two nouns you want to smush together, you’d be more likely to be right if you chose –s- compared to the others. That is not to say that –s- is the default linking element in German compounds, just that it is more widely distributed in the contemporary language.

The final point on the morphology of –s– that I thought worth sharing is that it appears to serve another functional purpose in certain contexts, namely the formation of complex compounds (i.e. those containing three or more nouns); the consolidation of a phrase into a single compound; and the reopening of a “closed” suffix e.g. –schaft. For each of these functions, Nübling and Szczepaniak offer the following illustrative examples:

  1. Complex compounds: Hof (“courtyard”) + Mauer (“wall”) = Hofmauer (“courtyard wall”), vs. Friede (“peace”) + Hof (“courtyard”) + Mauer (“wall”) = Friedhofsmauer (“graveyard wall”)
  2. Phrase -> compound: Richtung (“direction/trend”) + weisen (“to point”) = Richtungsweisend (“trendsetting”)
  3. Reopening a closed suffix: Freund (“friend”) + schaft (“ship,” a closed suffix) = Freundschaft (“friendship”), vs. Freund (“friend”) + schaft (“ship”) + Preis (“price”) = Freundschaftspreis (“special price”)

Other linking elements are more limited in scope and applicability to different kinds of nouns. In fact, Nübling and Szczepaniak argue in their more recent paper that, aside from –s-, German linking elements are still very much influenced by their historical function(s) as inflectional markers, and that this in turn impacts how they are understood and used by native speakers.

The –(e)n- linking element, for instance, is derived from the genitive form of weak nouns (i.e. those whose possessive form requires the addition of –n, rather than the standard –(e)s) and, when applied to weak feminine forms, often allows a plural interpretation. Similarly, –er- is also often associated with a plural reading, and is dependent on other morphological features of the noun it attaches to, such as gender and declension.

Sometimes the same element can be traced back to two distinct grammatical origins, and these origins impact the rules by which it combines with other morphemes nowadays. For example, –e- is historically a possessive marker for plurals, but it also emerged as a linking element for short words like Tag (“day”). In the latter case, there are a set number of cases where –e– occurs, and it cannot be paired with new words. In the former case, we see it pop up often with compounds involving animals, e.g. Hund (“dog”) + e + Futter (“food”) = Hundefutter (“dog food”), where a plural interpretation of the noun it attaches to is often preferred.

Finally, there are –(e)ns- and –es-, both of which can only be used with a set of specific nouns in defined contexts. Compounds involving these elements are the ones that are probably easiest to just memorize.

Although the purely morphological explanations are perhaps more palatable to a general audience, the fact is that different aspects of language – like morphology and phonology – don’t exist on entirely different planes. They have no choice but to interact, and typically not in very clear ways. Thus it is not surprising that many scholars have begun to argue that neither morphology nor phonology on its own accounts entirely for the quirks of German compound formation. Additionally, Nübling and Szczepaniak assert that grammatical rules that apply at the phonological level – are more formalized and, thus, easier for speakers to deduce compared to those that apply at abstract levels like inflectional class. In other words, rules that apply across-the-board to certain phonological contexts, such as after specific vowels or when two specific consonants appear next to each other, are easier to figure out than those that target more abstract notions like grammatical gender.

So next time on Things Nobody Asked Me to Write, I’ll summarize what I found out about the phonological analyses of German compound linking elements, and what that means for me as a second language learner.

Sources

Aronoff, M., Fuhrhop, N. Restricting Suffix Combinations In German And English: Closing Suffixes And The Monosuffix Constraint. Natural Language & Linguistic Theory 20, 451–490 (2002). https://doi.org/10.1023/A:1015858920912

Nübling, D., Szczepaniak, R. On the way from morphology to phonology: German linking elements and the role of the phonological word. Morphology 18, 1–25 (2008). https://doi.org/10.1007/s11525-008-9120-7

Nübling, D., Szczepaniak, R. Linking elements in German Origin, Change, Functionalization. Morphology 23, 67–89 (2013). https://doi.org/10.1007/s11525-013-9213-9

Linguistics for Language Learners: Null Subjects

At its core, foreign language learning is a two-step process involving a) uncovering rules that don’t exist in your native language, and b) deciding where and how to apply those rules in your target language. This is the case at both a grammatical and a phonological level – just as you must develop a sense of what sounds and sound sequences are permissible in your target language, you must also learn what words “belong” in a particular order. English speakers know, for example, that mleg isn’t a possible word in their language, while glat is. They also know that Green van the is is a terrible sentence, while The van is green works just fine. This sort of instinctual knowledge doesn’t transfer effortlessly into your second/third/fourth language; rather, it has to be built up with time and exposure to a variety of speakers and contexts. This can be particularly tricky when the grammar of your target language requires you to blatantly disregard the rules of your native language.

Although some generalizations about grammar, such as every sentence should have a predicate, can be easily taught and explained, others are rife with exceptions that can only be fully grasped with time and exposure. Today I’m going to introduce a typological aspect of human language that linguists have spent years attempting to quantify, and beginning language learners tend to struggle with: the concept of Null Subjects. Although it may not seem obvious, many grammatical characteristics of language interact with one another, so that mastering one feature oftentimes cascades into mastering others in your target language. Over the past seventy years or so, a number of trends have been identified in the distribution of null subject languages. Although this isn’t typically taught in many language courses, I think that considering these correlations may be beneficial for students who, like me, want to understand the logic behind different grammatical constructions. It is also helpful to see how seemingly unrelated phenomena in language can interrelate and create the surface features that we become familiar with as we learn a new language. I’m hoping that by introducing some of these less accessible, theoretical concepts from contemporary linguistics, I can help some fellow language learners gain a deeper understanding of their target language and maybe even start to look for other syntactic patterns.

FeatureConsistent Null Subject Languages (Es, It, EPt…)Partial Null Subject Languages (Fi, BPt…)Radical Null Subject Languages (Jp, Ko, Th…)Non-NSL (En, Fr, De…)
Person restrictionsNoYes – 3rd person (he/she/it/they) onlyNoN/A
Verb inflectionsManySomeNoneSome
Include subject for emphasisYesNoNoN/A
“Dummy” pronouns (e.g. it or there)NoNoNoYes
Arbitrary pronouns
(e.g. one or you)
Yes (e.g. se or si)NoNoYes
Null objectsNoNoYesNo
A tabular summary of everything you’re about to learn – isn’t it fabulous?!

What are null subjects?

First, we need to establish some definitions. If you were raised in an Anglophone country, you might recall being taught things in grammar school like every sentence must have a subject. The subject of a sentence is the person/thing/place/idea that performs the action, e.g. Jane threw the ball to Rahim. Jane is the one throwing, so she is the subject. On the other hand, the ball is the object of the sentence, because it is being directly affected by Jane’s action. If we were to omit the subject of the sentence and just say Threw the ball, your listener would immediately ask “who?”, and your English teacher would die a little inside. Similarly, you also have to specify the object, if there is one: Jane threw to Rahim would prompt your listener to ask “threw what?” because as language users, we expect others’ contributions to contain all the information needed to describe an event. It simply isn’t good conversational practice to exclude key bits of information like who or what was involved.

Fast forward to Spanish class, though, and suddenly that rule is turned on its head: in Spanish, it is perfectly fine to just say Threw the ball! What gives? How do you know who is doing the throwing? Miraculously, if you were to say Tiró la pelota to another Spanish speaker, they wouldn’t have to ask “who” or wonder if they perhaps misheard you, because Spanish is a Null Subject Language. This means that native speakers of Spanish don’t have a rule that requires them to include a subject in every sentence; nobody would think it sounds strange to say Threw the ball, or Is empty, or Am tired. A subject that doesn’t get pronounced in a sentence is called a null subject, and the languages that allow you to use null subjects are known as Null Subject Languages, or NSLs. English is a non-NSL, while Spanish is known as a consistent NSL (more on that momentarily).

If you aren’t yet proficient in a null subject language, this might be a difficult concept to wrap your head around. How can speakers of a language be okay with omitting such a crucial piece of information? Well, the thing is, it’s not omitted – it’s just represented differently. The exact theoretical explanation isn’t important, but to illustrate what I mean, let’s again consider the English rule I mentioned above: every sentence must have a subject. You may recall learning shortly thereafter that certain types of sentences – like commands – do, in fact, allow you to drop the subject without sacrificing meaning. For instance, if someone were to look at you and say “Go,” you would know it was a command and that the implied subject of the sentence is you. They wouldn’t need to specify it because other, non-linguistic aspects of the dialogue (i.e. eye contact, tone of voice, facial expression) did that for them. I was taught that this is called a “‘you’ understood,” because although it isn’t explicit, both speaker and hearer understand that you is the subject of a command.

This isn’t the only situation, though, in which English speakers would accept a null subject: think about how often you omit I in text messages (“Just got home!”), or when it would otherwise sound redundant (“Where’s Jonah?” “Sick.”). In all likelihood, you’ve omitted hundreds of subjects in your lifetime even if you are a monolingual English speaker, and English is technically not a NSL. When you imagine these scenarios, it becomes easier to see how other languages might permit you to drop the subject on the regular, not just in some very specific contexts: often, the subject of a single sentence might also be the subject of the whole conversation, so it’s unnecessary to state it over and over again. In these situations, context simply fills in the blanks.

Rich Inflection = Null Subjects?

If you ever learned Spanish, you may have heard that this is because the information about who was involved can be conveyed elsewhere in the sentence. For a long time, linguists believed that there was a direct relationship between whether a language allowed null subjects and whether its verbs were “richly inflected.” This means that you change part of the verb depending on who actually did the action: I threw (tiré in Spanish), you threw (tiraste), he threw (tiró), we threw (tiramos), etc. It doesn’t matter whether the subject is expressly stated in the sentence, because that same information is contained elsewhere, namely on the main verb. On the other hand, non-NSL such as English and French have what are called “impoverished” inflectional systems – we use the same form of the verb no matter who is performing the action, so that information can only be gleaned by looking at the subject (notice how all the forms of threw in English are the same in the above example). Because many of the European null subject languages also have rich inflection, this generalization, known as the Taraldsen Generalization, prevailed for years in the linguistics community, and remains a popular explanation of NSL tendencies in foreign language classrooms. The Taraldsen Generalization succinctly explains our observations about one of the most prevalent syntactic differences between the Indoeuropean languages, and scientists love simplicity.

Unfortunately, it’s not quite as simple as rich inflection = null subject typology. When you expand your sample size beyond Indoeuropean languages, the correlation begins to break down. Thus more recently, comparative syntacticians Ian Roberts and Anders Holmberg have proposed a more specific characterization of the world’s languages based on several other trends they and other authors have observed across more robust samples of the world’s languages. They noticed that certain other syntactic features and rules correlated with the availability of null subjects in different languages. Based on these findings, they separated NSL into three additional subcategories: consistent null subject languages, partial null subject languages, and radical null subject languages. Now we’re going to take a closer look at what each of these subtypes looks like grammatically.

Consistent Null Subject Languages (Spanish, Italian, European Portuguese, etc.)

Many of the Romance languages fall into this category, with French being a notable exception. These are the languages that drove many analyses of null subjects before linguists were confronted with evidence from non-European languages. These languages consistently omit the subject from most types of sentences, including statements, questions, and commands. Because they do not require an overt (= non-null) subject, consistent NSL don’t have “placeholder” pronouns like English it (e.g. It is raining); since subjects aren’t required in the first place, there is no point in including a “dummy” subject that doesn’t actually refer to anything. Another side effect is that, because null subjects in these languages are so common, overt subjects slightly change the meaning of a sentence. This means that you should only include the subject explicitly to emphasize or disambiguate the people involved. To clarify, let’s look at some examples from Spanish below:

Ya estamos llegando. ‘[We] are already arriving.’ Notice that there is no overt subject in this sentence, yet thanks to the conjugation of estar, it is perfectly clear who is involved in the action. This is the typical structure for many sentence types in Spanish (and other CNSLs).

Va a llover mañana. ‘[It] is going to rain tomorrow.’ In the English version of this sentence, it doesn’t actually refer to a person or thing – but because English requires every sentence to have a subject, we have to use a sort of “placeholder” in instances like this, when the sentence describes a state of affairs with no discernible cause (who/what is going to rain tomorrow? The sky? God? The clouds? There is no clear answer based on the sentence alone). These are sometimes called expletive pronouns, and their only job is to fill in the subject space when nothing else can. Other examples of this type of pronoun include It is Saturday, There is no more rice, or It is cold. Unlike English and other non-NSL, CNSL do not require any sort of dummy pronoun in such contexts.

Tienen que despedirse, pero ella se puede quedar. ‘[They] have to say goodbye, but she can stay.’ In this sentence, the subject of the first clause (‘they’) is null, while the subject of the second clause (‘she’) is made explicit. Here, you are emphasizing that she is able to stay as opposed to the rest of her group. Depending on the wider context, you may also be introducing ‘her’ for the first time as a topic of conversation – i.e. you are shifting the focus to her as an individual, whereas previously the focus of the discussion was the group as a whole. In cases like this, the subject is made explicit to highlight a change of topic and keep everyone on the same page.

No se puede fumar aquí. ‘[You] cannot smoke here.’ The se in this sentence is an instance of what is known as an arbitrary pronoun – it is the rough equivalent of English ‘one’ (formal) or ‘you’ (informal), as in One mustn’t go to bed without supper. CNSL have a unique pronoun used for these situations, and you must use this pronoun if you are trying to express a generalization or rule: if you were to simply say No puede fumar aquí, that can only mean ‘[He/she] cannot smoke here’ – it is limited to an individual, not society in general. In instances where you might use one in English, in CNSL you would use the special arbitrary pronoun (se in Spanish and Portuguese, si in Italian).

If you’re learning a consistent null subject language, it may be enough to simply be aware of the different distribution of overt subjects between this and your native language. It is fairly safe to assume that you don’t need to include a subject in most sentences, especially if you are speaking extensively about the same person/thing. When you immerse yourself in the media of your target language, try to pay attention to when native speakers do make the subject explicit in a sentence, and consider what about the context required them to do so. As a rule of thumb, at least for simple and compound sentences, you can leave the subject out unless that would create excessive confusion (like if you’re trying to talk about multiple people at the same time).

Partial Null Subject Languages (Finnish, Brazilian Portuguese, etc.)

These languages are considered “partial” because null subjects are not quite as common as in consistent null subject languages, and are limited as far as what contexts they can appear in. Generally, partial null subject languages require an overt subject when speaking in first or second person (‘I’ and ‘you’), but you can drop the subject in many third-person (‘he/she/it’) constructions. You might notice that Brazilian Portuguese falls into this category, while European Portuguese remains a consistent NSL. This is due to natural processes of language change: although the language of the European colonizers was indeed a CNSL, over time and likely as a result of extended contact with other languages in Brazil, the local variety of Portuguese began to require the subject more often than not. A similar thing may have happened in the history of French: although nowadays, it is a non-NSL like English and German, the French of yesteryear may have been a CNSL just like Italian and Spanish. Sometimes languages just do that.

Anyway, partial null subject languages differ from CNSL in several key respects. First, whether or not you include a subject does not affect the interpretation of the sentence – in other words, including the subject doesn’t emphasize it in any way. These languages typically have less rich agreement systems than consistent NSL, which makes sense according to the Taraldsen Generalization. In Brazilian Portuguese, for example, if you were to omit você ‘you’ from a sentence like Gosta de estudar português? ‘Do [you/he/she] like to study Portuguese?’, it isn’t immediately clear who you are asking about – the other person or some third party. Partial NSL do allow you to use a null subject in sentences expressing generalizations; there is no special word for it like se in Spanish. This means that Não pode fumar aqui could be interpreted as either ‘[He/she] cannot smoke here’ or ‘[You/one] cannot smoke here’, and you would need to specify ele/ela if you are referring to a particular individual.

Like CNSL, these languages don’t have a “dummy” pronoun like English – to express ‘it’s raining’ in Brazilian Portuguese, you can simply say Está chovendo ‘[It] is raining’, omitting the subject just like you would in Spanish. However, if you’re talking to someone else or about yourself, it is more common to include a subject e.g. Eu estudo português ‘I study Portuguese’ (whereas in European Portuguese you could drop the eu ‘I’).

In terms of language learning for PNSLs, for beginners the best generalization is to include subjects for all of your sentences until you develop more of an instinct for which ones can be omitted in which contexts. Nobody will misunderstand you if you always include a subject in your sentences, whereas they might be confused if you omit the subject where it is not warranted.

Radical Null Subject Languages (Japanese, Korean, Thai, Mandarin?, etc)

As the name may suggest, radical null subject languages are, in a way, more extreme than the others I’ve discussed so far. Why do we consider them extreme? Well, unlike CNSL and PNSL, these languages lack any sort of inflection on their verbs, meaning that information about the subject of a sentence isn’t expressed elsewhere – it’s just inferred from context. Not only that, but these languages allow another type of information to be omitted from a sentence as well: the direct object. This is why, in Mandarin, it is perfectly understandable when you say 看完了 kàn wán le ‘[I] finished reading [it]’ – even though neither the subject nor the object of kàn was directly expressed, and can change depending on the context in which the sentence is spoken. RNSL are the languages that threw a wrench in the oldschool null subject typology: if the ability to omit subjects relates directly to whether a language has rich agreement or not, then how can languages like Japanese and Korean be even more liberal than CNSL?

Radical null subject languages share some structural features with both CNSL and PNSL. Like CNSL, languages such as Japanese do not limit the contexts in which a null subject can appear, but explicitly including the subject of a sentence doesn’t imply emphasis or any sort of difference in meaning. Also like CNSL, these languages do not have “placeholder” pronouns – to express that it is raining, you can simply say 下雨 xià yǔ ‘[It] is raining’, no subject needed. On the other hand, like PNSL, radical null subject languages do allow you to express generalizations like You can’t smoke here using a null subject: 这里不可以抽烟 zhè li bù kě yǐ chōu yān literally ‘Here not allowed smoke’. They also do not allow indefinite interpretations of an omitted subject: if you are speaking about a boy, you must make that explicit, whereas the boy can be dropped.

Learning a RNSL as a native English speaker may be challenging at first, because many sentences require you to infer much more from context than you would in English. Context plays a huge role in discerning what is meant by an utterance lacking both a subject and an object, and this is something that requires practice to become automatic. It might be helpful to remember that, if new information or topics are being introduced, they will be explicitly stated – as I said in the introduction, it’s simply bad etiquette to not include pertinent information in a conversation. In other words, if items are missing from a sentence, you can assume that the information was made available in a previous sentence or by the context in which the interaction takes place.

Theories of language must somehow account for the massive variation in rules and structures in a relatively straightforward manner – otherwise children would never be able to learn the grammar of their language. Those who subscribe to the theory of generative grammar believe that sentences are organized hierarchically, with different segments dominating one another. This is what allows us to understand that Stacey’s mom’s boyfriend’s cat is a cat owned by one person – not shared by Stacey, her mom, and her mom’s boyfriend.

A popular proposal was the abstract idea of pro (read as “little pro”), which is basically a silent pronoun. The idea was that pro could be inserted into the subject position of a sentence instead of other alternatives such as she, he, I etc. NSL speakers have pro as an additional word in their mental repertoire, while non-NSL speakers do not. However, recently the idea of pro has become increasingly abstract for a variety of theoretical reasons, one of which is that it seems to function quite differently in RNSL than it does in CNSL and PNSL. The availability of null objects in particular has caused considerable debate as to whether pro can adequately account for all the variations we see cross-linguistically: in Mandarin, for instance, the rules governing null objects are almost the complete opposite of those governing null subjects. However, if the two are both attributable to pro, we would expect them to behave similarly.

There are a number of other theories regarding null subject typology, but for the sake of brevity I’ll end my spiel there. The takeaway is that plenty of linguists are sitting in their offices tearing their hair out over this issue likely as I write, and if/when they converge on an explanation, I’ll let you know. Thanks for tuning in.

Further reading:

Barbosa, P., Duarte, M. E. L., & Kato, M. A. (2005). Null Subjects in European and Brazilian Portuguese. Journal of Portuguese Linguistics, 4(2), 11–52. DOI: http://doi.org/10.5334/jpl.158

Barbosa, P. (2011). Pro‐drop and Theories of pro in the Minimalist Program Part 1: Consistent Null Subject Languages and the Pronominal‐Agr Hypothesis. In Languages and Linguistics Compass. Blackwell Publishing Ltd. https://doi.org/10.1111/j.1749-818X.2011.00293.x

Camacho, J. A. (2013). Null/overt subject contrasts. In Null Subjects. Cambridge University Press. https://doi.org/10.1017/CBO9781139524407

Camacho, J. A. (2013). The nature of the Extended Projection Principle and the Null Subject Parameter. In Null Subjects. Cambridge University Press. https://doi.org/10.1017/CBO9781139524407

Frascarelli, M., & Casentini, M. (2019). The Interpretation of Null Subjects in a Radical Pro-drop Language: Topic Chains and Discourse-semantic Requirements in Chinese. Studies in Chinese Linguistics, 40(1), 1–45. https://doi.org/10.2478/scl-2019-0001

Roberts, I. (2019). Parameter Hierarchies and Universal Grammar, chapter 3. Oxford University Press.

Sato, Y. (2019). Comparative syntax of argument ellipsis in languages without agreement: A case study with Mandarin Chinese. Journal of Linguistics, 55(3), 643–669. https://doi.org/10.1017/S0022226718000403

Intro to German Verbs

This post links to external sites which may offer further explanation or clarification of the topic at hand. These are links that I personally found helpful at the time of writing, and are certainly not exhaustive of the linguistics/language-learning resources available online. Unless otherwise noted, I am not affiliated with any of the linked authors and am not responsible for the content of their blogs/websites.

I’d like to start off by saying that I am by no means a fluent or even close to fluent German speaker, so please, if you are and you notice any glaring errors, report them to me so I can fix them and avoid disseminating blatantly false information. Vielen Dank 🙂

Now that we’ve got that out of the way, it’s time to talk about everyone’s favorite part of learning a new language: grammar! Being a syntax student, one of my favorite parts of starting a language is the first time that my new skills become useful in reading syntax papers. It may not shock you to learn that linguists love to cite examples from a diverse array of the world’s grammars when arguing new ideas because, well, the whole point is to come up with theories and hypotheses that are universally applicable. German is one of those languages that — for various reasons — seems to come up a lot in papers, so when I first encountered a German-language example and didn’t have to skip straight to the English gloss, suffice it to say I was tickled pink.

The other day I was talking with my friend who is also learning German about their version of a golden rule: German has a funky syntactic feature that linguists have not-so-creatively dubbed V2, which stands for “verb 2nd” and means that the finite verb ALWAYS has to occur, well, in the second position. This is perhaps the most important rule for beginners to learn (…according to me, a beginner) because other than that, compared to English, German is a highly free word order language (meaning that the significance of a sentence is not highly dependent on the order of words). As far as syntax goes, the position of the finite verb is V (get it?) important.

“Er, that’s great, but what’s a finite verb?” you ask. In high school foreign language class terms, the finite verb is the opposite of the infinitive – it has already been conjugated with respect to the subject of the sentence. English infinitives are denoted by to (e.g. to cry), Spanish by -ar/er/ir (fumar, cometer, escribir), and German by -en (sagen, mögen). The infinitive is the form of the verb that you’ll see, for example, in a dictionary or vocabulary list. When you want to take an infinitive verb out of your vocabulary list and stick it in a clause as the main (finite) verb, it undergoes some changes: you have to conjugate it so that it agrees with the person/number/gender of the subject (although in some languages and for some subjects, the finite form of the verb looks the same as the infinitive). Now, you have a finite verb.

Now that we’ve got finite/infinite verbs straightened out, let’s look at a translation of the sentence “I am doing that now.” Once you’ve figured out where the verb goes (hint: what does V2 stand for?), the rest of the words can be flung all around and, thanks to other cool features of German like case-marking, the meaning doesn’t change. Let’s see what I mean:

Ichmachedasjetzt
Dasmacheichjetzt
Jetztmacheichdas

Maybe one day I’ll write a follow-up post touting the benefits of overt case in a free word order language, but for now let’s be content with the knowledge that a) all of the above sentences are perfectly grammatical, and b) all of the above sentences convey basically the same thing. Now, it’s time to continue our journey through the Vonderful Vorld of Werbs.

Another way of looking at it is that the finite verb is the one that would typically come first in a string such as “I have been eating.” Have, been, and eating are all verbs – but only have is conjugated to ‘match’ features of the (pro)noun performing the action. If you changed the subject from I to he, for example, you would have to change have to has but been eating is unaffected.

So far, we’ve gotten a pretty solid grasp on basic, single-verb sentences. The process is just as easy if the subject of the sentence is performing multiple actions at once. If you’re listing a series of actions, such as “I wash, brush, and kiss the dog,” (ich wasche, bürste und küsse den Hund) all these verbs are finite, and part of the same “phrase” occupying the second position. We know this because you can remove any one of them or change the order in which they’re listed, and still be left with a perfectly grammatical sentence. You could view it as a more succinct way of saying, “I wash the dog. I brush the dog. I kiss the dog.” – you’ve basically just eliminated unnecessary redundancy, but the verbs themselves are all conjugated with respect to the subject, and thus all belong in the second position of the sentence.

This is a good opportunity to discuss what exactly I mean by “position” because, as you may have noticed, three different words can’t all be the second word at the same time. When I say “position,” I actually mean this should be the second phrase in the sentence. A phrase is just a string of words that all relate to one main item and cannot (normally) be separated; for example, the man in the yellow hat is a phrase in which man is the main item, and all the other words modify that item. Linguists use something called constituency tests to determine whether a string of words is its own phrase or not: the easiest one is probably the Question test, where if a group of words forms the answer to a question, e.g. “Which man?” The man in the yellow hat. “What did the man in the yellow hat do?” Buy bananas.

Why am I boring you with constituency tests? Because this can help you decide whether or not you have already filled the first position in a German sentence with a (non-verb) phrase, and whether you’re encroaching on the finite verb’s rightful place in second position. A noun phrase like Das Kind, for example, can fill the first position, despite it being two words, because das is an article modifying Kind, and it would be simply unacceptable to split them up by jamming a verb in the middle. Das schön Kind can also fill that position, because again, schön is modifying Kind and cannot be separated. Prepositional phrases can (and often do) also fill the first position, but this has no bearing on where the finite verb ends up.

V2 is all well and good until you want to go beyond simple clauses, or even just use a modal auxiliary like can, want, or should. When a modal modifies a verb in a German sentence, it becomes the finite verb and takes the second position. What about the infinitive it modifies? As I mentioned before, only the finite verb(s) are subject to the V2 rule, so when you use more than one verb in a sentence like “I want to go home,” all the infinitives pile up at the end: Ich will nach Zuhaus gehen (literally “I want to home to go“). This is where things start to get confusing for English speakers, as we’re used to hearing all of our verbs clumped together; in German, you often have to wait until the end of the whole sentence to find out what action is actually being performed. If you have several dependent clauses, that could mean two or even three infinitives all smooshed together at the end of the sentence, completely isolated from the modal (or any other type of verb) that modifies them.

How do you know when you’re dealing with a list of verbs versus a verb plus a modal? Well, first, you could simply memorize a list of modals — there aren’t that many. However, this doesn’t necessarily help you to see the underlying structure that motivates the different constructions, and will only get you so far. The difference between “[Modal] + [Verb]” and “[Verb 1], [Verb 2] and [Verb 3],” once again, comes down to phrase structure. In a sentence like “I will wash the dog,” the verbs will and wash are part of the same phrase, and thus cannot be independently manipulated in the same way as a list of verbs: “I will the dog” doesn’t make any sense, and “I wash the dog” is grammatically correct but changes the nature of the action. So, a simple way that you can decide whether or not a verb is a modal is to ask yourself whether it can stand on its own in a frame like “[ ] the dog.” A modal doesn’t make sense here: “I can the dog,” “I will the dog,” “I should the dog,” “I shall the dog”… all of these are absolutely terrible sentences in English, and likewise in German.

I’ll end with a bunch of German sentences, their English glosses, and translations:

Ich will dich nicht küssen. I want you-ACC not to kissI do not want to kiss you.
Er müss mir etwas Kaffee kaufen.He must me-DAT some coffee-ACC to buy.He must buy me some coffee.
Wir bachen und essen einen Kuchen.We bake and eat a cake.We bake and eat a cake.
Meine Schwester lässt ihren Mann niemals seinen Hund waschen.My sister lets her husband-ACC never their dog-ACC to wash.My sister never lets her husband wash their dog.
Das Kind schreit und tretet. The child screams and kicks.The child screams and kicks/is kicking and screaming.

For a general overview of V2 across languages, check out the Wikipedia article.