A Phonology of Simlish

This may in fact be the most pointless research I’ve ever conducted, but sometimes you just have to carpe diem and do pointless things instead of working on your real research.

If you have never and will never play The Sims (or you have no interest whatsoever in phonology), then this is not the post for you. That said, almost everybody I know has at least dabbled in the game once in their lives, and have at least commented in passing on the features of the Sims’ language. A very brief overview, just in case you forgot: in The Sims, you are essentially God (or your omnipotent deity of choice), creating and controlling the lives of little virtual people who live in an idealized version of suburban America. Some people play exclusively to build houses, others to live out their wildest dreams, and still others to release their inner psychopath. Regardless of the reason you play, it is more than likely that in your time as a Simmer, you’ve encountered the fictional language used in the game, known as Simlish.

Although it doesn’t actually have a complete syntax or lexicon, Simlish is still an impressive feat and an ode to the game’s creator’s attention to detail. According to a recently-published article on the evolution of Simlish, it was originally devised by the voice actors hired to provide the male and female vocalizations for the original edition of The Sims, released in 2000. The creator, Will Wright, had originally envisioned a ‘language’ that was a hodgepodge of features from languages such as Navajo, Ukrainian, and Latin; eventually, this idea was scrapped in favor of the actors’ ad-libbing using gibberish syllables that (mostly) conformed to English phonetics and phonotactics.

Simlish is not intended to be a ‘learnable’ language; its purpose is to make the game accessible for players around the world, regardless of their mother tongue. It allows players to superimpose their own details onto their Sims’ conversations and outbursts, resulting in a much more customizable and imaginative storyline than if the characters spoke a real language. However, Simlish does appear to follow a number of rules as far as its pronunciation is concerned, and that is what I’ve spent the better part of my evening looking at in detail.

The data for this post is taken from the following sources:

Phonemic Inventories

BilabialLabiodentalAlveolarPalatalVelarGlottal
Plosivep bt d k g
Nasal m n ŋ
Fricativef vs zʃh
Laterall
Approximantr
Glides w j
Affricatet͡ɕtʃ dʒ
The consonant inventory of Simlish. In keeping with IPA tradition, voiceless phonemes are on the left and voiced on the right. All nasals are voiced.
FrontCentralBack
Closei
ɪ
u
Close-Mideǝ o
Open-Midæʌ ɔ
Openɑ
The vowel inventory of Simlish. Vowels to the left are unrounded.

If you’re at all familiar with the phonemes of English, you’ll probably notice several similarities. Unsurprisingly, considering both the original voice actors were American, the phonology of Simlish bares a striking resemblance to Mainstream American English. It is worth noting, however, that the Simlish in the original Sims game also sounds significantly different than later iterations – for example, in the first minute of this video, you can hear a flap/trill, a /ʒ/, a /ɣ/, and a /t/ that more closely resembles the Spanish /t/ than its English counterpart. Sometimes I suspect a bit of retroflex on some of the plosives, but it’s hard to say for sure. Regardless, this is an artificial example of the influence that language contact can have on sound shifts over time: in the past 20 years, Simlish phonology has taken on more English-like features as a result of extended contact with the English-speaking world, especially since most of its ‘users’ (i.e. actors and writers) are native English speakers.

Next, we’ll break down some of the phonotactics (= rules governing how individual phonemes are combined) of Simlish.

Phonotactics

  • The most complex syllable is (C)(C)V(C)(C)
  • Permissible simple onsets: /p/, /b/, /t/, /d/, /k/, /g/, /m/, /n/, /f/, /v/, /j/, /w/, /r/, /l/, /s/, /z/, /ʃ/, /t͡ɕ/, /dʒ/, /tʃ/
  • Permissible simple codas: /l/, /m/, /r/, /g/, /f/, /b/, /t/, /ʃ/, /ŋ/, /k/, /r/, /s/
  • Permissible complex onsets: /bl/, /pl/, /sp/, /kl/, /gl/, /mj/, /sk/, /fl/, /fw/, /tw/, /bw/, /mw/,
  • Permissible complex codas: /mg/, /nd/, /ps/, /bz/, /kt/, /lt/
  • Allows syllabic /r/ e.g. [grb]
  • Primary stress tends to fall on the last syllable
  • Some of the canonical vowels are diphthongized similar to MAE: /e/ becomes [eɪ], and /o/ becomes /oʊ/ in an open syllable
  • Other diphthongs include /aɪ/ (as in hi) and /au/ (how)
  • /v/ sometimes seems to be in free variation with /b/, e.g. /’bʌdiʃ/ thank you can also be pronounced /’vʌdiʃ/
  • Word-initial voiceless plosives (/p/, /t/, and /k/) are aspirated
  • Word-final nasal + consonant clusters may delete the second consonant
  • Light and dark /l/ are in complimentary distribution with one another, like in MAE
  • Intervocalic /t/ and /d/ may become flaps, like in MAE

Comparison to English

Although the influence of American English on Simlish phonology is extremely evident, there are some minor differences between the two. First, Simlish lacks dental phonemes, particularly /θ/ and /δ/ (sounds at the beginning of thin and the, respectively) which are prevalent in many dialects of English. Simlish is also more conservative when it comes to syllable structure: English permits up to three sequential consonants in the onset of a word (e.g. strong) and four or five in the coda (e.g. sixths /siksθs/ or angsts /æŋksts/, depending on dialect); on the other hand, Simlish allows a maximum of two consonants in each position. There are some other, more minor differences as well regarding the distribution of phonemes in words/syllables. Simlish allows many, but not all, of the same phonemes in a simple coda as English: the /d/, /l/, and /z/ phonemes are unattested in the data I looked at, although /l/ and /t/ appear together in a complex coda.

Many English speakers have compared Simlish to ‘baby talk’ and, after examining the phonology of the language, the reason becomes clear: Simlish allows plenty of consonant + /w/ clusters that are typically associated with children’s early attempts to produce words containing a consonant + /l/ or /r/ (e.g. /bwu/ for blue), and don’t exist in adult English. It also seems to contain a greater number of words that begin with glides (don’t quote me on that – I haven’t actually run a formal analysis), which may also influence English-speaking listeners to perceive it as infantile.

Conclusion

There is an obvious influence of English phonology and syllable structure on the phonology of Simlish. The Simlish phonemic inventory consists almost exclusively of English vowels and consonants, and many of its rules governing stress assignment and allophonic variation are borrowed directly from English. However, there are some non-trivial differences between the languages: Simlish syllables are maximally CCVCC, whereas English allows up to three consonants in the onset and four/five consonants in the coda of a single syllable. The types of consonant clusters also differ, with Simlish allowing several more C + glide combinations as well as the /mg/ cluster in final position. These rather salient features of Simlish phonology may be to blame for English speakers’ assessment of the language as ‘baby talk.’

Of course, the above assessment has been made on the basis of limited acoustic data and without the input of native Simlish speakers (mostly because they don’t exist). And again, seeing as the language lacks a concrete syntax or morphology, it is rather difficult to postulate underlying forms for the surface forms of words and phrases presented in The Sims – rather, I have operated on the assumption that Simlish is a fully faithful language when it comes to mapping underlying forms onto surface forms. Future researchers of the language would do well to analyze sample utterances using a parsing software such as Praat, or otherwise contact potential informants for deeper insights as to its structure.

And with that, I’m off to finish some reading for my dissertation. Dag dag!

Linguistics Outside of Academia

Sometimes, we spend so much of our time with like-minded people that it can be difficult to take a step back and view your academic specialty through a broader lens. During school terms, I’m surrounded every day by others who think linguistics and language are just as fascinating as I do, so it’s rare that I find myself needing to explicitly state the worthwhileness of our cause. Particularly when it comes to language, the vast majority of people don’t often take a step back and marvel at just how crazy this uniquely-human trait actually is, because it comes to us so naturally. When I’m home between terms or interacting with other scientists outside of linguistics, on the other hand, I often find myself worrying about how my passion is perceived by my audience, and whether the widespread applications of a background in linguistics have gone unnoticed. Recently, I’ve also started hearing classmates remarking on the hopelessness of their situation if they decide to leave academia – and I staunchly believe that this is simply not true! My response, as per usual, was to sit down and write about all these misconceptions, and this is the result: a broad overview for the general public – as well as those considering a degree in linguistics – of the field and its applications.

In a nutshell, linguists dissect and classify (human) language scientifically in order to better understand its function and limitations in the human mind. We also seek to answer more existential questions as to how language evolved, how it is represented in the brain, and whether it is an innate and/or unique characteristic of human beings, as well as hundreds of other questions that fall under one of these umbrellas. In order to do this, we use a combination of theoretical and empirical methods, just like the more well-known sciences, and we conduct analyses according to the scientific method: we generate hypotheses, look at what past research has to say about our hypotheses, conduct experiments, and based on the results, we decide whether to accept or reject our hypotheses. Rinse and repeat.

Aside from the fact that language is inherently fascinating (I’ve been told I may be biased), findings from linguistics can inform a vast range of disciplines, including but not limited to second language pedagogy, diagnosis and treatment of language impairments, translation/interpretation, programming, and artificial intelligence. I’ve heard of actors who studied phonetics in order to learn accents (see: My Fair Lady), professors who create languages such as Parseltongue and Dothraki, and therapists who work with aphasiacs to improve their speech fluency after a TBI. Because language is so central to the human experience, linguistics and language science are relevant to a number of other disciplines and industries. It’s an extremely versatile field, when you move beyond academia.

Off the top of my head: with a degree in linguistics, you can work in…

  1. Curriculum design for schools
  2. Overseeing language education programs (e.g. Rosetta Stone, Duolingo)
  3. Software development and programming, especially natural language processing
  4. Data analysis
  5. Government positions
  6. Language documentation and preservation work
  7. PR and marketing
  8. Lexicography
  9. Speech synthesis technology
  10. Speech-language pathology (with supplementary clinical training)
  11. Audiology (again, with supplementary clinical training)
  12. Developing diagnostic tools for language impairments
  13. Editing/publishing
  14. Technical writing
  15. Translation and interpreting
  16. Journalism
  17. Fictional language construction for media
  18. Accent coaching
  19. Profiling and forensics
  20. Any job that requires foreign language skills, assuming you’re a linguist who actually speaks a foreign language
  21. Traditional research, lab management, etc

As with any degree, the key is knowing how to market yourself and the skills you’ve developed throughout your training as an undergrad/grad student. A linguistics degree is far from useless, but some of its potential applications may require thinking outside the box and developing skills outside of those required by traditional coursework. What’s most important, in my humble opinion, is that you are passionate about what you study and the ways you can use this knowledge afterwards.

Phonemes R Hard

This post links to external sites which may offer further explanation or clarification of the topic at hand. These are links that I personally found helpful at the time of writing, and are certainly not exhaustive of the linguistics/language-learning resources available online. Unless otherwise noted, I am not affiliated with any of the linked authors and am not responsible for the content of their blogs/websites.

A little more than a year ago, I began my official endeavor into learning German. By the time I first set foot into the classroom, I was what most people would deem a seasoned veteran of foreign language learning: in high school, I studied abroad and opted to substitute art classes for IB Spanish; in undergrad, I racked up credits in Portuguese and American Sign Language; in my spare time, I watched Chinese dating shows and did my best to maintain some semblance of a streak on Duolingo. I was That Person in foreign language classrooms who offered up translations when the instructor was at a loss, who volunteered for every read-aloud exercise, who constantly compared the target language to one of the gajillion others they already spoke.

Still, every time I start a new language, there is the nagging question at the back of my mind whether this will be the point at which I bottom out. How many grammars will be “too much” for me to handle? How long until I stopped being sensitive to phonetic variations like vowel length, prosody, and allophony? The first week of German 110, I could not get a handle on how our instructor pronounced her Rs, and subsequently became quite concerned with whether this was the end of my polyglot journey. Suddenly I was the one left behind in the acquisition of this new phonemic inventory – in my desperation, I sounded like I was alternating between bastardizations of the Spanish, English, and Chinese sound systems every time I tried to imitate the German accent. I went into full-on panic mode and began scouring the literature on German phonology and, almost instantly, my confusion dissipated – crazy how the Internet can swoop in and save the day. A year later, although I would still by no means be mistaken for a native speaker, I can at least pronounce the German /r/ (and, in fact, all its other phonemes!) properly in the majority of contexts. My admittedly brief struggle with the acquisition of a new sound did, however, get me thinking about exactly what features of the letter/sound /r/ make it so difficult to acquire cross-linguistically. Here’s where that led me:

As English speakers, we are quite often misled by the alphabet in the sense that English /r/, which is a considerably rare sound in the world’s languages, is represented by the exact same symbol across other Indo-European languages despite the fact that it often indicates a completely different phoneme (= a single unit of sound in a given language). So, first and foremost, it is important to distinguish between the orthographic form R and the actual sound – while one is consistent across many Indo-European languages, the other is not. This, I suspect, is the source of a lot of language learners’ confusion.

When I first begin learning the sounds of a language, like most people (and, actually, some speech recognition software), I typically try to draw from my knowledge of other languages, depending on which one sounds superficially most similar. When I started Portuguese, I used a default Spanish-like pronunciation for all its sounds until I became more aware of the differences. Orthographic similarities make it quite easy to map sounds from one language onto another with which it shares an alphabet: there is minimal guesswork required to translate between Spanish /p/ and Portuguese /p/, as the same symbol represents the same sound in both languages. However, this apparent ease of translation can be detrimental when the same letter represents two different sounds in language; this is often where I find myself and my classmates get tripped up, allowing the same letter to trigger an English sound rather than a Spanish sound. The letter R is one in particular that differs quite drastically across Indo-European languages – in fact, from a strictly phonetic standpoint, there is nothing at all that links its variants to one another, except for the fact that they all happen to be represented by the same letter.

Let’s start with English /r/, as this is the one with which I, as a native speaker, am most familiar. I’m not going to get into dialectal variations because, as anyone who’s watched even five minutes of Harry Potter will know, not every native English speaker pronounces their Rs in the same way (and yes, I’m looking at you, Cho Chang). In my dialect — that is, the way we speak in Western New England — /r/ is formed by placing the blade (as opposed to the tip) of the tongue against the alveolar ridge such that the flow of air is restricted but not completely obstructed by the tongue. What is strange about this sound is that it requires just the right amount of air flow between the tongue blade and the roof of the mouth. Too much, and your English /r/ turns into a vowel; too little and it turns into a Portuguese /r/.

Spanish /r/ (the sound in perro), famously, is what is known as an alveolar trill, produced by basically “vibrating” your tongue against the roof of your mouth just behind the teeth — the same place that sounds like /t/ and /d/ are produced in English. Nothing about this place or manner of articulation mimics the English /r/, which is known as a postalveolar approximant and is produced in a completely different way at a different position in the mouth. The fact that these two sounds happen to share the same written form can be detrimental to language learners.

As any phonetician will tell you, of course, phonemic contrasts don’t exist on a binary, but a rather extensive scale measuring all sorts of complicated acoustic features that don’t really matter much to the layperson, so I won’t get into them now. What is most important (and, indeed, most salient to your average polyglot) is that certain sounds may be more similar to one another by nature of their place/manner of articulation (where/how they are forced in the mouth), or whether or not they require vibration of the vocal folds (= voicing). With this in mind, then, it is easy to see that English and Spanish /r/ are actually quite similar in the sense that a) Spanish /r/ is produced just slightly further back in the mouth than English /r/ and b) they are both voiced, but different in the sense that Spanish /r/ requires contact between the tongue and the roof of the mouth, which English /r/ does not. This actually makes them more similar than Spanish and Portuguese /r/, which are pronounced at nearly opposite ends of the mouth (Spanish closer to the front, and Portuguese at the back), do not share manner of articulation, nor do they share voicing. Brazilian Portuguese /r/ is known as a voiceless uvular fricative, meaning it is made by almost — but not completely — cutting off the flow of air from the lungs, and it does so at the uvula (AKA “the dangly thing that makes you gag when you accidentally poke it with your toothbrush”) without vibrating the vocal folds.

It turns out that my prior mastery of the Portuguese /r/ would be my saving grace in German, because they also share 2/3 features! German /r/ is just a voiced version of the uvular fricative in Portuguese. I was incredulous when I found this factoid on Wikipedia, but sure enough, the next time I went into German class and listened for my teacher’s production of that elusive phoneme, suddenly there it was, clear as day: the voiced uvular fricative, just a vibration away from my Portuguese teacher’s /r/. Of course, it took longer than that for me to actually achieve a degree of accuracy, especially producing it word-initially (as in rund), but the awareness was there… and as any acquisitionist will tell you, that’s half the battle.

Fun bonus: Phonemic inventories of the world’s languages, complete with detailed phonetic descriptions and peer-reviewed citations, are available here.