Some months ago in the Lounge (last August, to be exact) we talked about the results of a project on collocations that yielded up some interesting features of English. Collocations are hotter than ever in English and other languages; they're one of the most promising avenues for producing data that can be used in a number of natural language processing (NLP) applications, and reams of data about them are being generated in the hope that computers might someday read, comprehend, summarize, and even translate language as efficiently and accurately as the good old human brain does.

At the moment we're pursuing a fairly extensive survey of English collocations and we've been struck by an odd phenomenon: there are quite a large number of collocations in English that would not be statistically significant were it not for their appearance in fiction. In other words, certain fairly natural word combinations do not appear with sufficient frequency in print (or in pixels) to register on collocational radar if their appearances in fiction are discounted.

The reason this strikes us as being odd is this: isn't a great deal of fiction supposed to be a reflection of what really goes down? Does art imitate life, or doesn't it? Of course it is not surprising that particular collocations should appear mainly in genre fiction — warp speed and cryogenically frozen, for example, immediately conjure science fiction, and collocations like lace bodice and fiery passion more or less define the romance genre. But the startling finding is that a number of ordinary collocations seem to find a home far more frequently in fiction generally than in any other genre of writing. We'll illustrate a few examples from the letter B, the spot in the alphabet where we were recently parked for a couple weeks.

The verb brush, as its wordmap illustrates, is an extremely busy one in English. It's probably safe to say that all of its many verbal uses are rooted in the original noun sense — that is, an implement with bristles set in a handle. Brush as a transitive verb can accommodate a wide range of objects. The top five in terms of salience (that is, roughly speaking, in terms of statistical significance) are: teeth, hair, strand, lock, and lip. Three of these collocates, notice, are about hair. Two of them (teeth and hair) exemplify the sense of brush that means "apply a brush to"; three of them (hair again, as well as strand and lock) are the sense of brush that means "remove by sweeping with the hand," as in "He brushed a lock of his brown hair aside." The last collocation, lips, is the sense that means "touch gently": "Armand lightly brushed her lips with his."

Now, here's the curious thing: were it not for their appearances in fiction, the only one of these collocations of brush that hits the billboards for English generally is teeth. Doctors, journalists, bloggers, scientists and others (as well as fiction writers) all write about brushing teeth with some regularity. But the other main collocates of brush — hair, strand, lock, and lip — appear overwhelmingly in fiction — up to 150 times more frequently than in any other genre. The frequency of these collocations in fiction is so great that it completely skews the statistics for brush generally. Why is this?

Lips is the standalone and probably the easiest one to decipher: the frequency of "brush lips" in fiction may simply be a reflection of two things:

  1. sex sells, and a modern novel without someone's lips brushing up against someone else's (or up against someone else's cheek, ear, shoulder, etc.) is a dry novel indeed.
  2. while brushing of lips may be quite a frequent human behavior, it is mainly associated with a sort of intimacy that is not widely written about in factual genres.

For the hair-related collocations, the evidence points to a different, irrefutable conclusion: fictional characters cannot stop playing with their hair. Fictional heroines constantly brush theirs, often as a backdrop to pondering weighty questions or holding tête-à-têtes with their confidantes. Fictional personages of both sexes seem obliged to spend a great deal of time getting hair out of the way, whether it's their own or others': they brush it back, they brush auburn locks off their foreheads, they brush blond-streaked strands out of their eyes.

The other conclusions that might be drawn from this depend on your view of fiction. It's possible that people are, in fact, always playing with their hair, and that it only ever gets written about in fiction. This conclusion is consistent with the "art imitates life" thesis, and is certainly supported by the fact that grooming of self and others is a basic mammalian behavior: have hair, will groom. On the other hand, you may conclude that hair manipulation is merely a device, a convention that fiction writers use to represent, emblematically, any number of motives on the part of their characters. Finally, and less charitably, you might conclude that all this stuff about hair is mere fictional cliché: something fiction writers throw in because Fiction 101 says the characters must be doing something, and so much of the time, when they're nattering on or laying the internal-dialog groundwork for some startling insight, there really isn't anything else for them to do.

It's rather early days in our study of collocations (the rest of the alphabet still lies waiting), and we expect that by the end of it we'll have a more definite view of the rather skewed patterns of English that are represented in fiction. For now, we offer for your perusal some other collocations of words beginning with b that we found appear mainly — sometimes overwhelmingly — in fiction. We have taken the liberty of arranging them in an impromptu narrative. The frequent (in fiction) collocations are in italics:

A powerfully built man knocked and entered the basement. It was Jayde's new bodyguard! I sat bolt upright. Hadn't I bolted the door?
"Just checking up on you," he said, blowing a cloud of smoke in my direction. I noticed a bruise on his bronzed skin.
"What's the fruit basket for?" I blared loudly. I blamed myself. I bit my lip. I drew a breath. Would he think that I was growing bold?
"Just thought we might grab a bite," he blurted out.

This book chapter gives a good background in collocations:
http://tangra.si.umich.edu/~radev/papers/handbook00.pdf

There's a nice PowerPoint presentation here, summarizing approaches to the study of collocations and why they're important:
www.cs.tau.ac.il/~nachumd/NLP/Collocations.ppt