One of the ways in which massive corpora (databases of natural language examples) have revolutionized lexicography is by providing access to a level of statistical analysis of language that was never before possible. The data in a corpus can tell us, with the effort of a few keystrokes—and backed by the effort of hundreds of person-hours of software development—all we need to know about the most frequent uses and collocations of words.

One of the ways in which massive corpora have not yet revolutionized lexicography is in helping us to locate statistically meaningful patterns of language that fly below the radar of the average language reference book, but are still interesting in their own right—and probably worthy of documentation. I explored an aspect of this phenomenon many years ago in the Lounge when I was surveying a number of compounds that don't appear in dictionaries because they are taken to be transparent—that is, easily understandable on the basis of their parts.

One quite frequent compound I talked about, heart-shaped, is only transparent if you know that the heart part of it means "♥", and not "the hollow muscular organ located behind the sternum and between the lungs". Should dictionaries define heart-shaped? I think they should. There are innumerable other such compounds whose frequency and lack of compositionality makes them merit fuller dictionary treatment than they get, and corpora can help us to find them.

I searched a massive corpus (nearly 20 billion words) for a number of word patterns to get an idea of the frequency of words made from some combining forms that we often take to be self-explanatory: compound adjectives formed by appending a sense-related participle (-smelling, -looking, -sounding) onto an adjective: in other words, e.g., good-looking, high-sounding, foul-smelling.

The appearance of high-sounding in many dictionaries is evidence of lexicographers doing their job. It's the most frequent *-sounding compound by far, and high is extremely polysemous, making it unlikely that naïve readers will be able to construe that high-sounding means "pretentious, pompous, imposing". The next most frequent *-sounding compounds are: great-sounding, similar-sounding, best-sounding, natural-sounding, nice-sounding, good-sounding, and sweet-sounding.

All transparent? Well, pretty much, but I have a little problem with natural-sounding, a compound that is nowhere defined. What does it mean: having a sound that is natural? Having a sound found in nature? Natural-sounding modifies nouns that have almost entirely to do with speech and language: natural-sounding speech/language/voices/dialogue/English. So it isn't so much the sound we are talking about here, as the sense of what is said that we characterize as natural. This probably bears defining, especially for English learners.

Good-looking, though not particularly challenging to construe, appears widely in dictionaries because it is very frequent, and it means more than "that looks good"—it means "attractive, having a pleasing appearance," and it is applied almost exclusively to people to characterize their attractiveness. Are there other*-looking compounds that require a closer look? Some that appear in dictionaries include evil-looking, forward-looking, ill-looking, and solid-looking. Perhaps surprisingly, of these four only forward-looking figures in the *-looking frequency league tables of contemporary English.

Other frequent compounds on the same model include various synonyms of good-looking (nice-/great-/best-/better-looking) and natural-looking, professional-looking, and odd-looking. In the case of *-looking compounds I think we have a bit of overzealousness on the part of lexicographers (evil-looking, ill-looking, solid-looking), combined with a spot of neglect, and here again I would single out the natural- compound. What does natural-looking mean? One of the most frequent phrases that instantiates this compound is natural-looking results. What are these? Here are a few cites that suggest the meaning.

Clearly, natural-looking results is a code for something like "no one will know you paid hundreds or thousands of dollars for this" and it is probably a term that could be better treated by obfuscation master Mark Peters. It turns out that most other things that natural-looking frequently modifies are also fakes of various kinds: natural-looking hairline/tan/porcelain crown/breasts/lashes. Lexicographers, take note: there is a defining opportunity here!

Compounds that characterize smells are not problematic when they can be construed thus: x-smelling means "that smells x". So it is pretty much with the two compounds that account for about 50 percent of usage and are about equally frequent, though rare in dictionaries: foul-smelling and sweet-smelling. Native-speaker intuition will tell most readers that sweet-smelling does not mean "that smells like sugar" but rather "that smells pleasant," though English learners might benefit from knowing this. On the other hand, it may be that in context, even English learners will be able to correctly infer the meaning of sweet-smelling in context, judging by the nouns that it typically modifies: sweet-smelling flowers/herbs/incense/smoke/fragrance/breath/perfume.

With this brief survey I have barely scratched the surface of compound words that are ready to join the lexicographer's queue. The search for underserved compounds in massive corpora is limited only by the imagination of the searcher and I think many more riches await. Poking around for some patterns that occurred to me I found several gaps in the defined lexicon for these word classes:

a) compound adjectives based on parts of the body where the initial adjective characterizes some quality (usually figurative) of the body part. Dictionaries do pretty well on several compounds of –headed, -footed, -armed, -handed, and –legged, but some intriguing possibilities for better treatment include multi-handed, full-fingered (both of these are surprisingly polysemous) and nappy-headed (though I should note that the crowd-definers at Urban Dictionary have dealt with that one adequately).

b) spatial orientation adjectives ending in –sided, -topped, and –bottomed: This class has many figurative uses as well, of whichbell-bottomed, carrot-topped, and slab-sided are treated in dictionaries, but mop-topped, before-sided, and false-bottomed are not.

c) the productive combining form well- (usually followed by a past participle, as in well-known, well-rounded, and well-versed) is sometimes taken by dictionaries to be so obvious as to require no definition. Compounds often appear in list form without definition as well, except for those not entirely obvious, such as well-fixed or well-nigh. But others, including well-priced, well-told and well-used show evidence of needing a bit more unpacking by professionals.