Fraud and Deception Detection: Five Language Fingerprints

Last month, I described how computer-aided text-based analysis can help uncover fraud and deception in company communications. But what other insights can we glean from this research into scandal companies?

We used Deception And Truth Analysis (D.A.T.A.) to examine 10 of the largest corporate scandals in recent history and found that the average lead time between our textual identification of deception and the public recognition of possible scandal was more than six years.

Corporate Scandals: Time between Textual Evidence and Public Recognition

Ticker Company Size, in US Millions Scandal Year Average Alert Score in Lead-Up Average Alert Score Pre-Scandal Years Warning
ACC Adelphia $2,300 2002 -46% -44.8% 2
AIG AIG $3,900 2005 -30.6% -52.4% 12
CUC Cendant $640 1998 -37.9% -48.8% 3
ENRN Enron $74,000 2001 -87.4% -76.3% 8
HLS HealthSouth $1,400 2003 -42.2 -27.1% 9
LEH Lehman Bros. $50,000 2008 -37.2% -3.8% 13
SAY Satyam $1,400 2009 -28.9% -38.4% 6
TYC Tyco International $600 2002 -77.1% -81.7% 7
WCOM WorldCom $3,800 2001 -33.9% -47.9% 4
WM Waste Management $6,000 1997 -39.4% -41.1% 2
Total $144,290 Average -40.3% 6.6

The obvious question is why. Why does it take regulators and markets so long to recognize these scandals? And a follow-up question: What insights from text-based analysis can we use to better identify these scandals earlier? Let’s take these in turn.

Theory: It’s the Behavior

Why does D.A.T.A. detect deception faster than acutely interested investors and regulators? After thinking about this for a while, we developed a theory, and it boils down to 86.5%. That is the percentage of financial information that is expressed in text, not in numbers, in annual reports. Text communications reveal the behavior of corporate management teams, and that behavior leads to the outcome that is expressed in numerical performance.

So that 6.6 years between the initial indication of deception and when the scandal breaks is the average length of time that a poorly behaving firm can fake it, until they just can’t massage the numbers any longer.

What is interesting is that the two scandals that took over a decade to recognize both involved financial companies: AIG and Lehman Brothers. Their annual reports ran in the hundreds of pages, and the velocity of money cycling through their balance sheets and income and cash flow statements was very, very high. Thus, it took considerable time for their poor behaviors and choices — the inputs — to eventually show up in the numbers, or the outputs.

If this theory is a valid explanation for that lead time, then scandal ought to have language fingerprints that investors can dust for as either an early warning system or as a second opinion on the normal fundamental work that investment research teams conduct.

Language that Reveals Possible Scandal

After examining the 10 scandals above as well as Wirecard and other more recent controversies, we identified five textual fingerprints that differ from those of more truthful companies by more than 50%.

Scandal Words and Company Communications

Language Fingerprint Incidence Relative
to the Mean
Words Indicating Friendship +56.1%
Words Indicating Risk +55.9%
Impersonal Pronouns +54.1%
Words That Indicate Differences -53.6%
Words That Negate a Statement +50.4%

In addition to text-based analysis, we also conducted one-on-one conversations to better discern between deception and truth and to identify some of the more pan-cultural deceptive behaviors people engage in. Our findings aligned with what previous lie detection researchers had uncovered: that each of the five potential deception indicators that surface in text-based analysis also occur in person-to-person interviews.

So let’s drill a bit deeper into each of them.

1. Words Indicating Friendship

Lie detection researchers have shown that deceivers often employ obfuscation to create confusion. One way they do this is by using words that imply friendship more often than the norm in business communications. Deceptive companies employ such terms 56.1% more than the average, according to our analysis. So if an annual report includes a number of ingratiating terms, it may be evidence of obfuscation and deception.

But a distinction is crucial here: Words that indicate friendship — “friend,” “pal,” “neighbor,” and “gang,” for example — are different from friendly words.

2. Risky Words

Scandal firms favor words that indicate risk at a much higher proportion than the average company. These include such terms as “averse,” “avoid,” “concern,” “difficulty,” “prevent,” “stopped,” and so on. These types of words already tend to raise securities researchers’ hackles, and as we pointed out in the last piece, firms are proactively excising these kinds of “red flag” words from their annual reports.

3. Impersonal Pronouns

“Another,” “everybody,” “someone,” and “whichever” are the sort of impersonal pronouns that dishonest firms employ to a much greater extent — 54.1% more often — than their truthful peers. Why do they prefer to be impersonal in their communications? Researchers theorize that they are trying to create emotional space between themselves and those they wish to mislead.

4. Words That Indicate Difference

Lying is cognitively demanding. One manifestation of this is that during the act of deception, the liar is often unable to make distinctions among competing points of view in their communications and so are less likely to draw comparisons. So the use of words that suggest difference is actually an indication of truthfulness. Constructions that present contrasting viewpoints — “as compared with other years . . .” — are examples of this.

Deceivers also have an agenda: to convince their target to believe their preferred narrative. They are unlikely to draw distinctions between other narratives and will tend to focus on their preferred one.

5. Words That Negate a Statement

Research also indicates that liars often employ more negative terms than truth tellers. This is why we drew the distinction between words indicating friendship and words that are friendly.

But researchers do not always find that the deceivers are more negative than the truthful. Our analysis of dishonest firm communications suggests, however, that they tend to use such words as “not,” “never,” “should not,” “does not,” and “must not” at a 50.4% greater proportion than the average.

So what is by far the strongest indicator of deception? The number of swear words in an annual report. Though they are rarities, swear words occur in scandal company annual reports a whopping 277.1% more frequently than the mean.

