The 4 Fundamentals of AI-Based Email Security

5 min readJun 27, 2019

Editor’s Note: This blog post was originally found on the Agari Email Security Blog.

By Siobhan McNamara

Predictive, AI-based email security is proving to be remarkably effective at protecting against today’s most advanced business email compromise (BEC) scams, phishing attacks, and other rapidly evolving email threats. But only when it’s done right.

According to the FBI, targeted email attacks doubled last year, leading to $1.2 billion in business losses. Nearly 30% of all email attacks are launched from a highjacked email account belonging to a trusted employee or outside partner or supplier, including 1.5 million sent from pirated Office 365 accounts on a monthly basis. More than 90% of all businesses have been hit in just the last 12 months.

In the face of this unrelenting assault, signature-based email security, whitelisting, and even modern, cloud-native security controls are hopelessly outgunned on their own. From never-before-seen, zero-day events, to dynamically-generated malware variants that defy detection, attacks grow evermore inventive and dangerous by the nanosecond.

Even so, today’s most devastating attacks employ simple, plain-text email messages that rely on sophisticated social engineering tactics designed to push emotional buttons in order to manipulate recipients into divulging login credentials or making wire transfers. The average price tag of a successful attack now tops $2 million. When it results in a data breach, you’re talking an average $7.9 million and up.

The revolution in artificial intelligence (AI), and more specifically, machine learning (ML), can change all this. But to understand how, it’s important to get past the hype.

Not All ML is Created Equal

Put simply, machine learning (ML) is a branch of AI that’s focused on recognizing patterns and learning from sets of labeled data in order to make predictive business decisions. In the email security space, ML is helping to shift the prevailing defense paradigm from a reactive posture focused on chasing down known threats in an endless game of whack-a-mole, to a proactive approach that recognizes and even anticipates novel attack modalities as quickly as they emerge. Unfortunately, excessive vendor hype and sensational press coverage are undercutting progress by creating unrealistic expectations and obscuring what is required to harness ML’s full potential.

In parts one and two of this series, we looked at how Agari Secure Email Cloud leverages machine learning to prevent even the most ingenious zero-day attacks using real-time intelligence from around the globe. In actual deployments, Agari Secure Email Cloud functions with 99.9% efficacy against all BEC and phishing attacks, including those launched from hijacked email accounts. Over time, we’ve learned that there are four fundamental requirements for achieving this level of performance.

#1: A Focus on the ‘Good’ to Expose the ‘Bad’

Instead of focusing solely on training ML to recognize malicious email in hopes of ferreting out each new attack modality, a far more powerful approach is to model legitimate, “good” email traffic. After all, the behavior of legitimate users can be quite predictable — it only deviates from normative patterns when an account is hijacked or impersonated.

The premise behind our approach is simple. We train our models to identify good and normal characteristics of emails, by defining the normal for both behavioral and infrastructural components. If something deviates from this normal, we interpret it as malicious, allowing us to focus only on the very small set of data that is considered bad. Instead of looking for a needle in a haystack, the Agari Secure Email Cloud removes the hay to reveal the needle.

#2: Globally Scaled, Dynamic Datasets

This kind of approach requires a massive dataset — and the bigger, the better. The Agari Secure Email Cloud interpolates trillions of emails annually to graph relationships and behavioral patterns between individuals, organizations, domains, infrastructures, and locations, spanning hundreds of raw feature values to define good, trusted email communications at a global scale. It then dynamically scores each new email message against a divergent set of behavioral models, enforcing policies according to a specific business’s needs. But as important as it is, this kind of scale is not enough on its own.

In order to continuously refine the solution’s capabilities, more than 300 features are updated each day enabling our variety of machine learning models to learn continuously. Through real-time data streaming, intelligence that necessitates model changes is applied not in daily or even hourly batched data updates, but rather within microseconds of detection. Each new customer adds deeper, more relevant insights to this dynamic, global dataset, creating a network multiplier effect that makes the Agari products smarter and more effective with each new email.

#3: Expertise from the Masters of Their Domains

Beyond the size and quality of its dataset, the efficacy of any AI-based approach is predicated on the expertise of the scientists that train it. One of Agari’s greatest strengths has always been that our domain experts rank among the world’s leading authorities on phishing, BEC, and account takeover-based email attacks. In instances where there may not yet be enough labeled data to combat a new attack modality, our human experts can identify the underlying mechanisms behind the scam.

We can then start to add heuristics, or rules, to kickstart defenses with this baseline knowledge. As that starts generating labeled data from the field, we’re able to train machine learning algorithms to generalize away from our expert-derived classifications. In time, ML algorithms begin to recognize patterns that even the most eagle-eyed domain expert may not perceive — including the rise of new permutations of the original attack — before formulating rules that can defeat them.

#4: Seamless Integration with Cloud Platforms

As a growing number of organizations move to Microsoft Office 365, G Suite, and other popular cloud platforms, they face increasing risks from advanced email threats. Office 365 alone now accounts for 36% of all phishing attacks — rising 250%in the last year. According to Forbes, 29% of organizations report their O365 email accounts were compromised just within the month of March this year, amplifying their vulnerability to phishing and BEC attacks launched from trusted, internal accounts.

To counter these trends, the Agari Secure Email Cloud seamlessly integrates with O365 to block malicious emails that make it through platform-native security controls. With full visibility into employee webmail, Agari uses continuous detection and response technologies to automatically detect and remove any attacks that do make it to employee inboxes. In the next part of this series, we’ll look closer at these capabilities, and how they can help reduce the time it takes to discover and remediate breaches from an average three months to just minutes.

Prediction: Success

Make no mistake. Despite widespread hoopla surrounding this subject, predictive, AI-based email security is playing a very real, and a very important role in helping thousands of industry-leading companies detect, protect against, and respond to BEC scams, phishing attacks, and other metastasizing email threats. For organizations looking to do the same, the four fundamentals described in this post represent key considerations that can help them do it right.

To learn more about how Agari applies the power of machine learning-based AI to prevent phishing attacks, BEC scams and more, download an exclusive white paper.