Combining ML Models to Detect Email Attacks

November 18, 2020
This article is a follow-up to one I wrote a year ago—Lessons from Building AI to Stop Cyberattacks—in which I discussed the overall problem of detecting social engineering attacks using ML techniques and our general solution at Abnormal. This post aims to walk through the process we use to model various aspects of a given email, and then to ultimately detect and block attacks.

As discussed in the previous post, sophisticated social engineering email attacks are on the rise and getting more advanced every day. They prey on the trust we put in our business tools and social networks, especially when a message appears to be from someone on our contact list, or even more insidiously, when the attack is actually from a contact whose account has been compromised. The FBI estimates that over the past few years over 75% of cyberattacks start with social engineering, usually through email.

Why is Email Attack Detection a Hard ML Problem?

There are a multitude of reasons on why detecting these attacks can be extremely hard. Here are a few of them.

Detecting Attacks is Like Finding a Needle in a Haystack: The first challenge is that the base rate is very low. Advanced attacks are rare in comparison to the overall volume of legitimate email. For example:

  • Only 1 in 100,000 emails is an advanced spear-phishing attack.
  • Less than 1 in 10,000,000 emails is advanced business email compromise like invoice fraud or lateral spear phishing where a compromised account phishs another employee.
  • When compared to spam, which accounts for 65 in every 100 emails, we have an extremely biased classification problem, which raises all sorts of difficulties.

There's an Enormous Amount of Data: At the same time, the data we have is large (many terabytes), messy, multi-modal, and difficult to collect and serve at low latency for a real-time system. For example, features that an ML system would want to evaluate include:

  • Text of the email
  • Metadata and headers
  • History of communication for parties involved, including geo locations, IPs, etc.
  • Account sign-ins, mail filters, and browsers used
  • Content of all attachments
  • Content of all links and the landing pages those links lead to
  • …and so much more.

Turning all this data into useful features for a detection system is a huge challenge from a data engineering, as well as machine learning, point of view.

Attackers are Adversarial: To make matters worse, attackers actively manipulate the data to make it hard on ML models, constantly improving their techniques and developing entirely new strategies.

The Precision Must Be Very High: oo build a product to prevent email attacks, we must avoid false positives and the disruption of legitimate business communications, but at the same time, catch every single attack. The false-positive rate needs to be as low as one in a million!

To effectively solve this problem, we must be diligent and extremely thoughtful about how we break down the overall detection problem into components that are solved carefully.

Example of the ML Modeling Problem

Let’s start with this hypothetical email attack and imagine how we could model various dimensions and how those models come together.

Subject: Reset your password
From: Microsoft Support <>
Content: “Please click _here_ to reset the password to your account.”

This is a simple and prototypical phishing attack.

As with any well-crafted social engineering attack, it appears nearly identical to a legitimate message. In this case, it looks to be a legitimate password reset message from Microsoft. Because of this, modeling any single dimension of this message will be fruitless for classification purposes. Instead, we need to break up the problem into component sub-problems.

Thinking Like the Attacker: Our first step is always to put ourselves in the mind of the attacker. To do so we break an attack down into what we call “attack facets."

Attack Facets:

  1. Attack Goal: What is the attacker trying to accomplish? Steal money? Steal credentials? Something else?
  2. Impersonation Strategy: How is the attacker building credibility with the recipient? Are they impersonating someone? Are they sending from a compromised account?
  3. Impersonated Party: Who is being impersonated? A trusted brand? A known vendor? The CEO of a company?
  4. Payload Vector: How is the actual attack delivered? A link? An Attachment?
Image for post

If we break down the Microsoft password reset example, we have:

  1. Attack Goal: Steal a user's credentials.
  2. Impersonation Strategy: Impersonate a brand through a lookalike display name.
  3. Impersonated Party: The official Microsoft brand.
  4. Payload Vector: A link to a fake login page.

Modeling the Problem: Building ML models to solve a problem with such a low base rate and precision requirements forces a high degree of diligence when modeling sub-problems and feature engineering. We cannot rely just on the magic of machine learning.

In the last section, we described a way to break an attack into components. We can use that same breakdown to help inspire the type of information we would like to model about an email in order to determine if it is an attack.

All these models rely on similar underlying techniques, specifically:

  • Behavior Modeling: Identifying abnormal behavior by modeling normal communication patterns and finding outliers from that
  • Content Modeling: Understanding the content of an email
  • Identity Resolution: Matching the identity of individuals and organizations referenced in an email (perhaps in an obfuscated way) to a database of these entities
Image for post

Understanding the Attack

Attack Goal and Payload: Identifying an attack goal requires modeling the content of a message. We must understand what is being said. Is the email asking the recipient to do anything? Is it an urgent tone? Are there other aspects we need to consider? This model may identify malicious content as well as safe content order to differentiate the two.

Impersonated Party: What does an impersonation look like? First of all, the email must appear to the recipient to look like someone they trust. We build identity models to match various parts of an email against known entities inside and outside an organization. For example, we may identify an employee impersonation by matching against the Active Directory. We may identify a brand impersonation by matching against the known patterns of brand-originating emails. We might identify a vendor impersonation by matching against our vendor database.

Impersonation Strategy: An impersonation happens when an email is not from the entity it is claiming to be from. To do so, we identify normal behavior patterns to spot these abnormal ones. This may include abnormal behavior between the recipient and the sender. It may be unusual sending patterns from the sender. In the simplest case, like the example above, we can simply note that Microsoft never sends from “”. In more difficult cases, like account takeover and vendor email compromise, we must look at more subtle clues like unusual geo-location and IP address of the sender or failing authentication.

Attack Payload: For the payload, we must understand the content of attachments and links. Modeling these requires a combination of NLP models, computer vision models to identify logos, URL models to identify suspicious links, and so forth. Modeling each of these dimensions gives our system an understanding of emails, particularly along dimensions that might be used by attackers to conduct social engineering attacks. The next step is to actually detect these attacks.

Combining Models to Detect Attacks

Ultimately, we need to combine these sub-models to produce a classification result (for example P(Attack)). Just like any ML problem, the features given to a classifier are crucial for good performance. The careful modeling described above gives us very high bandwidth features. We can combine these models in a few possible ways.

(1) One humongous classification model: Train a single classifier with all the inputs available to each sub-model. All the input features could be chosen based on the features that worked well within each sub-problem, but this final model combines everything and learns unique combinations and relationships.

(2) Extract features from sub-models and combine to predict target: There are 3 ways we can go about this.

(2.a) Ensemble of Models-as-Features: Each sub-model is a feature. Its output is dependent on the type of model. For example, a content model might predict a vector of binary topic features

(2.b) Ensemble of Classifiers: Build sub-classifiers that each predict some target and combine them using some kind of ensemble model or set of rules. For example, a content classifier would predict the probability of attack given the content alone.

(2.c) Embeddings: Each sub-model is trained to predict P(attack) like above or some other supervised or unsupervised target, but rather than combining their predictions, we extract embeddings, for example, by taking the penultimate layer of a neural net.

Image for post

Each of the above approaches has advantages and disadvantages. Training one humongous model has the advantage of learning all complex cross dependencies, but it is harder to understand and harder to debug, and more prone to overfitting. It also requires all the data available in one shot, unlike building sub-models that could potentially operate on disparate datasets.

The various methods of extracting features from sub-models also have tradeoffs. Training sub-classifiers is useful because they are very interpretable (for example, we could have a signal that represents the suspiciousness of text content alone), but in some cases, it is difficult to predict the attack target directly from a sub-domain of data. For example, purely a rare communication pattern is not sufficient to slice the space meaningfully to predict an attack. Similarly as discussed above, a pure content model cannot predict an attack without context regarding the communication pattern. The embeddings approach is good, but also finicky, it is important to vet your embeddings and not just trust they will work. Also, the embedding approach is more prone to overfitting or accidental label leakage.

Most importantly, with all these approaches, it is crucial to think deeply about all the data going into models and also the actual distribution of outputs. Blindly trusting in the black box of ML is rarely a good idea. Careful modeling and feature engineering are necessary, especially when it comes to the inputs to each of the sub-models.

Our Solution at Abnormal

As a fast-growing startup, we originally had a very small ML team, which has been growing quickly over the past year. With the growth of the team, we also have adapted our approach to modeling, feature engineering, and training our classifiers. At first, it was easiest to just focus on one large model that combined features carefully engineered to solve subproblems. However, as we’ve added more team members, it has become important to split the problem into various components that can be developed simultaneously.

Our current solution is a combination of all the above approaches, depending on the particular sub-model. We still use a large monolithic model as one signal, but our best models use a combination of inputs, including embeddings representing an aspect of an email and prediction values from sub-classifiers, for example, a suspicious URL score. Combining models and managing feature dependencies and versioning is also difficult.

Takeaways for Solving Other ML Problems

Here's my best advice for how understanding ML at Abnormal can help with other ML problems.

  1. Deeply understand your domain.
  2. Carefully engineer features and sub-models. Don’t trust black box ML.
  3. Solving many sub-problems and combining them for a classifier works well, but don’t be dogmatic. Sure, embeddings may be the purest solution, but if it’s simpler to just create a sub-classifier or good set of features, start with that.
  4. Breaking up a problem also allows scaling a team. If multiple ML engineers are working on a single problem, they must necessarily focus on separate components.
  5. Modeling a problem as a combination of subproblems also helps with explainability. It’s easier to debug a text model than a giant multi-modal neural net.

There's still more for us to do. We need to figure out a more general pattern for developing good embeddings and better ways of modeling sub-parts of the problem, better data platforms, and feature engineering tools, and so much more. Attacks are constantly evolving and our client base is ever-growing, leading to tons of new challenges every day. If these problems interest you, we’re hiring!

Blog logo wavy lines
When we founded Abnormal Security more than two and a half years ago, we met with 50 top CIOs and CISOs who told us two things: they needed a solution to stop a novel set of cyberattacks that increasingly bypassed legacy email security solutions, and they needed it...
Read More
Blog gray poles
The IRS has long been a popular target for impersonation by attackers. This email highlights a more sophisticated IRS impersonation, where a targeted attack is sent from a spoofed sender domain to collect fraudulent payment from the victim.
Read More

Related Posts

B 10 15 21
With Detection 360, submission to threat containment just got 94% faster, making it incredibly easy for customers to submit false positives or missed attacks, and get real-time updates from Abnormal on investigation, conclusion, and remediation.
Read More
Extortion blog cover
Unfortunately, physically threatening extortion attempts sent via email continue to impact companies and public institutions when received—disrupting business, intimidating employees, and occasioning costly responses from public safety.
Read More
Blog engineering cybersecurity careers
Cybersecurity Careers Awareness Week is a great opportunity to explore key careers in information security, particularly as there are an estimated 3.1 million unfilled cybersecurity jobs. This disparity means that cybercriminals are taking advantage of the situation, sending more targeted attacks and seeing greater success each year.
Read More
Blog hiring cybersecurity leaders
As with every equation, there are always two sides and while it can be easy to blame users when they fall victim to scams and attacks, we also need to examine how we build and staff security teams.
Read More
Cover automated ato
With an increase in threat actor attention toward compromising accounts, Abnormal is focused on protecting our customers from this potentially high-profile threat. We are pleased to announce that our new Automated Account Takeover (ATO) Remediation functionality is available.
Read More
Email spoofing cover
Email spoofing is a common form of phishing attack designed to make the recipient believe that the message originates from a trusted source. A spoofed email is more than just a nuisance—it’s a malicious communication that poses a significant security threat.
Read More
Cover cybersecurity month kickoff
It’s time to turn the page on the calendar, and we are finally in October—the one month of the year when the spooky becomes reality. October is a unique juncture in the year as most companies are making the mad dash to year-end...
Read More
Ices announcement cover
Abnormal ICES offers all-in-one email security, delivering a precise approach to combat the full spectrum of email-borne threats. Powered by behavioral AI technology and deeply integrated with Microsoft 365...
Read More
Account takeover cover
Account takeovers are one of the biggest threats facing organizations of all sizes. They happen when cybercriminals gain legitimate login credentials and then use those credentials to send more attacks, acting like the person...
Read More
Blog podcast green cover
Many companies aspire to be customer-centric, but few find a way to operationalize customer-centricity into their team’s culture. As a 3x SaaS startup founder, most recently at Orum, and a veteran of Facebook and Palantir, Ayush Sood...
Read More
Blog attack atlassian cover
Credential phishing links are most commonly sent by email, and they typically lead to a website that is designed to look like common applications—most notably Microsoft Office 365, Google, Amazon, or other well-known...
Read More
Blog podcast purple cover
Working at hyper-growth startups usually means that unreasonable expectations will be thrust on individuals and teams. Demanding timelines, goals, and expectations can lead to high pressure, stress, accountability, and ultimately, extraordinary growth and achievements.
Read More