Corner purple 2 FINAL

Model Understanding with Feature Importance

March 16, 2022

Here at Abnormal, our machine learning models help us spot trends and abnormalities in customer data in order to catch and prevent cyberattacks. Maintaining and improving these models requires a deep understanding of how they operate and make decisions. One way to build this understanding is to analyze how each model uses the features we feed it.

An Overview of Feature Permutation

A simple algorithm we can use to accomplish this is called feature permutation. The algorithm proceeds as follows:

  • Given a model, a dataset, and a feature F, compute the baseline model performance over the dataset.

  • For each sample in the dataset, randomly choose another sample from the dataset and swap the values of F between these samples.

  • The importance of feature F is the difference between the model performance over the permuted dataset and the baseline model performance.

Intuitively, the feature permutation algorithm allows us to understand the degree to which a particular feature's value contributes to the model's final decision. Let’s show an example to illustrate this.

Feature Permutation Case Study: Ensemble Models

Our email attack detection systems at Abnormal Security have multiple layers. Upstream machine learning models generate predictions based on a variety of message attributes, and an ensemble model makes the final decision. This ensemble is trained on the predictions of the individual upstream models, and we retrain it whenever we add or change one of these models. These changes can dramatically change the strategy that the ensemble uses to make decisions.

For example, suppose we have an ensemble that is trained on three upstream models A,B,C, where models B and C are substantially more important.

Graph of ensemble with three upstream models where B and C are most important

Now suppose that we add new features to model A and retrain the ensemble. There are multiple potential impacts of this change. One possibility is that all three models become equally important:

Graph of ensemble with three models of equal importance

Another possibility is that model A increases in importance at the expense of model B:

Graph of ensemble with three upstream models where A and C are most important

This is a likely outcome if the features we added to model A were also used in model B.

An Overview of Feature Group Permutation

One pitfall of feature permutation is that it doesn't play nicely with correlated features. If we have N features from different versions of the same model or M counts of very similar quantities, the sum of the importances of the individual counts can be an underestimate of the importance of the group of features. Luckily, we can easily get around this by using a similar algorithm, called feature group permutation. This algorithm computes the importance of a group of features rather than a single feature and proceeds as follows:

  • Given a model, a dataset, and a group of features G, compute the baseline model performance over the dataset.

  • For each sample in the dataset, randomly choose another sample from the dataset and swap the values of each feature F in G between these samples.

  • The importance of group G is the difference between the model performance over the permuted dataset and the baseline model performance.

By permuting features as a group, we can reduce the risk of correlated signals.

Feature Group Permutation Case Study: Suspicious Attachments

One of the models we use at Abnormal Security is our attack multi model, which consumes a wide range of feature types and predicts the likelihood that a particular email is an attack. One question we had recently about this model was how it determined that a particular message contained a malicious attachment. The model has access to a ton of different sources of information, so here are a few hypotheses about how it could make this decision:

  • Primarily focus on the sender and recipient information.

  • Primarily focus on the subject and header text in the email.

  • Primarily focus on the body text in the message.

  • Primarily focus on the signals in the attachment itself.

We can use feature group permutation to solve this. First, we arrange our features based on the type of data it is constructed from. Then, we pull all messages with attachments and compute the feature group importance over this dataset.

Attack Multi Model understands suspicious attachments: attachment features are least important

We see that the attachment features are only somewhat important, since the model is able to catch attacks from the other signals on the message.

However, this is not the complete story. There are certain types of messages for which the attachment is more central to the message due to the language used. On these messages, we would expect that the model would need to pay closer attention to the attachment in order to detect attacks. If we limit our dataset to the subset of messages that contain these attachment types and recompute the feature group importance, we see the importance of the attachment features increase:

Attack Multi Model understands suspicious attachments: attachment features are most important

Using Feature Group Permutation Effectively

Despite its simplicity, feature group permutation is an extremely powerful tool. Here are a few tips for using it effectively:

Choosing Feature Groups: If the information represented by the features in some group G is also represented by features outside of G, then it is possible that the feature group permutation algorithm will underestimate the importance of G. For this reason it is usually helpful to start with very large groups and then only break down groups with high importance.

Choosing the Dataset: Not all features are useful on all samples. Certain signals may be extremely predictive, but only present very rarely. For this reason it is usually helpful to choose the dataset based on the questions we have about the model behavior.

Choosing the Metric: The feature group permutation algorithm computes the importance of a group in terms of some metric of model performance. Certain metrics will capture different kinds of importance. For example, suppose we are studying a binary classification model. A feature that determines the overall calibration of the model will be attributed high importance by the cross-entropy loss metric but low importance by the ROC-AUC metric.

Feature Importances at Abnormal

At Abnormal Security we use feature importance analysis to understand our detection models. This helps us validate that new signals are useful or anticipate changes in model behavior. Understanding the feature importance distribution also enables us to anticipate which kinds of attacks might slip through our models so that we can prioritize feature development to improve our system.

Want to join our team to work on these problems? Abnormal is hiring! Check out our open roles to learn more.


Prevent the Attacks That Matter Most

Get the Latest Email Security Insights

Subscribe to our newsletter to receive updates on the latest attacks and new trends in the email threat landscape.

Demo 2x 1

See the Abnormal Solution to the Email Security Problem

Protect your organization from the attacks that matter most with Abnormal Integrated Cloud Email Security.

Related Posts

B 05 11 22 Scaling Out Redis
As we’ve scaled our customer base, the size of our datasets has also grown. With our rapid expansion, we were on track to hit the data storage limit of our Redis server in two months, so we needed to figure out a way to scale beyond this—and fast!
Read More
B 05 17 22 Impersonation Attack
See how threat actors used a single mailbox compromise and spoofed domains to subtly impersonate individuals and businesses to coerce victims to pay fraudulent vendor invoices.
Read More
B 05 14 22 Best Workplace
We are over the moon to announce Abnormal has been named one of Inc. Magazine's Best Workplaces of 2022! Learn more about our commitment to our workforce.
Read More
B 05 13 22 Spring Product Release
This quarter, the team at Abnormal launched new features to improve lateral attack detection, role-based access control (RBAC), and explainable AI. Take a deep dive into all of the latest product enhancements.
Read More
B 05 11 22 Champion Finalist
Abnormal has been selected as a Security Customer Champion finalist in the Microsoft Security Excellence Awards! Here’s a look at why.
Read More
Blog series c cover
When we raised our Series B funding 18 months ago, I promised our customers greater value, more capabilities, and better customer support. We’ve delivered on each of those promises and as we receive an even larger investment, I’m excited about how we can continue to further deliver on each of them.
Read More
B 05 09 22 Partner Community
It’s an honor to be named one of CRN’s 2022 Women of the Channel. Here’s why I appreciate the award and what I love about being a Channel Account Manager at Abnormal.
Read More
B 05 05 22 Fast Facts
Watch this short video to learn current trends and key issues in cloud email security, including how to protect your organization against modern threats.
Read More
B 05 03 22
Like all threats in the cyber threat landscape, ransomware will continue to evolve over time. This post builds on our prior research and looks at the changes we observed in the ransomware threat landscape in the first quarter of 2022.
Read More
B 04 28 22 8 Key Differences
At Abnormal, we pride ourselves on our excellent machine learning engineering team. Here are some patterns we use to distinguish between effective and ineffective ML engineers.
Read More
B 04 26 22 Webinar Re Replacing Your SEG
Learn how Microsoft 365 and Abnormal work together to provide comprehensive defense-in-depth protection in part two of our webinar recap.
Read More
Blog mitigate threats cover
Learn about the most common socially-engineered attacks and why these tactics are still so successful—despite a growing awareness from employees.
Read More