Corner purple 2 FINAL

Model Understanding with Feature Importance

March 16, 2022

Here at Abnormal, our machine learning models help us spot trends and abnormalities in customer data in order to catch and prevent cyberattacks. Maintaining and improving these models requires a deep understanding of how they operate and make decisions. One way to build this understanding is to analyze how each model uses the features we feed it.

An Overview of Feature Permutation

A simple algorithm we can use to accomplish this is called feature permutation. The algorithm proceeds as follows:

  • Given a model, a dataset, and a feature F, compute the baseline model performance over the dataset.

  • For each sample in the dataset, randomly choose another sample from the dataset and swap the values of F between these samples.

  • The importance of feature F is the difference between the model performance over the permuted dataset and the baseline model performance.

Intuitively, the feature permutation algorithm allows us to understand the degree to which a particular feature's value contributes to the model's final decision. Let’s show an example to illustrate this.

Feature Permutation Case Study: Ensemble Models

Our email attack detection systems at Abnormal Security have multiple layers. Upstream machine learning models generate predictions based on a variety of message attributes, and an ensemble model makes the final decision. This ensemble is trained on the predictions of the individual upstream models, and we retrain it whenever we add or change one of these models. These changes can dramatically change the strategy that the ensemble uses to make decisions.

For example, suppose we have an ensemble that is trained on three upstream models A,B,C, where models B and C are substantially more important.

Graph of ensemble with three upstream models where B and C are most important

Now suppose that we add new features to model A and retrain the ensemble. There are multiple potential impacts of this change. One possibility is that all three models become equally important:

Graph of ensemble with three models of equal importance

Another possibility is that model A increases in importance at the expense of model B:

Graph of ensemble with three upstream models where A and C are most important

This is a likely outcome if the features we added to model A were also used in model B.

An Overview of Feature Group Permutation

One pitfall of feature permutation is that it doesn't play nicely with correlated features. If we have N features from different versions of the same model or M counts of very similar quantities, the sum of the importances of the individual counts can be an underestimate of the importance of the group of features. Luckily, we can easily get around this by using a similar algorithm, called feature group permutation. This algorithm computes the importance of a group of features rather than a single feature and proceeds as follows:

  • Given a model, a dataset, and a group of features G, compute the baseline model performance over the dataset.

  • For each sample in the dataset, randomly choose another sample from the dataset and swap the values of each feature F in G between these samples.

  • The importance of group G is the difference between the model performance over the permuted dataset and the baseline model performance.

By permuting features as a group, we can reduce the risk of correlated signals.

Feature Group Permutation Case Study: Suspicious Attachments

One of the models we use at Abnormal Security is our attack multi model, which consumes a wide range of feature types and predicts the likelihood that a particular email is an attack. One question we had recently about this model was how it determined that a particular message contained a malicious attachment. The model has access to a ton of different sources of information, so here are a few hypotheses about how it could make this decision:

  • Primarily focus on the sender and recipient information.

  • Primarily focus on the subject and header text in the email.

  • Primarily focus on the body text in the message.

  • Primarily focus on the signals in the attachment itself.

We can use feature group permutation to solve this. First, we arrange our features based on the type of data it is constructed from. Then, we pull all messages with attachments and compute the feature group importance over this dataset.

Attack Multi Model understands suspicious attachments: attachment features are least important

We see that the attachment features are only somewhat important, since the model is able to catch attacks from the other signals on the message.

However, this is not the complete story. There are certain types of messages for which the attachment is more central to the message due to the language used. On these messages, we would expect that the model would need to pay closer attention to the attachment in order to detect attacks. If we limit our dataset to the subset of messages that contain these attachment types and recompute the feature group importance, we see the importance of the attachment features increase:

Attack Multi Model understands suspicious attachments: attachment features are most important

Using Feature Group Permutation Effectively

Despite its simplicity, feature group permutation is an extremely powerful tool. Here are a few tips for using it effectively:

Choosing Feature Groups: If the information represented by the features in some group G is also represented by features outside of G, then it is possible that the feature group permutation algorithm will underestimate the importance of G. For this reason it is usually helpful to start with very large groups and then only break down groups with high importance.

Choosing the Dataset: Not all features are useful on all samples. Certain signals may be extremely predictive, but only present very rarely. For this reason it is usually helpful to choose the dataset based on the questions we have about the model behavior.

Choosing the Metric: The feature group permutation algorithm computes the importance of a group in terms of some metric of model performance. Certain metrics will capture different kinds of importance. For example, suppose we are studying a binary classification model. A feature that determines the overall calibration of the model will be attributed high importance by the cross-entropy loss metric but low importance by the ROC-AUC metric.

Feature Importances at Abnormal

At Abnormal Security we use feature importance analysis to understand our detection models. This helps us validate that new signals are useful or anticipate changes in model behavior. Understanding the feature importance distribution also enables us to anticipate which kinds of attacks might slip through our models so that we can prioritize feature development to improve our system.

Want to join our team to work on these problems? Abnormal is hiring! Check out our open roles to learn more.

Image

Prevent the Attacks That Matter Most

Get the Latest Email Security Insights

Subscribe to our newsletter to receive updates on the latest attacks and new trends in the email threat landscape.

0
Demo 2x 1

See the Abnormal Solution to the Email Security Problem

Protect your organization from the attacks that matter most with Abnormal Integrated Cloud Email Security.

Related Posts

B 10 3 22 Cobalt Terrapin Blog
Threat group Cobalt Terrapin uses sophisticated impersonation techniques with multiple steps to commit invoice fraud.
Read More
B 09 29 22 CISO Cybersecurity Awareness Month
October is here, which means Cybersecurity Awareness Month is officially in full swing! These five tips can help security leaders take full advantage of the month.
Read More
B Email Security Challenges Blog 09 26 22
Understanding common email security challenges caused by your legacy technology will help you determine the best solution to improve your security posture.
Read More
B 5 Crucial Tips
Retailers are a popular target for threat actors due to their wealth of customer data and availability of funds. Here are 5 cybersecurity tips to help retailers reduce their risk of attack.
Read More
B 3 Essential Elements
Legacy approaches to managing unwanted mail are neither practical nor scalable. Learn the 3 essential elements of modern, effective graymail management.
Read More
B Back to School
Discover how threat group Chiffon Herring leverages impersonation and spoofed email addresses to divert paychecks to mule accounts.
Read More
B 09 06 22 Rearchitecting a System Blog
We recently shared a look at how the Abnormal engineering team overhauled our Unwanted Mail service architecture to accommodate our rapid growth. Today, we’re diving into how the team migrated traffic to the new architecture—with zero downtime.
Read More
B Industry Leading CIS Os
Stay up to date on the latest cybersecurity trends, industry news, and best practices by following these 12 innovative and influential thought leaders on social media.
Read More
B Podcast Engineering 11 08 24 22
In episode 11 of Abnormal Engineering Stories, David Hagar, Director of Engineering and Abnormal Head of UK Engineering, continues his conversation with Zehan Wang, co-founder of Magic Pony.
Read More