Model Understanding with Feature Importance

Here at Abnormal, our machine learning models help us spot trends and abnormalities in customer data in order to catch and prevent cyberattacks.

Dr. Dan Shiebler

March 16, 2022

Here at Abnormal, our machine learning models help us spot trends and abnormalities in customer data in order to catch and prevent cyberattacks. Maintaining and improving these models requires a deep understanding of how they operate and make decisions. One way to build this understanding is to analyze how each model uses the features we feed it.

An Overview of Feature Permutation

A simple algorithm we can use to accomplish this is called feature permutation. The algorithm proceeds as follows:

Given a model, a dataset, and a feature F, compute the baseline model performance over the dataset.
For each sample in the dataset, randomly choose another sample from the dataset and swap the values of F between these samples.
The importance of feature F is the difference between the model performance over the permuted dataset and the baseline model performance.

Intuitively, the feature permutation algorithm allows us to understand the degree to which a particular feature's value contributes to the model's final decision. Let’s show an example to illustrate this.

Feature Permutation Case Study: Ensemble Models

Our email attack detection systems at Abnormal Security have multiple layers. Upstream machine learning models generate predictions based on a variety of message attributes, and an ensemble model makes the final decision. This ensemble is trained on the predictions of the individual upstream models, and we retrain it whenever we add or change one of these models. These changes can dramatically change the strategy that the ensemble uses to make decisions.

For example, suppose we have an ensemble that is trained on three upstream models A,B,C, where models B and C are substantially more important.

Graph of ensemble with three upstream models where B and C are most important

Now suppose that we add new features to model A and retrain the ensemble. There are multiple potential impacts of this change. One possibility is that all three models become equally important:

Graph of ensemble with three models of equal importance

Another possibility is that model A increases in importance at the expense of model B:

Graph of ensemble with three upstream models where A and C are most important

This is a likely outcome if the features we added to model A were also used in model B.

An Overview of Feature Group Permutation

One pitfall of feature permutation is that it doesn't play nicely with correlated features. If we have N features from different versions of the same model or M counts of very similar quantities, the sum of the importances of the individual counts can be an underestimate of the importance of the group of features. Luckily, we can easily get around this by using a similar algorithm, called feature group permutation. This algorithm computes the importance of a group of features rather than a single feature and proceeds as follows:

Given a model, a dataset, and a group of features G, compute the baseline model performance over the dataset.
For each sample in the dataset, randomly choose another sample from the dataset and swap the values of each feature F in G between these samples.
The importance of group G is the difference between the model performance over the permuted dataset and the baseline model performance.

By permuting features as a group, we can reduce the risk of correlated signals.

Feature Group Permutation Case Study: Suspicious Attachments

One of the models we use at Abnormal Security is our attack multi model, which consumes a wide range of feature types and predicts the likelihood that a particular email is an attack. One question we had recently about this model was how it determined that a particular message contained a malicious attachment. The model has access to a ton of different sources of information, so here are a few hypotheses about how it could make this decision:

Primarily focus on the sender and recipient information.
Primarily focus on the subject and header text in the email.
Primarily focus on the body text in the message.
Primarily focus on the signals in the attachment itself.

We can use feature group permutation to solve this. First, we arrange our features based on the type of data it is constructed from. Then, we pull all messages with attachments and compute the feature group importance over this dataset.

Attack Multi Model understands suspicious attachments: attachment features are least important

We see that the attachment features are only somewhat important, since the model is able to catch attacks from the other signals on the message.

However, this is not the complete story. There are certain types of messages for which the attachment is more central to the message due to the language used. On these messages, we would expect that the model would need to pay closer attention to the attachment in order to detect attacks. If we limit our dataset to the subset of messages that contain these attachment types and recompute the feature group importance, we see the importance of the attachment features increase:

Attack Multi Model understands suspicious attachments: attachment features are most important

Using Feature Group Permutation Effectively

Despite its simplicity, feature group permutation is an extremely powerful tool. Here are a few tips for using it effectively:

Choosing Feature Groups: If the information represented by the features in some group G is also represented by features outside of G, then it is possible that the feature group permutation algorithm will underestimate the importance of G. For this reason it is usually helpful to start with very large groups and then only break down groups with high importance.

Choosing the Dataset: Not all features are useful on all samples. Certain signals may be extremely predictive, but only present very rarely. For this reason it is usually helpful to choose the dataset based on the questions we have about the model behavior.

Choosing the Metric: The feature group permutation algorithm computes the importance of a group in terms of some metric of model performance. Certain metrics will capture different kinds of importance. For example, suppose we are studying a binary classification model. A feature that determines the overall calibration of the model will be attributed high importance by the cross-entropy loss metric but low importance by the ROC-AUC metric.

Feature Importances at Abnormal

At Abnormal Security we use feature importance analysis to understand our detection models. This helps us validate that new signals are useful or anticipate changes in model behavior. Understanding the feature importance distribution also enables us to anticipate which kinds of attacks might slip through our models so that we can prioritize feature development to improve our system.

Want to join our team to work on these problems? Abnormal is hiring! Check out our open roles to learn more.

Discover How It All Works

See How Abnormal AI Protects Humans

Forging a Stronger Defense: Why a Global Industrial Manufacturer Added Abnormal to Block What Proofpoint Couldn’t

A global industrial manufacturer blocked 3,232 missed attacks and saved 336 SOC hours per month by adding Abnormal to address gaps left by Proofpoint.

Artificial Intelligence Company & Culture

Abnormal Security Advocates for AI-Native Cybersecurity in Response to OSTP RFI on AI Strategy

Abnormal urges adoption of AI-native cybersecurity in response to OSTP’s RFI, highlighting the need for public-private collaboration to counter AI-powered threats.

B MKT793r Open Graphs Convergence Announcement Blog

Artificial Intelligence

The Convergence of AI + Cybersecurity: Announcing Season 4

Join this virtual event series to get the insights you need to make security decisions in the age of AI.

Threat Intel

Inside Atlantis AIO: Credential Stuffing Across 140+ Platforms

Discover how cybercriminals use Atlantis AIO to automate credential stuffing attacks—and how AI-driven security can stop them before accounts are compromised.

Threat Intel

Exploring Black Basta’s Use of Generative AI to Supercharge Cybercrime

Black Basta is a highly active ransomware-as-a-service (RaaS) group that has been linked to dozens of high-profile attacks against organizations worldwide. See how they utilize generative AI to support their campaigns.

B AI Generated Zoom Impersonation Phishing Attack

Threat Intel

AI-Generated Zoom Impersonation Attack Exploits Tax Season to Deploy Remote Desktop Tool

Threat actors impersonated Zoom using an AI-generated phishing page to deliver a remote monitoring and management tool.

Model Understanding with Feature Importance

An Overview of Feature Permutation

Feature Permutation Case Study: Ensemble Models

An Overview of Feature Group Permutation

Feature Group Permutation Case Study: Suspicious Attachments

Using Feature Group Permutation Effectively

Feature Importances at Abnormal

See Abnormal in Action

Get the Latest Email Security Insights

See How Abnormal AI Protects Humans

Related Posts