chat
expand_more

Model Understanding with Feature Importance

Here at Abnormal, our machine learning models help us spot trends and abnormalities in customer data in order to catch and prevent cyberattacks.
March 16, 2022

Here at Abnormal, our machine learning models help us spot trends and abnormalities in customer data in order to catch and prevent cyberattacks. Maintaining and improving these models requires a deep understanding of how they operate and make decisions. One way to build this understanding is to analyze how each model uses the features we feed it.

An Overview of Feature Permutation

A simple algorithm we can use to accomplish this is called feature permutation. The algorithm proceeds as follows:

  • Given a model, a dataset, and a feature F, compute the baseline model performance over the dataset.

  • For each sample in the dataset, randomly choose another sample from the dataset and swap the values of F between these samples.

  • The importance of feature F is the difference between the model performance over the permuted dataset and the baseline model performance.

Intuitively, the feature permutation algorithm allows us to understand the degree to which a particular feature's value contributes to the model's final decision. Let’s show an example to illustrate this.

Feature Permutation Case Study: Ensemble Models

Our email attack detection systems at Abnormal Security have multiple layers. Upstream machine learning models generate predictions based on a variety of message attributes, and an ensemble model makes the final decision. This ensemble is trained on the predictions of the individual upstream models, and we retrain it whenever we add or change one of these models. These changes can dramatically change the strategy that the ensemble uses to make decisions.

For example, suppose we have an ensemble that is trained on three upstream models A,B,C, where models B and C are substantially more important.

Graph of ensemble with three upstream models where B and C are most important

Now suppose that we add new features to model A and retrain the ensemble. There are multiple potential impacts of this change. One possibility is that all three models become equally important:

Graph of ensemble with three models of equal importance

Another possibility is that model A increases in importance at the expense of model B:

Graph of ensemble with three upstream models where A and C are most important

This is a likely outcome if the features we added to model A were also used in model B.

An Overview of Feature Group Permutation

One pitfall of feature permutation is that it doesn't play nicely with correlated features. If we have N features from different versions of the same model or M counts of very similar quantities, the sum of the importances of the individual counts can be an underestimate of the importance of the group of features. Luckily, we can easily get around this by using a similar algorithm, called feature group permutation. This algorithm computes the importance of a group of features rather than a single feature and proceeds as follows:

  • Given a model, a dataset, and a group of features G, compute the baseline model performance over the dataset.

  • For each sample in the dataset, randomly choose another sample from the dataset and swap the values of each feature F in G between these samples.

  • The importance of group G is the difference between the model performance over the permuted dataset and the baseline model performance.

By permuting features as a group, we can reduce the risk of correlated signals.

Feature Group Permutation Case Study: Suspicious Attachments

One of the models we use at Abnormal Security is our attack multi model, which consumes a wide range of feature types and predicts the likelihood that a particular email is an attack. One question we had recently about this model was how it determined that a particular message contained a malicious attachment. The model has access to a ton of different sources of information, so here are a few hypotheses about how it could make this decision:

  • Primarily focus on the sender and recipient information.

  • Primarily focus on the subject and header text in the email.

  • Primarily focus on the body text in the message.

  • Primarily focus on the signals in the attachment itself.

We can use feature group permutation to solve this. First, we arrange our features based on the type of data it is constructed from. Then, we pull all messages with attachments and compute the feature group importance over this dataset.

Attack Multi Model understands suspicious attachments: attachment features are least important

We see that the attachment features are only somewhat important, since the model is able to catch attacks from the other signals on the message.

However, this is not the complete story. There are certain types of messages for which the attachment is more central to the message due to the language used. On these messages, we would expect that the model would need to pay closer attention to the attachment in order to detect attacks. If we limit our dataset to the subset of messages that contain these attachment types and recompute the feature group importance, we see the importance of the attachment features increase:

Attack Multi Model understands suspicious attachments: attachment features are most important

Using Feature Group Permutation Effectively

Despite its simplicity, feature group permutation is an extremely powerful tool. Here are a few tips for using it effectively:

Choosing Feature Groups: If the information represented by the features in some group G is also represented by features outside of G, then it is possible that the feature group permutation algorithm will underestimate the importance of G. For this reason it is usually helpful to start with very large groups and then only break down groups with high importance.

Choosing the Dataset: Not all features are useful on all samples. Certain signals may be extremely predictive, but only present very rarely. For this reason it is usually helpful to choose the dataset based on the questions we have about the model behavior.

Choosing the Metric: The feature group permutation algorithm computes the importance of a group in terms of some metric of model performance. Certain metrics will capture different kinds of importance. For example, suppose we are studying a binary classification model. A feature that determines the overall calibration of the model will be attributed high importance by the cross-entropy loss metric but low importance by the ROC-AUC metric.

Feature Importances at Abnormal

At Abnormal Security we use feature importance analysis to understand our detection models. This helps us validate that new signals are useful or anticipate changes in model behavior. Understanding the feature importance distribution also enables us to anticipate which kinds of attacks might slip through our models so that we can prioritize feature development to improve our system.

Want to join our team to work on these problems? Abnormal is hiring! Check out our open roles to learn more.

Model Understanding with Feature Importance

See Abnormal in Action

Get a Demo

Get the Latest Email Security Insights

Subscribe to our newsletter to receive updates on the latest attacks and new trends in the email threat landscape.

 

See the Abnormal Solution to the Email Security Problem

Protect your organization from the full spectrum of email attacks with Abnormal.

 
Integrates Insights Reporting 09 08 22

Related Posts

B 4 9 24 Send Grid
SendGrid and Mailtrap credentials are being sold on cybercrime forums for as little as $15, and they are used to send phishing emails and bypass spam filters. Learn how infostealers and checkers enable this underground market.
Read More
B Convergence S2 Recap Blog
Season 2 of our web series has come to a close. Explore a few of the biggest takeaways and learn how to watch all three chapters on demand.
Read More
B 1500x1500 Adobe Acrobat Sign Attack Blog
Attackers attempt to steal sensitive information using a fraudulent electronic signature request for a nonexistent NDA and branded phishing pages.
Read More
B 4 15 24 RBAC
Discover how a security-driven RBAC design pattern allows Abnormal customers to maximize their user setup with minimum hurdles.
Read More
B 4 10 24 Zoom
Learn about the techniques cybercriminals use to steal Zoom accounts, including phishing, information stealers, and credential stuffing.
Read More
Social Images for next Cyber Savvy Blog
Explore how Alex Green, the CISO of Delta Dental, safeguards over 80 million customers against modern cyber threats, and gain valuable insights into the cybersecurity landscape.
Read More