Product FAQ: AI / ML Data Set - Abnormal Security Product FAQ AI ML Data Set
Product FAQs  AI / ML Data Set

Product FAQ: AI / ML Data Set

Abnormal’s AI/ML Data Training Models

At Abnormal, we tackle the problem of staying ahead of attackers by updating our AI/ML models the most up to date information possible. We have both automated systems and security researchers keeping up with the latest attacks. The data gathered is then consumed by a rapidly retraining NLP pipeline.

The data we have is large (many terabytes), and multimodal. Evaluated data includes:

  • Text of the email
  • Metadata and headers
  • History of communication for parties involved, geo locations, ips, etc
  • Account sign-ins, mail filters, browsers used
  • Content of all attachments
  • Content of all links and the landing pages those links lead to
  • …and more

We turn all this data into useful features for a detection system and break down attacks into what we call “attack facets”.

Attack Facets:

  • Attack Goal – What is the attacker trying to accomplish? Steal money? Steal credentials? Etc.
  • Impersonation Strategy – How is the attacker building credibility with the recipient? Are they impersonating someone? Are they sending from a compromised account?
  • Impersonated Party – Who is being impersonated? A trusted brand? A known vendor? The CEO of a company?
  • Payload Vector – How is the actual attack delivered? A link? An Attachment?

For example, if we break down the Microsoft password reset example, we have:

  • Attack goal: Steal a users credentials
  • Impersonation strategy: Impersonate a brand through a lookalike display name (Microsoft)
  • Impersonated party: The official Microsoft brand
  • Payload vector: A link to a fake login page.

Building ML models to solve a problem with such a low base rate (1 in 10,000,000,000 constitutes an advanced email threat) and precision forces a high degree of diligence when modeling sub-problems and feature engineering. 

In the same way we break an attack into components, we can use the same breakdown to help inspire the type of information we would like to model about an email in order to determine if it is an attack.

  • Behavior modeling: identifying abnormal behavior by modeling normal communication patterns and finding outliers from that
  • Content modeling: understanding the content of an email
  • Identify resolution: matching the identity of individuals and organizations referenced in an email (perhaps in an obfuscated way) to a database of these entities

To continue learning about Abnormal’s AI/ML Data Training capabilities, read our Engineering Blog.

Want to learn more?

Schedule a personalized product demo to see:

  • Threat analytics, insights and reporting
  • Automated Triage, Investigation and response tools
  • Platform integrations into SIEM, SOAR
  • …and more
Automated Triage, Investigation and response tools

Want to learn more?

Schedule a personalized product demo to see:

  • Threat analytics, insights and reporting
  • Automated Triage, Investigation and response tools
  • Platform integrations into SIEM, SOAR
  • …and more