Machine Learning at Abnormal
We are building the foundation of the next generation of email security, leveraging AI and cloud infrastructure to stop the most advanced socially-engineered attacks.
The Abnormal Engineering Blog
Using Machine Learning for Precise Email Security
Unique Adversarial Learning
Attackers are constantly changing, inventing new tactics to outsmart security technology and better trick victims. Unlike most machine learning problems, this problem is adversarial. It is a cat-and-mouse game between our ML models and the attackers.
We cannot just train a model on a dataset and expect the performance to continue to catch all attacks. Instead, we must build a platform that is constantly learning and adapting. It must learn from the data, but also be flexible so our team of engineers and security analysts can add new features, new models, and new approaches to stay ahead of ever-changing threats.
Extremely High Precision and Recall
We must maintain extremely low false positive rates, because this means deleting legitimate emails. At the same time, we must have extremely high recall, as we don’t want to let attacks through.
We don’t care about the average case. We must care about the decision on every single email. Out of 100 million emails, maybe 10,000 are phishing attacks, and less than 10 are advanced invoice fraud attacks. This low base rate poses big challenges for designing classifiers.
Scaled Infrastructure and Machine Learning
Processing hundreds of millions of emails per day requires sophisticated use of distributed data processing and scalable fleets of microservices. We have to build the best technology, and we need to serve it at scale at low latency, all while maintaining the flexibility to keep up with attackers.
Our products evaluate messages and sign-ins in real-time, operating at high throughput—more than 1M queries per second—and low latency at < 0.1 second. Our cloud infrastructure must reliably support our Fortune 500 customers, even as we are rapidly scaling.
Beyond Email Attack Detection
Our core business is focused on detecting attacks delivered via email, but we are actually protecting organizations in multiple ways. We develop products to:
- Detect advanced email attacks like phishing and business email compromise
- Detect compromised accounts by finding anomalies in sign-in behavior
- Detect sophisticated invoice fraud schemes by understanding invoice and vendor relationships
- Detect account takeovers in multiple cloud products beyond email accounts
And more! Join us to find out what we’re doing next.
Are You Ready to Become an Abnormal Engineer?
If you’d like to solve some of the hardest problems in email security, Abnormal is the place.
We encourage you to think through these exercises to understand the type of work we do.
- How do you tune a classifier to detect events at a base rate of 0.01% to 0.00001%? What happens when the distribution of your evaluation set does not match that of the online distribution due to the changing attack landscape? How would you keep up classification performance on new types of attacks?
- What effect does adversarial text obfuscation have on modern NLP techniques? For example, how are embeddings affected by purposeful misspelling or hidden spacing characters inserted into words? What effect would this have on transformer-style models? What about simpler text understanding like bag-of-words or phrase matching?
- We try to identify unusual communication patterns as features into models, for example—how often has this sender communicated with this recipient? What happens when the attackers purposefully hack this feature by sending legitimate messages over the course of weeks or months?
- An email has many modes of data in it, including text content, headers, links, images, attachments, landing page content, and more. How do you incorporate all these signals simultaneously into a model?