ML 00 HERO

Machine Learning at Abnormal

We are building the foundation of the next generation of email security, leveraging AI and cloud infrastructure to stop the most advanced socially-engineered attacks.

The Stakes Are High in Email Security

Over 90% of cyber attacks are initiated through email. Attackers are usually financially motivated but may also be involved in nation-state level espionage and sabotage operations.

Think about all the attacks in the news recently: An oil pipeline shut down and forced to pay a $5m ransom affecting the entire economy of the eastern United States. A hospital system shut down due to ransomware, initiated via email. A government agency was impersonated in an attack that sent malware to thousands. And for each of these attacks in the news, there are hundreds of other successful attacks against organizations, costing them billions of dollars per year.

Render purple 2 FINAL
0x
More effective solution for email security
0x
Fewer attacks get through
0s
to full deployment with less work for your security team
0x
faster threat response time
Less
work to fully remediate incidents
More
time for employees to focus on their jobs

The Abnormal Engineering Blog

Building this whole machine to work at high scale is an incredibly challenging software engineering and machine learning problem. Read about some of these projects and challenges on our blog.
Blog earth lights
Sophisticated social engineering email attacks are on the rise and getting more advanced every day. They prey on the trust we put in our business tools and social networks, especially when a message appears to be from someone on our contact list, or even...
Read More
Blog ai algorithm
Developing a machine learning product for cybersecurity comes with unique challenges. For a bit of background, Abnormal Security’s products prevent email attacks—think credential phishing, business email compromise, and malware—and also...
Read More
Blog scales building
On October 21st, 2020, just two weeks before the US general election, many voters in Florida received threatening emails purportedly from the “Proud Boys." These attacks often included some personal information like an address or phone number, threatened violence...
Read More
Blog machine learning orb
Jesh Bratman, a founding member at Abnormal Security and Head of Machine Learning, was just featured on The Tech Trek’s podcast. Jesh deeps-dives into his past, building ML systems to detect abusive behavior at Twitter, and how he used this background to transition...
Read More

Using Machine Learning for Precise Email Security

ML 03 Unique Adversarial Learning 01 2x

Unique Adversarial Learning

Attackers are constantly changing, inventing new tactics to outsmart security technology and better trick victims. Unlike most machine learning problems, this problem is adversarial. It is a cat-and-mouse game between our ML models and the attackers.

We cannot just train a model on a dataset and expect the performance to continue to catch all attacks. Instead, we must build a platform that is constantly learning and adapting. It must learn from the data, but also be flexible so our team of engineers and security analysts can add new features, new models, and new approaches to stay ahead of ever-changing threats.

ML 04 Extremely High Precision and Recall

Extremely High Precision and Recall

We must maintain extremely low false positive rates, because this means deleting legitimate emails. At the same time, we must have extremely high recall, as we don’t want to let attacks through.

We don’t care about the average case. We must care about the decision on every single email. Out of 100 million emails, maybe 10,000 are phishing attacks, and less than 10 are advanced invoice fraud attacks. This low base rate poses big challenges for designing classifiers.

ML 05 Scaled Infrastructure and Machine Learning 2x

Scaled Infrastructure and Machine Learning

Processing hundreds of millions of emails per day requires sophisticated use of distributed data processing and scalable fleets of microservices. We have to build the best technology, and we need to serve it at scale at low latency, all while maintaining the flexibility to keep up with attackers.

Our products evaluate messages and sign-ins in real-time, operating at high throughput—more than 1M queries per second—and low latency at < 0.1 second. Our cloud infrastructure must reliably support our Fortune 500 customers, even as we are rapidly scaling.

ML 06 Beyond Email Attack Detection 2x

Beyond Email Attack Detection

Our core business is focused on detecting attacks delivered via email, but we are actually protecting organizations in multiple ways. We develop products to:

  • Detect advanced email attacks like phishing and business email compromise
  • Detect compromised accounts by finding anomalies in sign-in behavior
  • Detect sophisticated invoice fraud schemes by understanding invoice and vendor relationships
  • Detect account takeovers in multiple cloud products beyond email accounts

And more! Join us to find out what we’re doing next.

Listen to Abnormal Engineering Stories

Curious to hear directly from our team? Check out our latest podcast for more details on how we think about engineering and machine learning at Abnormal.
Icons8 spotify 1
Icons8 apple logo 1
Icon google podcasts
Rectangle 55
Abnormal 2

Are You Ready to Become an Abnormal Engineer?

If you’d like to solve some of the hardest problems in email security, Abnormal is the place.

We encourage you to think through these exercises to understand the type of work we do.

  1. How do you tune a classifier to detect events at a base rate of 0.01% to 0.00001%? What happens when the distribution of your evaluation set does not match that of the online distribution due to the changing attack landscape? How would you keep up classification performance on new types of attacks?
  2. What effect does adversarial text obfuscation have on modern NLP techniques? For example, how are embeddings affected by purposeful misspelling or hidden spacing characters inserted into words? What effect would this have on transformer-style models? What about simpler text understanding like bag-of-words or phrase matching?
  3. We try to identify unusual communication patterns as features into models, for example—how often has this sender communicated with this recipient? What happens when the attackers purposefully hack this feature by sending legitimate messages over the course of weeks or months?
  4. An email has many modes of data in it, including text content, headers, links, images, attachments, landing page content, and more. How do you incorporate all these signals simultaneously into a model?

Join the Team

We have a team of experts working on hard problems. When you join Abnormal, you join a team of over 50 engineers and data scientists who are developing groundbreaking technology to keep our customers safe from the most dangerous cyber threats.