Stopping New Email Attacks with Data Augmentation and Rapidly-Training Models

November 20, 2020

On October 21st, 2020, just two weeks before the US general election, many voters in Florida received threatening emails purportedly from the “Proud Boys." These attacks often included some personal information like an address or phone number, threatened violence if they did not vote for Donald Trump, and implied that they had access to the voting infrastructure which would reveal to the attacker how an individual voted. This claim is exceedingly unlikely to be true, but as with other exploitation attacks, once an attacker includes some small amount of personal information, the victim will be more likely to believe these more outlandish claims.

Email security systems did not stop this attack because the pattern had not been seen before. However, Abnormal was able to incorporate the attack into our NLP models within hours to enable detection capabilities for future such attacks.

This story exemplifies one of the hardest challenges of building an effective ML system to prevent email attacks. That is, the rapidly changing and adversarial nature of the problem. Attackers are constantly innovating, not only launching new attack campaigns, but also tweaking the language and social engineering strategies they employ to convince people to give up their login credentials, install malware, or send money to a fraudulent bank account, among other activities.

Retraining the NLP Pipeline

At Abnormal, we tackle this problem by providing our ML models with the most up-to-date information possible. We have both automated systems and security researchers keeping up with the latest attacks. The data gathered is then consumed by a rapidly retraining NLP pipeline.

As a thought exercise, let’s imagine we missed an attack with the following text content:

Subject: Account payment overdue

We haven’t received the invoice payment for invoice #12335. We’ve been trying to contact your accounting department for a month and if we don’t hear back your service will be terminated immediately

Regards,

Oleg

Perhaps our existing text representation did not identify the particular threat of terminating a service, which is a constant challenge as attackers adapt. We would like to immediately re-train our text models to learn this pattern.

Image for post
General retraining pipeline

However, putting just one sample into the system is unlikely to improve the model enough to catch anything beyond that exact message, even if we use weighting schemes. We would like to learn to detect similar attacks because attackers will be unlikely to use the exact text template in the future. Our solution is to use text augmentation.

For each of these missed attacks, we generate many training samples using text augmentation and use a few open-source text augmentation libraries for this with some of our own changes.

Image for post
Text augmentation for a missed attack

Once we have generated the augmented samples we retrain our model as before. Currently, we use a combination of word and character-level CNNs (convolutional neural networks) to learn to predict various attack labels, such as an attack, spam, graymail, etc. This model is then used directly as an input into our detection stack, and as features in other models.

Some of the challenges in building this system include:

  1. For new attacks, we often have a very limited set of examples, which are very easily ignored by our model training. It is hard to capture and ensure we have the right level of signal but do not overwhelm other samples using the augmentation.
  2. It’s hard to maintain model precision because, in the vast volume of legitimate emails, there are often many edge cases that appear similar to attacks.
  3. We must set up a robust data pipeline to make sure there is always the latest data to train, and that means more robust data pipelines.
  4. We must find a model structure that is quick to retrain, robust to converge, and expressive enough to learn.

Catching Election Interference Emails with This System

Using this system, we fed in an example of the emails noted in the Washington Post article. After running this message through the trained model, we can verify it is caught with a very high score.

Image for post
Prediction on original attack (generated using eli5)

But the open question is whether this model will indeed catch similar, but differently worded, messages. To test this, we can construct a new message and run it through the model.

Image for post
Prediction on new attack (generated using eli5)

The model scores this high as well (in this case at 99.6), while our previous model did not score this high at all.

After incorporating this attack into our detection model, Abnormal was able to detect and stop a significant number of other related attacks and spam that used election-related terminology.

If developing machine learning models and software systems to stop cybercrime interests you, we’re hiring! Check out our Careers page to learn more and apply.

Previous
Blog city sunrise
With many employees forced to work from home because of COVID-19, cybercriminals can take advantage of the fear and uncertainty caused by the pandemic. This attack features a new phishing scheme around returning to the office. Despite (or perhaps because of) the rise in COVID-19...
Read More
Next
Blog city sunrise
With many employees forced to work from home because of COVID-19, cybercriminals can take advantage of the fear and uncertainty caused by the pandemic. This attack features a new phishing scheme around returning to the office. Despite (or perhaps because of) the rise in COVID-19...
Read More

Related Posts

B 10 15 21
With Detection 360, submission to threat containment just got 94% faster, making it incredibly easy for customers to submit false positives or missed attacks, and get real-time updates from Abnormal on investigation, conclusion, and remediation.
Read More
Extortion blog cover
Unfortunately, physically threatening extortion attempts sent via email continue to impact companies and public institutions when received—disrupting business, intimidating employees, and occasioning costly responses from public safety.
Read More
Blog engineering cybersecurity careers
Cybersecurity Careers Awareness Week is a great opportunity to explore key careers in information security, particularly as there are an estimated 3.1 million unfilled cybersecurity jobs. This disparity means that cybercriminals are taking advantage of the situation, sending more targeted attacks and seeing greater success each year.
Read More
Blog hiring cybersecurity leaders
As with every equation, there are always two sides and while it can be easy to blame users when they fall victim to scams and attacks, we also need to examine how we build and staff security teams.
Read More
Cover automated ato
With an increase in threat actor attention toward compromising accounts, Abnormal is focused on protecting our customers from this potentially high-profile threat. We are pleased to announce that our new Automated Account Takeover (ATO) Remediation functionality is available.
Read More
Email spoofing cover
Email spoofing is a common form of phishing attack designed to make the recipient believe that the message originates from a trusted source. A spoofed email is more than just a nuisance—it’s a malicious communication that poses a significant security threat.
Read More
Cover cybersecurity month kickoff
It’s time to turn the page on the calendar, and we are finally in October—the one month of the year when the spooky becomes reality. October is a unique juncture in the year as most companies are making the mad dash to year-end...
Read More
Ices announcement cover
Abnormal ICES offers all-in-one email security, delivering a precise approach to combat the full spectrum of email-borne threats. Powered by behavioral AI technology and deeply integrated with Microsoft 365...
Read More
Account takeover cover
Account takeovers are one of the biggest threats facing organizations of all sizes. They happen when cybercriminals gain legitimate login credentials and then use those credentials to send more attacks, acting like the person...
Read More
Blog podcast green cover
Many companies aspire to be customer-centric, but few find a way to operationalize customer-centricity into their team’s culture. As a 3x SaaS startup founder, most recently at Orum, and a veteran of Facebook and Palantir, Ayush Sood...
Read More
Blog attack atlassian cover
Credential phishing links are most commonly sent by email, and they typically lead to a website that is designed to look like common applications—most notably Microsoft Office 365, Google, Amazon, or other well-known...
Read More
Blog podcast purple cover
Working at hyper-growth startups usually means that unreasonable expectations will be thrust on individuals and teams. Demanding timelines, goals, and expectations can lead to high pressure, stress, accountability, and ultimately, extraordinary growth and achievements.
Read More