chat
expand_more

How You Should Design ML Engineering Projects

Machine learning engineering is hard, especially when developing products at high velocity, as is the case for us at Abnormal Security. Typical software engineering lifecycles often fail when developing ML systems.
April 7, 2021

This article originally appeared in Towards Data Science. You can read the full article below.

Machine learning engineering is hard, especially when developing products at high velocity, as is the case for us at Abnormal Security. Typical software engineering lifecycles often fail when developing ML systems.

Have you, or someone on your team, fallen into the endless ML experimentation twiddling paralysis? Found ML projects taking two or three times as long as expected? Pivoted from an elegant ML solution to something simple and limited to ship on time? If you answered yes to any of these questions, this article may be right for you.

The purpose of this article is to:

  1. Analyze why software engineering lifecycles fail for ML projects.
  2. Propose a solution with an accompanying design document template to help you and your team more effectively run ML projects.

Software and Data Science Project Lifecycles

Typical software engineering projects are about developing code and systems. They might go something like this:

A typical software engineering project goes like this: (1) identify a product or infrastructure problem (2) discuss and design the software system to solve the problem (maybe with a crawl/walk/run) (3) break the problem into pieces and implement over the course of days, weeks, or months often using agile development processes (4) push into production and monitor (5) go back to the beginning of the cycle to improve the system as necessary.

This lifecycle is clearly not what happens for ML engineering projects. What about data experimentation? What about model training and evaluation?

Maybe we should look toward data science research projects and see if their lifecycle is more suited.

A data science research project might go like this: (1) identify a question that can be answered with data (2) design experiments (3) wrangle data (4) evaluate hypotheses with data analysis or modeling (5) publish results or trained models.

This lifecycle doesn’t seem right either. Pure data science research projects are about answering questions and not about building systems. What’s the middle ground?

Understanding Machine Learning Engineering

Machine learning engineering is at a unique crossroads between data science and software engineering. ML engineers will have trouble operating in a software engineering organization if you try to force everyone to operate in the typical software development lifecycle. On the other hand, operating a machine learning team like a pure data science or research team will result in nothing getting shipped to production.

ML engineers can get frustrated when they commit to a project that requires experimentation. When they inevitably have false starts because data does not support their initial hypothesis or because wrangling the data is much more difficult than anticipated, they start falling behind committed timelines. This sense of falling behind results in a feeling that a crucial part of their job—experimentation—feels like a constant failure compared to their colleagues working on software engineering tasks.

A typical ML Engineering lifecycle goes as follows: (1) identify a problem (2) design software and experiment, which are interconnected because the models you may plan to implement will depend on which experiments work out, but you may need to design feature and model code to run your experiments in the first place (3) implement code and wrangle data, which may be interconnected because you may need to implement software to get the data you need and you may need the data to write and test the feature extraction or model training code(4) analyze data, train models, evaluate results (5) publish results (6) test, deploy, and monitor code and models.

A typical ML engineering lifecycle. The better the software design and experimental design, the less re-visiting required because a good design will anticipate the branches that may need to be taken.

This ML engineering lifecycle is often invented on the job and not taught. It is possible to do very well by carefully laying out software and experimental design. Still, it is also easy to do poorly, leading to many false starts and winding paths toward a solution that may never be reached.

Junior ML Engineers vs Senior ML Engineers

In a fantastic article by Julie Zhuo, she illustratively compares Junior Designers vs. Senior Designers, and this visualization aptly pertains to Junior and Senior ML Engineers as well

Process for Junior ML Engineer. Often they will meander through the space of implementations, experiments, and data without a clear method. This wastes time and can be frustrating. (Image by Julie Zhuo, used with permission.)
Process for Senior ML Engineer. Senior ML engineers will carefully lay out experimental paths, know when to cut them short, and proceed in more fruitful directions, as well as know when one result indicates in which new directions to proceed. (Image by Julie Zhuo, used with permission.)

Methodical thinking and discipline are a must when iterating on experiments. Can we help ML engineers plan out work to follow this paradigm?

Design Documents to Aid ML Engineering Lifecycles

How can we encourage better ML engineering design?

A process we’ve implemented at Abnormal is to require all ML Engineering projects to go through a formal design review process using a design document template that helps the engineer do good software and experiment design simultaneously.

What should be encouraged when designing ML Engineering projects?

  1. Put the work into explicit forward-thinking experiments before rushing into implementation. This heads off the endless and fruitless ML/data experimentation/twiddling experiment iterations we all find ourselves in from time to time.
  2. Call out the *work* of experimentation as useful, whether or not the experiment validates the hypothesis. There is value to disproving a hypothesis even if it does not lead to ML product improvements.
  3. Design software with experiments in mind and design experiments with software in mind, i.e., what is capable of shipping to production. Wrangle your data in light of the systems you will be building and how that data will be available in production.

With these in mind, we created the template below to fill out at the beginning of any ML engineering project. An engineer should copy this template, fill in the details for their project, then present the software and experimental design to the team for feedback and iteration. This process has greatly improved the success and velocity of projects, and we highly encourage adopting this design template (or something similar) for your ML Engineering team.

Abnormal Security’s ML Design Document Template

We use this template at the start of every project. Feel free to use directly, modify, and share!

Problem Statement

What are we specifically trying to solve, and why are we solving it now? A strong justification will tie this back to a product or customer problem.

Goals

Software Goals

Describe the software system we wish to build and its capabilities.

Metric Goals

What is the desired metric improvement, how are we going to measure the impact of this work, and why do we want to improve the system in this way?

  • Bad Example: Improve model’s performance
  • OK Example: Improve AUC by X% for the model
  • Good Example: Improve recall by X% for the class of false negatives without decreasing recall for any other classes by more than Y%.

Include expected metrics tradeoffs, if any. For example, Increase recall without decreasing precision by more than 5%.

Experiment Design

Unlike pure software projects, data science / ML projects often require data exploration, experimentation, failure, and changing design along the way when data has been collected. To help make a project successful, it is helpful to lay out your potential branching points and how you will make decisions along the way. Additionally, all experiments should be evaluated against a baseline, which is either a simple solution to the problem (simple algorithm, simple heuristic) or the current production solution if one exists.

Data Motivation

Describe the problem that should be solved and use data to validate that this is indeed a worthy problem to solve. Is this actually going to have a real impact?

Hypotheses

Hypothesis 1: Method A will improve metric B by X% over baseline.

  • Method: Describe the methodology you are approaching. For example, this might be a model architecture we are testing, a new feature we are adding, etc.
  • Metric: Describe the metric or metrics we will use to evaluate the method.
  • Success Criteria: The measured metric results that will indicate success in this hypothesis. Ideally should be measured against a baseline.
    Timebox: X days, then check in with the team to decide the next steps.
  • Failure Next Steps: For example, go on to try Hypothesis 2.
  • Success Next Steps: For example, push this model to production.

Hypothesis 2: …
The same set of questions for each hypothesis

...

Software Design

Describe the software systems and data pipelines needed to execute this project. What software needs to be built? What services and databases? What data will need to be available in production to run your model? Feel free to use normal software design documentation principles here.

Execution

What will be delivered and when will it be delivered. A strong plan will provide incremental value and will allow us to get to the crawl state quickly.

  • Crawl: Minimum design to prove the efficacy of change before we invest too much time in software development.
  • Walk: More thorough design aimed to be a relatively complete component.
  • Run: Long-term design here; how would we make this a really first-class system or model.

Considerations

  • Success criteria to launch? Describe metrics evaluated to advocate launching this model or change into production.
  • What could go wrong? Describe all possibilities that might go wrong when we launch this.
    • Which product surfaces could be affected?
    • How will this impact customers?
    • How will we monitor?
    • What will we do to roll back?
  • What are the security and privacy considerations? Include everything that must be taken into consideration.
    • What impact on security could this change have?
    • What impact on privacy could this change have?

Appendix: Experiment Log

Keep track of the results of each hypothesis tested and the decisions made along the way, branching points, learnings, revised hypotheses, and so on. It’s beneficial to remind yourself later and share how you approach this type of problem with others on the team.

And that's it! We hope this helps you design better machine learning projects for your tough problems.

Interested in learning more about how we work on Machine Learning at Abnormal? Check it out here, or join us!

How You Should Design ML Engineering Projects

See Abnormal in Action

Get a Demo

Get the Latest Email Security Insights

Subscribe to our newsletter to receive updates on the latest attacks and new trends in the email threat landscape.

Get AI Protection for Your Human Interactions

Protect your organization from socially-engineered email attacks that target human behavior.
Request a Demo
Request a Demo

Related Posts

B AI Mbx Prompts
Discover how to unlock the full potential of the AI Security Mailbox with custom prompts designed to enhance your generative AI output.
Read More
B Protecting Microsoft Accounts Blog
Microsoft, with its vast user base, is a prime target for cybercriminals. Discover the top 5 attack strategies used to compromise its users and systems.
Read More
B Convergence S3 Announcement Blog
Join us for Season 3 of The Convergence of AI + Cybersecurity as we explore deepfakes, the evolving role of the SOC, and the intricacies of AI-native security.
Read More
B AISM Augmenting Customer Facing Product with AI Blog
Learn how Abnormal Security leverages large language models (LLMs) to enhance security awareness and automate SOC teams’ workflows with AI Security Mailbox.
Read More
B Education Targeted Attacks Blog
Cyberattacks on schools have surged, exposing 650K+ records in the last 60 days. As the school year begins, phishing is a key threat to students, teachers, and staff.
Read More
B Fed RAMP Announcement Blog
Abnormal is pursuing FedRAMP Moderate authorization, which enables us to empower federal agencies with AI-native email security against advanced cyber threats.
Read More