Designing Between SOPs and Software

The most interesting skill I look for in designers right now is the ability to design between SOPs and software.

I joined Apian three years ago as employee ten. Our earliest deployments had no software stack, just a cross-referenced horror of Google Sheets, and it was our Standard Operating Procedures (SOPs) that held everything together. Laid out in unambiguous lists and flowcharts, SOPs are a universal language, understood across healthcare, aviation and engineering, that keep our operations safe and efficient.

Since then we’ve built out our software platform to integrate and automate everything we can, both in NHS clinical systems (EHRs, LIMS) and autonomous robotics (Wing, Matternet, Zipline and our own ground robots). Our SOPs have evolved in tandem, defining the workflows for every human still in the loop.

Connecting these two sides is what makes our operations successful. It also gives us two parallel levers for change. And so when things inevitably break - in the midst of live debugging on ops - I always ask the same question: are we going to fix this with SOPs or software?

The design superpower

We run two continuous on-call rotas - one operational, one engineering. I cover occasional operational shifts, but monitoring live ops day-to-day isn't part of my role.

And yet I still do, because when things break, the perspective design provides can resolve issues faster than almost anyone else's. That superpower comes from combining deep knowledge of two adjacent areas: operational context and platform architecture. Engineers often lack the former; our health-ops teams don't have the latter. You need both to truly understand how the SOPs and software work in tandem.

Deep operational context is the outcome of a service design mindset. My team has spent weeks embedded in hospital pathology labs, getting to know the staff and their workflows. The research and flows we map are the framework upon which the SOPs are written. We know which side of the shelf they prefer to pick from. Which barcode scanner always gets stuck. Why they skip the label printer and annotate by hand.

The second part, fluency in the architecture of our platform, comes from our early decision to build a combined product+design function at Apian. Our design team author the PRDs, push pixels, build prototypes, and see features through to live. We may not write the code, but we understand how it all hangs together and which "simple" changes quietly touch everything.

Is anything really a bug?

“I’m not sure if this is a bug but…” begins another message to our #support-internal Slack. Prune out the few genuine software bugs and everything left is triaged across three points:

  1. What needs to be done immediately to resolve this?

  2. What needs changing in our SOPs or software so this is handled in future?

  3. Should we?

Here are three examples of issues we encountered during live ops, and how we chose to solve them:

  1. Validating Special Haematology blood tests

    We clinically validate every class of blood test before we fly real patient samples. Samples are split at the sending site, sent by both ground and air to the receiving site, and the results compared to confirm no difference exists.

    One new test (Special Haematology) required a novel approach. The tests can only be performed by the labs themselves, and just for the validation the same lab wanted to both send out the samples and perform the analysis. The validation blood needed to go on a round-trip flight: “I want to send something to myself” was a use case we’d never had before. 

    It felt like something our platform should handle, but experience from 471 previous validation flights showed it was rare. What’s more we knew the lab team well, how engaged they were, and how few flights would be needed to validate. Rather than change our software, we handled this one with a bespoke SOP and some no-code glue. The lab staff were trained to book flights to the roof of another hospital (rather than direct to the lab), and a Zapier alert pinged our ops on-call who booked an immediate return flight to the sender. 

  2. Overcoming an Epic interface failure

    Epic (the electronic health record provider) has a module called Grand Central, used by many of our partner hospitals to manage thousands of portering tasks per day.

    Our team continuously monitors portering tasks that connect to our flights using Epic. Grand Central has a helpful feature bug whereby tasks disappear from view as soon as the job is completed. The anxiety this was causing was wild - completed jobs and accidentally cancelled jobs all looked the same.

    A quick patch SOP fix was to check the screen multiple times per minute (!), but clearly this wasn’t sustainable. The problem was ripe for automation, except this Epic module has no public API. Thankfully we’re used to this in healthcare, so as usual we just shipped around it. Our team extended our RPA product to continuously read portering status changes and pipe them into our platform, deleting a ton of SOP steps in the process.

  3. Making our map tracking more annoying

    Staff in NHS pathology labs track every drone delivery from our network on a giant display. Our platform gives an alert when the drone is inbound so they can prepare to collect it.

    In one of London’s busiest hospital labs, one of our forward deployed medics reported how staff would miss the alert and fail to collect in time. Sometimes the alert was too subtle, but other times the Windows PC running the display would just go to sleep, thanks to an over-zealous corporate IT policy.

    Solving this needed a mixed approach. For the former we deployed a well known Austrian DJ (who also happens to be one of our engineers) to mix some pleasantly awakening sounds for the lab. The display timeout problem was more complex and quite specific to this site - not worth the engineering investment. So we didn't. Instead we amended the SOP to check every morning that a strategic paperclip was clipped on, holding down a harmless keyboard key. Timeout solved.

SOPs as human-readable software

We are organisationally biased to solve problems in code. It’s a corporate culture that stems from having been co-founded by an engineer, coupled with a deployment mantra of “don’t add workload, don’t change workflows”. Add in the promise of AI, able to push automated fixes to customer issues overnight, and there is a strong argument we can ship our way out of every problem.

But many operational bugs will not generalise well, especially when deploying within the fragmented ecosystem of NHS infrastructure. Scoping every detail to an individual site or customer results in feature flag hell.

Instead, I’ve learned to think of SOPs as human-readable software - just another language to deploy changes in. Our SOPs are coauthored by NHS staff, giving them buy-in and ownership that sticking a custom logo inside white-labelled software never could.

SOPs are also helpfully opinionated in a way software can't be. A platform's intent is opaque, no matter how good the UX, but an SOP states its reasoning out loud, in language the people following it actually wrote. The real question, I’ve learned, was never SOPs or software. It's knowing which language to write the fix in.

Written by George Cave

Next
Next

The Physical AI deployment paradox