Buy to Build, the Third Way

business
data science
digital infrastructure
Author
Published

January 5, 2026

Tired of the buy vs build discussion? So am I. But recently I had an epiphany and realized the framing around this topic was wrong: Why not both? Let me propose a third strategy, the Buy to Build.

Why not both?

The title is indeed a nod to politics in the late 90s and early 00s. There were many issues with the third way political concept, but I just want to show off my memory.

Ask your any one of your favorite LLMs and they probably tell you that the decision on whether to buy or build any piece of software or data solution should be based on a number of circumstantial factors: complexity of the software, strategic importance, urgency and cost, among others.

Whilst these are important to consider, if the decision is to acquire a tool we should also consider how extensible the software is. How well will it tie to existing tools? How future proof is it should the company’s direction change?

I have seen a number of very expensive tools which aren’t even integrated with other products from the same software vendor.

The thoughts that follow are valid for any software like tool:

I will be focusing on Biotech (Drug development) use cases, but the concept is broadly applicable to other domains.

Buying solves one problem, creates another

Buying a software to solve an existing or upcoming problem has it’s advantages: an off the shelf resolution to an immediate problem. Time savings vs immediate costs. This is specially tempting when a startup is young and meeting deadlines becomes more important than having the money in the bank or strategic long-term considerations (funding round, delivery of results to a partner).

So the team will go out an procure a software that solves the issue - for example a LIMS, an ELN, or a compound management service. Quite often they will land on something that a manager used when they were more hands-on (10+ years ago) or what is the “Industry Standard” at a larger company.

Funny thing is that the really big companies have numerous departments and teams and I bet each will use different software to solve the same problem.

Whilst familiarity is a not a bad criteria, it shouldn’t be the only one (or the main criteria). In my experience tools that fall into this category of “it’s what I know” tend have some critical shortcomings:

  • Were developed many years ago when the industry standards and practices were different
  • Poor documentation (from a developer’s perspective),
  • Very prescriptive on how the users should use it, how the data should be structured
  • Often implement algorithms which might not be the latest or are very use-case specific.
    • tied with poor documentation, the algorithms can be fairly opaque and not documented (vendor’s IP)

In other words, familiarity (and competitive dominance) breeds complacency.

This means that if a Biotech is doing innovative biology or cutting edge experimental work, existing solutions will feel limited. These were designed with certain assays or well defines process in mind, which is not the case at companies which are innovating at a fast pace. Soon enough we will want to build extensions to supplement the commercial software - be it a new dashboard or integrate the results of an multi-modal ML model - and hit the limitations of a commercial software.

NoteBuying doesn’t solve everything

One example is a very popular (and expensive) HTS imaging system and it’s bundled analysis software. Great if you only need it for standard image analysis and one parameter hit-calling (and for the vendor’s imaging system). However it becomes an issue as soon as one needs to (i) use a different multi-parametric algorithm for hit selection, or (ii) ingest the data (images and masks) in an internal DB for ML work.

It is a solvable problem but requires some serious reverse engineering causing delays to projects and of course making use of precious resources. If the software had a clean and documented way for 3rd parties to extend it or at least interact with it’s database, this work would have been much more efficient.

Code
---
config:
  look: handDrawn
  theme: forest
---
flowchart TB
 subgraph Imaging["Imaging"]
        Mic["Microscope"]
  end
 subgraph subGraph3["Undocumented"]
        Im(["Images"])
  end
 subgraph subGraph1["Propriatery Black Hole"]
        Soft["Image Analysis
    Sotware"]
        Mas["Masks"]
        Num["Numerical Features"]
        db["DB"]
  end
 subgraph subGraph2["Innovation"]
        ML["Apply ML model"]
        Inte["Dashboard"]
  end
    ML --> Inte
    db -.-x Inte
    Num --> db
    db --> Soft
    Im -.-x ML
    Mic --> Im
    Im --> db
    Soft --> Mas & Num
    Mas --> db

    Mic@{ shape: rounded}
    db@{ shape: cyl}
     Im:::Sky
     Mas:::Ash
     Num:::Ash
     db:::Ash
     ML:::Ash
     Inte:::Ash
    classDef Ash stroke-width:1px, stroke-dasharray:none, stroke:#999999, fill:#EEEEEE, color:#000000
    classDef Sky stroke-width:1px, stroke-dasharray:none, stroke:#374D7C, fill:#E2EBFF, color:#374D7C
    style subGraph1 fill:#FFD600,stroke:#FF6D00,color:#D50000
    style subGraph2 color:#2962FF
    style Imaging color:#2962FF
    style subGraph3 stroke:#BBDEFB,fill:#BBDEFB,color:#000000

---
config:
  look: handDrawn
  theme: forest
---
flowchart TB
 subgraph Imaging["Imaging"]
        Mic["Microscope"]
  end
 subgraph subGraph3["Undocumented"]
        Im(["Images"])
  end
 subgraph subGraph1["Propriatery Black Hole"]
        Soft["Image Analysis
    Sotware"]
        Mas["Masks"]
        Num["Numerical Features"]
        db["DB"]
  end
 subgraph subGraph2["Innovation"]
        ML["Apply ML model"]
        Inte["Dashboard"]
  end
    ML --> Inte
    db -.-x Inte
    Num --> db
    db --> Soft
    Im -.-x ML
    Mic --> Im
    Im --> db
    Soft --> Mas & Num
    Mas --> db

    Mic@{ shape: rounded}
    db@{ shape: cyl}
     Im:::Sky
     Mas:::Ash
     Num:::Ash
     db:::Ash
     ML:::Ash
     Inte:::Ash
    classDef Ash stroke-width:1px, stroke-dasharray:none, stroke:#999999, fill:#EEEEEE, color:#000000
    classDef Sky stroke-width:1px, stroke-dasharray:none, stroke:#374D7C, fill:#E2EBFF, color:#374D7C
    style subGraph1 fill:#FFD600,stroke:#FF6D00,color:#D50000
    style subGraph2 color:#2962FF
    style Imaging color:#2962FF
    style subGraph3 stroke:#BBDEFB,fill:#BBDEFB,color:#000000

Diagram of how data will be moved between formats to finally get it into the DB. Only necessary to keep memory use as low as possible.

So we end up in a situation where the company decides to buy a solution thinking “problem solved” and suddenly realizes that it still need to build to fulfill it’s vision.

This not to say commercial software isn’t very good what it does - but it will be good at the one thing (e.g. image analysis) but not do everything the company needs (e.g. ML generated embeddings with an in-house proprietary model).

And in those cases where it tries to do ALL OF THE THINGS it will do a poor job at almost everything.

In Biotech, the decision to buy (or recommend) a tool is generally at the hands of the lab folks - a Biologist, Imaging expert, or chemist - which are the ones who judge if the functionality is fit for purpose. But whilst they will be focused on a set of core functions, having a “Data” or a software person in the room to help with the decision will have long term benefits. We can spot if the tool with be extendable, maintainable and interoperable (plays nicely with other software).

Cost is of course a consideration and ultimately a Manager will have to make a decision that affects budgets. But a pure numbers evaluation, might miss how useful a particular software actually will be, or how long it will take to deploy and maintain. I have seen market leading products being phased out on account of cost to be replace by something which was slower, lacked features, was buggy, and generally got in the way of getting things done. Was the money savings really worth it, and what cost?

Why not build from scratch?

Now that I’ve explained why I think buying solves some issues but might create new ones, let’s address the DYI part of the equation.

First of all it’s not a given that a software development team exists in-house or is large enough to develop a given software from scratch. And no, external contractors are not a magical solution. Contractors will need on-boarding before they understand the company internal systems, and managing them and the project doesn’t come for free - someone internal has to do it and in a small start-up every person is already doing two jobs.

But even if there are available resources, how long does it take to design and implement a piece of software? Talking to all the stakeholders, gathering use-cases and feedback from users, architecting and prototyping are things that take time. Loads of time. And I am not even factoring in feature creep - every user will (very reasonably) want a couple of special features which add up. And when a company is under time-pressure, waiting for 6 months for a fully functional piece of software is a luxury it can’t afford. And importantly does anyone really want to develop yet another use LIMS or ELN?

The answer is yes. Too many people.

And this the final argument against in-house build: is the software core to the company’s mission and business? Every Biotech company will have a concept or IP that is core to it’s existence. For some it will be RNA as the main treatment modality, others it will be that the target is a pathway, or some specific technology for personalized medicine. Whatever this is, the IP and all related activities will be kept in-house as much as possible. This what the company exists for. Unless it’s a BioTech where Tech is key, very rarely will the core be a software, and therefore building software or data products will not be mission critical. In this situation outsourcing of this activity is a sensible choice.

This is not to say that when done properly building in-house doesn’t have advantages. It does! The fine grained control of what the tool does and what for can be very important. Naming entities (ontologies), data structures, and tailor-made data flows for assays which might be unique to the Company is much more efficient than the headache of retrofitting data to kind of match what an existing software expects.

How to buy to build

If we consider the limitations of Commercial software - one size fits all - versus the needs of modern Biotech, custom, flexible solutions, and throw in the need for speed of implementation with limited resources, we could start picturing what a perfect commercial software solution looks like:

  • First class at a it’s core functionality (obviously). Once bought it needs to be useful for at least the one thing.
  • It’s user friendly (Visual interface).
  • Flexible enough to accommodate in-house assays or naming conventions.
  • Uses open formats or has an API for programmatic access (a way of interacting programmatically with the software). In other words, developer friendly.
    • and the API is written in a well know language (typically python)
  • Easy to migrate to (deploy), and critically, out of. Business critical in my opinion because we want to avoid silos.

Good examples

Some companies already do this very well. Benchling is one of those with it’s open and well documented API:

The Benchling Developer Platform allows you to programmatically access and edit data in Benchling. The platform is most commonly used for:

Keeping Benchling in sync with other systems Bulk Loading/Exporting data Creating dashboards and charts of Benchling data Integrating with Instruments in the lab

What this means is that a small dev team will be able to write code to ingest data from other tools to Benchling, or query entries in their database to use in other tools. As an example, we could build a small app to generate experimental sample sheets (or plate maps) using only reagents already in the inventory. This is useful for compliance and traceability by linking reagents to experiments and results - if a reagent is not in the database an experiment can’t be created. By using controlled vocabulary on Benchling we can also reduced issues with data inconsistencies - one gene / protein / cell line showing up with different names in different experiments. This quite critical because computers don’t like inconsistencies like that.

Hela, HELA, Hela S3, HeLa, may look the same, but computers are stupid: they will need these to be exact same name or they won’t match and error. I have seen all those variations, some times by the same user, and couple of those in a single experiment!

We could use an LLM for pattern matching and sanitizing those names but (i) it’s a very crude and resource intensive hammer for a very human problem, and (ii) I wouldn’t trust it distinguish between very subtle differences in gene names or cell lines.

Also, I worked as a cell culture technician for a while and HeLa and HeLa S3 are not the same cell line.

Another example of a tools that does all the right things is the Seqera Platform (formerly Nextflow Tower). This a web platform to deploy and manage bioinformatics pipelines and provides a large number of features for data reproducibility. Their documentation is also excellent. One such feature is Datasets:

Datasets in Seqera are CSV (comma-separated values) and TSV (tab-separated values) files stored in a workspace. They are used as inputs to pipelines to simplify data management, minimize user data-input errors, and facilitate reproducible workflows.

Basically Sample Metadata for a given experiment describing how the sample was generated:

sample_id cell_line drug_name drug_concentration_um incubation_time_h
s1 HeLA DMSO 0 24
s2 HeLA DMSO 0 24
s3 HeLA DMSO 0 24
s4 HeLA tamoxifen 10 24
s5 HeLA tamoxifen 10 24
s6 HeLA tamoxifen 10 24

Since we can programmatically generate these Datasets, we could build a tool where the user inputs sample metadata using the entries in Benchling (fetched with Benchling’s API), attach the location of the sequencing files, and saves these datasets in Seqera platform, which then triggers an RNAseq pipeline to start for this dataset. A bioinformatician might get involved for QC - and should have been involved earlier for the experimental design - but the lab user is in full control and guardrails are in place to prevent data issues. There also fewer manual steps and information flows seamlessly between tools.

This is an example of what system which could be quickly developed with the right tools.

Code
---
config:
  look: handDrawn
  theme: forest
---
flowchart TB
 subgraph Experiment["RNAseq Experiment"]
        Exp["Experiment entry"]
  end
 subgraph LIMS["LIMS/ELN (Benchling)"]
        Experiment
        cells["Cell lines"]
        ab["Antibodies"]
        vec["Gene constructcts"]
  end
 subgraph Store["Compound Management"]
        cpd["Compound"]
  end
 subgraph App["Experimental Data App"]
        meta["Sample Metadata"]
  end
 subgraph Seqera["Seqera plattform"]
        Data["Datatsets"]
        Pipe["Pipeline"]
        Res["Gene expression"]
  end
    cells --> meta
    ab --> meta
    vec --> meta
    cpd --> meta
    meta --> Exp & Data
    Data --> Pipe
    Pipe --> Res
    Res --> Exp

     meta:::Sky
    classDef Sky stroke-width:1px, stroke-dasharray:none, stroke:#374D7C, fill:#E2EBFF, color:#374D7C
    style App stroke:#2962FF,fill:#BBDEFB

---
config:
  look: handDrawn
  theme: forest
---
flowchart TB
 subgraph Experiment["RNAseq Experiment"]
        Exp["Experiment entry"]
  end
 subgraph LIMS["LIMS/ELN (Benchling)"]
        Experiment
        cells["Cell lines"]
        ab["Antibodies"]
        vec["Gene constructcts"]
  end
 subgraph Store["Compound Management"]
        cpd["Compound"]
  end
 subgraph App["Experimental Data App"]
        meta["Sample Metadata"]
  end
 subgraph Seqera["Seqera plattform"]
        Data["Datatsets"]
        Pipe["Pipeline"]
        Res["Gene expression"]
  end
    cells --> meta
    ab --> meta
    vec --> meta
    cpd --> meta
    meta --> Exp & Data
    Data --> Pipe
    Pipe --> Res
    Res --> Exp

     meta:::Sky
    classDef Sky stroke-width:1px, stroke-dasharray:none, stroke:#374D7C, fill:#E2EBFF, color:#374D7C
    style App stroke:#2962FF,fill:#BBDEFB

Diagram of how we could link an experiment to data generation with little human intervention. Users will mostly interact with Experimental Data App.

This “dream” scenario is only possible because these tools, Benchling and Seqera, embrace interoperability and are designed to be user and computer friendly. As a bonus, should an internal tool be developed, for example a multimodal ML modal to predict properties of the a compound using both imaging and transcriptomics, we could integrate it in this workflow. This is what I consider a fairly future proof system.

The hidden cost of not buying the right tools to build

Finally, let’s talk business: How much does it cost to change systems? For example replace the LIMS system? Not just the new contract but also the time it takes for the data team to transfer data from one system to another, users to get used to the new tools, and the predicable slowing down of operations (fewer experiments -> slower progress and the clock is ticking on the Funding).

And let’s not kid ourselves, any company and specially a start-up will change tools for the same functionality a few times in it’s early stages. Shouldn’t we buy tools that make the transition as smooth as possible?

We should also consider being a walled garden situation. Do we want parts of the business operation to be reliant on a third party tool? Or have the data inside some obscure database which makes it virtually impossible to obtain without paying a ransom to the DB developer? What if they up their prices to such an extent it becomes a toxic expense for a key component of the company? Personally I am quite suspicious of software which is not up-front about how the data is stored and the possibilities it offers to retrieve data from it should we stop using it.

Final thought and suggestion

Whilst this text might give the impression I am not a big fan of building tools in-house, it’s quite the opposite! I think companies should have the budget to build key functionality in house.

But Key functionality being the crucial aspect. We should also be wary of not re-inventing the wheel, so if there is already a solution that does most what we need we should save ourselves the trouble and buy it! Well, as long as this solution allows us to additionally build the key business functionality.

So let’s buy to build.

NoteNot re-investing the wheel

Like most Bioinformaticians, I have build a number of pipelines. These days before I write any code I head out to nf-core and check out if there is a pipeline that does what I need. For example, the RNA-seq pipeline does most a very good job of producing the gene expression tables and QC, and I then build internal reporting tools on top of it. Win win.

Attribution

This post’s cover image was composed of:

Reuse

Citation

BibTeX citation:
@online{domingues2026,
  author = {Domingues, António},
  title = {Buy to {Build,} the {Third} {Way}},
  date = {2026-01-05},
  url = {https://amjdomingues.com/posts/2025-10-26-buy-or-build/},
  langid = {en}
}
For attribution, please cite this work as:
Domingues, António. 2026. “Buy to Build, the Third Way.” January 5, 2026. https://amjdomingues.com/posts/2025-10-26-buy-or-build/.