Ever found yourself with these prompts?

Let’s build this in-house; we have got the hardware
We could host this internally
It’s simple; it’s just some hardware, OS, and some packages
We could do this better and cheaper, insource

There are many shapes and forms of the above. In the end, it all boils down to insourcing infrastructure and platform. If you support the idea, I have a single piece of advice:

“Do not!”

The rest of the post elaborates on the whys. If you are against it, you’ll find some arguments to help you. And, of course, there are exceptions; I’ll talk about those towards the end.

Let’s see why you shouldn’t build a platform.

First, you probably have better things to do, build your product instead. Second, the cost-benefit is an illusion; it is much more expensive than you think. Engineers overestimate business benefits and underestimate the cost due to a lack of experience. They are great with numbers (requests/sec, hardware/cloud infrastructure costs, etc.) but often lack real-world experience running an organization and managing its costs. I have been there and made that mistake.

There are companies whose product is a platform. They can do it better and cheaper, trust me. You must strive for added value and not a more affordable supply chain (no, I am not saying you shouldn’t bargain to get better prices from cloud providers). Once you maximize your added value, then it might make sense to cut costs by capturing a more significant part of the value chain. Be careful even then, or you will end up with a mirage.

Let’s play the devil’s advocate and see why you would want to build a platform in the first place.

The benefits of building a platform

There are two main arguments for pro-platform-building: control and cost. Before we explore these arguments, let’s recap the typical cloud computing models: IaaS, PaaS, and SaaS. When you start insourcing, you buy less-managed services. For instance, imagine a company building a SaaS product. They purchase PaaS from a cloud service provider (CSP). When they decide to insource and build the platform themselves, they ditch the PaaS and rent virtual hardware, purchasing IaaS instead. They might buy the hardware and create a private cloud if they go further. The higher up you operate on the value chain, the more managed services you purchase. If you expand downward, you typically insource some of the things you bought before. Keep this in mind as you read further.

And now, onto the benefits.

More control

“We can’t do PaaS because we must have mTLS; no, it’s non-negotiable.”

Platforms often fully control the HTTP stack, load-balancing, and routing. They don’t allow you much customization. Although there are exceptions. For instance, AWS lets you deploy to a managed ElasticBeanstalk environment and gives you control over load balancing and the HTTPS stack (mostly). While you can do mTLS, you can’t do custom tweaking of the TLS protocol.

And that’s okay. Because most of the time you do not need that. Engineers, myself included, tend to get lost in the strive for control. We feel like we could do it better. It’s not that much extra effort, right? We are finished with an additional package, a custom build, and some scripts. It works; it’s excellent!

Of course, as you go down the managed axis, you get more control. Nothing beats custom hardware with Linux (or, better yet, your own OS) on it. And as the saying goes, with control comes a lot of responsibility. If it breaks, it’s your job to fix it. You must be the expert; it will take up your time. And while you are managing it, you are not doing something else.

Please be very critical about how much control you truly need. In my experience, it’s less than you first think.

While the following quote is from a military context, it also stands very nicely here.

“Be in command, and out of control”
Paul Van Riper

More cost-effective

Okay, this one is more complex. Typically, cost-effectiveness requires more control. Let’s consider two angles here: utilization and comparing different cloud models. Better utilization means scheduling your workloads better and getting more performance for your buck. And a comparison of cloud models, as in what exactly are you paying for and whether you can invest the difference better.

Utilization

IaaS and PaaS are very similar in pricing construction. You are usually billed after a pre-selected compute unit. For example, take an AWS EC2 instance t3a.medium with 2 vCPUs and 4 GiBs of RAM. The usage is prorated to the second; however, there is a catch. This resource is allocated to you whether you use it or not. Even if CPU utilization is <5% in 99.9% of the time, you are still billed for 100% usage. Similarly, you get CPU share and memory usage quotas with Heroku’s dynos. And again, whether or not you utilize it, you are billed for it.

Costs can quickly increase if you have multiple small apps and many microservices. You’ll find yourself paying for 10x-20x the resources you actually use. At this point, it is very tempting to build a platform or go use a hosted/self-hosted Kubernetes (don’t do that!) for the sake of cost-effectiveness. The reasoning is straightforward: you only need to pay for the 10th of the resources and can better schedule and spread our workloads. From a pure $ / vCPU / GiB perspective, this is true, but I invite you to consider another angle.

If your apps use so few resources, moving even higher on the cloud value chain makes sense. The answer is serverless. You only pay for what you use and are billed for CPU and memory seconds. And yes, this is more expensive if you use it a lot. And for your money, you also get many other things: massive scaling, less maintenance, etc.

Comparison of cloud models

A PaaS is usually more expensive than an IaaS offering. Here is a simple comparison.

Heroku’s Standard 1X dyno costs $25 / month for a “1x” CPU share with 1x-4x Compute and 512 MB of RAM.

The roughly equalling IaaS offer from AWS is t3a.nano with 2 vCPUs, and 512 MB of RAM for <$5 / month.

The comparison isn’t fair as Heroku grants these limits to your application, while AWS needs to fit the OS into this as well.

Upgrading 2 tiers to t3a.medium, we get 2GiBs of RAM for about $14 / month. Still a lot more performance for our money.

So what’s the difference? And why not go all the way to purchasing our own hardware?

It’s a different offering; a PaaS has patching, deployment support, backups, snapshots, etc. These are baked into the price. With an IaaS, you are left to manage these with experts, who are not easy to come across and take a nice salary. You get what you pay for.

The price difference has two major components, the cost of added value (the cost of extra systems, complexity, experts, etc.) and the cloud provider’s profit. The cloud provider can do this much cheaper because of the economics of scale. Then, the question becomes whether you can do a decent enough job for the money you saved (if any). It is more challenging than you think.

And this brings us to the next section.

The actual cost of your platform

The cost of your platform-building endeavor is better measured in time than money. Time is more valuable than money, and the time needed to build a platform is greatly underestimated by most. The actual cost is built up from the following parts

Initial investment
Ongoing operations
Opportunity cost

Let’s examine each one.

Initial investment

You must invest this upfront payment to build the initial MVP and migrate to it. The financial cost of this can be spread over a period using the concept of amortization.

“Amortisation, in business accounting, is spreading the cost of an expensive and long-lived item over many periods.”

In the long run, this doesn’t count much financially. Time-wise, however, this is critical. It means you are losing out on creating value for your customers. In business, time is money. It’s actually more than money. A project pushed to the future has more significant implications than delayed profit. More on that in another post.

Ongoing operations

This is the leading financial cost of the platform investment. If you insource it, running and maintaining the platform is your job. You are to patch, update, backup, restore, add features, handle incidents, etc. This is an enormous effort, both time-wise and resource-wise. If you buy the platform, someone else is taking care of all that, and you can worry about mitigating any potential impact business-wise. You may think you can do a better job and achieve superior SLAs. Think again; don’t consider doing this if you have never run a platform. The incidents you see with Heroku and AWS are real, but they are mitigated relatively fast; given the complexity of those platforms, it is understandable. And engineers working there do drills, build redundancy, test backups, have their own tooling, etc. The platform is their product.

Companies where the core is providing a platform, can do a better job, period.

Opportunity cost

While you are spending time building the platform, you are not doing other, potentially more valuable things. This is called opportunity cost.

“Opportunity cost refers to the value of the best alternative forgone when a decision is made to pursue a particular course of action. In other words, it represents the benefits or utility that could have been gained if resources were allocated differently.”

Ask yourself: is insourcing the best way to win in your current market? I don’t think so. Also, consider that this is an ongoing cost. Every month you will spend considerable effort maintaining, developing, and operating the platform. Time better spent on your core business. This brings us to the following argument.

Comparative advantage

“Comparative advantage is an economic concept that refers to the ability of an individual, company, or country to produce a particular good or service at a lower opportunity cost compared to other entities. This concept is a cornerstone of international trade theory. It suggests that all parties can benefit from specialization and trade, even if one has an absolute advantage in producing all goods.”

In other words, even if you could build a better platform cheaper than you can buy, it still makes sense to buy it and focus on your core offering. Read more about this concept here.

This fits very nicely with the idea of outsourcing non-core competencies. This is very simple for facility maintenance, bookkeeping, catering, etc. I challenge you to think about the platform you are building your services on the same way. It may feel close to your core competency, but it isn’t. You don’t run your own mail server, do you?

Just because you can, it doesn’t mean you should.

If you are a software engineer, you probably had friends, relatives, colleagues, or anyone asking you to install printer drivers, update windows, etc. Yes, you can do it, but it isn’t your core competency; it’s a favor. I would gladly pay or delegate it to someone else because we (whoever is doing it and I) are better off. Most software engineers’ time is more valuable spent developing software and not debugging general IT problems with email clients and printers.

Build a platform, if

There are valid use cases for building a platform. They are easily recognized.

The platform is your product; it’s your core business

Are you building something like AWS, GCP, Heroku, Digital Ocean, etc.? If so, that’s your offering, do it.

It is infeasible to buy/rent it

Of course, there are times when buying the platform is infeasible politically, economically, or just strategically. Then, it’s okay to build it. Think twice when you arrive at this conclusion, and make sure it is truly justified. Here are some examples to help you.

Your competitor has the platform, but you aren’t willing to purchase it from them
It would cost too much to operate a component in the cloud (IaaS because of the high traffic costs
Your origin country doesn’t allow deals with the country of the platform seller

However, most of the time, these are untrue in the days of AWS, GCP, and Azure.

To capture a more significant part of the value chain

This is an exciting place to be. You maximized the value of your core business. The next best move is to capture a more significant part of the value chain to increase your profit. If you are here, you already have a specific expectation of a profit increase.

Again, most companies and services are nowhere near this stage. They are a lot better off focusing on their product.

Build your product instead

Avoid the trap of free or low-maintenance self-hosted platforms like Kubernetes. It’s works well most of the time. When it doesn’t, it’s on you, configuration, debugging, monitoring, etc. All your job, your time, and your cost. As a rule, go as high on the value chain as possible.

In conclusion, insourcing infrastructure and platform building can be tempting, but it’s often not the best choice for most. Instead, focusing on your core product and relying on specialized platform providers leads to better outcomes. Remember that the actual costs of building and maintaining a platform are usually underestimated, and the opportunity costs can be substantial. Comparative advantage suggests that you specialize in your core competencies and leave platform building to those who are experts in it. There are valid use cases for building a platform, but for most companies, it’s wiser to prioritize delivering value to customers through their products and services. So, think critically about your company’s needs and the potential drawbacks of insourcing before making a decision.

Comments

One response to “Do Not Build a Platform”

jordana

May 5, 2023

Not exactly Kubernetes but check this story how moving from serverless to ECS saved a ton of money.
https://www.primevideotech.com/video-streaming/scaling-up-the-prime-video-audio-video-monitoring-service-and-reducing-costs-by-90

Do Not Build a Platform