We know platforms. We had plenty of them in our life as IT professionals, just that everyone seems to imagine it a different thing for different purposes. For running workloads, for developing software, for providing a way of social interaction, whatever. Is there a common denominator?
A platform is meant to be a stable ground, just like in real life. Whatever I do on a platform – building something, communicating etc. – should be easy, without hazzle. All the tools I need should be there and should work well together. If I run into problems I expect the platform to provide readymade solutions. It’s the opposite of doing something „in the wild“, where I need to manually path my way to where I’m going. Working in the wild like a pioneer, without question, has its appeal too. But we are not always in adventurer mode. Most of the time there just is a job that needs to be done. I, for one, therefor regard myself happy when I’m able to just focus on the task in hand most of the time, without worrying too much about the details.
Modern software development
„Easy“ is a word that would not come to mind when thinking about software development environments these days. Don’t get me wrong: Every new invention here was necessary to speed development up and still ensure the best possible quality. Nevertheless, all these systems that are utilized today when software get’s developed and updated continuously – CI Job schedulers, Artifact and image repositories, static code analysis servers, End-To-End-Environments, UI and HTTP test drivers, GitOps Deployment Control, a/b Deployment frameworks, Message Bus Systems, Databases and last not least Container orchestrators which you might already call platforms … see, I already forgot how this sentence started. No, it’s not easy, It is a complicated ecosystem on its own. Just have a look at what a typical CI/CD pipeline at Consol looks like:
(If you would like a version of this picture with explanations on the individual building blocks just head to our CI/CD topic homepage)
What do we do when things become too complicate to handle? We automate them! And as this first seems like additional effort (mostly it is not) we make sure that this automatism can be scaled, meaning: Multiple consumers can profit from the automated „it“. This is something that gets extremely easy on Cloud platforms. Here every possible resource is virtual and automatable. I’m not only talking about Public Cloud Providers and their services but also – actually primarily – about Container Orchestrators. Cloud at its core means decoupling IT services from dedicated hardware, which is exactly what happens when Kubernetes runs Containers „somewhere“ in its cluster. Kubernetes actually goes one step further here than Cloud Provider VMs as it also jettisons traditional server concepts which enables it to go an extra mile here, but that would be the topic of another article.
Kubernetes = Platform?
Earlier I said, that some people might regard container orchestrators like Kubernetes to already be platforms, and in some way they are right. These are platforms for clustered workloads. Here they really give you everything you need, including replicas, autoscaling, scheduling, resource control etc. But are they also platforms for software development? Surely not. Of course you can build up everything you need on Kubernetes, but there are numerous options, a lot of things to configure, and plenty opportunities for running in the wrong direction. A dev team starting on a vanilla Kubernetes infrastructure is in for an adventurer journey.
Did someone say „Governance“? Right, important topic. Once you allow your dev teams to allocate resources in an automated way, you will need to keep under control what happens, at best without nagging anyone too much. Cloud platforms, as they decouple hardware from services, have the tendency to accept any workload no matter if the available hardware resources – yes they still exist! – are already overscheduled with other tasks. So you need to control resource availability, to manage access control.
And that is just the start! Think of all the security policies that software running on this platform might need to apply to in order to be compliant with your businesses rules. Regarding data storage, network connections, network encryption etc. The good message is that on container platforms it is quite possible to enforce most of these automatically without getting into the way too much. But this certainly is something that needs to be built up.
So that’s already a lot of things to do for leveraging cloud platforms as development platforms. A title drop is imminent.
So what is Platform Engineering then?
In short: All of this! But in a planned, coherent way. It means building a solid platform for your development teams project on top of cloud platforms, so they can work in their most productive way while your IT stays secure and manageable. Why is this suddenly a thing? Because there are now concepts available that make Platform Engineering go to places that were not imaginable earlier. Let me sketch you an ideal picture:
When a development team starts a new project they generally need to involve absolutely no one at Ops. Via a self-service website they can create their own project scope in the infrastructure. Maybe their department needs to purchase a „resource quota“, i.e. the right to use a certain number of CPUs and memory, so they are participating in paying the platform and the resources that they use, but that should be it. In this project scope they receive turnkey-ready solutions for things like CI/CD, staging environments, artifact repositories, deployment strategies and the such.
Utilizing these they are enabled not only to work in a full-fledged development process, they are also enabled to do their own Ops for their own service. The platform provides all functionalities necessary for that in the form of custom tailored automation resources. Who tailored those resources? That would be classical Ops, who also operate the platform and ensure the availability of all base services. So the roles of Ops and DevOps shifted here. Ops became a pure service provider for the platform and DevOps does what Ops did before. Have a look at the new responsibility model:
This of course is a 100 mile high birds eye view of things. nevertheless we see IT Ops providing the basic building blocks of a platform –networking, storage, computing etc. – while DevOps „sits“ on top of that and instantiates these building blocks with the specific configuration they need, e.g. network routing rules, deployments etc.
When done right, policies and compliance are automatically enforced by the platform as DevOps uses it. For example: Network transport security is accomplished via providing a service mesh whose specific rules are defined by DevOps. The DevOps team needs to explicitly define the allowed network connections via network policies (or nothing will connect) and all communication is implicitly encrypted on the mesh. The same goes for the needed workload resources. DevOps will need to explicitly assign CPU and memory resources to their deployments or they will not work.
Not to forget: Documentation! The process to setup a project, necessary configurations, available services is documented in detail and up to date.
The platform here is all about the engineers experience. Everything should be prepared, easy to use and work well together, so the team can focus on what it actually wants to accomplish.
Making things work easy is hard work!
Platform Engineering combines tasks that individually may fall into the responsibility of DevOps Engineers, Site Reliability Engineers, Cloud Architects and Automation Architects. Only that we now call the engineers by what they are building rather by what skills they are using to do it. A platform engineer should have good knowledge and experience in all of these skills as only a combination of them will enable them to provide a consistent platform.
Also, there are quite a number of topics that may be addressed for a platform. Have a look at the following overview graphic:
When building up a platform you will most likely not be able to address every potential feature that a platform may cover right at the beginning, so this graphic tries to order the features by their priority. Some things need be thought of right on the beginning, others can be introduced later. Of course, your mileage may vary, e.g. you might have a higher need for a standardized AI/ML platform because of what you do for a living.
There are beyond that quite a few things to learn from implementing platforms and a few paths to avoid. IMHO it mostly boils down to:
- Know your use cases
- Enforce what is necessary but not more
- Document and train wherever the opportunity arises
- Embrace true DevOps
- Convention over configuration
- Additional turn-key ready services greatly affects acceptance
And maybe the most important one: Platforms are never finished! To keep them working, up-to-date, well documented, efficient and effective is a continuous task that needs continuous attention. So you need a stable team of platform engineers to accomplish this. This at first might seem as additional effort. But if you do the numbers and compare that to the cost of non-agile projects, be it the increased effort per project or just the speed penalty on their time-to-market, I am sure you will see that it is more than worth it.