DevOps Decrypted: Ep. 16 - A stitch in time
Join Laura, Jobin, Jon and Rasmus, along with our special guest Erko Knoll from Stitch It, as we discuss the architecture agenda, and how some groups are using Amazon’s decision to push their own narratives.
Summary
Welcome once again to DevOps Decrypted – where we're unpacking Amazon's seemingly shocking decision to ditch serverless for their Prime Video product. Does this move mean serverless is dead? Join Laura, Jobin, Jon and Rasmus, along with our special guest Erko Knoll from Stitch It, as we discuss the architecture agenda and how some groups use Amazon's decision to push their own narratives.
We talk about the benefits and limitations of a serverless architecture and why it probably won't be going anywhere soon. We also get to hear all about Stitch It – a code-first cross-platform integration tool, and how it can be used to navigate and mitigate the limitations of serverless environments.
Laura Larramore:
Alright – well, today is Friday, 19 May, and we will be talking a little bit about serverless building and Stitch It on the DevOps Decrypted podcast! So, welcome – I'm your host, Laura, and we have with us Jobin and Jon, Erko from Stitch It, and Rasmus (the famous Rasmus). So, welcome!
Jobin Kuruvilla:
Hello! Hello!
So… Serverless. That's an interesting take. What exactly are we going to talk about serverless? I just read the other day there's an article about Amazon ditching serverless for their Prime Video offering...
Rasmus Praestholm:
Serverless must be dead!
Jobin Kuruvilla:
Serverless must be dead, right!? Yeah… Well, I was actually posting about it in our internal Slack channel, and our CTO Jon Mort, who is on the call with us, was passionately against that “serverless is dead” argument. So, Jon, what do you really think happened?
Jon Mort:
Yeah, I think there's been a lot of people furthering their own agendas off the back of this post, and what most of them are missing is the fact that Prime Video is a really successful product that was built initially, really quickly, on serverless. And it's scaled to the point at which the things that they're trying to do with the product, the architecture, didn't support it anymore.
And they chose a different solution for this particular service, and I very much think it's a pragmatic decision to go; this architecture wasn't working for us in this situation, and we switched it to something else for the scale with where, for all of the other reasons, that it made sense for that product.
And I think that there's, yeah, there's a lot of FUD, you know? People are using it to further agendas of, yeah, “we want everything back in our own data centres”, and then on the serverless side as well, a lot of people going, no, they made the wrong decision. They should have built it this way…
I don't believe that the team would not have thought about how they could have rebuilt it using serverless, with the experience that they had. I don't think they’re a stupid team. So. And there's this kind of yeah range of serverless to bring everything back in-house, and we’ll run all our own [inaudible] and manage everything.
And I think that a lot of it is missing the point that this is a pragmatic decision, which is the right thing for the product they’re building and the right thing for the team.
Jobin Kuruvilla:
So if I break it down, Jon, what you're saying is serverless is still a viable technology for a lot of the people who want to create solutions, and, you know, do it really fast? It is still a viable technology – and two, you don't have to use serverless for everything, only if your architecture supports it, right? There are 2 different aspects that you talked about.
Jon Mort:
Yeah, yeah, absolutely. And that architecture question is, I think, that that's the key one of like, making sure you build your architecture, you design to the problem that you have, and I mean, I think that ultimately they kind of like the serverless. All this service-ful way of building things is a really great place to start, and it's a really great place for many, many systems – they don't need to migrate or go beyond that. I think this is a hugely powerful kind of systems-building paradigm.
And it gives you the advantage of focusing on your problem rather than focusing on running all of the underlying systems that you might need to do that. So if you're right, if you have a problem which fits severless well, then you should embrace it. I think that a serverless first mindset makes a ton of sense. But you shouldn't, or equally, you shouldn't try and force every single problem to be a serverless problem. It just doesn't make sense, either. So…
Rasmus Praestholm:
I sort of get the impression that this might be one of those more polarising topics, maybe in, maybe in, because it can feel alien. Like, a lot of us are just used to. Well, I mean, we used to write things right on computers. And then there were these VMs that popped up.
Now it's like containers which is like a gradual thing.
But then serverless seems like it just came flying in from the side, like this is way different.
Like, Wait, what? There are probably a lot of us, myself included, quite honestly, that just still feel stuck in the old way of doing things. And I actually this is reminding me because – the book you actually recommended Jon – The Value Flywheel Effect? I got a similar kind of weird impression from the author that he was just like really big on serverless. Well, it's like everybody should be serverless first, which was the key point, and he was also expressing confusion that it hasn't taken on more, which I would like back to that it might just feel kind of alien.
Jobin Kuruvilla:
I mean. I still remember the days when I started, you know, looking at Lambda, I mean the biggest problem I had was, you know I didn't have a server to work with. I mean it did make my life a bit difficult initially, but it doesn't mean that you know, Lambda is a bad choice.
Speaking of it, you know. I know, that we internally are using Lambda as a serverless architecture for a lot of the products that we have internally. But before I go into that – one quick question, Jon, just to confirm – so this move, Prime Video did or did Amazon pay for the Prime Video offering? It's nothing to do with scaling, is it?
Jon Mort:
Well, it is kind of… it was, and it wasn't in this particular example. So, they were struggling to scale the serverless system, because the architecture didn't meet the needs of the system at that point. So they had 2 services which were very, very chatty together, and so that was causing, that architecture was causing them problems because there was lots of communication between the 2 things. I think it would have been possible to have put those into the one serverless component, or done some things similar with the architecture. Still, they chose, like, I think there were some cost reasons as well, to go and use ECS and go for a container-based approach, because they were at that scale where maybe the cost advantage was there, and they could have that scale.
So yeah, you can have serverless – I mean it clearly had scaled to that point, and it's difficult to say whether it scaled even further. They've made the decision that it wasn't going to, in that architectural form, so…
Jobin Kuruvilla:
Got it. Okay. So if I have a microservice that I'm developing on, I can still do it in serverless architecture, maybe using Lambda, maybe something else, and I can still scale it to a point where it's been used by, you know, millions of users.
In this particular case, it was chatty architecture between the different microservices that was causing the issue.
Jon Mort:
Yeah, very much so, that’s the way I read that article and see the problem that they have.
Jobin Kuruvilla:
So serverless is not dead, after all. Phew!
Jobin Kuruvilla:
All right. But that said, as I was saying earlier – internally, we use serverless architecture for a lot of our products, and on that account, we have Erko, from Stitch It application here.
And so Erko – welcome!
Erko Knoll:
Yeah, nice to be here! And yeah, we also, or Stitch It, use serverless, or, to be more precise, AWS Lambdas for most of our application stack. I'd say about 90% of the application is just a bunch of Lambdas talking to each other.
The 10% that isn’t serverless is just one issue that's performing regular calculations or just keeping track of how many, how many tokens each customer has left…
Jobin Kuruvilla:
Erko, before you go into the details, let me ask you this. I mean were you freaked out seeing this news, I mean is serverless dead?
Erko Knoll:
No, not at all. I mean people have opinions, and in my opinion, there's a tool for every job. So if that tool was to run something in a server fashion, then that probably sort of?
But in our case. Being a young product still in beta, getting to leverage serverless, has been a really really good experience. I'd say so or less has served us quite well so far. It's not to say that we might be able to scale to infinity with serverless, but right now it's doing its job well.
Jobin Kuruvilla:
So what is this product about? I mean, what exactly is Stitch It? Maybe you can tell us what Stitch It is, and then we will go into the architecture.
Erko Knoll:
Yeah. So it's basically a tool for building integrations in a code-first manner. You can imagine things like Zapier that are low code or have no code-based integration.
Or you can imagine that the older enterprise where enterprise had ESP tools, enterprise, service process, basically, which had both of the coding capabilities, usually, and some local interface.
So Stitch It is one of those tools that are primarily code-first. JavaScript or typescript that you have to use. and it takes care of all the digest parts of building integration, so you don't have to have to worry about where the code is being executed. We actually take care of it for you running it on top of AWS Lambdas under the hood.
And we also take care of things like how to manage connections, how to manage authentication, and how to store those authentication credentials securely. And then it also tries to make your developer experience pretty good by providing an editor which is based on visual code. So your developer experience can be quite familiar if you're coming from a background where VS code is being used.
Jobin Kuruvilla:
You did mention, you know, using an enterprise service space for something similar, and that reminded me of when I was working on a similar integration product years back, we actually went with Mule ESP.
Now the question for you. What was your consideration for going to AWS Lambda for something like this? And for serverless architecture in general? Why didn't you use an ESP?
Erko Knoll:
Why didn't I use an existing ESP? So, our first consideration was that we wanted to be a source platform, which means that you don't have to host something on your own. You don't have to worry about the server, but it's running, so that's the first consideration.
We wanted to just offer users an interface, a web-based interface, where they can just log in and set up the connectors they need to, connect to the tools they need to integrate with and then simply, just directly jump into writing the business logic in code.
The same value proposition of Scriptrunner – instead of building a massive Java-based looking to enhance the Atlassian applications, you can install Scriptrunner, do a little bit of configuration around where you want the script to be run, and just write the script.
Jobin Kuruvilla:
Very interesting.
Rasmus Praestholm:
So early on. I was also thinking a little bit about if this whole Amazon thing is really more of an almost like a cost optimisation thing… They started out. They moved fast. They broke things. They got cool things working.
And then, later on, they realised, hey, these 2 services are kind of chatty. Maybe we should turn them into something else.
Have you encountered anything with Stitch It where you can sort of guess at, hey? In the future, you know. Maybe there are some bits we won't be doing serverless?
Erko Knoll:
Oh, yeah, definitely, if there are cost considerations, if there are performance issues that arise from that, if there are scaling issues, because serverless can scale – but you still have to be aware of it in the case of AWS Lambdas, there are certain limits that you cannot go over.
And if, for some reason, that doesn't work for you, or doesn't work for us, there's no point. We won’t be sticking to serverless if we have an alternative route, let's host it ourselves, and probably be more efficient in the progress.
Jobin Kuruvilla:
We spoke about the cost in general, I mean, when it comes to Amazon doing it, or even you are talking about cost considerations – that for some of the listeners, it might actually pose this question. So are we really saying that Lambda, or serverless architecture in general, is, you know, will cost us more than using containers?
Erko Knoll:
I'd say the contrary. It tends to cost less because, again I’m speaking about AWS Lambda – you pay for the execution, being that if there is no traffic, you don't pay anything. You only pay for it for the stuff that you're doing. When you take a container or issue tool, then you're paying on an hourly basis, how much that or for how long that issue tool is basically running. So in a lot of cases, you can be far more flexible in your cost optimisations with serverless.
Jobin Kuruvilla:
That is, that's exactly the point I wanted to clarify because, you know, in many cases, the number of executions might be quite less, and what you'll be doing by running an instance is you're paying for the service that you're not actually using.
Jon Mort:
Yeah, I think that one of the really important, particularly like business applications, is the scale to zero aspect of serverless is really really important.
If it's not being used, it costs you nothing. So it works really well for infrequent things or bursty tools, or services that you might have.
I think that kind of thing is one of its key advantages over a container-based system.
Jobin Kuruvilla:
I have actually seen a lot of the cases where there are issues about startup times, like, for example if you’re running Java runtimes, you know. It takes a bit of time, you know, to start it up. Is that going to be an issue? I mean, have you seen any issues like that, Erko, with Stitch It?
Erko Knoll:
Cold starts, generally I would say, are an Achilles Heel of serverless.
Cold starts, regardless of the runtime you use, are, I think, in serverless, you're going to have to factor into your architecture. Java, yes, is more heavy than, let's say a node or node JS that we are using.
But, for example, AWS has done some pretty interesting optimizations around… I believe they have a feature called Snap Start, or something that is supposed to create a snapshot of
JVM basically sort of when you want to go skipping some of the regular stuff that execution, but, generally speaking, half a second is something you have to encounter to some of the requests that are sort from a cold start.
And usually, this is something that you know, when your application gets hit, you usually have a bunch of Lambdas that talk to each other.
Worst case scenario – all of them are cold, so you multiply by half a second. But the more traffic you have, mathematically speaking, the fewer chances there are for a single user to hit all of the Lambdas in a cold start because chances are that you know you will be hitting a Lambda that's already spooled up.
And even then, the AWS gives you a bunch of tools to mitigate this issue – one of them is provision concurrency, which is pretty much like an issue tool, AWS in advance, to keep those Lambdas warm for you and every request that can fit into that pool of warm Lambdas will be served from that pool – and if it doesn't fit into that pool of Lambdas, then you have to suffer the cold start.
Rasmus Praestholm:
It seems like more. It's about just, you know, fitting your business like, what do you do? What is your pace? How much do you scale? Do you get quiet?
Erko Knoll:
If you want to do some black box trading, where latency is really really important – algorithmic trading – then you probably don’t want ground in your Lambdas. Having half a second delay by the stock at the price you want to pay is probably not going to be good enough for you.
But then a lot of business applications, you know. Occasionally, half a second, occasionally, won't move the needle too much.
Rasmus Praestholm:
I ran into a similar kind of comparison myself not that long ago because I'm a big fan of Google Cloud and so on. And I was building a new Kubernetes cluster, and I realised that. Oh, wait! You know, GKE Autopilot is a thing now, which is where Google will keep the nodes, you know, warm or whatnot for you.
But you don't maintain a cluster. You can scale to 0, even though it's Kubernetes.
But… while that sounds like you'll be, ooh, I like that, because then I don't need to pay for the reasons I'm not using – then there's the little gotcha that oh, wait! No, no! When you actually run a thing, you could charge for the entire resource requirement request you put on your stuff.
You don't get to take advantage of the well. I've got ten things running, and they're all running at 10%, so I only need one unit of something... So there's always some way they're going to get you.
Erko Knoll:
Those details are usually in fine print!
Rasmus Praestholm:
Yep… So the one other thing I was curious about cost-wise just to keep me away from using something in the past is Za-pier, or Zapier, or whatever it's called.
Because I was looking at, you know, open source-y hobby things like, hey? This thing looks kind of useful, but I only get 100 Zaps a month before it starts costing money, and so on, which may not be a lot. But when you're trying to do something for free, you get scared of these kinds of limits.
So I never got into using Zapier. But now it feels like, with Stitch It, we have sort of like our own thing. And now I'm thinking about all these cool, neat ways that, hey, now that I have an in, I want to use this stuff…
Erko Knoll:
Yeah. Usually, these tools have a free tier. These free tiers are quite limited.
We also have a free tier – it's quite generous. We give you 10,000 functioning locations per month. Just to clarify, what is a functioning location, basically, any function that we need to run in our room, any external event that you need to process as part of your integration?
Logic is basically one functioning location. So yeah, you should be able to do quite a lot within those limits.
Rasmus Praestholm:
That sounds more generous than the Zapier thing. Can you perhaps do a bit more of a contrasting thought on what I would use Zapier for? And what would I use Stitch It for?
Erko Knoll:
So Stitch It, because it's code-based, it's aimed solely at more complex integration issues. Zapier, on the other hand, is primarily no-code first – you start out with the UI, and you set up a bunch of actions in the Zap – but I haven't really used it, so please forgive me if I get the terminology wrong… And you're supposed to be able to do simple things really quickly with Zapier.
But the thing with most of the low code integration or automation tools is that you will hit the ceiling pretty quickly in terms of complexity that you can do.
Others sit around the fact that you can have an escape hatch into the code in that tool, also. But our unique perspective on solving integrations is that we give you code from the start.
So you can do your simple stuff in the code and the most complex stuff in the code, also. So you have this one single, unified and optimised interface around writing code and all the things that have to happen around that code to run integrations properly, such as, you know, connectors and other things.
Rasmus Praestholm:
Yep, that sounds quite reasonable. I have hit the wall with Zapier just thinking like I want to do a thing to Trello. But the right thing isn't there, so I can't do it. But I guess I can do that with Stitch It, then!
Erko Knoll:
There are only two requirements for connecting Stitch It to something you want to work with. The first is whether one can it connect to it – Stitch It is a source product hosted in the cloud on the internet. And if you have something behind a firewall, you can't hit it.
But what you can do is we offer a static IP address, and you can add a Firewall bypass rule into a firewall to let Stitch It connections pass through, or you can set up a reverse proxy in some of your networks if you want to do it in a slightly different manner.
So you must do some in your network security to let Stitch It connect to your service.
Another requirement is that the thing you're talking to has to have some sort of HTTP-based API available to be able to talk to it. Usually, you know Rest API, CraftQL, and whatever works over HTTP, you’re good to go.
Jobin Kuruvilla:
So I will add a few use cases to it, right? I mean, obviously, talk about integration – I have worked on products which integrated, I remember Jira with Quality Center, because at some point, you know, a lot of the folks were using Quality Center HP ALM for just case management.
Whereas user stories were written in Jira, they wanted to keep them in sync. So you know what's happening on the test case management side inside Jira itself, and we had to run integration platforms to connect Jira to the Quality Center.
I could potentially say the same thing with Jira and GitLab these days. Right? I mean, in Jira, You do user story management in GitLab, you're hosting your code, doing CI/CD a lot of these things, but they also have an issue tracking inside, so I can keep the issues in sync between Jira and GitLab.
Just to understand this.. So you're saying Stitch It will work with any of these tools if I want to integrate them. Did I get that right?
Erko Knoll:
Exactly. And, in regards to Jira and GitLab, those are the two tools that we officially have connectors available.
The difference between having a connector and not having a connector is that if you have a connector, it's really simple. There's a UI. You go through a specific update connection. And you basically then start using APIs.
If you don't have a connector, then we offer something called a generic connector, which means that you have to put your API keys directly in there as one of the headers that have to be passed along.
It's a little bit more… less user-friendly if you don't know exactly what you're doing, but you still can use this as an escape hatch into the tools you need to work with.
But in your case, GitLab and GitLab issues, or tickets, or whatever it's called, syncing it with Jira on-prem or cloud, should be actually quite straightforward with Stitch It.
You still have to write the business logic on your own, you know, when something happens in Jira, then what in GitLab, and vice versa, if it has to be bi-directional – but connecting these two tools should be quite straightforward.
Rasmus Praestholm:
So connectors are really kind of like the existing integrations in something like Zapier?
Erko Knoll:
Not existing integrations, but gateways or interfaces into talking to certain tools. The integration. Well, it depends on what you consider an integration. I tend to view integration as the business logic that moves data with those things.
But obviously, part of the business logic is the connectivity tools have to be connected.
And the great thing about Stitch It, compared to some other tools, is that it's not designed to be one-to-one.
Let's say facilitating one-to-one integration means that if something happens in system A, then do something in the system. B.
You can do many to many you can have listening events from the three different Jira instances and bump them to different GitLab instances based on your business logic.
That's the flexibility you get with Stitch It.
Jobin Kuruvilla:
I have seen the demo for Stitch It, which gave me more insight into how these connectors work. So, having a connector, obviously, it helps you get up and running really fast. But don't you have something called managed APIs, which will help you code the business logic faster?
Erko Knoll:
Yeah, we have the concept of managed API, which basically is a wrapper over the underlying API – usually, they are. Currently, it’s only Rest APIs; we are looking at how to construct a way the complexities of CraftQL kind of and also give you a little bit simpler interfacing – but the idea is true, that instead of you having to know how to call that API, which usually means what the URL is? What's the method of HTTP request? What's the payload of the HTTP request? And what's the shape of the response that you're getting back? What managed API is doing is basically telling you exactly what you have to do; instead of you having to know the URL, you just for a function as in a regular programming language.
And what under the hood is doing. It's basically calling that API point for you, and it will tell you what parameters you can pass along, what is the shape of the body or a request that you can pass along, and also what's the shape of the response. So you don't necessarily have to look up that information from the external documentation.
We try to keep it up to date for you.
Jobin Kuruvilla:
So it really brings down probably the code from 100 lines of code to maybe less than 10?
Erko Knoll:
Yeah, it definitely reduced the number of code lines and cognitive overhead. In terms of you having to know all of these details because when you use it in our editor, you get the code successions and autocompletion when you are working with these APIs
It doesn’t remove the need to know how these tools work; you know, in Jira, you have issues in GitLab and something else – you have to know what these things are called. So the business logic it's still for you to figure out, but it takes care of working with APIs. It’s just slightly more manageable.
Rasmus Praestholm:
So I'm kind of curious. If we are past this topic, you're working with a serverless heavy application. One thing that's always made me wonder, like making it feel alien, is. How do you have a healthy dev and test cycle with serverless?
How do you have the developer experience of – well, I want to add a new connector, for, like Discord, do I like go in a branch? Do I go in a different serverless URL? What do you change around to those that do not impact your live service?
Erko Knoll:
It depends; I guess it depends on how teams have figured it out. The way we have figured out is actually pretty simple. Each of the developers they have their own AWS account.
CDK manages all the infrastructure. So basically, each developer deploys their own stack and develops it against that stack. And when things are ready to go, they basically take through pocket pipelines and get deployed directly into staging and production AWS accounts.
Jobin Kuruvilla:
That's very interesting because earlier, we talked about the cost, and you don't have to worry about each person running their own AWS accounts because you're basically only paying for the executions. You're not paying for EC2 instances running forever for each developer.
Erko Knoll:
Generally speaking, yes, except for the one EC2 instance that we have, which we are paying full time, but generally speaking, that's true – if the instance does not get any load or the account backing it, then the cost is quite low.
But generally speaking, serverless is just one piece of the puzzle when it comes to developing a product. You also need other AWS services, and some get charged differently. For example, you may want to have some SSL certificates for your hosting somewhere, and that gets charged differently. For example, we also have a VPC network, primarily to put that EC2 to secure the network, but the VPC interface itself gets charged hourly; there's definitely cost reduction, but it's not zero. – it's just less.
Rasmus Praestholm:
So you mentioned a staging environment. That one has a permanent URL. You can have all the things tied into it, and so on.
Erko Knoll:
Yeah, it's a carbon copy of the production. Basically.
Rasmus Praestholm:
And then, essentially, if you're trying to do a new connector. You probably just do everything over in staging. Or would you ever have a point where you're doing one thing and staging? But you're pointing at something in production?
Erko Knoll:
No, staging is primarily just there to… currently it doesn't really serve a purpose. Everything goes directly in the branch also. But ideally, there should be a bunch of, and you the tests that must be on there to verify. If you didn't hit any regressions or anything as part of the deployment.
But staging is really to verify if that is if it's safe to deploy the production.
In this case, when it comes to, for example, developing a new collector, let's say I had the developer somewhere that needed to do it. They have their own AWS account where the local stack is set up; they just go ahead and create a new service that deals with that particular connector, deploy the local stack, and test if it works properly as part of the Pr. They can also expose their own URL for that particular development—the rest of us. We can just log into that instance and test the connector. If it looks good, we'll give a thumbs up. If not, then go back and fix something.
Rasmus Praestholm:
Okay. So, that comes to one more question: can you do automated testing with all this stuff? Can you have a test suite for different connectors and how they might work with each other?
Erko Knoll:
Yeah, we actually do actually, from front end perspective, from the user's perspective, how to set up a connector – we have a bunch of end-to-end tests with Playwright that just, you know, presses the buttons that the user would otherwise do.
Rasmus Praestholm:
Cool.
Jobin Kuruvilla:
So it doesn't look like serverless is dead, based on the conversation! I have no plans to kill it, either. Still, I have to ask about one more limitation because I think it was for one of our customers considering Stitch It for bulk migrations. One of the limitations we came across was Lambda itself has limitations in terms of how much time it can execute a single Lambda can take. I think it was 5 minutes or something.
Erko Knoll:
15.
Jobin Kuruvilla:
Okay, 15. Okay, All right. I'm wrong 15. I'll go with the expert here. But that limitation did actually pose a problem for us because, you know, some of these migrations can go much much longer than that.
So how do you work around limitations like that?
I think, for integrations. It's probably not a problem because you probably wouldn't see a single thread running for more than 15 minutes…
Erko Knoll:
I mean, that's a very good point. Lambdas have limitations, and as long as we have limitations, we both have limitations for our end users. At least as long as these user scripts are running in the Lambda.
One example is, if you want to get rid of a Lambda, we can start hosting our own containers where those scripts are running so one way to get rid of that limit is just not to use a Lambda. But migration is a very interesting topic because we are actually kind of solving this problem with Stitch It.
And yes, there is this 15 min limitation. You just have to work around it, and Stitch It actually gives you tools to work around it. And coincidentally, I’d argue that forcing you to work around this limitation makes your migration script far more reliable.
So let me explain the way we work around it. But before I explain it, I'll tell you about our script in Stitch It. One of the tools we support is Tempo.
It's a tool in the Atlassian ecosystem for logging workloads – how many hours somebody worked on a particular ticket.
And as of today, Tempo doesn’t have a bespoke solution to migrate work logs from one cloud instance into another. So they came to us, and that, you know, can we solve this problem for them?
And we did, with Stitch It
So we built a bunch of templates. A template in Stitch It is basically a blueprint – a complete solution that you can create your own workspace from, basically where integration is set up, all the code is there, all the connections – everything is in a workspace.
And so we had a bunch of migration templates. We have one for Tempo accounts, one for Tempo work logs and the most important one is work logs. Those are the ones where people may have millions of work logs in a single instance that needs to be migrated.
And the work around this so that when the script starts running, it basically starts tracking; where is the progress? What have I done previously?
So it continues where it left off previously. So it starts doing its thing, it monitors where it's about to run out of time – so it's about 14 minutes, is the limitation we've set. So 1 minute, just for you know, safety.
And the minutes are about to run out of time. It restarts itself. It programmatically says, you know. Okay, so I want to start the new script execution, basically, and the script basically runs in a loop. Every time it starts off, it checks how many work logs I managed to migrate previously, and it then continues with the work logs that it hasn’t migrated so far.
But the consequence is that you can pause this migration at any time because the state is stored in Stitch It using record storage, which is a simple key-value store. And you can resume the migration at any given time because the previous state is there. It knows how much stuff it managed to migrate, and then it just continues where it left off.
So that's an example of finding a workaround for a hard limitation that you know AWS currently forces us to comply with.
Jon Mort:
This is a really interesting example of where the architectural constraints and limitations push you towards a more robust or better design. And I think there are plenty of examples throughout, like serverless like API Gateway, Lambda… those kinds of tools which push you towards more resilient architectures, or essentially more scalable architectures due to those constraints.
So yeah, I don't know if you got any kind of thoughts along those lines because it's something that AWS pushes, as a good thing about serverless is that it squeezes you more towards good quality, well-worn architecture.
Erko Knoll:
Yeah, that's the limitations that we ideally want to get around that we have to, you know… deal with.
One of the particular limitations is that a response has to be sent back with an API gateway by 29 seconds.
Weird number, but it's 29s, and there are plenty of things... But one thing is that you should ensure that whatever process you're running behind an API gateway responds fashionably. Because if you go above 29s, you get an error back, basically.
So yeah, I agree that those limitations actually make you write more robust solutions.
Jobin Kuruvilla:
I understand the point. I understand the point. I mean, limitations are there for a reason, and, as Jon said, maybe it is in our best interest, so we can make our architecture a bit more robust.
But I can't help but ask a controversial question here…
Is this one of those cases where serverless is not the right fit for our architecture? Should we be looking at something else so you can you don't have to work on the limitations? Instead, maybe there's a better solution out there that we can use.
Erko Knoll:
I'd say it's all about the cost of, you know, how much it costs to re-architect around limitations versus okay, so that limitation is just getting in my way. I'll do it in some other solution. What's the cost of that one? So if the cost is not going to be something else. Then it's a no-brainer.
Jobin Kuruvilla:
And the cost of running it for the next 5 or 10 years.
Erko Knoll:
Yeah, the cost itself is all in setting up the solution and how much it costs, let’s say, the AWS bill of running an issue tool – what's the maintenance, you know, when you go away from a Lambda to an EC2, you’re a server, you have to patch that server so you know that's time.
There is a people cost, you know? People have to know how those EC2s work, you know.
Developers who have to maintain it if you're not doing DevOps, then some operations people. So the cost isn't just what you see in your AWS bill at the end of the month.
Jobin Kuruvilla:
So to summarise, serverless is not dead; you just use it when needed.
Erko Knoll:
Exactly.
Laura Larramore:
It reminds me of that analogy where you can hammer in now with a screwdriver. But is that the best tool to use? Probably not.
Erko Knoll:
You can build the house with the sledgehammer also – not very efficient!
Laura Larramore:
All right. Well, thank you, guys. Thank you. Erko, for coming on and talking to us about Stitch It and the serverless build of it. It was really fascinating. I think we need to do it again!
And that's it for the DevOps Decrypted podcast. You can connect with us on social media at Adaptavist and let's know what you think of the show. For myself and Jon, Jobin, Erko and Rasmus – thank you for joining us today, and I hope you have a wonderful day!
Why not leave us a review on your podcast platform of choice? Let us know how we're doing or highlight topics you would like us to discuss in our upcoming episodes.
We truly love to hear your feedback, and as a thank you, we will be giving out some free Adaptavist swag bags to say thank you for your ongoing support!