Hi. I decided to become a popular tech blogger.
But first, let’s set the atmosphere. Writing about computers is exciting, but the point of all writing is the human source (yes, there’s an evidence of me being a human) and a human recipient (let’s skip for now the hypothesis of modern internet’s content being created and consumed by the algorithms).
Here’s the best band ever (of this week), and proceed
Mexican Krautrock. Looks like humanity is still in a constant need for Joy Division, and that’s encouraging - it’s a world I have a place in, it’s a world with a hope.
But I digress.
Hello. I’m Sascha, Sasha or Саша.
It’s not my name, but it’s what I ask people to call me.
I’m a cloud architect by day but, of course these days, who’s not? We’ve all been MCs and DJs, bloggers, photographers, influencers and, most recently, epidemiologists. At least, I was involved with software for almost three decades now. I’ve seen some shit, but nothing dramatic. Except that one project we once inherited. If I’ll ever decide tech blogging is too small, and I want to be a tech booker instead, that project will be absolutely perfect for ten thousand examples on how not to write software.
What am I by night? Anything I want to be, but mostly tired.
It’s funny, how the realisation that different people perceive us all as different personalities or roles, means that our true essence is indeed just like the Buddhists say - we are something that can take different forms, but to be able to do that it cannot have a permanent form of its own. We are the play-doh of the universe.
Oh, yes, it’s a tech blog and I lured you with the hybrid cloud. Let’s get to it.
Hybrid cloud or multi-cloud is a myth.
There, I said it.
They might say that you can have your kubernetes cluster that has some pods in AWS, some in Azure and some on the Raspberry Pi in your parent’s basement. They will be not wrong. You can.
They might say Pulumi has powerful abstractions, and you can decide later which cloud exactly your serverless function will be deployed. It actually might. Let me know - I’m curious if it does.
Some would go further and tell you it’s a good idea to host your remote desktops with Azure, your Oracle with, well, Oracle and perform your computations with AWS. Because of licensing costs. Because when Azure runs out of capacity. Ehm, again. Your computations will still be running. Eggs and baskets, remember?
The problem is, that they all will be technically not wrong.
Have you heard this one?
Theoretically, there’s no difference between theory and practice. Practically, there is.
They all are not wrong, if we’re talking from the marketing perspective. Have you noticed, that Cosmos DB is described as “an anything you want it to be database“ in marketing explanations and only in technical ones you learn that it can, but you have to chose which exactly, before creating an instance and cannot change it later? It’s just an umbrella service for different kinds of databases, just like RDS is.
They’re also not wrong if we’re implementing hello world.
They’re wrong when things get real. There are two problems with this approach. Bring your calculator with you, I’d like you to check my math. Also keep in mind, I’m rounding everything to the nearest 5. I’m that lazy.
Problem One. Latency
As St. Dijkstra (may the gods of Informational Superhighway bless his soul) said already in 1969:
Apparently we are too much trained to disregard differences in scale, to treat them as "gradual differences that are not essential". We tell ourselves that what we can do once, we can also do twice and by induction we fool ourselves into believing that we can do it as many times as needed, but this is just not true! A factor of a thousand is already far beyond our powers of imagination!
Let’s imagine you set up a fancy 10Gbps direct connect link between your AWS and Oracle data centres. That will give you some crazy ~100Mb/s. Today, Wikipedia is about 15Gb, so if you were processing a dataset of that size, you’d be able to copy it in about 150 seconds. Two and a half minutes. I might’ve dropped a zero somewhere, go figure.
Anyway. No biggie.
The problem is that you’ll be lucky to have one tenth of that speed. And equally unlucky to have datasets in terabytes.
Something like 25000 minutes. 17 days to move your dataset from storage to where it’s processed. Remember, “this will take up to 24 hours to process“? Pepperidge farm remembers.
Back to Dijkstra:
To all this I can see only one answer, viz. to treat problems of size as explicitly as possible.
In other words, don’t make a PoC about whether you can connect 2 clouds. There’s nothing to P in that C. Make it about the scale you’re expecting. Multiply by 10.
Problem Two. Cost.
Every cloud provider charges nothing for incoming traffic and heartily for the egress. They obviously want you to use their and only their cloud. Who wouldn’t.
According to the decaying simple cost calculator, 10Gb of egress per month will cost you ~$25. Thus, moving 15Gb is ~$35. We can do that in 2 minutes. That’s ~$15/minute on that super high speed imaginary connection. Make that processing your daily business, say 4 hours a day, 5 days a week, that’s 20 hours or 1200 minutes a week, 4800 minutes of traffic per month. $72000 monthly egress costs. Just so that you could move your data from one cloud to the other.
Of course, if you’re into Oracle or SAP, then it might be actually cheaper than running it yourself, but. Oh, well.
Oh, and we even haven’t brought up the direct connect costs. Have a look.
In other words, keep your data where your compute is. Afraid of a vendor lock-in? Some other day, let’s calculate how much does it cost to run your own data centre.
Hybrid cloud is not an escape from the vendor lock-in. It’s a multivendor lock-in.
Hybrid cloud is a marketing myth ∎
Stay tuned and take care! See you soon.
Tools used in this post.
bc - the terminal calculator.
pdfimages and tesseract-ocr because, of course, it’s easier to convert those Dijkstra’s PDFs than to type that many characters.
AWS simple monthly calculator. Interestingly I couldn’t find the egress costs in its replacement.
Of course, a lot of Wikipedia. How else would’ve we learned that Play-Doh has retracted a product because customers complained one part of it looked like a penis?
https://unicode-table.com - you just don’t know how much you need it.
I don't understand half of the techie talk, but shall stay tuned (you mentioned 'cost') :D
Fun! I might quibble that problem 1: latency is really about bandwidth. Good point all the same.