Jekyll2021-06-10T15:15:22+00:00https://bigbitbus.com/feed.xmlBigBitBus Inc.Take Charge of your Cloud Journey.{"name"=>nil, "email"=>nil, "twitter"=>nil}Google Cloud’s E2 VMs compared to the N2 VMs2021-06-10T00:00:00+00:002021-06-10T00:00:00+00:00https://bigbitbus.com/2021/06/10/Google-Cloud-E2-N2-VMs<p>In this post we compare Google Cloud’s E2- class of virtual machines with N2- class of virtual machines. We compare their performance and cost so you can make data-driven decisions when choosing between the two classes.</p>
<h2 id="e2-and-n2-vm-products">E2 and N2 VM Products</h2>
<p>Here is how the <a href="https://cloud.google.com/compute/docs/machine-types">Google cloud VM types</a> page describes the E2 and N2 products:</p>
<p>“E2 machine types are cost-optimized VMs that offer up to 32 vCPUs with up to 128 GB of memory with a maximum of 8 GB per vCPU. N2 machine types are the second generation general-purpose machine types that offer flexible sizing between 2 to 80 vCPUs and 0.5 to 8 GB of memory per vCPU.”</p>
<p>The description suggests that the E2 vCPU is a cost effective as compared to the N2 but has a lower performance. We wanted to understand the relative performance characteristics of the E2 and N2 VMs’ vCPUs. So we ran the <a href="https://wiki.ubuntu.com/Kernel/Reference/stress-ng">stress tool </a> to compare vCPU performance. Here is what we found when we compared an E2-medium (2vCPU, 4GB RAM) with an N2-Standard-2 (2vCPU, 8GB RAM). In this experiment, we spun up an E2 and N2 virtual machine and ran the stress tool on them. The stress tool can cycle through 100s of different CPU-consuming tasks (BOGO operations on the y-axis below) and adjust their frequency until the desired utilization (stress percent in the x-axis below) is achieved.</p>
<p align="center">
<img src="https://docs.google.com/spreadsheets/d/e/2PACX-1vS0riPn6tgRNsBP1ONVr-ALqMRgMudq4eN61hUIqBGbO92DdcPmEH5QKlm-4JDks5Ly6tvqM0j0WD5C/pubchart?oid=1451668472&format=image" />
</p>
<p>The E2 is less performant than the N2. The remarkable insight from the graph is that the difference in their performance is dependent on the load applied to the VM. In fact the E2 VM closely follows the N2 until about 25% utilization (stress percent), but afterwards its performance falls off as compared to the N2.</p>
<p>Since its unlikely that the silicon was designed with this characteristic we delved into E2 documentation and came across <a href="https://cloud.google.com/blog/products/compute/understanding-dynamic-resource-management-in-e2-vms">Google’s blog</a> on the how E2 VMs work. These VMs use “Performance-driven dynamic resource management”. E2 VMs are more tightly packed into physical cores and the idea is to continiously monitor the VM’s usage and “live migrate” it to another hypervisor as and when needed. The blogpost notes that users should consider N2 or C2 VMs for higher-CPU loads. So we believe that the non-linear E2 behavior we saw in our experiments was a result of triggering the dynamic resource management algorithms.</p>
<h2 id="pricing">Pricing</h2>
<p>You can use our free <a href="https://b3console.bigbitbus.com/login">B3Console</a> tool to compare VMs across different cloud providers, we have collected the on-demand pricing per month (720 hours) for E2 and N2 VMs in Google’s us-central1 datacenter (as of June 10th, 2021) in this table :</p>
<table>
<thead>
<tr>
<th>Product</th>
<th>1 vCPU</th>
<th>1 GB RAM</th>
</tr>
</thead>
<tbody>
<tr>
<td>E2</td>
<td>$15.70</td>
<td>$2.10</td>
</tr>
<tr>
<td>N2</td>
<td>$22.76</td>
<td>$3.05</td>
</tr>
</tbody>
</table>
<p>N2 is at an approximate 50% premium as compared to E2.</p>
<h2 id="outlook">Outlook</h2>
<p>So, which of the product families should you choose? For low utilization workloads (your workload seldom spikes about 25% CPU utilization) the E2 is a great option; this includes a lot of dev/test environments for example.</p>
<p>But for workloads that demand consistency and constant performance go with the N2. Thats because its almost certainly more expensive to scale compute horizontally than vertically. For example a 100 VM N2 cluster running any distributed application will probably outperform a 150 VM E2 cluster, simply because coordinating nodes in distributed systems is slow and expensive given the relatively slow networks that connect VMs.</p>
<p>I would also be wary of using E2 VMs in production workloads where usage patterns can be very non-linear and unpredictable. You don’t want to be scratching your head about why your application that supported 10K users yesterday suddenly struggles with 7K users today, just because the E2 VMs hosting your application do not guarantee consistent performance!</p>
<p>*</p>
<p><em>Sachin Agarwal is a computer systems researcher and the founder of BigBitBus.</em></p>
<p><em>BigBitBus is on a mission to bring greater transparency in public cloud and managed big data and analytics services.</em></p>{"name"=>nil, "email"=>nil, "twitter"=>nil}In this post we compare Google Cloud’s E2- class of virtual machines with N2- class of virtual machines. We compare their performance and cost so you can make data-driven decisions when choosing between the two classes.Comparing Pricing of Managed Relational Databases in AWS, Azure and GCP2021-03-26T00:00:00+00:002021-03-26T00:00:00+00:00https://bigbitbus.com/2021/03/26/Comparing%20Pricing%20of%20Managed%20Databases%20in%20AWS,%20%20Azure%20and%20GCP<p>In this post will go over the pricing of relational database services across the three major providers viz. Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). We will show how using a different provider can help you save money as well as how costs scale with database size. <a href="https://b3console.bigbitbus.com/login">B3Console</a> can be your inbiased data-driven decision helper in this space!</p>
<h2 id="relational-database-services-in-the-b3console">Relational Database Services in the B3Console</h2>
<p>Relational database services are an integral part of applications. Providing database platform-as-a-services (PaaS) is a huge market opportunity as many organizations are looking to outsource database server management. Major cloud service providers like AWS, Azure and Google provide off the-shelf solutions to run a database in the cloud with ease. These services, though easy to use, often come with a hefty price tag. For example, even a modest 4-core instance running the open-source MySQL database engine can cost upwards of $6,000 a year and costs only increase as the application scales. If a user has a database-heavy IT footprint, then they can benefit from choosing a cost-effective database PaaS provider.</p>
<p>In this article we will go over the pricing of relational database services across the three major providers viz. Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). We will show how using a different provider can help you save money as well as how costs scale as the database scales.</p>
<h2 id="a-birds-eye-view">A Bird’s Eye View</h2>
<p>This section will present an overview of pricing of relational database services from the three major providers. The following charts look at the hourly On-Demand price for relational database services across virtual CPU core count. All machines run the MySQL database engine. Hence the only major difference is the provider. For this study we looked at the average price of instances across regions and grouped them by the vCPU core count.</p>
<p>We find that the cheapest provider across all vCPU sizes is Microsoft Azure, whereas Google and AWS have similar pricing up until 32 vCPU cores. Beyond this point AWS becomes more expensive.</p>
<p align="center">
<img src="/assets/post15/rates.png" />
</p>
<p>Pricing also stratifies on various machine types. Some machine types are significantly more expensive than the others. For GCP and Azure we see that memory optimized machines are more expensive than generic machines. On the other hand AWS provides many machine types leading it to be more expensive than the other providers.</p>
<p align="center">
<img src="/assets/post15/machine_types.png" />
</p>
<p>There are also differences in prices across regions, and these too can be significant. The chart below shows the prices of a small single core machine (db.m1.small) for AWS running the PostgreSQL. As can be seen prices can vary widely between regions. As shown below the rate can more than double depending on the region, where the service is deployed.</p>
<p align="center">
<img src="/assets/post15/regions.png" />
</p>
<h2 id="comparing-providers">Comparing Providers</h2>
<p>There are 10s of thousands of managed database machine type skus across providers. BigBitBus has collected detailed pricing and sizing data for all these options so you can make data-driven decisions about how to choose your cloud provider and right-size your database workloads.</p>
<p align="center">
<img src="/assets/post15/screenshot_aze.png" />
</p>
<p>The B3Console allows you to gain insights into your IT infrastructure costs using our comparison tool. It allows you to take the relational database service on one provider and translates it to corresponding services on another. For example, we translate the m4 machine type from AWS to GCP and Azure using the B3Console’s matching feature. The m4 machine types are the general purpose machine types from Amazon that can be used for most workloads. In particular we match the db.m4.2xlarge machine with 8 vCPU cores and 32 GB of memory running the MySQL database engine. In case of GCP, the B3Console returns the db-n1-standard-8 service with 8 cores of vCPU and 30 GB of memory. Outside of the small difference in memory the machines are the same, but the difference in price is significant. The AWS service costs $1,022 a month compared to $789 of the GCP service. This is approximately 23% in savings and can save about $2,800 a year. The results from the GCP translation can be seen in the figure below:</p>
<p align="center">
<img src="/assets/post15/screenshot.png" />
</p>
<p>The results are even more drastic for Azure. According to the <a href="https://b3console.bigbitbus.com/login">B3Console</a> the corresponding service for the AWS’ db.m4.2xlarge service is Azure’s general purpose Gen 5 machine (encoded as db-general-g5-8-mysql) with 8 cores and 40 GB of memory. This machine costs about $520 a month, which is almost half the price of the AWS machine. This amounts to savings of $6,000 a year. So not only the Azure service has 8 GB more in memory it also costs about half as much. The results from the B3Console can be seen the screenshot below:</p>
<h2 id="the-bottom-line">The Bottom Line</h2>
<p>After having seen the comparison between various providers we get an idea about the diversity among the services provided under the same class of services. This reinforces the point that when it comes to making decisions about your cloud infrastructure, technical requirements cannot be the sole criterion. There needs to be a more rigorous cost-benefit analysis; this is the gap our product aims to fill. You can access the B3Console for free at: <a href="https://b3console.bigbitbus.com/login">https://b3console.bigbitbus.com/login</a></p>{"name"=>nil, "email"=>nil, "twitter"=>nil}In this post will go over the pricing of relational database services across the three major providers viz. Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). We will show how using a different provider can help you save money as well as how costs scale with database size. B3Console can be your inbiased data-driven decision helper in this space!Matching Algorithm2021-02-08T00:00:00+00:002021-02-08T00:00:00+00:00https://bigbitbus.com/2021/02/08/Matching-Algorithm<p>In this article we explain some of the internals of our ML-based cloud provider comparison solution implemented in our <a href="https://b3console.bigbitbus.com">B3Console</a> product.</p>
<p>The global public cloud service market is a multi-billion dollar industry that continues to grow at a rapid pace as businesses invest in modern cloud IT. As public cloud services become the primary drivers of an organization’s IT infrastructure, sound data-driven decisions to choose a cloud service provider or migrate from one provider to another become vital. This is the space we aim to fill with our B3Console product. We enable users to make unbiased data-driven comparisons between cloud provider services.</p>
<p>We provide a way for users to replicate their infrastructure footprint metadata in our application in form of application stacks, which are composed of individual cloud services. Once the user has defined the stack, s/he can do many “what-if” analyses when it comes to migrating the stack to another provider. For example, consider a simple e-commerce application. The application consists of 3 tiers: an application tier, a database tier, and an NGINX tier. The infrastructure runs on Microsoft’s Azure platform and costs about $33,000 a year. The details of such an application stack can be found in the figure below:</p>
<p align="center">
<img src="/assets/post13/Stack.png" />
</p>
<p>The stack feature allows our users to directly translate their infrastructure to an alternate provider using AI-based algorithms. This translation provides an estimate of infrastructure costs on the alternate provider. It also provides multiple options on the alternate provider for a single service, which enables users to clearly understand tradeoffs between different providers. Continuing with our e-commerce application above we use the BigBitBus console to translate the application stack to a different provider, say Amazon Web Services (AWS). According to our AI matching algorithm the same e-commerce application can be run on AWS for approximately $14,000 , which is less than half the price of Azure with the same virtual machines and data centre location.</p>
<p align="center">
<img src="/assets/post13/Cost.png" />
</p>
<h2 id="data">Data</h2>
<p>In the backend, our dataset is made up of various cloud infrastructure services from major providers like AWS, Google, Azure, Alibaba etc. The data contains information about the pricing of these services and certain defining attributes of each service type. For example for a compute service one might like to compare virtual machine size in terms of the number of CPU cores and the memory. The data set also includes region/location data on the service, which adds an additional dimension as the prices and availability vary across regions.</p>
<h3 id="data-preparation">Data Preparation</h3>
<p>The data is stored in a Postgres database divided into various tables. This structure, while useful for storing and retrieving data efficiently does not lend itself to for direct use in a machine learning based algorithm. So the first step was to transform the data into a suitable and clean dataset for the machine learning algorithm to produce meaningful results.</p>
<p>To create this dataset we use automated scripts to retrieve the data from our database and put it in custom datastructures containing all services across all providers and regions. The volume of the data makes it a time consuming and intensive task. We introduced several I/O optimizations for reducing the time spent for this step. The initial version of our script required 12-15 seconds to go over our database, but with the reduced I/O approach the time was reduced to 1-2 seconds. Infact, the optimization enabled us to incorporate the method for usage ‘on the fly’, which allowed us to create valuable features like private/on-site provider matching for our users on a per-user basis.</p>
<p>The raw data is enhanced with new features as well as adjusted to ensure that the training results were not biased due to data. These measures include scaling the features so that large features like memory do not dominate as well as encoding categorical variables into numbers. The entire data pipeline is illustrated in the figure below.</p>
<p align="center">
<img src="/assets/post13/Data Pipeline.jpg" />
</p>
<p>To further enhance the data we run a clustering algorithm to club similar services. This is helpful as within a service type there are machines that are optimized for certain uses e.g. CPU Optimized machines have a higher core count relative to memory. Ultimately it improves the quality of matches returned by the algorithm.</p>
<h2 id="the-machine-learning-matching-algorithm">The Machine Learning Matching Algorithm</h2>
<p>In our earlier algorithm we used to find matches for services with a rules based system that would scan the entire database for matching services and then compute a score and rank the matches based on the computed score. This was a slow process and could run in excess of 4-5 seconds for some services. Such long waits are detrimental to the user experience. Further these algorithms did not always return the most relevant matches.</p>
<p>With widespread accessibility of machine learning algorithms through popular libraries like Python’s <a href="https://scikit-learn.org/">Sklearn</a>, using an ML approach was a natural choice. The matching algorithm is a simple <a href="https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm"> K-Nearest Neighbours (k-NN)</a> algorithm that finds closely related cloud services on a different provider using the relevant attributes for the kind of service being matched. This is an unconventional use of the k-NN algorithm, which is usually used in the context of regression and classification problems.</p>
<p>The idea is straightforward - compute <a href="https://en.wikipedia.org/wiki/Euclidean_distance">Euclidean distances </a> between various services and pick the closest ones. Though this ML engine is at the centre of our matching algorithm the process does not end there. The k-NN algorithm is augmented with filtering and scoring to zero in on best possible matches in the least amount of time.</p>
<h2 id="the-scoring-method">The Scoring Method</h2>
<p>As stated above the matching algorithm does not naively return the closest service by distance as calculated using the k-NN algorithm. The algorithm uses a simple scoring system that balances between the quality(closeness) of the match and its cost. The score uses an input parameter alpha that ranges from 0 to 1. This can be adjusted to choose for closeness or cost of the match. The process by which the scoring formula rebalances matches is shown in the figure below. The matched services are plotted on the basis of the k-NN distance and the cost and the blue ellipses indicate the returned matches.</p>
<p align="center">
<img src="/assets/post13/Plot.jpg" />
</p>
<h2 id="future-development">Future Development</h2>
<p>There are many possible extensions of the algorithm. The first natural step is to abstract the algorithm to accommodate for more service types like networking, storage, CDNs, databases etc. This has been implemented to a large extent as we introduce newer services into our product.</p>
<p>As the product scales it will become necessary to adapt the data pipeline for machine learning. To this end popular big data tools may be used, which are designed to handle a much larger volume of data. Another small feature in matching is giving the user the ability to choose between the quality and cost of the matches. The user can change the parameter - alpha - and supply it as an input parameter to the API request.</p>
<p>Infact, the usual approach to a matching problem is using a recommendation engine, but a lack of user feedback makes it difficult to implement such a solution. The idea is after a critical mass of users on our platform we can ask the users for suggestions regarding the most popular services. We can use these suggestions to build matches that work better in the real world.</p>
<h2 id="conclusion">Conclusion</h2>
<p>This simple matching algorithm is used to create insight into an area, which is not as transparent. But this is not the only service we offer. Our product allows users to compare individual services against each other across various attributes, regions and pricing regimes.
These are the clearest comparisons of cloud services. available to technical decision makers. We invite you to use our free tool the <a hreg="https://b3console.bigbitbus.com/"> B3Console </a> to gain insights about your cloud infrastructure.</p>
<h2 id="references">References</h2>
<p>James, G., Witten, D., Hastie, T., & Tibshirani, R. (2017). An introduction to statistical learning: With applications in R. New York: Springer.</p>
<p>Learn. (n.d.). Retrieved February 08, 2021, from https://scikit-learn.org/stable/index.html</p>{"name"=>nil, "email"=>nil, "twitter"=>nil}In this article we explain some of the internals of our ML-based cloud provider comparison solution implemented in our B3Console product.Cloud Cost Optimization - Where to Invest Your Efforts2020-09-13T00:00:00+00:002020-09-13T00:00:00+00:00https://bigbitbus.com/2020/09/13/Cloud-Cost-Optimization-Most-Bang-For-Your-Buck<p><em>Cloud users want to optimize their public cloud bills. A multi-pronged approach will yield the maximum benefits but some methods give you more bang-for-your-buck than others. We list the methods in decreasing order of impact so you can focus on the most impactful ones.</em></p>
<p>We have spent thousands of hours looking at different application stacks across small and large organizations and learnt where to look for inefficiencies in their cloud spend. Here is what we learnt, presented in decreasing order of impact, along with a percentage of relative importance in our opinion.</p>
<p><em>Code Optimization - 30%</em></p>
<p>If you have the luxury of actually owing the code that runs in your cloud infrastructure, or if you run open-source software, then you should first ask your engineers to comb through the code to look for efficiencies. As horizontally scalable architectures (running multiple copies of the same software for scalability and high-availability) becomes the norm, even a small change can yield huge gains.</p>
<p>Here are some examples to inspire you to go down this path.</p>
<ul>
<li>A developer removed unnecessary logging lines from the codebase and reduced the managed logging bill by several thousand dollars per month.</li>
<li>The QA engineer tested and upgraded to the latest version of an open-source database connector library which reduced the number of connections the application was making to the cloud database, enabling the organization to reduce the database size and save a 5-digit amount.</li>
<li>A developer reduced the polling interval on a datasource from 5 seconds to 20 seconds, reducing bandwidth costs on the data source server 4-fold.</li>
</ul>
<p><em>Embracing New architectures - 30%</em></p>
<p>If your team has the ability to evolve the architecture of your applications then it is well worth your while to have engineers periodically scan open-source and managed cloud services’ landscape to modernize your applications. Re-plumbing applications is expensive, but the return on investment can be spectacular. There is incredible and rapid innovation happening out there, often driven by Internet-scale use-cases that relentlessly optimize costs. Available to anyone who invests in upgrading their application stacks.</p>
<p>Here are some examples we have come across recently:</p>
<ul>
<li>A mid-size e-commerce website moved from using virtual machines to Kubernetes. Afterwards, they were able to leverage open-source monitoring (Grafana+Prometheus) instead of a vendor monitoring solution as well as use pre-built database infrastructure-as-code (helm charts) to power their dev and qa environments instead of using managed cloud databases. Moving to Kubernetes also improved developer productivity due to better devops practices, faster deployments, and easy roll-backs for operations. We estimated they will save at least 60% over a 4-year period.</li>
<li>A fintech company was struggling with sharing environments for its distributed developer team. Spinning up a new environment per team was costing them a lot of money. They evolved their development environment architecture by logically partitioning the development environment using namespaces and service mesh for traffic isolation helped them cut their cloud bill by a huge amount.</li>
</ul>
<p><em>Rightsizing Workloads - 20%</em></p>
<p>Lets face it, many capacity plans are not accurate and applications often land on over-provisioned infrastructure. This is particularly the case if the application is bursty with weekend or seasonal lulls for example. Rightsizing workloads means reducing VM and database sizes to fit requirements. Offcourse, bursty application requirements can change every minute, so auto-scaling infrastructure to match demand is an important aspect of rightsizing solutions.</p>
<p>Here are some examples of auto-scaling done right:</p>
<ul>
<li>An education technology company introduced autoscaling of their application servers to track weekend and school holiday periods; it directly saved them thousands of dollars per quarter.</li>
<li>A software development company used to spin up development environments with a “small” sized managed relational database. They didn’t know that a smaller “nano” size database was recently released by their cloud provider. Switching to the “nano” size for development helped them reduce their cloud bill.</li>
</ul>
<p>The above three approaches - Code optimization (30%), new architectures (30%) and rightsizing workloads (20%) together form the big 80% of the “where” to look for cloud cost control. There are some other approaches (the last 20%) where your-mileage-will-vary and some come with their own undesirable side-effects:</p>
<p><em>Negotiating vendor discounts (5%)</em>:</p>
<p>Lets face it, big cloud providers have 1000s of enterprise users so no matter how big you are, there is only so much weight you carry when you ask your provider for a big discount. They may be amenable to giving you a time-limited pot of “cloud credits” to entice you to switch providers or to lock you into their product, but in the long term, you won’t come out on top.</p>
<p><em>Process, controls, locking-down(5%)</em>:</p>
<p>Locking down cloud access across the developers of your organization in the hope this will control costs is a fallacy. Sure, you can set an upper bound on what each developer can consume in order to prevent fat-finger cloud errors, but locking down your cloud stiffles innovation, promotes shadow IT (for example developers using personal cloud accounts for development), and erodes trust. Remember, developers are probably more expensive than your cloud bill!</p>
<p>All the above only sum up to 90%; tell us if something else worked for you and we’ll add it to the list!</p>
<p><em>BigBitBus is on a mission to bring greater transparency in public cloud and managed big data and analytics services. Talk to us about how you can architect your cloud IT assets to maximize your returns on investment.</em></p>{"name"=>nil, "email"=>nil, "twitter"=>nil}Cloud users want to optimize their public cloud bills. A multi-pronged approach will yield the maximum benefits but some methods give you more bang-for-your-buck than others. We list the methods in decreasing order of impact so you can focus on the most impactful ones.Ten Things You can Do With the B3Console Today2020-09-11T00:00:00+00:002020-09-11T00:00:00+00:00https://bigbitbus.com/2020/09/11/Ten-Things-You-Can-Do-With-B3Console-Today<p>Here are 10 things you can do with the <a href="https://b3console.bigbitbus.com">B3Console</a> today, and some reasons why you may want to try it out.</p>
<ol>
<li>
<p><em>Compare VMs.</em>
You can compare the specifications and prices of virtual machines across the 5 cloud providers we support: GCP, AWS, Azure, Alibaba and Linode.</p>
</li>
<li>
<p><em>Find the latest servicetypes and pricing offered by cloud providers.</em>
Our database stores on-demand and reserved pricing for over 10,000 VM types across five public cloud providers in over 100 data-centers world-wide. You will have access to this information, updated daily.</p>
</li>
<li>
<p><em>Model your applications from a pricing standpoint.</em>
Create one or more stacks of different VMs which comprise your application(s). Unlike a technical architecture document, this view helps you track per-application costs without a deep-dive into the application architecture. You can come back to your stacks at a later date and re-access your cloud footprint without starting from scratch.</p>
</li>
<li>
<p><em>Perform a “what-if” cloud migration cost analysis.</em>
If you are looking at other cloud providers for technical, cost, or security reasons the B3Console can “translate” your stack into the target provider. That way you know what corresponding VMs are available on the target provider and what they cost, all before you actually PoC anything.</p>
</li>
<li>
<p><em>Plan reservations to avoid higher on-demand prices.</em>
Our tool lets you calculate the total cost of your stack or service based on on-demand and reserved 1- or 3-year pricing. You can now make a data-driven choice of when its time to buy reservations and reduce your cloud bill.</p>
</li>
<li>
<p><em>Download data into a spreadsheet.</em>
If you use spreadsheets to track your costs you are in luck - B3Console allows you to download data into a spreadsheet for further analysis</p>
</li>
<li>
<p><em>Share your findings.</em>
You can easily share your analysis with colleague or even on social media (LinkedIn, Twitter) so others can benefit from your analysis and insights.</p>
</li>
<li>
<p><em>Optimize your VM sizes.</em>
B3Console can help you right-size your VMs if you are able to provide us with monitoring data. Note: This feature is only available for a subset of cloud providers and VMs at present.</p>
</li>
<li>
<p><em>Zero Security Risk.</em>
The tool never asks for your cloud provider credentials to do all the above; B3Console is light-weight, zero-ops and accessible tool to plan and stay on top of your cloud computing needs.</p>
</li>
<li>
<p><em>Always free.</em>
The tool is free to use today! All you need is a Google/Gsuite/Github login handle. You won’t have to convince your boss or your team about buying yet another subscription or software to install on your laptop. As easy as logging in!</p>
</li>
</ol>
<p>Happy Transparent Clouding!</p>
<p>Go to <a href="https://b3console.bigbitbus.com">B3Console</a></p>{"name"=>nil, "email"=>nil, "twitter"=>nil}Here are 10 things you can do with the B3Console today, and some reasons why you may want to try it out.The BigBitBus B3 Console2020-07-17T00:00:00+00:002020-07-17T00:00:00+00:00https://bigbitbus.com/2020/07/17/The-BigBitBus-B3Console<p>We have been busy over the past several months creating the B3 Console. The console provides a frontend to our API and unlocks the data we have been collecting and curating about several cloud providers - AWS, Azure, Alibaba Cloud, GCP and Linode.</p>
<p>Now anyone with a Google or GitHub account can use our social auth sign-up to log into the frontend and visually access all this data. Use it to compare how much you pay for your applications in your chosen cloud provider, and what it may cost on another cloud provider. Instantly generate alternatives for your application stacks in other cloud provider(s). Does it make sense to consider a migration? Or to build your next application in another cloud provider? Supercharge your multi-cloud decision making process with unbiased cloud pricing and performance data. With our hard data to back you up, you will be able to negotiate better discounts when you renew your cloud subscriptions. Share your findings and insights about various clouds and their performance and price differences with colleagues or even on LinkedIn or Twitter so others can benefit from your experience.</p>
<p>We have highlighted some of the functionality of the B3Console in our quickstart, <a href="https://www.bigbitbus.com/frontend-documentation/">available here</a>. Or you can jump right into the action by visiting our <a href="https://b3console.bigbitbus.com/login">B3Console URL</a>.</p>
<p>The clouds just got a whole lot more transparent!</p>
<p align="center">
<img src="/assets/post12/servicetypes.png" />
<b>All the cloud data you need, so you can make better decisions </b>
</p>{"name"=>nil, "email"=>nil, "twitter"=>nil}We have been busy over the past several months creating the B3 Console. The console provides a frontend to our API and unlocks the data we have been collecting and curating about several cloud providers - AWS, Azure, Alibaba Cloud, GCP and Linode.Data Driven Decisions behind AWS Reserved Instance Commitments2020-02-17T00:00:00+00:002020-02-17T00:00:00+00:00https://bigbitbus.com/2020/02/17/Data-Driven-Decisions-behind-AWS-Reserved-Instance-Commitments<p>Amazon AWS gives users the option to purchase virtual machines (VMs) on a pay-as-you-go on-demand basis or commit to 1 or 3 year “reservations” and get a discounted price. The longer the commitment, the higher the discount. We wanted to answer some questions about this pricing mechanism:</p>
<ol>
<li>
<p>Do 1- and 3-year reservations offer users uniform discounts over on-demand prices - irrespective of the type of virtual machine?
<em>Answer: No, the discounts vary significantly across different VM classes. We will highlight some virtual machine classes that offer deeper commitment discounts than others.</em></p>
</li>
<li>
<p>Are 1- and 3-year (relative) reservation price differences equal, or are some VMs more deeply discounted than others - is AWS incentivizing users into buying longer reservations more aggressively for some VM service classes as compared to others?
<em>Answer: Amazon discounts 3-year reservations more on some classes of VMs. We will hypothesize what may be the reasons AWS wants to discount some VM types more than others.</em></p>
</li>
<li>
<p>What strategies can you adopt while choosing the type of reservations on AWS?
<em>Answer: It depends on your situation, but we offer some guidance toward the end of this article.</em></p>
</li>
</ol>
<p>There are 100s of AWS VM types across dozens of data-centers, for this analysis we focused on pricing data from the us-east-1 AWS location. We based our analysis on information available from the AWS EC2 pricing available online. VM service offerings and prices may change over time, all data presented here was collected in January 2020.</p>
<p>Figs. 1 and 2 show columns of different VM classes offered by AWS in us-east-1 (for example, t3, c5n, p3 etc) for 1-year and 3-year reservations respectively. Each VM class may have different sizes of VMs, for example the t3 class of VMs offers t3.nano, t3.small and t3.xlarge VMs with varying number of CPU cores and memory sizes. The per-hour on-demand cost of the highest priced VM of each class is listed at the top of each column (highlighted in blue). There are 3 bars for each VM type - the left two bars - the dark-green and blue bars show the non-convertible or standard reservation discount percentage and the convertible reservation discount percentage over the on-demand price respectively. The red bar shows the premium AWS charges for users to have the convertible option (the difference between the green and blue bars equals the red bar).</p>
<p align="center">
<img src="/assets/post11/1yrPrem.png" />
<b>Fig.1: One year reservation discount percentages over on-demand prices </b>
</p>
<p>Comparing VM classes between Fig.1 (1-year commitment) and Fig.2 (3 year commitment) makes it clear that the longer commitments net bigger discounts. No surprise here as AWS gets to plan capacity over a longer window, the same datacenter hardware can potentially be used over longer periods, and most importantly the user is “locked-in” into AWS for a longer commitment. AWS definitely wants users to commit to longer leases.</p>
<p align="center">
<img src="/assets/post11/3yrPrem.png" />
<b>Fig.2: Three year reservation discount percentages over on-demand prices </b>
</p>
<p>The more interesting aspect is the comparison between different VM classes. The premium for convertibility is significantly higher for d2, p3, and p3dn VM classes. Why? To understand this, we first need to understand what the AWS VM convertible option means</p>
<p>“You can exchange one or more Convertible Reserved Instances for another Convertible Reserved Instance with a different configuration, including instance family, operating system, and tenancy. There are no limits to how many times you perform an exchange, as long as the target Convertible Reserved Instance is of an equal or higher value than the Convertible Reserved Instances that you are exchanging.” - from AWS Documentation</p>
<p>Convertible VMs still give AWS “user lock-in” - a user will not migrate off AWS unless s/he agrees to forfeit all the reserved sunk cost they incurred when they bought the reservation; but AWS loses accuracy on their multi-year capacity planning front when users convert their reservations - for example, if a user converted their p3 instances (featuring custom Nvidia tesla core GPUs) into general purpose VMs then AWS is potentially left holding Nvidia GPUs that may sit unused.</p>
<p>Interestingly, the convertible premium spread for 1-year reservations in Fig.1 is much lesser (~10% or lower across all VM classes) than 3-year reservations of Fig. 2 where the spread can reach up to ~20%.</p>
<p>From Fig. 2 (3-year reserved instances), we see some of the highest convertible premiums for these VM classes:</p>
<table>
<thead>
<tr>
<th>VM instance class</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>D2 (17.53%)</td>
<td>D2 instances are designed for workloads that require high sequential read and write access to very large data sets, such as Hadoop distributed computing</td>
</tr>
<tr>
<td>P3 (20.90%)</td>
<td>P3 instances deliver high performance compute in the cloud with up to 8 NVIDIA® V100 Tensor Core GPUs and up to 100 Gbps of networking throughput for machine learning and HPC applications.</td>
</tr>
<tr>
<td>P3dn (18.65%)</td>
<td>P3dn instances are specialized compute units designed to accelerate machine learning training and inferencing for large, deep neural networks.</td>
</tr>
</tbody>
</table>
<p>And some of the lowest convertible premiums as well:</p>
<table>
<thead>
<tr>
<th>VM instance class</th>
<th>bigbitbusgithubio_jekyll_1Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>M5n (5.89%)</td>
<td>These are a great fit for databases, High Performance Computing, analytics, and caching fleets that can take advantage of improved network throughput and packet rate performance</td>
</tr>
<tr>
<td>C5n (5.67%)</td>
<td>C5n instances that can utilize up to 100 Gbps of network bandwidth. C5n instances offer significantly higher network performance across all instance sizes.</td>
</tr>
<tr>
<td>X1, x1e (4.95%)</td>
<td>Part of the Amazon EC2 Memory Optimized instance family, designed for running high-performance databases, in-memory workloads such as SAP HANA, and other memory intensive enterprise applications in the AWS Cloud.</td>
</tr>
</tbody>
</table>
<p>Special purpose VMs - for example, the P3 VMs that contain Nvidia machine learning hardware have high convertible premiums. On the other hand, more generic hardware like the C5n or M5n seems to have a lower premium. AWS probably finds it much harder to accurately track demand for specialized hardware and wants users to sign up for longer leases when possible on these classes of VMs.</p>
<p>There is approximately a 20% difference between the 1- and 3-year discounts. Should you bite the bullet and go for the higher 3-year discounts? In our opinion, it depends on two factors:</p>
<ol>
<li>
<p><strong>Your Demand Forecast Accuracy:</strong> Get your best architect or engineering team on the job to predict future usage - don’t leave it to the junior business analyst to model the what-if scenarios and collect data for making reserved pricing decisions. If your capacity planning is accurate and your business horizon is reasonably certain, then a 3-year reservation is definitely worth it. Even the 1-year reserved instances are well worth the exercise - it can easily shave off over 30% of your VM instance bill.</p>
</li>
<li>
<p><strong>Age of the VM Service Type:</strong> Always find out about how “old” a VM-type being offered is. If they are older generations then you should consider switching your VM class to a newer class before you buy reserved instances. Afterall, if you buy 3-year instances on a VM class that was released 3-4 years ago, then you will be stuck with running your applications on 6-7 year old server hardware later.</p>
</li>
</ol>
<p>In addition to these two considerations it is important to consider convertible reservations; but first check the convertible premium for the reserved instances you are planning to use.</p>
<hr />
<p><em>Saksham Bhatnager is a Data Analyst.</em></p>
<p><em>Sachin Agarwal is a computer systems researcher and the founder of BigBitBus.</em></p>
<p><em><a href="https://www.bigbitbus.com">BigBitBus</a> is on a mission to bring greater transparency in public cloud and managed big data and analytics services. Talk to us about how you can architect your cloud IT assets to maximize your returns on investment.</em></p>{"name"=>nil, "email"=>nil, "twitter"=>nil}Amazon AWS gives users the option to purchase virtual machines (VMs) on a pay-as-you-go on-demand basis or commit to 1 or 3 year “reservations” and get a discounted price. The longer the commitment, the higher the discount. We wanted to answer some questions about this pricing mechanism:Is Your Business Trapped Inside Your Public Cloud Vendor?2019-02-20T00:00:00+00:002019-02-20T00:00:00+00:00https://bigbitbus.com/2019/02/20/Is-Your-Business-Trapped-Inside-Your-Public-Cloud-Vendor<p>It is easy to get locked into a specific cloud vendor. Learn how to keep your business free from cloud vendor lock-in.</p>
<p align="center">
<img src="/assets/post10/CSIRO_ScienceImage_1766_Venus_Fly_Trap.jpg" />
<small> Image courtesy - <a href="https://commons.wikimedia.org/wiki/File:CSIRO_ScienceImage_1766_Venus_Fly_Trap.jpg"> CSIRO Science Image </a> </small>
</p>
<p>I am sorry to break it to you, but it is easier than ever to get locked into a public cloud or big data vendor. Lets see why cloud vendor lock-in is bad for your business, and then look at IT architecture patterns to mitigate the lock-in risk in your public cloud journey.</p>
<h2 id="why-is-cloud-vendor-lock-in-a-bad-thing">Why is Cloud Vendor Lock-In a Bad Thing?</h2>
<p>Public cloud providers are extremely innovative and forward-thinking, but do not confuse open-to-innovation with openness. 100s of closed-source databases, big-data systems, AI frameworks, IoT platforms, and SaaS services are launched by cloud providers every year. Every time you adopt a closed-source service the lock-in noose tightens. You are trading away your agility to migrate to a different cloud provider for the convenience and cost-saving of not building skills in your team to run the open-source counterpart of the managed service. This will come back to haunt you in the future:</p>
<ol>
<li><em>Business Agility</em>. Your business may need a cloud-provider migration. For example, if you are a startup getting acquired by a company that uses a different cloud provider, then the valuation of your IT assets will suffer if your processes, code and data assets are too sticky to your chosen cloud provider.</li>
<li><strong>Leverage</strong>. You have no leverage when you can’t migrate off a cloud provider. When it is time to renew your contract with the cloud provider their sales negotiation team will know you cannot move to the competition.</li>
<li><strong>Independence</strong>. Your business is hamstrung by the rate and direction of innovation of your cloud provider. Although cloud providers are highly innovative, their choice of direction will control your choice of IT technologies and may force you to choose their vision of what IT should look like.</li>
<li><strong>Reputational risk</strong>. Your business is taking on reputational risk when you can’t migrate off a cloud provider. Suppose your cloud provider suffers a serious data breach or falls out of favor with a jurisdiction’s government and your business needs to migrate off that cloud provider. How would you meet this challenge in a timely manner if you have not taken steps to avoid cloud vendor lock-in?</li>
<li><strong>Continuity risk</strong>. Your business will suffer if your cloud provider shuts down or sells the business. I know, how could I even contemplate Azure or AWS or GCP shutting down? But take a step back and think about all the products that came out of these companies and were pulled from the market. Remember blockbuster failures like Amazon’s Fire Phone, Microsoft’s Windows Phone, or Google Wave/Plus, for example? There will be cloud provider consolidation, and clouds will cease to exist.</li>
<li><strong>Hiring risk</strong>. Your business may find it impossible to hire skills in the future. If you have to stick with your cloud provider forever then finding skilled engineers may become challenging if your cloud provider becomes a niche player in the future.</li>
</ol>
<p>I hope you are convinced that there are valid business reasons to consider the “my-business-needs-to-migrate-off-a-cloud-provider” scenario. Let us look at ways in which you can reduce stickiness to your public cloud provider and retain the migration option.</p>
<h2 id="architect-against-cloud-provider-stickiness">Architect Against Cloud Provider Stickiness</h2>
<ol>
<li><strong>Minimize data stickiness</strong>. When you use closed-source databases and big data systems (e.g. Google Bigquery or AWS Redshift) make sure you regularly backup the data and schema metadata into files (on your cloud provider’s object store for example). This is an often overlooked aspect - after all the cloud provider is responsible for managed big-data service backups - but you will not have access to those backups when you migrate off the cloud provider. Making sure you have well documented documented data+metadata dumps will enable a future migration.</li>
<li><strong>Favor open protocols, tools and APIs</strong>. Choose open-standard APIs when possible. Whenever your developers have to choose between APIs (e.g. a messaging queue client or a database adapter) ask if the end-point (server) can be substituted by one outside of the cloud provider. If the answer is no, then carefully consider other options.</li>
<li><strong>Clean, well defined interfaces</strong>. If you have to use a cloud-provider-specific service, ask your developers to build a clean interface (a separate cloud-module for example) and document all the interaction points between your code and the cloud provider. Future developers who will lead your cloud migration will thank you for this some day.</li>
<li><strong>Avoid provider-specific cloud orchestration tools</strong>. (e.g. AWS Cloud formation or Azure ARM) and instead use API-driven open-source orchestration systems like Terraform when possible. Although you still end up writing provider-specific orchestration code your developers can be clever about separating cloud-agnostic orchestration code into separate modules. This will minimize the orchestration code re-write when you migrate to another cloud provider.</li>
<li><strong>Avoid cloud provider SaaS tools</strong>. For example, discourage your developers from using a cloud-provider’s custom CI/CD and SCM tools and instead rely on industry standards like Jenkins/Git*/Atlassian, etc. I know it is more work dealing with extra vendors as compared to your one cloud provider but process tools (like your devops pipelines) are among the hardest to migrate off of.</li>
<li><strong>Choose third-party operational tools</strong>. Depending on how important cloud migration is, you may want to adopt third-party cloud-agnostic monitoring and logging tools instead of using native cloud-provider services.</li>
<li><strong>Talk to the competition</strong>. Call up a competitive cloud provider occasionally and have your IT architects and developers discuss how your business could migrate to their cloud. Almost all migration stickiness issues you fear have been dealt with earlier!</li>
</ol>
<p>*</p>
<p><em>Sachin Agarwal is a computer systems researcher and the founder of BigBitBus.</em></p>
<p><em>BigBitBus is on a mission to bring greater transparency in public cloud and managed big data and analytics services. Talk to us about how you can architect your cloud IT assets to minimize public-cloud stickiness and lock-in.</em></p>{"name"=>nil, "email"=>nil, "twitter"=>nil}It is easy to get locked into a specific cloud vendor. Learn how to keep your business free from cloud vendor lock-in.Lower Tier Blobs/Objects Share and Their Higher Tier Counterparts Share the Same Backend (?)2018-06-11T00:00:00+00:002018-06-11T00:00:00+00:00https://bigbitbus.com/2018/06/11/Are-Cool-Storage-Objects-Really-Different-Performancewise<p><em>We measured and compared object storage latency of standard and lower tier AWS S3, Google Cloud storage and Azure blob storage in order to unearth performance differences between the storage tiers.</em></p>
<p><em>This article is an extension to the article on comparing object store performance across AWS S3, Google cloud storage and Azure blob storage; you may want to read <a href="/2018/06/05/Public-Cloud-Objectstores-Are-Very-Unequal/">that article</a> to get more context around the testing and methodology used for this extension.</em></p>
<p>AWS S3, Google Cloud Storage and Azure storage offer a “lower” tier for objects storing backups or archive data that are infrequently accessed. AWS S3 calls it <a href="https://aws.amazon.com/s3/storage-classes/">standard infrequently accessed storage</a>, Google cloud storage calls it <a href="https://cloud.google.com/storage-nearline/nearline-whitepaper">near-line storage</a>, and Azure calls it <a href="https://azure.microsoft.com/en-us/blog/introducing-azure-cool-storage/">cool blob storage</a>. Note we are not comparing Amazon Glacier in this article.</p>
<h2 id="results">Results</h2>
<p>We created different sized objects and measured creation, download, and deletion latencies for these objects; data presented here is averaged over 100 unique runs. We chose <em>ca-central-1</em>, <em>northamerica-northeast1</em> and <em>canadacentral</em> regions for AWS, Google cloud and Azure respectively.</p>
<p>We present 5 pairs of Figures below; each figure has 3 bar plots in it that show the latency of the corresponding object store operation on the AWS, Azure and GCP providers. The two figures in the pair correspond to the higher tier (Tier 1 - standard object-store tier) and the lower tier (Tier 2) - infrequent access, cool blobs and near-line storage for AWS, Azure and GCP respectively.</p>
<p>The key take-away from each pair is that the latency statistics for Tier 1 and Tier 2 objects are almost identical!</p>
<h4 id="small-object-sizes-upload">Small object sizes upload</h4>
<p align="center">
<b>Fig.1a: Higher tier small objects (up to 100kB) upload latency in Canada </b><br />
<img src="/assets/post7/BigBitBus_small_upload_Tier_1.png" />
</p>
<p align="center">
<b>Fig.1b: Lower tier small objects (up to 100kB) upload latency in Canada </b><br />
<img src="/assets/post7/BigBitBus_small_upload_Tier_2.png" />
</p>
<h4 id="small-object-sizes-download">Small object sizes download</h4>
<p align="center">
<b>Fig.2a: Higher tier small objects (up to 100kB) download latency in Canada </b><br />
<img src="/assets/post7/BigBitBus_small_download_Tier_1.png" />
</p>
<p align="center">
<b>Fig.2b: Lower tier small objects (up to 100kB) download latency in Canada </b><br />
<img src="/assets/post7/BigBitBus_small_download_Tier_2.png" />
</p>
<h4 id="large-object-sizes-upload">Large object sizes upload</h4>
<p align="center">
<b>Fig.3a: Higher tier large objects (1MB - 100MB) upload latency in Canada</b><br />
<img src="/assets/post7/BigBitBus_large_upload_Tier_1.png" />
</p>
<p align="center">
<b>Fig.3b: Lower tier large objects (1MB - 100MB) upload latency in Canada</b><br />
<img src="/assets/post7/BigBitBus_large_upload_Tier_2.png" />
</p>
<h4 id="large-object-sizes-download">Large object sizes download</h4>
<p align="center">
<b>Fig.4a: Higher tier large objects (1MB - 100MB) download latency in Canada</b><br />
<img src="/assets/post7/BigBitBus_large_download_Tier_1.png" />
</p>
<p align="center">
<b>Fig.4b: Lower tier large objects (1MB - 100MB) download latency in Canada</b><br />
<img src="/assets/post7/BigBitBus_large_download_Tier_2.png" />
</p>
<h4 id="object-deletion">Object Deletion</h4>
<p align="center">
<b>Fig.5a: Higher tier object deletion latency in Canada </b><br />
<img src="/assets/post7/BigBitBus_object_deletion_Tier_1.png" />
</p>
<p align="center">
<b>Fig.5b: Lower tier object deletion latency in Canada </b><br />
<img src="/assets/post7/BigBitBus_object_deletion_Tier_2.png" />
</p>
<h2 id="outlook">Outlook</h2>
<p>The upload, download and deletion latencies of lower tier objects are almost identical to the tier-1 object store for all three cloud providers. Since the differences are not statistically significant; we believe that the difference is purely in product pricing and positioning and that the underlying object-store implementation of Tier 1 and Tier 2 object-stores is shared.</p>
<p>If that is true, then we ask, why don’t cloud providers simply adjust users’ billing based on which tier each user object fell in during each billing period in order to minimize the user’s cost, instead of making the user juggle objects to try and optimize each object’s tier? For example, charge infrequently accessed user objects at tier 2 prices and hotter tier objects at tier 1 prices to minimize the user’s cost automatically. Free the overworked storage admins and developers from this menial task!</p>
<p><em>Sachin Agarwal is a computer systems researcher and the founder of BigBitBus.</em></p>
<p><em>BigBitBus is on a mission to bring greater transparency in public cloud and managed big data and analytics services.</em></p>{"name"=>nil, "email"=>nil, "twitter"=>nil}We measured and compared object storage latency of standard and lower tier AWS S3, Google Cloud storage and Azure blob storage in order to unearth performance differences between the storage tiers.Public Cloud Object-stores Are Very Unequal2018-06-05T00:00:00+00:002018-06-05T00:00:00+00:00https://bigbitbus.com/2018/06/05/Public-Cloud-Objectstores-Are-Very-Unequal<p><em>For this article we compared the object-store performance of Amazon Web Services (S3), Google Cloud Storage and Microsoft Azure Blobs in locally redundant configurations (without geo-replication). We found very significant performance differences that can have a direct impact on user applications.</em></p>
<p><a href="https://en.wikipedia.org/wiki/Object_storage">Object or blob store</a> services on the cloud offer content addressable storage where users can save arbitrary files that can tbe accessed via a URL over HTTP(s) connections and simple CRUD semantics (GET to download, PUT to upload etc.). Object storage is convenient and cheap, and this has made it the storage back-end of choice for everything from small configuration files of less than a few kilobytes to huge VM images or backup archives. It is also the most common storage option for persisting raw data files used in big data analyses.</p>
<p>Lower object-store latency (time to upload and download files) is important in many use cases. For example, the time taken to download a backup copy of a database will be the dominant factor in the recovery time objective for disaster recovery planning. Big data applications such as Apache spark may seem sluggish if the back-end object-store hosting raw data has a high file-serving latency. There are many applications that repeated and frequently read and write small files to object stores (e.g. image thumbnails); these will benefit from lower latency small object performance.</p>
<p>Our key findings are:</p>
<ol>
<li>Large blob downloads are significantly slower (up to 4x) in Azure as compared to Google cloud storage or AWS S3 large object downloads.</li>
<li>Small-sized Azure blobs have lower upload latency.</li>
<li>In general the (relatively newer) Canadian regions have lower latency for object store operations as compared to the older US east regions.</li>
</ol>
<h2 id="setup">Setup</h2>
<p>We setup locally redundant object-store buckets for AWS-S3, Google cloud storage, and Azure blob storage in a cloud region and created one virtual machine (per provider) in the same cloud region. By “locally redundant” we mean that the objects were not geo-replicated to another region; we will be analyzing geo-replicated objects in another article.</p>
<p align="center">
<b>Fig.1: Test Setup for locally-redundant object-store testing. We report the upload and download latency of the client putting/getting objects to/from the object-store. </b><br />
<img src="/assets/post6/BigBitBus_objectstore_architecture.png" />
</p>
<p>A load tester virtual machine was loaded up with our custom-built open-source benchmarking program called <a href="https://github.com/bigbitbus/objectbench">object bench</a> that can upload and download different sized randomly-generated files to the object-stores. This tool uses python SDKs from each of the providers (so the client implementation is strictly as per provider standards). The tool was setup to serially upload and download different-sized randomly-generated files (ranging from 1kB to 100MB in size). We repeated the experiment 100 times and all our results are averaged over these 100 runs; we also show error-bars in our plots.</p>
<h2 id="results">Results</h2>
<p>We measured latency as seen by an application which uploads and downloads objects from the object-store. We present results for a US east region and a Canadian region for each provider (the exact names differ across providers). By selecting two different regions for each provider we eliminated the possibility of a bad load testing VM client or a badly configured object store in a specific region. We also unearthed performance differences between the regions for the same cloud provider; users looking for the best performance on public cloud object-stores should carefully benchmark performance differences across regions before choosing a specific region. All cloud regions <em>do not</em> have the same performance.</p>
<h3 id="us-region">US Region</h3>
<p>We chose <em>us-east-1</em>, <em>us-east1</em> and <em>eastus</em> regions for AWS, Google cloud and Azure respectively (collectively referred to as USEast in the below plots). The load testing VMs were spun up in one of the zones belonging to these regions for each cloud provider.</p>
<h4 id="small-object-sizes">Small object sizes</h4>
<p>Figs.2 and 3 show small object upload and download latencies in US East regions. The Azure blob store offers significantly lower upload latency as compared to AWS S3 or Google Cloud Storage. Its hard to say why the stark difference without knowing the implementation. We have a controversial hypothesis - perhaps uploads (writes) to the Azure blob store are cached in memory (to be persisted on disk later) and the acknowledgement sent immediately to the uploading client.</p>
<p align="center">
<b>Fig.2: Small objects (up to 100kB) upload latency in US East </b><br />
<img src="/assets/post6/BigBitBus_small_upload_USEast.png" />
</p>
<p align="center">
<b>Fig.3: Small objects (up to 100kB) download latency in US East </b><br />
<img src="/assets/post6/BigBitBus_small_download_USEast.png" />
</p>
<h4 id="large-object-sizes">Large object sizes</h4>
<p>Figs.4 and 5 show large object upload and download latencies in US East regions. The performance of all three object-stores is very similar for uploads. The strikingly slower Azure download is the highlight here (Fig. 5). We think this is a serious problem in Azure - especially for the backup/restore use-case. The data says that a 100MB object takes over 4 seconds to download from Azure blob-store, as compared to ~1 second in Google cloud storage. Downloading a 100GB backup set composed of 1000 such 100MB objects will take over 11 hours in Azure as compared to less than 3 hours in Google cloud. That is a huge hit on the recovery time objective for Azure users.</p>
<p align="center">
<b>Fig.4: Large objects (1MB - 100MB) upload latency in US East</b><br />
<img src="/assets/post6/BigBitBus_large_upload_USEast.png" />
</p>
<p align="center">
<b>Fig.5: Large objects (1MB - 100MB) download latency in US East</b><br />
<img src="/assets/post6/BigBitBus_large_download_USEast.png" />
</p>
<h4 id="object-deletion">Object Deletion</h4>
<p>Fig.6 shows the deletion latency for different-sized objects. The notable feature here is the consistency in the Google cloud (GCP) numbers.</p>
<p align="center">
<b>Fig.6: Object deletion latency in US East</b><br />
<img src="/assets/post6/BigBitBus_object_deletion_USEast.png" />
</p>
<h3 id="canadian-region">Canadian Region</h3>
<p>We repeated all the above experiments on Canadian public cloud regions. Figs.7-11 show the corresponding Canadian region numbers. Notice the different Y-axis on some of these graphs; in general the latency numbers are lower in Canadian regions than US East regions. We hypothesize that this is because of the relative newness and lower utilization of the Canadian regions. The same superior small-object performance and dismal large-blob download performance of Azure blobs was seen in these results as well.</p>
<p>We chose <em>ca-central-1</em>, <em>northamerica-northeast1</em> and <em>canadacentral</em> regions for AWS, Google cloud and Azure respectively (collectively referred to as Canada in the below plots).</p>
<h4 id="small-object-sizes-1">Small object sizes</h4>
<p align="center">
<b>Fig.7: Small objects (up to 100kB) upload latency in Canada </b><br />
<img src="/assets/post6/BigBitBus_small_upload_Canada.png" />
</p>
<p align="center">
<b>Fig.8: Small objects (up to 100kB) download latency in Canada </b><br />
<img src="/assets/post6/BigBitBus_small_download_Canada.png" />
</p>
<h4 id="large-object-sizes-1">Large object sizes</h4>
<p align="center">
<b>Fig.9: Large objects (1MB - 100MB) upload latency in Canada</b><br />
<img src="/assets/post6/BigBitBus_large_upload_Canada.png" />
</p>
<p align="center">
<b>Fig.10: Large objects (1MB - 100MB) download latency in Canada</b><br />
<img src="/assets/post6/BigBitBus_large_download_Canada.png" />
</p>
<h4 id="object-deletion-1">Object Deletion</h4>
<p align="center">
<b>Fig.11: Object deletion latency in Canada </b><br />
<img src="/assets/post6/BigBitBus_object_deletion_Canada.png" />
</p>
<h2 id="outlook">Outlook</h2>
<p>The latency metrics reported in this article are critical for many user applications. Our results show a clear disadvantage when using the Azure blob store for large objects - operations like restoring backups, downloading large media files and VM images, etc. The Azure service wins for small object sizes - uploads were consistently faster than AWS S3 and Google cloud storage object stores. Object deletion time is important for applications that update, save and delete a large number of temporary objects. We were impressed by the consistency in the Google cloud deletion times as compared to other object stores.</p>
<p>Our aim was to capture performance differences due to different object-store implementations. Given the performance differences across the implementations we hope the engineering teams behind these services will tune and improve their systems to bring their systems at par with the best.</p>
<p>Stay tuned as we investigate geo-replicated object performance, cold-storage object stores and object metadata performance in this series.</p>
<p><em>Sachin Agarwal is a computer systems researcher and the founder of BigBitBus.</em></p>
<p><em>BigBitBus is on a mission to bring greater transparency in public cloud and managed big data and analytics services.</em></p>{"name"=>nil, "email"=>nil, "twitter"=>nil}For this article we compared the object-store performance of Amazon Web Services (S3), Google Cloud Storage and Microsoft Azure Blobs in locally redundant configurations (without geo-replication). We found very significant performance differences that can have a direct impact on user applications.