Share

In our previous post Optimizing your AWS Availability-Zone Architecture  we saw how a configuration change saved 60% ($8,000 / month) and reduce latencies from 20ms to 1ms

In this post we will show  how a simple change in the code (one line, 25 characters), improved performance in factor of X 100 and saved $93,000 yearly.

Today’s applications

Many modern applications are processing documents, if it’s a JSON or XML in many cases the applications eventually persists the documents to a database.

Today most of the databases support those formats. Some databases are more optimized like MongoDB, OrientDB and Couchbase, but recently also traditional databases like MySQL and PostgreSQL added support for JSON format as well.

More in info about document-oriented databases can be found here.

Architecture

The following example I’m using MongoDB, but it is valid for all databases.

Our content-management-service (CMS) that manage our digital content is deploy in AWS. the back end database is MongoDB, The MongoDB deployment is standard and  each member of the replica set in a different AWS Availability-Zone (more details can be found in Deploy MongoDB in AWS)

Here is a screenshot of the architecture:

The implications of being lazy (being me …)

When engineer would like to retrieve specific fields from a large records set,  There are two options:

  • Select all fields:                documents = db.collection.find()
  • Select specific fields:     documents = db.collection.find({}, {field_1:1, field_2:1})

In both cases the next step is the same – fetch the data.

Now let’s look under the hood

When return whole document the database does not manipulate the document (to a partial one) and this is why it’s less overhead for the database.

But when there are thousands of documents (or very big documents), it would be much faster to manipulate the document at the DBMS level rather than transmit over the socket layer.

Let’s do some Math

The engineer would like to get 4 fields values from a subset of document [id, name, state, timestamp]

  • There are 20,000 documents to retrieve (the MongoDB holds Millions of records)
  • Average document size 100KB 
  • The same query is executed 6 times per seconds (6 EC2 instances, execute query every second)
  • The 4 fields size is 0.1KB (0.0001 MB)

Select all fields:               20,000  *  0.1 MB  *  6/sec = 12,000MB/s  

Select specific fields:     20,000  *  0.0001 MB  *  6/sec = 12MB/s  

 

OMG Moment (And some visualization)!

the “Select *” generate huge load on the network and the EC2 clients, at list the fix was easy – use specific fields. The next screenshot shows performance enhancement before and after the change:

A nice surprise and a happy CFO

After few weeks we found out that our AWS bill dropped:

  • we reduce our cross-zone traffic by 8GB/s  – saving $2000/Month
  • we can use m4.2xlarge ($0.23/hour) instead of c4.8xlarge ($0.94/hour)  – saving $5,000/Month
  • Probably we don’t need the 2 replica set  – saving $750/Month

This is a yearly saving of $93,000 a year, not bad for such a simple change!!

 

About ITculate

ITculate.io provides a monitoring solution for DevOps environments. ITculate’s solution captures not only raw and custom metrics but also the architecture of the customer’s environment. ITculate’s core technology tracks relationships between and within services. Understanding the relationships allows ITculate to provide a context to the user. It also allows for better visualization and enable much faster troubleshooting. ITculate provides a more intuitive way of data exploration and dramatically improves the user experience of monitoring. Please check us out at ITculate.io to learn more!

Share
Comments are closed.