Exploring the NoSQL options on Azure

//Exploring the NoSQL options on Azure

For the last couple of years, there’s been a lot of talk around NoSQL. It’s an exciting new world that is unfolding from the good ol’ days of when SQL was the only data persistence in town. Don’t worry, I’ll cover what NoSQL is and how Azure can help you get there.

What is NoSQL?

NoSQL is a very big umbrella. It is a name that identifies any type of database that is not relational in the tabular sense.

Because these databases are schema-less, there’s less overhead on trying to maintain data integrity such us foreign keys, data types, and optional fields. Instead, it’s up to the application developers and data scientists to uphold the data integrity.

Many people aren’t aware that there many types of NoSQL databases. Instead, it is a much broader classification than the RDBMS counterparts like Oracle, SQL Server, and MySQL. I’ll talk about the main categories:

  1. Key-Value Stores
  2. Column-Family Stores
  3. Document Databases
  4. Graph Databases

Key-Value Stores – The Dictionaries

Databases that use the concept of a key to access a unit of data (the value). In the simplest sense, they can be looked at a distributed dictionary. They often times offer great reliability and performance at the sacrifice of consistency or scalability. These databases don’t have to worry about joining data together or relationships or maintaining data integrity when new data is added. Some great uses cases for these databases are:

  1. Distributed cache and user/session data
  2. Write-heavy applications like chats, shopping carts, and logs

Column-Family Stores – The Big Data Boys

These are databases that store data in columns instead of rows. In a simplistic sense, they are like a SQL database that don’t allow you to:

  1. Query by anything but the primary key
  2. Create indexes to find data faster
  3. Join data together

Why so many limitations? Because these databases often cater to the Big Data world. They are built to absorb data at a monstrous rate and provide reasonably fast queries given their size. These databases are often so large that they spawn across multiple machines, so that scaling is just a matter of adding another machine node.

By disallowing joins and queries on the content of the data, each machine needs only needs to worry about the keys that it’s storing. Therefore, a query against one of these databases is essentially only being directed to the node in which the data point lives. These limitations avoid executing each query on all the nodes in the cluster.

Document Databases – The Lean and Mean

These database store data in the form of documents such as JSON, BSON, or XML. They are schemaless and they allow creating indexes on the data. Some scale very easily and others are ACID. Each of these databases brings something different when it comes to features. However, these databases are the first candidates to step-in for a traditional relational database. For the most part, they fill much of the same needs that traditional relational databases. Instead of using SQL to manipulate the data, these databases use APIs and SDKs.

Typically, these databases should be considered depending on your use case and your development environment. Some of the most common value propositions are:

  1. Open-Source
  2. Faster Performance
  3. Faster Developer Agility (Due to APIs/SDKs and Schema-Less)
  4. Eventually Consistent
  5. Less need for optimization, aka less need for a DBA
  6. Less cost in hosting + licensing fees

Graph Databases – The Fancy Ones

Hard to believe but these are more relational than traditional tabular relational databases that use SQL. Instead of storing the relationships with the data, they place an emphasis on the relationships themselves then the data. Social networks and recommendation engines make great use cases for graph databases. These databases shines under situations where too many joins would cause havoc on a SQL database.

NoSQL on Azure

Azure has PaaS options for NoSQL. If they don’t have the particular solution you need, there’s tons of PaaS offerings in their marketplace. If all else fails, you can host your own database on an Azure Virtual Machine. I’ll cover some of the options by database type and their alternatives.

Key-Value Stores

First Choice: Redis
Redis is the most popular key-value store out there. It is blazing fast and atomic. It’s easy to use and highly reliable. Brian wrote a great post that you can read here.

Second Choice: Riak (Marketplace)
Unlike Redis, your dataset doesn’t have to live in memory. In the Azure marketplace you can have Riak as a service.

Column-Family Stores

First Choice: HDInsight
Microsoft’s answer to Big Data. It’s the Windows-friendly version of HBase and Hadoop. You can use other Azure services like Machine Learning and Data Lake Analytics in conjunction.

Second Choice: Cassandra (Virtual Machine)
This is not for the faint of heart, but you can host your own. Cassandra was initially developed by Facebook and it’s an alternative to Hadoop/HBase.

Document Databases

First Choice: DocumentDB
Microsoft has been making great gains with this database. Because it’s a first class citizen on Azure, it works with other great services like Azure Machine Learning, IoT Hub, and others. We will continue to see this database grow in terms of features and connectivity with other Azure offerings.

Second Choice: RavenDB (marketplace)
Initially inspired by CouchDB, it’s an opinionated database that is steers developers to write performing applications. In the beginning, it mostly catered to .NET development; however, now it also serves HTTP and Java APIs.

Graph Databases

First Choice: Neo4j (marketplace)
This database is designed to supplement other databases. It embraces Polyglot persistence, which is the concept of using multiple database technologies to store your data. Unfortunately, as of today, Azure does not offer a Graph NoSQL database.

In summary, there’s many great NoSQL offerings on Azure. In today’s world, SQL is not a one size fits all solution. Applications can benefit from multiple database technologies to store their structured and unstructured data. Data persistence should be viewed holistically and we can help you get there.

Here at Nebbia, we recognize that different database technologies solve different problems. We carry this belief in our application development to provide the best response times for the type of data being accommodated. Contact us today to find out how NoSQL on Azure can help you and your organization.

By | 2017-02-20T15:30:59+00:00 February 9th, 2017|

About the Author:

I'm a Software Engineer at Nebbia Technology. I'm enjoy learning about all things cloud, software development, and agile. I'm always thinking about the next side project to work on.