databases, Programming, Software

NoSQL Databases: What are They, and Why?

This would be my first article about a technical issue. I initially wanted to steer clear of articles on technical stuff, but I just could not help it. It turns out that I am an unapologetic geek. So, my apologies to anyone not interested in the topic. ;)

If you think databases are all of the relational SQL kind (MySQL, MS SQL and PostgreSQL), then you have got a lot of reading up to do. There has been a lot of “noise” about NoSQL databases in recent times.

What are NoSQL Databases? That was the question I asked, the first time I heard the term “NoSQL”.

The term “NoSQL” typically refers to non-relational, distributed databases that do not require fixed-table schemas. After a little bit of “googling”, I realized that the concept of “NoSQL” is not exactly new and has been around for quite a while. Two of the best known implementations of NoSQL databases are Amazon’s Dynamo and Google’s BigTable. NoSQL, basically, is like a rebellion against traditional Relational Database Management Systems (RDBMS). Dave Kellogg, in one of his blog articles, describes NoSQL as:

…an organic and rapidly-growing industry movement away from relational databases, driven by a number of factors including both technology and cost.

For quite a while, Relational Database Management Systems (RDBMS) such as IBM DB2, MySQL, Microsoft SQL Server, PostgreSQL, and Oracle, have been the dominant model for database management. But today, non-relational, “cloud”, or “NoSQL” databases are beginning to gain recognition as an alternative model for database management. In fact, some organizations that collect large amounts of unstructured data are increasingly turning to nonrelational databases.

On a basic level, there are three popular types of NoSQL databases:

Key-value stores: data is stored as key-value pairs such that values are indexed for retrieval by keys. These systems can hold structured and unstructured data. An example is Amazon’s SimpleDB.

Column-oriented Databases: contain one extendable column of closely related data rather than sets of information in a strictly structured table of columns and rows as is found in relational databases. The ColumnFamily databases stem from Google’s internally-used BigTable. Some other examples are Cassandra, HBase, and Hypertable.

Document-based Stores: data is stored and organized as a collection of documents. Users are allowed to add any number of fields of any length to a document. They tend to store JSON-based documents in their databases. Examples of document databases include MongoDB, Riak, Apache CouchDB, and SimpleDB.

Now, the question is, what are the advantages one can gain from using a NoSQL database? Outlined below are some of the more obvious advantages:

  • For years, in order to improve performance on database servers, database administrators have had to buy bigger servers as the database load increases (scaling up) instead of distributing the database across multiple “hosts” as the load increases (scaling out). RDBMS do not typically scale out easily, but the newer NoSQL databases are actually designed to expand easily to take advantage of new nodes and are usually designed with low-cost commodity hardware in mind. Therefore, NoSQL databases function superbly in a distributed setting. This means that users could scale a single database by running it across additional inexpensive machines rather than running it on a single more powerful and costly machine. Furthermore, NoSQL databases enable better performance, especially for write-intensive applications. This has been attested to by Opeyemi Obembe in a recent blog article. This performance increase can be attributed to their simpler data models, amongst other things.
  • High-end RDBMS systems can typically be maintained only with the assistance of highly trained Database Administrators (DBAs). NoSQL databases are generally designed from the ground up to require less management using such desirable features as automatic repair, data distribution, and simpler data models leading to lower administration and tuning requirements, or at least that is the expectation. It is a known fact that someone would always have to be accountable for the performance and availability of any mission-critical data store. However, the human resource requirements for managing a NoSQL database are typically less.
  • It is usually not easy to make big changes to the data model of an RDBMS. Changes have to be carefully managed and may even necessitate downtime or reduced service levels. NoSQL databases have less rigid or even nonexistent data model restrictions. Many NoSQL databases typically allow new columns to be created without too much ado.

Despite the fact that NoSQL databases have a number of significant advantages, they also have a number of setbacks. These challenges may not be as important to developers as they are to enterprises. But either way, they are definitely worth noting.

  • Most NoSQL systems are in pre-production versions with many key features yet to be implemented. Therefore, caution should be exercised when deciding whether or not to use NoSQL databases. This is especially important for enterprises as they have a lot more to lose than the more adventurous developers.
  • Enterprises tend to rely a lot on the assurance that if a key system fails, they will be able to get timely and competent support. RDBMS vendors go to great lengths to provide a high level of enterprise support. Most NoSQL systems are open source projects. The companies that created these systems are often small start-ups without global reach, extensive support resources or the kind of the credibility that large RDBMS vendors like Oracle have.
  • NoSQL databases offer few facilities for queries and analysis since they do not work with SQL. Things that would otherwise require simple queries in RDBMS require significant programming expertise when using NoSQL databases. Furthermore, commonly used Business Intelligence (BI) tools do not provide connectivity to NoSQL. However, some work is being done to provide query capabilities to a variety of NoSQL databases.
  • Although, the design goals for NoSQL may be to provide a solution that requires little or no administration, NoSQL databases currently have not achieved that. These databases today, still require a significant level of skill and effort to install and maintain.
  • Most developers at the moment are currently only familiar with RDBMS concepts and programming. This means that almost every NoSQL developer is in learning mode. This might change over time, but at the moment, it is easier to find experienced RDBMS programmers or administrators than it is to find a NoSQL expert. However, NoSQL databases may be easier to work with for developers who are not familiar with the Structured Query Language (SQL).

Apparently, document databases are often best when dealing with collections of similar entities. ColumnFamily (column-oriented) databases seem to be best when scalability (particularly write scalability) is the main issue. The tradeoff is that developers must write more complicated code in order to do certain things explicitly. Graph databases (another type of NoSQL database), are often best in cases where the manner in which entities are related is very important. In an SQL database, adjustments involving updates to schema and already stored data often cause problems whenever unanticipated changes need to be made to initial database designs. I have always wondered how upgrades to applications that are already in use can be made, especially when they involve making changes to database tables that already contain data. NoSQL databases seem to be the answer to this dilemma.

It is quite obvious that RDBMS is better at some things, particularly reporting. There are already a large number of reporting tools built around RDBMS. It is important that we use the right tool for each job. It is only by doing this, that we can produce software that works best in the situations in which they are intended to be used.

NoSQL databases, when used appropriately, can offer real benefits. However, caution should be exercised when adopting new technology. Everyone generally needs to be aware of the limitations and issues that are associated with these databases. This is especially important for enterprises.

Apparently, the best days of relational databases are now far behind, but these systems are not likely to die anytime soon. In fact, NoSQL databases are not likely to replace relational databases, but instead would find their own niche in certain types of projects. As Dave Kellogg has rightly stated, some of the NoSQL hype is actually an over-reaction to the current situation where a small number of RDBMS vendors control the vast majority of the database market. Nevertheless, some of the NoSQL hype is also a reaction to the technological inadequacies of relational databases as well as the conceptual and technical difficulties in programming on them.

I intend to employ heavy use of a NoSQL database (probably MongoDB) in a new project of mine. I believe it just might solve some of my worries, even though I know it would create some new ones as well. :-)

Use Facebook to Comment on this Post

Share
Standard

8 thoughts on “NoSQL Databases: What are They, and Why?

  1. This is an enlightenment on a trend many people (enterprises and individuals with an interest in IT especially in web design and development) may not be aware of. It brings the pieces together and saves one the effort of a personal research from scratch and considering the pros and cons, an enterprise, could dictate the option they are comfortable with. It is also foundational (informative wise) for any who may desire to venture or acquaint themselves with database stuffs and trends. Good stuff.

  2. I really think you failed to mention a very important issue with NoSQL DBs — their proprietary nature.

    If I code an entire script to manipulate my Google BigTable DB and for some reason, I decide to migrate to MongoDB sometime in the future, I’ll have do all that coding again!

    I’m very sure that the standards behind SQL would keep RDBMS lingering for a very very long time while NoSQL DBs would play catch-up. Just like LISP playing catch-ups with C (and Java).

  3. Pingback: Learning Big Data « TechnoBuzz

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>