To meet the demands of online applications, the databases must be capable to manage always more information and ever more rapidly. In this context, NoSQL databases have got their place beside relational databases. Here are the main differences induced by this search of performance.
1. In memory vs SSD
A relational database constantly records data on SSD. This makes it easier to ensure that there will be no data loss (guarantee of data persistence). In addition, it will peak performance as a SSD cannot handle more than 100.000 hits per second.
A NoSQL database constantly retains data in memory. This makes it easier to offer one million operations per second. To support such a performance, the client application and the NoSQL database must be on the same server or on a very fast local network (10 Giga bps).
The challenge for a NoSQL database is to offer both high performance and persistence. This can be achieved by combining sequential backup (logging) and snapshot (or checkpoint).
A relational database offers transaction capabilities. A NoSQL database doesn’t. To understand what transaction capabilites implies on performance, let’s explain the ACID paradigm and the CAP theorem. These acronyms define some properties:
- Consistency (C from ACID): the database controls every data modifications to ensure the respect of constraints (or data integrity).
- Consistency (C from CAP): all nodes in the network see the same data at the same time.
- AID (from ACID) is specific to transactions. Atomic means that each transaction is done as a whole (all or nothing). Isolation means that transactions don’t interfere one with another. Durability means that each transaction is done as a whole, even in case of a power failure.
- AP (from CAP) is specific to distributed system. Availability means that each request gets an answer. Partition-tolerant means that the database operates even during a network failure.
A relational database is also transactional if it meets all the four ACID properties. These properties impact the database performances.
A NoSQL database is distributed, so it is more attentive to the network. The CAP theorem underlines the tradeoff that has to be made when dealing with network: the database cannot provide simultaneously all the three CAP properties. So, a NoSQL database relaxes consistency in order to answer first with an obsolete data, and later with a correct data. This behavior is called eventual consistency.
A hybrid database, transactional and NoSQL, will provide all the properties, and will slow down accordingly.
3. Query formulation
In a relational database, the data is accessed in SQL. On high volumes, treatment times become prohibitive even with tuning of the database. The SQL reaches its limit.
In a NoSQL database, the data is accessed by commands. This is sufficient for iterations which are the most common operations. If the application needs the relationship between data (a join), here are two methods:
- The application makes the join. This effective method withstands to high volumes. It is more flexible but more complex than doing it in SQL.
- Or the application makes the main join in advance and stores the result in the database. This method is recommended for a systematic join because it saves time, but consumes more storage space. It is the preferred method in a column store.
4. Horizontal scalability
A relational database functions within a unique server. The capacity of the server limits the capacity of the relational database. A terabyte is the maximum amount of memory that a server can offer nowadays. This is not enough, if you want to manage a high volume of data in memory.
A NoSQL database runs on a cluster. The data is distributed across servers (nodes) in the cluster. With this horizontal scalability, a distributed key value store provides a capacity of several terabytes.
Of course nodes can be added. If a NoSQL database is able to add nodes and continues to operate simultaneously, scalability becomes an elastic scalability.
The performance of a distributed database does not vary with the number of cluster nodes. It is dependent on the network bandwidth: the higher the bandwidth, the higher is the throughput and the overall performance. Therefore several network interface cards are used on each node.
Qualities of relational databases are designed for robustness whereas qualities of NoSQL databases are designed for performance in speed and volume. Products like Index64, Redis, Memcached or Riak implement these qualities.
At Index64, we have developed advanced technologies that still improve the performance of in memory key value stores.