Did you know relational databases can scale up, but have a hard time scaling out? NoSQL databases, on the other hand, are meant to scale out with commodity-grade hardware. Many organizations prefer using NoSQL over SQL databases as it offers a great set of features. There are different types of NoSQL databases available including Key-Value Store, Document Databases, Column-Family Databases, Graph Databases and Full-Text Search Engine Databases. Each type has its advantages and disadvantages and its usage depends on the business requirements. Let’s explore this more below.
NoSQL databases can store huge amounts of data, and they can prove to be a great choice while working with Big Data. But when it comes to Big Data, Hadoop’s HDFS, Cassandra, HBase are often preferred over any other storage mechanism.
The NoSQL database market is huge and some of the top database engines available are Cassandra, HBase, MongoDB, Neo4j, Redis, Oracle NoSQL, DynamoDB, Couchbase, ElasticSearch, CouchDB, Memcached and the list goes on. Of all these databases, MongoDB is the favorite among companies of all sizes. It's a database that is used across all industries and for a wide variety of applications. In this article, we will be learning more about MongoDB as a database, its features and the benefits it offers.
Some of our consultants started working on MongoDB in 2010 (it was released in 2009). By 2012, many of our clients had started using MongoDB. Scalability and performance are top areas of focus for the Cazton team. Microsoft recognized the expertise of our CEO, Chander Dhall, and invited him to speak at the world class conference, Tech Ed Europe, in 2014. Here’s a link to his 5-star rated presentation on Microsoft Channel 9.
Later, he published his highly regarded book, Scalability Patterns. Scaling the digital transformation cannot be ignored. There are so many startups that have grown at an unprecedented pace. Not all of them recovered from implementing an architecture that just couldn’t scale. In this book, our CEO discusses everything you need to know to create a scalable solution.
MongoDB is a document-based distributed NoSQL database that can be used for modern, distributed and cloud based applications. It got its name from the word humongous, which in context with the database means it can store huge amounts of data. It stores data in BSON format - binary encoding of JSON-like documents, and supports arrays and nested JSON objects. It offers SQL like features like ACID transactions, ad hoc queries, joins, indexing, and much more.
MongoDB as a Document Database: MongoDB stores data as a set of key-value pairs, which is called a Document. When compared to SQL databases, you can think of a document as a single row of record. Each document can store data with a dynamic schema, which means every document can hold data of different structures or fields. A set of documents can also be combined and stored as a Collection. Think of Collection as being equivalent to a RDBMS table. MongoDB allows storing embedded documents and arrays that represents a complex hierarchical relationship. This feature allows developers to work with evolving data models.
MongoDB as a Distributed Database: When the amount of data increases, it becomes essential to scale your database. And with distributed computing, we can easily distribute data and grow our deployment over inexpensive hardware or in the cloud. Through distributed databases, we can enjoy benefits like parallel processing, data replication (for fault tolerance), increased reliability and availability and much more. MongoDB offers a feature called Sharding that takes care of the complexity of distributed computing and allows data distribution across the cluster.
MongoDB as a Cache: It’s common knowledge that performing any operation on RAM is faster than performing it on a disc. MongoDB offers multiple storage engines like WiredTiger, In-Memory and MMAP (soon to be deprecated) that perform in-memory processing. These engines perform data storage and manipulation typically using an internal or filesystem cache. Writing data in-memory becomes faster than doing it on disc. This is a great feature provided by MongoDB. However, there are technologies like RedisCache and MemCache that may outperform MongoDB when it comes to in-memory processing.
MongoDB as a Service: MongoDB offers you the option to run it as a service. This means you don't need to worry about setting up physical hardware, installing software or configuring for performance. MongoDB Inc. the company which makes the software for MongoDB has a service called Atlas. Or you can leverage MongoDB Database as a Service on any cloud platform like Google Cloud Platform, Azure or AWS. However if you wish to keep it local, you can also install it as a service on Windows and Linux platforms.
Now that we have understood what MongoDB is, let's take a look at some of its features that makes this database special.
The Cazton team has been immensely busy in making client applications scale to a completely different level. We have helped scale applications with more than a billion hits per day and increased performance exponentially. We have also worked with clients that have just thousands of users, but wanted to reduce costs by looking at a NoSQL alternative like MongoDB while preserving the ability to scale in the future.
Cazton has many other success stories like these in all major business domains including tech, finance and mortgage, insurance and healthcare, banking, e-commerce, telecom, airlines, logistics and supply chain.
This is an excerpt from the book Scalability Patterns written by our CEO, Chander Dhall.
No discussion of scalability is complete without the CAP (Consistency, Availability and Partition Tolerance) Theorem. It is also known as Brewers’ Theorem after the computer scientist, Eric Brewer. In layman’s terms, CAP theorem states that in any distributed system, it’s only possible to get two out of the three guarantees viz. Consistency, Availability and Partition Tolerance. Keep in mind, a distributed system is one that is made up of individual machines or nodes. These nodes communicate effectively with each other via messages. A failure of a particular node may not mean the failure of the system. We all know about Master-Slave configurations. In a distributed system, we have multiple nodes and in order to make our system resistant to failure, we may need to back up a Master node into one or more slave nodes. Different systems can use different algorithms to achieve the same results.
However, for our understanding, let’s take an example of a distributed system where a Master node is the only node that the system can write to. Assuming we get a request to add an order to the system. The order gets written to the Master node. Once the order is added to the Master node, let’s assume the system makes the Master node sends a message to all slave nodes to add the new order. If the request is made to a slave that has not been updated yet, the order won’t exist and the system will be deemed to be inconsistent. But, if the system makes sure that a subsequent read request to the system will be able to guarantee the retrieval of the latest request, it’s considered to be a consistent system. Relational Database Management Systems are consistent. Every read request returns the most current data.
Consistency: A distributed system that returns the most current data no matter which node the request was made to is considered to guarantee consistency. In layman’s terms, if a write or update request to any node in the system is replicated to other nodes in the system, before the read request, it’s a consistent system. So, the bottom line is that every read will return the most recent write. The system will not return stale data but the most recently updated data. In order to achieve consistency, the system has to update all the relevant nodes at each request, before allowing any reads from the system on that particular resource.
Availability: It is the ability of a node to respond to requests if the node hasn’t failed. Availability allows for failed nodes. However, if the node hasn’t failed and doesn’t respond to a legitimate request, it is considered to not be available. In order to achieve availability, the system needs to replicate data between different nodes.
Partition Tolerance: It is the guarantee of a system to respond to requests even when the system is partially down. No failure less than a complete failure of the system should allow the system to respond incorrectly. So, if the connections between some nodes in the system are lost, the system is partition tolerant, if and only if the system as a whole is still consistent and available.
To understand where MongoDB stands w.r.t CAP theorem, it becomes important to understand how MongoDB works and how you set it up in a distributed environment. Typically, in a distributed environment, MongoDB chooses Consistency over Availability. To understand this, you need to know how replica-sets work in MongoDB.
Replication of data offers redundancy and high availability. MongoDB allows storing replicas of data sets on different nodes/partitions in a distributed environment. But when a failure occurs, MongoDB gives more preference to data consistency. Let’s try to understand this with a simple example.
When you think of a replica-set, we typically have a primary node and multiple secondary nodes. The primary node is the preferred choice for any read and write operations, but secondary nodes are always used for read operations only. Let’s assume that a primary node, that was about to perform a write operation goes down or gets disconnected. In this case the secondary nodes elect a new primary node. There are protocols in place to detect if all nodes are in sync with primary nodes and if re-election is needed. In case of any such failures, MongoDB will stop accepting writes to the system until it believes that it can safely complete a consistent write operation. It allows reading data from secondary nodes till a primary node is elected.
This means MongoDB compromises over availability and allows data consistency till all nodes are in sync. It is debatable to decide what MongoDB offers according to CAP theorem. However there are configurable settings that can be tweaked to make MongoDB offer optimal consistency, availability and partition tolerance.
At Cazton, we have a team of MongoDB Database Administrators, Database Engineers, Jr. & Sr. level MongoDB Developers, MongoDB Consultants, MongoDB Solutions Architects, Big Data Architects, Big Data Specialists and Data Scientists. We have specialized technical knowledge of the MongoDB platform and similar NoSQL technologies. Our experts can translate business requirements into technical specifications and build elegant, efficient, and scalable solutions based on specifications.
Our experts are well-versed with programming languages including but not limited to Java, C#, Python, Scala and can help you develop scalable MongoDB and API solutions, migrate traditional apps to work with MongoDB. Our certified experts can provide deployment, configuration and management of MongoDB on-premise and on leading cloud platforms. We can assist your developers in detecting performance problems using MMS and MongoDB Profiler. Read some of our success stories about how our experts helped Fortune 500 companies with scalability issues.
Cazton is composed of technical professionals with expertise gained all over the world and in all fields of the tech industry and we put this expertise to work for you. We serve all industries, including banking, finance, legal services, life sciences & healthcare, technology, media, and the public sector. Check out some of our services:
Cazton has expanded into a global company, servicing clients not only across the United States, but in Oslo, Norway; Stockholm, Sweden; London, England; Berlin, Germany; Frankfurt, Germany; Paris, France; Amsterdam, Netherlands; Brussels, Belgium; Rome, Italy; Quebec City, Toronto Vancouver, Montreal, Ottawa, Calgary, Edmonton, Victoria, and Winnipeg as well. In the United States, we provide our consulting and training services across various cities like Austin, Dallas, Houston, New York, New Jersey, Irvine, Los Angeles, Denver, Boulder, Charlotte, Atlanta, Orlando, Miami, San Antonio, San Diego, Stamford and others. Contact us today to learn more about what our experts can do for you.