MongoDB Consulting

  • MongoDB is a NoSQL distributed document-based database which means it stores data as a set of key-value pairs and can be scaled horizontally across multiple nodes in a cluster. It is one of the most popular NoSQL databases in the industry.
  • MongoDB can be used for modern, distributed and cloud-based applications, and offers SQL features like ACID transactions, ad hoc queries, joins, indexing, and much more.
  • At Cazton, we help Fortune 500, large and mid-size companies with database best practices, development, performance tuning, consulting, recruiting services and hands-on training services.
  • Our database and scalability experts have a proven track record of successful software projects while saving millions of dollars to customers.

Did you know relational databases can scale up, but have a hard time scaling out? NoSQL databases, on the other hand, are meant to scale out with commodity-grade hardware. Many organizations prefer using NoSQL over SQL databases as it offers a great set of features. There are different types of NoSQL databases available including Key-Value Store, Document Databases, Column-Family Databases, Graph Databases and Full-Text Search Engine Databases. Each type has its advantages and disadvantages and its usage depends on the business requirements. Let's explore this more below.

Features of a NoSQL Database:

  • Schema Agnostic: Unlike SQL databases where data is stored in a specific schema, NoSQL databases are flexible and offer dynamic schema support. This means it provides great storage for structured, unstructured or semi-structured data. As mentioned above, there are different types of NoSQL databases available, and hence choosing an appropriate database based on application requirement is a key design decision.
  • Highly Scalable: NoSQL databases are highly scalable (horizontally and vertically) and can run on a large cluster of computers seamlessly. Typically these databases are used for distributed applications where data is stored at multiple data centers.
  • Data Storage: NoSQL databases can scale to manage large volumes of data from terabytes to petabytes. In addition to exceptional storage capabilities, it also offers high performance benefits irrespective of variety and complexity of data.
  • Cloud Friendly: As it becomes difficult to manage on-premise infrastructure, many organizations plan to switch to cloud based solutions that offer database as a service. NoSQL databases are very flexible with cloud platforms and there are many that run on cloud platforms like Azure, Amazon, Google Cloud and others.
  • Open Source: NoSQL databases are mostly open source, which means they are inexpensive and companies using them can save huge amounts of investments that are spent for purchasing licenses. Due to enormous amount of contributions by the open source community, the core database platform capabilities keep improving.
  • Developer Friendly: NoSQL databases support all major operating systems and they provide all necessary tools and software that are required for all common programming languages.

NoSQL databases can store huge amounts of data, and they can prove to be a great choice while working with Big Data. But when it comes to Big Data, Hadoop's HDFS, Cassandra, HBase are often preferred over any other storage mechanism.

The NoSQL database market is huge and some of the top database engines available are Cassandra, HBase, MongoDB, Neo4j, Redis, Oracle NoSQL, DynamoDB, Couchbase, ElasticSearch, CouchDB, Memcached and the list goes on. Of all these databases, MongoDB is the favorite among companies of all sizes. It's a database that is used across all industries and for a wide variety of applications. In this article, we will be learning more about MongoDB as a database, its features and the benefits it offers.

Some of our consultants started working on MongoDB in 2010 (it was released in 2009). By 2012, many of our clients had started using MongoDB. Scalability and performance are top areas of focus for the Cazton team. Microsoft recognized the expertise of our CEO, Chander Dhall, and invited him to speak at the world class conference, Tech Ed Europe, in 2014. Here's a link to his 5-star rated presentation on Microsoft Channel 9.

Best Practices for Scaling Web Apps

Later, he published his highly regarded book, Scalability Patterns. Scaling the digital transformation cannot be ignored. There are so many startups that have grown at an unprecedented pace. Not all of them recovered from implementing an architecture that just couldn't scale. In this book, our CEO discusses everything you need to know to create a scalable solution.

 
 

What is MongoDB?

MongoDB is a document-based distributed NoSQL database that can be used for modern, distributed and cloud based applications. It got its name from the word humongous, which in context with the database means it can store huge amounts of data. It stores data in BSON format - binary encoding of JSON-like documents, and supports arrays and nested JSON objects. It offers SQL like features like ACID transactions, ad hoc queries, joins, indexing, and much more.

MongoDB as a Document Database: MongoDB stores data as a set of key-value pairs, which is called a Document. When compared to SQL databases, you can think of a document as a single row of record. Each document can store data with a dynamic schema, which means every document can hold data of different structures or fields. A set of documents can also be combined and stored as a Collection. Think of Collection as being equivalent to a RDBMS table. MongoDB allows storing embedded documents and arrays that represents a complex hierarchical relationship. This feature allows developers to work with evolving data models.

MongoDB as a Distributed Database: When the amount of data increases, it becomes essential to scale your database. And with distributed computing, we can easily distribute data and grow our deployment over inexpensive hardware or in the cloud. Through distributed databases, we can enjoy benefits like parallel processing, data replication (for fault tolerance), increased reliability and availability and much more. MongoDB offers a feature called Sharding that takes care of the complexity of distributed computing and allows data distribution across the cluster.

MongoDB as a Cache: It's common knowledge that performing any operation on RAM is faster than performing it on a disc. MongoDB offers multiple storage engines like WiredTiger, In-Memory and MMAP (soon to be deprecated) that perform in-memory processing. These engines perform data storage and manipulation typically using an internal or filesystem cache. Writing data in-memory becomes faster than doing it on disc. This is a great feature provided by MongoDB. However, there are technologies like RedisCache and MemCache that may outperform MongoDB when it comes to in-memory processing.

MongoDB as a Service: MongoDB offers you the option to run it as a service. This means you don't need to worry about setting up physical hardware, installing software or configuring for performance. MongoDB Inc. the company which makes the software for MongoDB has a service called Atlas. Or you can leverage MongoDB Database as a Service on any cloud platform like Google Cloud Platform, Azure or AWS. However if you wish to keep it local, you can also install it as a service on Windows and Linux platforms.

Now that we have understood what MongoDB is, let's take a look at some of its features that makes this database special.

Features of MongoDB:

  • Schema-free: As mentioned above, MongoDB stores data in JSON-like document format which like SQL databases, do not enforce a document to follow a specific schema. This feature makes MongoDB extremely flexible and removes the additional burden of database schema setup and type mapping between database and application objects.
  • Supports Ad-hoc Queries: MongoDB offers different varieties of querying capabilities. You can query data by range, fields and regular expressions. It offers dynamic querying on documents using a document-based query language that's nearly as powerful as SQL.
  • Indexing: Similar to indexing in a SQL database, MongoDB offers indexing on any field in a document. This enhances search performance.
  • Aggregation: MapReduce is very famous for processing Big Data. However with MongoDB, you can perform Aggregation, that functionally acts like MapReduce, but performs much better.
  • In-built Replication: It supports master-slave replication where a master can perform reads and writes whereas a slave copies data from the master that can be used for read only or backup.
  • Sharding: MongoDB provides auto-sharding capabilities with minimal configuration for horizontal scaling. Through sharding, developers can easily setup and perform operations in a clustered environment.
  • High Performance: It provides in-memory storage mechanism, which enables faster access of data.
  • MongoDB Management Service: It is a powerful web tool that allows us to track our database and create backups. It also tracks hardware metrics for managing and optimizing MongoDB deployment. It also has the capability to send custom alerts in case a MongoDB instance is affected.

Some of our MongoDB success stories:

The Cazton team has been immensely busy in making client applications scale to a completely different level. We have helped scale applications with more than a billion hits per day and increased performance exponentially. We have also worked with clients that have just thousands of users, but wanted to reduce costs by looking at a NoSQL alternative like MongoDB while preserving the ability to scale in the future.

  • The Cazton team helped a logistics company scale out from an RDBMS solution to polyglot persistence. The move involved microservices based development using Docker and Kubernetes. The company was transformed from a solution that was based on legacy technologies for more than 15 years old and were upgraded to the latest technologies. MongoDB played a great role in not just reducing costs, but also being able to provide a way to scale the database load with just commodity level hardware. With great reduction in the RDBMS servers that were running on Windows, we were able to save two kinds of licensing fees for our client. One for the Windows Servers and the second for the Oracle RDBMS. The whole move cost them way less than the cost savings on just licensing.
  • A Fortune 500 technology company had a lot of legacy data in an RDBMS. They surely needed the RDBMS for ACID guarantees, but they needed some real world expertise in moving data out of the RDBMS. Our team used the principles described in the book, Scalability Patterns and, in a very cost effective manner, helped the client move the data where it belongs. The final solution had a combination of Redis for cached data, MongoDB for semi-structured and structured data, Elasticsearch for search-related data and SQL Server for critical structured data. We also helped the client with a cohesive Big Data and AI strategy using Spark, Hadoop, Kafka and Ignite.
  • A Fortune 500 financial client approached Cazton to help improve the performance of its web based applications. During our analysis, our principal consultants found major architectural mistakes including using an RDBMS for all sorts of data, use of slower coding algorithms both on web as well as the API and lack of a cohesive performance strategy. Following the assessment, Cazton helped the client team by augmenting a Sr. Architect with a team of consultants. The team added MongoDB for improved performance and scalability where it was the best fit. In certain areas, the team proved that PostGres had better performance and was a better fit for the client's needs.

Cazton has many other success stories like these in all major business domains including tech, finance and mortgage, insurance and healthcare, banking, e-commerce, telecom, airlines, logistics and supply chain.

CAP Theorem:

This is an excerpt from the book Scalability Patterns written by our CEO, Chander Dhall.

CAP Theorem

No discussion of scalability is complete without the CAP (Consistency, Availability and Partition Tolerance) Theorem. It is also known as Brewers' Theorem after the computer scientist, Eric Brewer. In layman's terms, CAP theorem states that in any distributed system, it's only possible to get two out of the three guarantees viz. Consistency, Availability and Partition Tolerance. Keep in mind, a distributed system is one that is made up of individual machines or nodes. These nodes communicate effectively with each other via messages. A failure of a particular node may not mean the failure of the system. We all know about Master-Slave configurations. In a distributed system, we have multiple nodes and in order to make our system resistant to failure, we may need to back up a Master node into one or more slave nodes. Different systems can use different algorithms to achieve the same results.

However, for our understanding, let's take an example of a distributed system where a Master node is the only node that the system can write to. Assuming we get a request to add an order to the system. The order gets written to the Master node. Once the order is added to the Master node, let's assume the system makes the Master node sends a message to all slave nodes to add the new order. If the request is made to a slave that has not been updated yet, the order won't exist and the system will be deemed to be inconsistent. But, if the system makes sure that a subsequent read request to the system will be able to guarantee the retrieval of the latest request, it's considered to be a consistent system. Relational Database Management Systems are consistent. Every read request returns the most current data.

Consistency: A distributed system that returns the most current data no matter which node the request was made to is considered to guarantee consistency. In layman's terms, if a write or update request to any node in the system is replicated to other nodes in the system, before the read request, it's a consistent system. So, the bottom line is that every read will return the most recent write. The system will not return stale data but the most recently updated data. In order to achieve consistency, the system has to update all the relevant nodes at each request, before allowing any reads from the system on that particular resource.

Availability: It is the ability of a node to respond to requests if the node hasn't failed. Availability allows for failed nodes. However, if the node hasn't failed and doesn't respond to a legitimate request, it is considered to not be available. In order to achieve availability, the system needs to replicate data between different nodes.

Partition Tolerance: It is the guarantee of a system to respond to requests even when the system is partially down. No failure less than a complete failure of the system should allow the system to respond incorrectly. So, if the connections between some nodes in the system are lost, the system is partition tolerant, if and only if the system as a whole is still consistent and available.

What does MongoDB offer w.r.t CAP Theorem?

To understand where MongoDB stands w.r.t CAP theorem, it becomes important to understand how MongoDB works and how you set it up in a distributed environment. Typically, in a distributed environment, MongoDB chooses Consistency over Availability. To understand this, you need to know how replica-sets work in MongoDB.

Replication of data offers redundancy and high availability. MongoDB allows storing replicas of data sets on different nodes/partitions in a distributed environment. But when a failure occurs, MongoDB gives more preference to data consistency. Let's try to understand this with a simple example.

When you think of a replica-set, we typically have a primary node and multiple secondary nodes. The primary node is the preferred choice for any read and write operations, but secondary nodes are always used for read operations only. Let's assume that a primary node, that was about to perform a write operation goes down or gets disconnected. In this case the secondary nodes elect a new primary node. There are protocols in place to detect if all nodes are in sync with primary nodes and if re-election is needed. In case of any such failures, MongoDB will stop accepting writes to the system until it believes that it can safely complete a consistent write operation. It allows reading data from secondary nodes till a primary node is elected.

This means MongoDB compromises over availability and allows data consistency till all nodes are in sync. It is debatable to decide what MongoDB offers according to CAP theorem. However there are configurable settings that can be tweaked to make MongoDB offer optimal consistency, availability and partition tolerance.

Putting Cazton to work for you:

At Cazton, we have a team of MongoDB Database Administrators, Database Engineers, Jr. & Sr. level MongoDB Developers, MongoDB Consultants, MongoDB Solutions Architects, Big Data Architects, Big Data Specialists and Data Scientists. We have specialized technical knowledge of the MongoDB platform and similar NoSQL technologies. Our experts can translate business requirements into technical specifications and build elegant, efficient, and scalable solutions based on specifications.

Our experts are well-versed with programming languages including Java, C#, Python, Scala and can help you develop scalable MongoDB and API solutions, migrate traditional apps to work with MongoDB. Our certified experts can provide deployment, configuration and management of MongoDB on-premise and on leading cloud platforms. We can assist your developers in detecting performance problems using MMS and MongoDB Profiler. Read some of our success stories about how our experts helped Fortune 500 companies with scalability issues.