How to choose a database for backend
Each database management system has its pros and cons, they must be taken into account when choosing a particular database for your application. You need to evaluate how much data will be expected to store, whether complex relationships between data will be necessary, whether binary data or data in JSON format will be stored, whether sharding is supposed to be used, and so on.
The figure below shows the popularity of databases at the beginning of 2023.
More details are here.
One of the most common DBMS, allows you to implement projects of small and medium complexity. But of course you can try to use it on complex projects with a large amount of stored data, however, this may entail some difficulties.
|Open source.||Is not very efficient in handling very large databases.|
|Is a Relational Database Management System or RDBMS which means that it stores and presents data in tabular form, organized in rows and columns.||Versions less than 5.0 do not support COMMIT, stored procedure and ROLE.|
|Is more secure as it consists of a solid data security layer to protect sensitive data from intruders and passwords in MySQL are encrypted.||Does not support SQL check constraints.|
|Allows transactions to be rolled back.||Complex data replication mechanics.|
MariaDB is a fork of MySQL. In other words, it is an enhanced, drop-in replacement of MySQL and they have the similar pros and cons, let’s take a quick look at the differences between them.
|The performance of MySQL is not that great when compared to MariaDB.||The performance of MariaDB is far better than MySQL.|
|It does not provide a replacement for MariaDB.||It provides a drop-in replacement for MySQL.|
|Data masking is done in MySQL.||There is no data masking.|
|It can not handle large-sized data easily.||It can handle large-sized data easily.|
|OS X is supported in MySQL as a server.||OS X is not supported in MariaDB as a server.|
This DBMS allows using triggers and functions to implement its own methods for converting data stored in the database, it supports storing data in JSON format, but with large amount of data does not work very fast. It is often used in open source projects, it is easy to maintain and update, there are built-in mechanisms for implementing various replication schemes.
|Open source.||Expandable documentation only available in English.|
|Highly expandable.||Comparatively low reading speed.|
|Possible to process complex data types (e.g. geographical data).||Complex data replication mechanics and sometime unstable.|
|Flexible full text search.|
|Creation of own functions, triggers, data types, etc. possible.|
NoSQL database of the key-value family is popular in projects involving the storage of huge amounts of data with fast access to them. Supports sharding out of the box. Has some implementation quirks that sometimes lead to unexpected memory leaks.
|HBase can handle as well as stores large datasets on top of HDFS file storage. Moreover, it aggregates and analyzes billions of rows present in the HBase tables.||In HBase, there is no support for the transaction.|
|There is a time when relational databases break down, then HBase shines in the picture.||Instead of the database itself, JOINs are handled in the MapReduce layer.|
|As compared to traditional dataBase, data reading and processing in HBase will take a small amount of time.||As RDBMS can be indexed on some arbitrary field, HBase is indexed and sorted only on key.|
|Scalability is supported in both linear and modular form, in HBase.||There are no permissions or built-in authentication.|
|There is no concept of fixed columns schema in HBase because it is schema-less. Hence, it defines only column families.||As there is no support for SQL structure, it cannot contain any query optimizer.|
|In some time memory issues on the cluster, HBase is integrated with Pig and Hive jobs results.|
|Poor support for non-Java clients|
An open source column-oriented database management system developed by Yandex (it currently powers Yandex.Metrica, the world’s second-largest web analytics platform). Well suited for storing and uploading large amounts of small amounts of data, such as server logs, activity analytics, and so on.
|Open source.||No real delete/update support, and no transactions (same as Spark and most of the big data systems).|
|Parallel processing for single query (utilizing multiple cores).||No secondary keys (same as Spark and most of the big data systems).|
|Distributed processing on multiple servers.||Own protocol (no MySQL protocol support).|
|Good compression.||Limited SQL support, and the joins implementation is different. If you are migrating from MySQL or Spark, you will probably have to re-write all queries with joins.|
|SQL support (with limitations).||No window functions.|
|Good set of functions, including support for approximated calculations.|
|Great for structural log/event data as well as time series data (engine MergeTree requires date field).|
|Index support (primary key only, not all storage engines)|
A fork of the popular search engine Sphinx, the first release was released in 2017, it has a high speed, supports JSON, can work in HTTP server mode, and supports SQL-like query syntax. Well suited both for storing data that needs quick access, and in search engine mode.
|Open source.||Supports only positive integers for grouping/filtering.|
|Fast search on large (multi GB) datasets.||Not quite as simple as ALTER TABLE ADD FULLTEXT.|
|Very fast indexing and small index size.|
|Good relevance, can be tuned even further with field weighting and proximity limits.|
|Support for morphology; soundex & porter stemmers (EN & RU) included.|
|Simple data replication.|
One of the most popular key-value databases, which are usually used for caching the content of web pages and as well as for organizing a queuing system, for example, for chat applications. It can work in cluster mode, which, combined with an emphasis on in-memory, allows you to achieve very high speed and high availability.
|Open source.||Redis is an in-memory database, which means that the whole dataset should reside in the memory (RAM).|
|In-memory cache.||Isn’t intended for rich queries since it is a key-value database.|
|Pub/Sub – this is a special feature of Redis that uses the publish/subscribe messaging paradigm and it is suitable for implementing chatting related use cases.||Supports only basic security options.|
|Besides String data type, it supports more complex data structures such as lists, hashes, sets and sorted sets.||Persistence can impact performance since Redis will use memory dump to create snapshots used for persistence.|
|Provides a master-slave distributed system called Redis Sentinel to guarantee high availability.|