MySQL is a powerful, open source relational database that’s been around for decades. It’s a simple and effective workhorse for data management teams. It’s been used by some of the biggest websites on the internet and has become one of the most popular databases in use today.
Unfortunately, as much as data professionals love it, MySQL isn’t perfect. In fact, scaling MySQL for analytical queries can be challenging at times because it engine was designed for transactions (OLTP), and not analytics—namely big data queries against enormous datasets.
There are ways to address these limitations, though! In this post we’ll go over some common methods for scaling MySQL so you can keep your data performant, relevant, and safe from malicious attacks.
Scale Read Capacity With Primary-Replica Configuration
MySQL can handle big data analytics to a certain point. Using Primary-REplica is usually the first approach data engineers will take, but as tables and dataset increases in size, it becomes costly. Besides reporting, analytics uses different types of queries which are not efficient with InnoDB engine. This means that solutions like Singlestore will be more appropriate. Read on to learn more about why Singlestore works best for this type of data application.
To scale read capacity, you can use replication. In a Primary-Replica configuration, there is one primary database server and multiple replicas that replicate from the main database. This means that if your data set is large enough to warrant adding more read capacity, you can add new replicas without any additional maintenance overhead on the database side of things. AWS RDS is one of the most popular managed database services in the cloud and it’s not surprising that Azure, Digital Ocean and Google Cloud started providing managed database service.
Primary-replica replication also has some drawbacks. The biggest risk with using this type of configuration is replication lag; if something happens on your primary database or even during a planned maintenance event (such as upgrading MySQL), then it could take some time for your replica(s) to catch up with their transactions. This time lag depends on how much data has been inserted into or updated in those transactions since they were committed by the original transaction creator (in this case being committed by your application). Another solution to guarantee limited lag is to use AWS Aurora MySQL solution.
Another drawback of primary-replica replication is performance: because all reads must go through a single point, they will be slower than if they were coming directly from a sharded cluster where multiple nodes could serve them at once (like what we’re going to do below). However, this drawback may not matter depending on how often your queries may run and how long they take; if most queries are less than 1ms and only need one row at most times then it shouldn’t affect them too much!
Is MySQL the Best Solution for Analytical queries?
InnoDB is now the default storage engine for MySQL. This is because InnoDB is a transactional engine, which means that all transactions happen in isolation from other transactions on your database server. Additionally, InnoDB supports foreign keys, locking — handy features for ensuring consistency across tables — as well as clustering and replication. Also of note: MySQL has improved InnoDB engine partitioning, allowing it to detach partitions and move them to other tables efficiently.
However, this is not enough to optimize for the demands of analytical queries. When a database needs to ingest large amounts of data, by batches on micro-batches, there is a good chance your table will lock and you won’t be able to perform analytical type of queries. Large data batches will take time to replicate to your replica(s) so a data latency will occur and queries won’t be able to take advantage of them.
The InnoDB engine is not meant for analytical types of queries, especially complex ones. This is why the columnar DBMS was introduced a long time ago (1969) for OLAP type of queries.
Faster Queries withSSD Disk and High RPM
If you have a sufficient budget, you can opt to use a Solid-State Drive (SSD) in place of traditional hard drives. The reason you’d want this is simple: SSDs are faster than hard drives and can save you time when loading your data into MySQL. SSD disks also require less power than mechanical disks, which means they’re more energy efficient. They are also more reliable and durable because there are no moving parts inside the disk to break down over time. Finally, they have higher capacities than their mechanical counterparts so they can store more data on each drive unit at one time.
Evaluate the Indexes
Let’s talk about how best to use indexes. In some cases, indexes can be a major bottleneck for queries. If your database is over-indexed, you may be able to get it to perform faster by removing some of the redundant indexes or adding more RAM to your server so that MySQL can store more table data in memory and avoid searching through disk-based indexes.
To see what indexes are being used by MySQL during query execution (this is advanced stuff), use the EXPLAIN command:
EXPLAIN SELECT user_id FROM users WHERE firstname = ‘John’;
Efficiently Scaling MySQL Queries
If you need to scale up read capacity, consider using a primary-replica configuration. You can also scale the storage engine to InnoDB and use SSD disks with higher RPMs for better performance.
If you have indexes that are not well-designed, these can slow down queries or even cause them to fail. You should evaluate the indexes for any tables that are performing poorly and tune them accordingly.
Additionally, sharding is another way of scaling your database by distributing it across multiple nodes in your cluster so that each node has its own instance of MySQL installed on it. This allows each node to handle the full load independently while sharing data between nodes as necessary (i.e., when replicating).
Sharding, sometimes referred to as Horizontal Scaling, is a partitioning technique that splits data across multiple servers. It is used to distribute the data in a way that allows for parallel processing, and it scales out read capacity and write capacity.
For example, if you have 10 million users on your site and you are receiving 100 new user requests per second, sharding will help you distribute these requests among multiple shards so they can be processed more efficiently than if they had to go through one server. Sharding also allows you to scale out read capacity by allowing each individual shard to handle its own reads while still maintaining a central database server that handles all write operations.
Although some services provide a MySQL sharding solution, the best alternative is to use Singlestore. Singlestore supports the MySQL protocol and can scale or replace your MySQL, especially for real time analytics. Data-Sleek is a Singlestore partner and we have migrated several MySQL infrastructures to Singlestore to speed up MySQL queries running against tables with billions of rows. Contact us for more info.