We're hiring!

We found a long-lived MariaDB bug in our hosting stack

Article by Erlend Eide. Published on October 1, 2018

Last week we discovered a bug in MariaDB which has been with us on Servebolt for a very long time, and which will affect all others that use the same config setting. The bug has had a very negative impact on certain database queries. We rolled out a platform-wide workaround for the problem last week, and contributed information about the problem to MariaDB. We are now waiting for the amazing MariaDB team to get the bug fixed for one of the upcoming releases, and solve this for anyone that might have used the same configuration option.

We discovered the bug while moving in MyThemeShop.com. They have hundreds of thousands of users living in their WordPress. This makes a very large database, and especially the _user and _usermeta tables. WordPress Admin does several full count queries and a variety of join queries on these tables, on several different admin pages. These queries represent the typical scenario where indexes help speed up the query time a lot.

The symptoms of the bug were quite severe for the queries that were affected. We have not seen any front-end performance degradations for neither WordPress / WooCommerce or Magento, but certain back-end queries led us to the identification of the bug.

The bug and how it manifested itself

The bug is enabled in MariaDB if the innodb_stats_method is set to nulls_ignored. The MariaDB default config has this set to nulls_equal, so standard setups should not be affected. Our workaround for the problem has been to temporarily change this setting to the MariaDB default.

We opened a ticket with a detailed description and the way to reproduce the problem with MariaDB last week. The problem was that cardinality for indexes was equivalent to the table length, which made the database select the wrong indexes to use for certain queries. Cardinality is essential for the database to make the right decision on how to make a specific query as fast as possible. We also found that this bug has previously exited in the MySQL project, so this has somehow come back into MariaDB.

Join queries with large tables would suffer badly. Such queries are common in both cron jobs and certain pages in the back-end of WordPress and Magento. Cron jobs can easily go under the radar, as it is quite normal for large websites to run heavy queries in maintenance tasks. However, in the admin backend of WordPress – the problem manifested itself by using 10-20x more time than on otherwise configured databases.

Queries that were supposed to complete in a few seconds, and after that be cached in the query cache – spent more than a minute before the results were ready, and did not get cached.

Our most sincere apologies!

The Servebolt team is used to seeing a lot of “wrong” use of databases. Queries that do not use indexes is one of the most common database-related problems that plugins or developers make. Therefore, we might have responded to several of our clients that have experienced this problem that “your application is making bad database queries that do not use indexes correctly”, while the slowness has been caused by this bug in MariaDB. Sorry!

It is also possible that we have lost a few clients over this, because their front-end has been super performant like it always is on Servebolt, while they have discovered irregular slowness in the back-end, and not pointed this out to us.

The work-around has eliminated the problem – and for those who we have wrongly accused of using databases the wrong way, we are sincerely sorry. For those of you who might have tested us, and experienced any slowness – please come back and test again, because this issue is now resolved!

The upside of all of this is, that databases now work even faster and better than ever before, for all Servebolt hosted websites – and that the fantastic open source project MariaDB continues to improve!