The scope of MySQL optimization is very wide. One aspect of this complex field is the query performance and this article deals with it. MySQL provides powerful tools, that allow you to have a look inside the query parser & optimizer. To understand this article some MySQL server & SQL knowledge is required. Please be aware, that this article is just an overview about query performance. It can’t cover all aspects of it. This article is a part of the web-application performance series.
A simple calculation
On MySQL.com the company behind MySQL (MySQL AB) published a simple calculation to find out the estimated query performance:
- row_count: number of rows in your table
- index_block_length: normally 1024 bytes
- index_length: key value length (usually 3 bytes)
- data_pointer_length: data pointer (usually 4 bytes)
The result of this formula represents the number of disk seeks for a special query. This formula is not valid for all queries.
SELECT * is bad
SELECT * means that you need all columns from the result set. In most cases, you needn’t the full record (including all columns). But, some programmers are using the * because it is “comfortable”. Consider that a you need more (in some cases huge) space to store the data. The MySQL server spends time on things that are not necessary. In addition to that, PHP must allocate a lot of memory for data, that will never be used. So avoid using SELECT* when it is not necessary.
EXPLAIN is a powerful tool to analyze the query performance. You can use EXPLAIN for every query.
EXPLAIN SELECT * FROM employee WHERE id = 18273;
After sending this query to the server, the result set contains information of the MySQL Optimizer. The optimizer analyzes the query and decides which execution plan would be the fastest and/or cost-effective. Check approximatly all queries to make sure, that you don’t overlook a unindexed or slow query.
The output of the optimizer contains the following columns:
- table: involved tables (if you do a join, then this table contains more than one entry)
- type: type of query (e.g. SYSTEM | CONST | EQ_REF| REF | RANGE | INDEX | ALL, ordered by the fastest in a descending order)
- possible_keys: all possible keys for this query
- key: finally used key
- key_len: the length of the key (the length correlates with the performance)
- ref: shows which column or constant would used to compare the value of key
- rows: estimated number of rows to check
- Extra: additional information about how MySQL executes the query
If you want to use an index that MySQL don’t use in the execution plan, then you can force MySQL to use this index with:
SELCT lname, fname, birthday FROM employee FORCE INDEX(lname)...
You should use LIMIT, if you have a lot of rows in your result set. It saves a lot of time, if you’re using LIMIT. So, you can build your query, after your pagination variable is evaluated.
SELECT lname,fname FROM employee LIMIT 0, 30
or with the application of a pagination:
SELECT lname, fname FROM employee LIMIT ($page * $rows_per_page), $rows_per_page
Caching queries effectively
MySQL implements a query cache. This cache saves the execution plan for your queries and the results. Until the table is not changed, the query cache holds the queries in the cache (depends on the possible memory). The most common fault with the query cache is the innocence of how the query cache works. It works case-sensitive! This can be tricky if you use the same query more than one times. Then, you have to establish query coding-standards in your application, to keep the cache smart, efficently and economical.
SELECT * FROM employee WHERE id > 10 AND id < 10000
is not the same like:
select * from employee where id > 10 AND id < 10000
Split up complex queries
At a certain level, some queries can perform really slow. This will happen especially by joins, correlated queries, large tables and such complex constructions. Then you achieve a better performance with splitting this one query into more queries. Test this queries against the EXPLAIN command to check which query performs well and which not. But, be aware that before splitting a query up , the first option should be to check the indexes and the query cache.
SQL_CACHE vs SQL_NO_CACHE
If you are performing statistical queries to update user statistics and so one you should use the SQL_NO_CACHE directive. This directive is a part of the query cache. If you set the value of query_cache to 2, then you can use this directive. Use SQL_NO_CACHE for all queries, that you don’t need in your application. Because the space of your query cache is limited and shouln’t be wasted.
LIKE with %
‘%’ is known as the wildcard operator in MySQL. If you have an index on a column it can perform fast.
SELECT * FROM employee WHERE lname LIKE('New%');
This works fast, if you have an index on lname. The example below don’t works fast, because the index couldn’t be used:
SELECT * FROM employee WHERE lname LIKE('%man');
So, try to avoid the wildcard at the beginning of the like-clause. An considerable option can be to reverse the values of this column.
LIKE vs. Full-text
For simple searches you should use the LIKE construct. But, in case of a complex search (for instance a complete word), you should use the full-text search of MySQL. MyISAM is able to perform a fast full-text search on TEXT and VARCHAR() columns. The full-text indices are stored in a B-tree implementation with two attributes (VARCHAR and FLOAT). The VARCHAR-field contains the word and the FLOAT-field contains the weight of the word in this row. This indices can get really big with a large amount of data. But nontheless, this alternative is a very good performance boost.
select description, author from articles match (articles_text) against ('cip-labs');
select description, author from articles where articles_text LIKE ('%cip-labs%');
Some people prefer to use the INSERT_DELAY constant to handle all read-queries fast and after that, insert the new data. Sure, this sounds logical and optimized. But, this isn’t the case. When your load is high, the write with the INSERT_DELAY can wait up to infinity. In this case your users see data, that is deprecated.
When you send a simple SQL-statement to the server, it parses your query everytime. In MySQL this is a really costly process. After that, the MySQL optimizer creates an execution plan for this query. With prepared statements the server will do this just at the first time, when the prepared statement would be send to the server. For the following queries with just another value of the bind variable, the server skips the parsing, optimizing and creating an execution plan. For any information about prepared statements check out the MySQL manual.
- avoid calculated comparisons between values
- don’t use DISTINCT and GROUP BY together
- avoid correlated subqueries