Apr
20

MySQL is a very powerful and popular database server for web applications. It’s a open source project with a big community. A lot of people deal with the performance of this server to make their applications faster on small machines. This article deals with the server tuning, especially the tuning that you can do in the my.cnf configuration file of MySQL. Be aware, that the performance of a database server depends on a few more factors than server tuning via the configuration file. This article is a part of the web-application performance series.

Query cache

The query cache is a cache that caches the execution plan and the result of the queries. When a query comes in, the query-parser parses the query, after that the query-optimizer creates an execution plan (based on heuristics) and executes the query. This cache is a very powerful tool, that brings an enormous speedup. Be aware that the query cache acts case-sensitive, which means that it differ between “SELECT” and “select”. A query, that will return the same result set, but in file A it starts with “SELECT….” and in file B it starts with “Select” have two different entries in the query cache. This is not good, because the unnecessary & occupied space can’t be used for another important query. Below i listed some important variables to control the behavior of the cache:

Query cache status

To have an imagination of how the cache works you can type:

SHOW STATUS LIKE "qc%";

into the query window or the shell to get status information from the cache. Here, i listed the Qcache status information of my database server.

Variable_name Value
Qcache_free_blocks 493
Qcache_free_memory 20752864
Qcache_hits 130628
Qcache_inserts 246940
Qcache_lowmem_prunes 0
Qcache_not_cached 203444
Qcache_queries_in_cache 1160
Qcache_total_blocks 2885
  • Qcache_free_blocks: number of free memory blocks in cache
  • Qcache_free_memory: amount of free memory space
  • Qcache_hits: sucessful reads from the cache
  • Qcache_inserts: number of all inserted queries
  • Qcache_lowmem_prunes: number of queries that where deleted from the cache because of reaching query_cache_size
  • Qcache_not_cached: number of non cached queries
  • Qcache_queries_in_cache: number of queries that are currently in the cache
  • Qcache_total_blocks: number of total blocks in the query cache

query_cache_type

This variable accepts three values.

  • 0 – The cache is turned off. Don’t cache anything.
  • 1 – The cache is turned on. Cache anything, but not such queries that contain time functions and other special aggregate functions
  • 2 – The cache is turned on demand. It will cache all queries that are marked with SQL_CACHE, for instance (SELECT SQL_CACHE id, name, telnr FROM employee;)

Value 2 is just useful if you performing a lot of queries on your server with large result sets and they are just needed to display statistics in the backend. You should cache just queries that you are need on the production system.

query_cache_size

The cache size points out the amount of memory that the Qcache can allocate. To set the value of this variable to an optimal size, you have to study the behavior of your server. For some cases it can be useful to set 16M (which stands for 16 Megabyte) but some applications need more than 16 M. To optimize this, you can test the status of the cache, as described earlier, and keep your focus on Qcache_lowmem_prunes, Qcache_free_blocks and Qcache_free_memory.

query_cache_limit

This variable means the maximum size of a single cache entry. It depends – again – on the size of your result sets. If you have very long tables, then it would be useful to increase the value of this variable. There is no general solution, but a good value for small tables is 1M or 2M.

Key buffer

The key buffer represents the number of space that MySQL can use to build indexes and keys. This buffer is very powerful and the value should be very high (depends on your application). Indexes of large tables can be large too. Study the server status output for more information. Important variables are Key_reads (number of key-read operations that result in a disk access) and Key_writes (number of key-write operations that result in a disk access). Both values should be as small as possible.

key_buffer_size = 32M
#for large tables
key_buffer_size = 512M

Table cache

The table cache holds open tables in the memory to allow a quick access. The improvement of avoiding a disk access and a table scan is very huge. Keep your focus on Open_tables and Opened_tables. Open_tables represents the number of open tables in the table cache. These tables are currently in the memory. The opponent Opened_tables represents the number of tables that are not in the memory (the query results in a table scan). So it is your task to increase the table cache, that you have no Opened_tables. But this is the ideal case. In real-life applications you’ll never have 0 Opened_tables.

#for small appliactions
table_cache = 128
#for huge applications
table_cache 256

Some database administrators think, that FLUSH TABLES would be a good alternative to increasing the table_cache. Well, this is a common mistake. FLUSH TABLES only flushes the table cache. To do so is good, to force a quick reindexing of the cache, but it isn’t an alternative to increase the table cache.

Sort buffer

The sort buffer is used to do sort operations like ORDER BY or GROUP BY in the memory. If this buffer would be to less, MySQL continues the sort operation on the hard drive. This is a very slow process and costs a lot of time. The indicator of the right sort_buffer value is to check the Sort_merge_passes variable. This variable shows how many merge operations MySQL need and it should be 0 for optimal performance.  Is Sort_merge_passes unequal 0 then you have to increase the value of sort_buffer. Be experimental and check the status of your server for sort buffer optimization.

Read rnd buffer

The read_rnd_buffer is used after the sort operation to read the sorted rows. Sometimes the standard value, which is 256K is not enough. There is an rule which tells us to use 1M for every Gigabyte of RAM that our server has on board. If you have not to large tables 1M is a good value.

read_rnd_buffer_size = 1M

Tmp tables

Temporary tables are important for getting a fast result set. If your tables become to large, MySQL will create temporary tables on the disk. This is really slow and not practicable for high performance applications. The value of tmp_table_size must be a generous value. Created_tmp_disk_tables is the opponent of tmp_table_size and represents the number of tables that MySQL creates on the disk. It must be your goal to decrease this value to the minimum.  After watching the current value of Created_tmp_disk_tables you can decide to increase the tmp_table_size or not.

#default value
tmp_table_size = 32MB
#for smaller tables, this value would be better
tmp_table_size = 64 MB

Thread cache

Every connection to the MySQL server ends up in a thread. The thread cache starts after the connection to the client is finished. It holds the thread in the cache to avoid the overhead of instantiate a new thread. An old thread would be used to handle incoming connections. You can check the performance of your thread cache with have a look on Threads_cached and Threads_created. Threads_cached is the number of threads in the cache and Threads_created represents the number of created threads. Increase your thread_cache_size to improve the performance. Note: The thread cache was build to handle high thread concurrency in a fast way.

Join buffer

If you do a join in your query, the MySQL server creates a result table. Especially a full join needs a large join buffer. So, increase the size of your join buffer, that the server must not create a joined table on the disk. This could end in a very high loss of performance. Set join_buffer_size to a generous value. Joins are often a case for the slow-query-log. Try to avoid large queries and check the slow-query-log to detect slow queries.

Max connections

The max_connections value represents the number of connections that the server accepts (simultaneous connections). If you have a high concurrency on your database server, increase this value to handle all connections at the same time. But be aware that the number of file descriptors increases with the number of simultaneous connections. To avoid incorrect handled connections, be careful with this variable. The value depends on the hardware in your box.

#for small applications
max_connections = 500
#for larger applications
max_connections = 5000
Apr
20

The scope of MySQL optimization is very wide. One aspect of this complex field is the query performance and this article deals with it. MySQL provides powerful tools, that allow you to have a look inside the query parser & optimizer. To understand this article some MySQL server & SQL knowledge is required. Please be aware, that this article is just an overview about query performance. It can’t cover all aspects of it. This article is a part of the web-application performance series.

A simple calculation

On MySQL.com the company behind MySQL (MySQL AB) published a simple calculation to find out the estimated query performance:

log(row_count)/log(index_block_length/3*2/(index_length+data_pointer_length))+1
  • row_count: number of rows in your table
  • index_block_length: normally 1024 bytes
  • index_length: key value length (usually 3 bytes)
  • data_pointer_length: data pointer (usually 4 bytes)

The result of this formula represents the number of disk seeks for a special query. This formula is not valid for all queries.

Using EXPLAIN

EXPLAIN is a powerful tool to analyze the query performance. You can use EXPLAIN for every query.

EXPLAIN SELECT * FROM employee WHERE id = 18273;

After sending this query to the server, the result set contains information of the MySQL Optimizer. The optimizer analyzes the query and decides which execution plan would be the fastest and/or cost-effective. Check approximatly all queries to make sure, that you don’t overlook a unindexed or slow query.

The output of the optimizer contains the following columns:

  • table: involved tables (if you do a join, then this table contains more than one entry)
  • type: type of query (e.g. SYSTEM | CONST | EQ_REF| REF | RANGE | INDEX | ALL, ordered by the fastest in a descending order)
  • possible_keys: all possible keys for this query
  • key: finally used key
  • key_len: the length of the key (the length correlates with the performance)
  • ref: shows which column or constant would used to compare the value of key
  • rows: estimated number of rows to check
  • Extra: additional information about how MySQL executes the query

If you want to use an index that MySQL don’t use in the execution plan, then you can force MySQL to use this index with:

SELCT lname, fname, birthday FROM employee FORCE INDEX(lname)...

LIMIT

You should use LIMIT, if you have a lot of rows in your result set. It saves a lot of time, if you’re using LIMIT. So, you can build your query, after your pagination variable is evaluated.

SELECT lname,fname FROM employee LIMIT 0, 30

or with the application of a pagination:

SELECT lname, fname FROM employee LIMIT ($page * $rows_per_page), $rows_per_page

Caching queries effectively

MySQL implements a query cache. This cache saves the execution plan for your queries and the results. Until the table is not changed, the query cache holds the queries in the cache (depends on the possible memory).  The most common fault with the query cache is the innocence of how the query cache works. It works case-sensitive! This can be tricky if you use the same query more than one times. Then, you have to establish query coding-standards in your application, to keep the cache smart, efficently and economical.

SELECT * FROM employee WHERE id > 10 AND id < 10000

is not the same like:

select * from employee where id > 10 AND id < 10000

Split up complex queries

At a certain level, some queries can perform really slow. This will happen especially by joins, correlated queries, large tables and such complex constructions. Then you achieve a better performance with splitting this one query into more queries. Test this queries against the EXPLAIN command to check which query performs well and which not. But, be aware that before splitting a query up , the first option should be to check the indexes and the query cache.

SQL_CACHE vs SQL_NO_CACHE

If you are performing statistical queries to update user statistics and so one you should use the SQL_NO_CACHE directive. This directive is a part of the query cache. If you set the value of query_cache to  2, then you can use this directive. Use SQL_NO_CACHE for all queries, that you don’t need in your application. Because the space of your query cache is limited and shouln’t be wasted.

LIKE with %

‘%’ is known as the wildcard operator in MySQL. If you have an index on a column it can perform fast.

SELECT * FROM employee WHERE lname LIKE('New%');

This works fast, if you have an index on lname. The example below don’t works fast, because the index couldn’t be used:

SELECT * FROM employee WHERE lname LIKE('%man');

So, try to avoid the wildcard at the beginning of the like-clause. An considerable option can be to reverse the values of this column.

Other considerations

  • avoid calculated comparisons between values
  • don’t use DISTINCT and GROUP BY together
  • avoid correlated subqueries
  • use the FULLTEXT search and avoid LIKE
Apr
17

This short article i have focused on a special field of PHP optimization. It deals with the php.ini.

The php.ini file is the configuration file of PHP. It contains directives and settings, that affect the execution of the PHP script. The possible modifcations, that you can do to increase the speed of execution, are very special. You have to decide which of the settings, that are listed below, seems wise to you and your application. This article is a part of the web-application performance series.

realpath_cache_size & realpath_cache_ttl

PHP uses a realpath-cache to cache the path of include/required-files. This cache was introduced in PHP 5.10 and brought a high improvement of speed into PHP. The default value should be 16K, which is a really good ratio.

register_globals

The default value since PHP 4.2.0 is set to Off, nevertheless you should convince yourself, that this value isn’t  set to On. On the one hand, register_globals are well known as a big security issue and on the other hand PHP creates all variables in $_GET, $_POST, $_COOKIE, $_REQUEST, $_SERVER and $_SESSION in register_globals variables, too. This costs a unnecessary amount of time!

always_populate_raw_post_data

Is always_populate_raw_post_data set to On, PHP fills $HTTP_RAW_POST_DATA with raw data, that comes in with POST. Turn this directive to Off,  if you’re not planning something special with it. It saves time and memory.

register_long_arrays

This directive copies all data into the outdated $HTTP_*_VARS. You can access this data for instance with $_GET. You should turn register_long_arrays to Off to save time and memory.

expose_php

To tell the webserver which version of PHP is installed, PHP sends an information string to the webserver. Set this directive to Off, to avoid an unnecessary string copy. In addition to that, it is safer to tell just the necessary information and not more.

register_argc_argv

If you are using your PHP scripts with a webserver, then you don’t need argc (number of command line arguments) and argv (array that contains the command line arguments). Is register_argc_argv set to On, then PHP tries to parse and copy this variables into the symbol table of the script. This takes unnecessary time. Turn it to Off to save time.

asp_tags

The asp_tags directive points out, that the PHP detects also ASP tags like <% and %>. It’s not best practice to use ASP tags in PHP scripts. If you don’t use these tags, you should turn Off the asp_tags directive to save time.

Apr
15

Lighttpd (often called Lighty) is a small and popular web server. Some large web applications are using lighttpd to serve their content (youtube, imageshark.us, myspace). The opponent of lighty is apache2 which is process based web server. But the big advantage of lighty is the thread model that makes the handling of the request really slim. This article would give short introduction into speeding up lighty. This article is a part of the web-application performance series.

Maximize the number of Connections

Lighty comes out with maximal 1024 open file descriptors by default. You can modify this in the configuration file lighttpd.conf. For running the server in a production environment with heavy load you have to increase this value. Set it to 2048 or 4096 (depends on your hardware). Be aware, that a simple PHP request needs 3 file descriptors in lighttpd. Now, you can calculate how many file descriptors do you need to serve your load. The 3 file descriptors are used for handle-TCP/IP-with-user, socket-to-FastCGI and Check-whether-a-file-exists.

#number of open file descriptors
server.max-fds = 2048

stat_cache

Lighty has an intelligent stat handling. The stat command is used to get file system information from a specific file. The stat_cache caches this requests to avoid an unnecessary number of stat commands an the associated accesses to the hard drive. Another option to handle something like stat() is FAM (file alternation monitor). FAM comes out with a deamon, that monitors all files. You can enable the stat_cache with the following line in you configuration file:

#enable the simple stat_cache
server.stat-cache-engine = "simple"
#or with a running fam deamon
server.stat-cache-engine  = "fam"

Keep-Alive vs. Close

If you are running a web application with high concurrency you have to consider about keep-alive requests. It’s not good to keep your file descriptors alive and let them idle. When you do so, you waste a lot of resources. It would be a huge performance improvement to set server.max-keep-alive-requests to 0 and avoid unused file descriptors and threads.

server.max-keep-alive-requests = 0

Using XCache

XCache is a opcode cache coded by the lighty labs. It improves the performance of Lighttp+PHP by caching the opcode into the SHM (shared memory segment). So, the interpreted PHP file comes directly from the RAM. This is approx. more than 5 times faster than a simple PHP request on lighttpd.

Important configuration settings are xcache.ttl, xcache.size, xcache.cacher and xcache.optimizer. Xcache.ttl is the time to life for the cached op-code file, before it would be marked as invalid. If this value was set to 0, then the time to life is endless and the file would never be marked as invalid. The xcache.size option points out the maximum allowed size of a cache file (memory mapped file). To turn the cache on or off you can use the xcache.cacher directive with “on” or “off”.

Below you can see an example:

xcache.shm_scheme =                "mmap"
xcache.mmap_path =              "/var/cache/xcache.mmap"
xcache.size  =                  64M
xcache.count =                  1
xcache.ttl   =                  0
xcache.slots =                  8K
xcache.gc_interval =            0
xcache.readonly_protection =    Off
xcache.cacher =                 Off
xcache.var_gc_interval =        300
xcache.stat =                   On
xcache.optimizer =              On
xcache.var_count =              1
xcache.var_size  =              0M
xcache.var_slots =              8K
xcache.var_ttl   =              0
xcache.var_maxttl   =           0

Mod_compress

This module is used to compress files with gzip, bzip2 and deflate. Compressing files reduces your bandwith and the throughput. Be aware, that any compression needs time and cpu load. To save cpu time, the results of the compression can be cached. To do so, mod_compress needs a few configurations:

compress.allowed-encodings = ("bzip2", "gzip", "deflate")
# you have to create the cache directory! the web server
# is not able to do that for you
compress.cache-dir = "/var/www/cache/"
#add all filetype you need pictures, php files etc.
compress.filetype           = ("text/plain", "text/html")

For more infos check the manual.

Mod_expire

Mod_expire is used to control the cache headers of lighttpd. To cache your static files, this module is all you need. It excepts the following directives:

expire.url = ( "/images/" => "access plus 20 days" )

To learn more about the syntax and how to use this module check out the manual.

Apr
14

This article introduces you to the optimization of the most used web server called Apache2. It gives you just an overview about the optimization of the configuration file and shows you how to install useful modules. Of course, there are some other optimizations with Apache2, for instance the code optimization, but this is not in the scope of this article. This article is a part of the web-application performance series.

Hostname lookups

Is HostNameLookups set to on Apache2 does a hostname lookup for every IP. You don’t need this functionality, because it has no impact on the response and it needs a lot of time. So, set this directive to off.

HostNameLookups Off

Keep-alive

To keep alive a HTTP connection means to let the file descriptors open to handle the next request (from the same user) faster. The idea of keep-alive is cool, but it only makes sense if you have a small amount of users. Otherwise you have open file descriptors that need unnecessary resources. So, it would be better to set this mechanism to Off to serve a high number of requests faster. Modify the configuration file:

KeepAlive Off

Adapt workers

The optimal number of workers (multi-thread & multi-process module) is important for a well working webserver. On the one hand you can have to less workers and on the other hand you can have to much workers. Both cases are not desirable. The problem is, that you have to test the server under a realistic load. You can control the behavior of the Apache2 with the following calculation:

# ServerLimit * ThreadsPerChild = MaxClients
ThreadLimit 50
ServerLimit 30
StartServers 5
MaxClients 1500
MinSpareThreads 30
MaxSpareThreads 50
ThreadsPerChild 50

The setting depends on the power of the hardware. If you have a server that have only one task (to respond HTTP requests), you can increase this settings to make the server more powerful.

Apache benchmark

To test your web server configuration/performance you can use ab (Apache Benchmark) which would be delivered with the Apache2 binary (If not, download it afterwards). You can use this benchmarking tool with:

ab -n 1000 -c 100 http://example.com/
  • n: number of total requests
  • c: number of concurrent requests
  • url: url to test

The tool sends out 1000/100 waves with 100 concurrent requests. Note that you can plot the output with gnuplot.

Mod_expires

Apache2 handles the expires header with mod_expires. The expires header is a HTTP header, that tells the browser, how long the transfered file is valid. You can turn on this module with typing:

 a2enmod expires

into the shell. After that you can configure the web server (httpd.conf):


ExpiresActive On
ExpiresByType text/html "access plus 2 hours"
ExpiresByType text/xml  "access plus 2 hours"
ExpiresByType image/jpg "access plus 10 weeks"
ExpiresByType image/gif "access plus 10 weeks"
#add all types of tiles that you need

The webserver then adds the expires header to the files. It automatically calculates the timestamp after your settings.

Mod_headers

This module is used to append headers to the HTTP response. After enabling mod_headers with

a2enmod headers

you can use the functionality with adding the following line to the configuration file:

Header append Cache-Control "public"

Mod_deflate

This module is used to compress the server output. You have to a2enmod this module and after that you can use it by typing this into the configuration file:

SetOutputFiler DEFLATE

AddOutputFilterByType DEFLATE text/plain text/html text/htm
AddOutputFilterByType DEFLATE application/javascript
#add all types that you need

Additional

In addition to that, Apache2 has some more modules to improve the performance of the server. Below i listed these modules with the link to the documentation:

Apr
13

This article deals with some PHP code optimizations. The list of tipps & tricks  is not complete at all and i will try to add more tests. I did this tests on windows and linux environments to make safe that the results are rough valid. It’s really important for a good PHP developer that he or she is aware of some performance aspects of the code. The following tipps are a best practice example of coding smart PHP code. This article is a part of the web-application performance series.

Echo vs. print and concatenation

The comparison of the two output options is the oldest of the PHP-scene. It’s wrong to assert that echo ist faster than print because of the procedural overhead that the print instruction produces. These two instructions are both builtin-constructions of the PHP-language and not, as often assumed, print is a function and echo a language-construct. The only difference versus echo and print is, that echo has no return value but print returns always 1 (see: manual). The next Problem is the often unknown difference between outputing a couple of substrings with echo and concatenate substrings to a longer string. The first operation runs with the commata operator and the second one with the point operator. It’s the normal case to use echo with the point-operator, but it’s not best practice.

$str1 = “Test1“;
$str2 = “Test2“;

echo “Hello world, this is “ . $str1 . “  & “ . $str2 . "!";

If you have long sub strings, then your concatenation will take a very long time. Below you can see the best practice example:

$str1 = “Test1“;

$str2 = “Test2“;

echo “Hello world, this is “ , $str1 , “  & “ , $str2,"!";

Quotes

The myth of optimizing quotes is often not welcome in the PHP-community, but the benchmark-tests shows us, that single quotes are a bit faster than double quotes. The reason is, that the parser is looking for variables which could possibly be in there. And this process takes more time, than the simple text parsing. You should use double quotes only in the case, that you want to put variables in the string.

Search strings

We compared strstr() and strpos() with each other. The task is to find a substring in a string. Below you can see the two scenarios:

$sub = '@';

$str = '[email protected]';

//the first way

if(strstr($str, $sub) !== FALSE) echo 'found';

else echo 'not found';

//the second way

if(strpos($str, $sub) !== FALSE) echo 'found';

else echo 'not found';

After testing these two ways in a loop, i noticed that strpos() works faster and uses less memory than strstr().

Comparison of the starting characters

To compare the first n token of two strings, you have 2 options. You can use strncmp() with is the smarter one or you can use substr(). The result should be clear. With just one PHP function call the script performs faster. The best practice scenario is strncmp():

$str1 = 'Teststring1;

$str2 = 'Test_string1';

$len = 4; //compare the first 4 tokens

if(strncmp($str1, $str2, $len) === 0) return true;

else return false;

if(substr($str1,0,$len) == substr($str2,0,$len)) return true;

else return false;

The result of the test shows that strncmp() is faster than the substr() alternative. The advantage of using strncmp() inceases with the length of the string. Another scenario could be, to use strpos(…) like strncmp() to compare the starting characters. I did this test and the result points out that strncmp() was faster than strpos() (but not significant).

Counting operations (count(), sizeof() und strlen())

If you want to know the number of elements in an array to work with it, you can use the sizeof() (or count()) method to get it. It’s often used, but not best practice , to implement these methods in the condition area of loops. It would be better to get the number of elements before the loop starts. Otherwise you count the elements in every single iteration. Note, that count() is faster than sizeof(). The reason is, that sizeof() is just a synonym of count().

//wrong way

for($i = 0; $i < count($arr); $i++){

//magic

}

//right way

$size = count($arr);

for($i = 0; $i < $size; $i++){ //magic }

Read loops and modify loops

Often, you have to modify data in complex structures , such an array. Here you can use the foreach()-construction of PHP. Beware, that the performance of this code depends on the operation that you want to perform. After testing a set of operations (reading, modifying, unset) i noticed, that read-operations perform better with foreach(). The modification of an array works better with the for-loop. The reason is, that PHP does more hash-lookups, if you use it with foreach() and $key => $value. Note, that this loop performs faster if you get $value by reference.

//best practice for reading

foreach($arr as  &$value){

//magic with $value

}

To modify an array you should use something like this.

//best practice for modification

$keys = array_keys($arr);

$c = count($keys);

for($i = 0; $i < $c; $i++){

//magic with $arr[$keys[$i]]

}

Testing array index

To find out whether a array index exists or not you can use the in_array() function or the isset() function.  The in_array() function expects two parameters. The first parameter is the $needle (the index to find) and the second one is $haystack (array). PHP searches into $haystack for $needle. The isset() function ( just expects the variable) simply searches in the symbol table whether this index is set or not.

in_array('test', $array);
isset($array['test']);

I found out that isset() is faster than in_array(). Do not think, that this improvement isn’t significant. The speed improvement increases proportionally with the size of the array.

File handling

You have more than one option to perform file handling in PHP. A simple read from a file can be performed by file() and file_get_contents(). Be aware that the first one returns just an array with all rows. You have to implode this array with a glue (e.g. a whitespace) to use it as a string. The second function returns exactly this string. In most cases you need the content as a string. Use file_get_contents() to do this work. It is more than 200% faster (depends on length). If you need the rows to do your task, then it would be faster to use the file() function.

Replacing strings

If you want to replace strings in PHP you can use str_replace() and preg_replace(). For simple patterns you should use str_replace(). This function is much faster than preg_replace() because it works stupid and simple. But beware,that preg_replace() is much faster, if you have a complex pattern. One call to preg_replace() is smarter than 2 or more calls to str_replace().

Constants & configuration

Defining global constants is a mighty method for large web applications to bring some flexibility into it. To realize this, PHP implements a method called define(). Since PHP supports OOP you can define constants in classes. This method is not only better structured but also faster than the define() alternative. It’s more comfortable to build a configuration class, that contains all constants.

//old way

define('DB_HOST', 'localhost');

define('DB_USER', 'mysql');

define('DB_PW', '**************');

define('DB_DB', 'application');

//other way

class DB{

const HOST = 'localhost';

const USER = 'mysql';

const PW = '**************';

const DB = 'application';

}

Statistical probability (LazyEnd)

Sometimes you have to do some decisions about the program flow of your application. Here you would have a significant increase on performance if you let the parser perform the small functions, before it performs the expensive functions.

if(expensive_function() || small_function()){

//magic

}

//is significant slower than

if(small_function() || expensive_function()){

//magic

}

But why? The parser performs from the left to the right. When the function on the left returns true, then the complete condition becomes true. The parser don’t execute the second function, because it is not necessary. This works for logical AND too. Put the small function or the fastest expression on the left site. In the case of the logical AND becomes the complete condition wrong, if one expression is wrong.

Another statistical optimization aspect is to consider well about the structure of if-else and switch instructions. You should keep the conditions, that are most likely, on top of you if/else or switch construct, to avoid work and save time.

If vs. switch vs. ternary operations

The general opinion about If vs. Switch is, that the If/else construct have a better implementation and therefore it performs faster. I tried to get light into the dark with browsing the code and the internals of PHP. It’s right, that the If/else implementation looks smarter and so i can confirm the general opinion. But there is a construct that works faster than If/else and it’s called ternary operations. Below i listed an example of the simple usage of a tenary operation (a smarter handling of If/else):

return ($condition)? $if_value : $else_value;

Increment operations

To increment an integer variable (e.g. during a loop iteration) you have a few options. For that, PHP provides the pre- and post-increment operator. Here we put the light on the performance of this operators. The fastest method is to use the pre-increment ++$i, because the first step is the increment and after that PHP returns the new value. This is done in approx. 1 cycle. The second-fastest is the post-increment $i++. Here, PHP uses another procedure to handle this. And the third-fastest is the normal assignment, because PHP can add, divide, etc. and with a variable or a number. That makes this operation trickier and even slower for the execution.

Dec
01

Wordpress. A word that stands for an established system, a nice backend to write and edit articles but also for a frontend which performs slow and heavy. On cip-labs I developed a cache for this system. The cache is very easy to install and does the caching in a very very simple way.

How it works

It works quite simple. I had modified the index.php of the wordpress installation. When a request comes in,  an object of CIP_WP_Cache would be created. This object represents the cache. It looks up in the cache directory and tries to find the file which is requested. The files were stored in a directory defined in CIP_CACHE_DIR. If the file was found in cache (files are named like md5($_SERVER['REQUEST_URI'])), the cache will responds it and then shutdown. The benefit of doing so is, that the wordpress engine would never reached during the whole request. But if the file was not found in the cache, the CIP_WP_Cache captures the output buffer of the wordpress engine and stores the result in it’s cache. For the next requests to this URI, the cache responds the file from the cache.

That’s it, it does nothing more or less. Of course, you need more functionality to control the cache and I will write a plugin for Wordpress which allows you to control the behavior of the cache. In the following I have a few benchmark results from my recent tests:

Benchmark

The results of the benchmark are clear. The heavy wordpress engine (avg. 13 MB ram usage) battles against a static file. I used ab -n 1000 -c https://cip-labs.net/ to test the performance of the cache. I think the specifiation of the benchmark environment is regardless for this benchmark.

without cip-cache

  • time taken for requests: 311.653 sec
  • requests per second: 3.21

with cip-cache

  • time taken for requests: 1.897 sec
  • requests per second: 526.88

To sum up this benchmark you can see, that cip-cache serves faster. The cache works with a TTL directive (called CIP_CACHE_TTL), which you can configure in the cip-wp.php file. When cip-cache creates the cache file, it runs through the wordpress engine and after that, it writes the output buffer to the cache directory.

How to Install

At first, you have to overwrite the index.php from your Wordpress installation with the index.php from the source package (copy both files (cip-wp.php and index.php) into the directory which includes the index.php). The second step is to configure the cache with the CIP_CACHE_DIR and CIP_CACHE_TTL directives. Create a directory for the cached files and make sure, that you have set all required permissions (777).

  • CIP_CACHE_DIR: should be the directory that includes the generated cache files
  • CIP_CACHE_TTL:  time to life (in seconds)

That’s it.

Wishlist

  • control the cache behavior with a small Wordpress plugin (clear cache [item or all], set TTL, set cache dir, add/remove routes to control paths to cache)
  • cache just for readers and not for admin’s or users to avoid html fragments that should be not displayed on the readers view

goto: project page

goto: download version 1.0.0

Sep
22

For a long time I’m really intrested in performance aspects of content management systems and smarter blogging systems and that’s the reason for the decision to test some of the popular systems on a workbench and to get some technical information about these systems. The test was quite simple. I installed the packages on my sandbox server and added a function (cip-bench()) to the installation. Then I run the index page with the default template and configuration. The data I got from the test was limited on the raw index page after the installation. I picked up 5 aspects for the test. The first one was the memory usage of the system, the execution time, executed database queries, how many database tables exist and the last parameter shows how many files are required. It is interesting to see how different some CMS solve their tasks. I was suprised of some results for example 399 database queries of contenido.

To sum up this test I was impressed by chyrp. It’s delivered with an elegant backend and I think it has got a lot of potential to become more popular and famous. The memory usage of wordpress seems to be improved in contrast to previous versions.

blog

name memory avg time queries tables required files
chyrp 5.556 MB 0.3 – 0.5 7-10 8 63
geeklog 6.97 MB 0.6 – 0.7 59 50 38
serendipity 6.773 MB 0.5 – 0.55 11 21 48
textpattern 2.823 MB 0.2 – 0.3 21 17 12
wordpress 12.044 MB 0.4 – 0.6 15 11 73

cms

name memory avg time queries tables required files
cmsmadesimple 7.543 MB 1.1 – 1.48 38 – 52 52 92
contenido 9.562 MB 0.6 – 0.9 254 – 265 (399) 76 123
impressCMS 10.938 MB 0.5 – 0.6 53-55 57 139
joomla 6.289 MB 0.7 – 0.8 7 – 11 33 127