Sunday, July 17, 2011
Database Performance Optimisation Techniques
A while back I was working on a project which involved database underperformance of a web based application. Obviously, there are many ways through which to achieve this. Some of these methods are outlined below and include modern as well as well as traditional techniques.
Of late, there has been a trend towards so called non-relational/NoSQL type databases which scale horizontaly rather than vertically. The reasoning behind these thoughts seem to be based on a paper by Edward Schaffer that seems to imply that since the UNIX/Linux shell is capable of producing almost any result required through piping of various commands and since it doesn't have to be abstracted to run across multiple operating systems it should theoretically be able to build a high performance operating system off of this particular base. The exact quote is as follows.
"The UNIX file structure is the fastest and most readily-available database engine ever built: directories may be viewed as catalogs and tables as plain ASCII files. Commands are common UNIX utilities, such as grep, sed and awk. Nothing should be reinvented.
NoSQL was born with these ideas in mind: getting the most from the UNIX system, using some commands that glue together various standard tools. Although NoSQL is a good database system, this is not a panacea for all you problems. If you have to deal with a 10 gigabytes table that must be updated each second from various clients, NoSQL doesn't work for you since it lacks of performance on very big tables, and on frequent updates you must be in real time. For this case, I suggest you use a stronger solution based on Oracle, DB2 or such packages on a Linux cluster, AS/400 or mainframes.
However, if you have a web site containing much information and more reading occurs than writing, you will be surprised how fast is it. NoSQL (pronounced noseequel, as the author suggests) derives most of its code from the RDB database developed at RAND Organization, but more commands have been built in order to accomplish more tasks."
Obviously, there are problems with using this relatively these relatively immature technologies. Research seems to indicate that depending on the implementation being used they can be hard to modify, ad-hoc reporting can be difficult to produce, and that there is a general lack of expertise within the IT community in this field since almost organisations have no need to switch towards its use and existing relational type databases offer good enough performance as well as a mature toolset and other reporting capabilities.
Although I haven't personally experimented with in memory databases there's no questioning their usefullness. Obviously, a database that runs purely from RAM is going to be faster from both an access and transfer perspective than one that requires the use of secondary storage (even though this may change with the advancement of SSD technology and RAM based cards that have somehow been modified to act as secondary storage).
"Inefficient schemas. Adding indexes can help improve performance. However, their impact may be limited if your queries are inefficient because of poor table design that results in too many join operations or in inefficient join operations. Schema design is a key performance factor. It also provides information to the server that may be used to optimize query plans. Schema design is largely a tradeoff between good read performance and good write performance. Normalization helps write performance. Denormalization helps read performance."
Due to the fact that databases are 'abstractions' that run on top of existing software/hardware the usual performance bottlenecks should be looked at. These include, hardware specification, RAID level.operating system (and tuning options), and even network issues can play a factor especially if using distributed database being used and geography starts to become a problem. Strategies in dealing with these problems are outlined in the, 'Building a Cloud Computer Service' document, http://sites.google.com/site/dtbnguyen/
Obviously the type of join as well as the number of joins can impact on performance.
Use of indexes (whether they are partial, complete over table, compound, full-text, and/or column based only).
While there are two main database engines (InnoDB, MyISAM) there are also others that cater specifically to certain types of transactions.
"Yes, each MySQL storage engine will perform differently but the result will depend on the type of transaction. Reads, Writes, Transaction size, HA requirements are a few of the factors that would effect performance for each storage engine choice.
Prior to MySQL 5.5 the default storage engine is MyISAM which performs great on reads. With MySQL 5.5 the default storage engine is InnoDB which has improved performance over prior releases and allows a nice mix of performance and ACID compliance. These are the 2 most popular storage engines but there are other storage engines like memory, blackhole, ndb and others that offer a unique value for specific applications.
Generally speaking the safe bet is to use InnoDB for most of your tables and revisit the choice after you hit a performance bottleneck and are looking to tune things. The nice thing is it will be a simple task to change the storage engine for a table and provided the table size is not too great it shouldn't take too long to change and test each storage engine for your specific application."
While I have never personally used database proxy technology it is clear that they have their place. Like other proxy type technologies though they are obviously subject to local caching problems.
Creation of temporary tables on tables where other methods are unsuitable. Can be particularly useful when you have a resource intensive query that may be executed many times but whose actual data is rarely updated.
Finally, you can conduct tuning of the actual database server software itself. Something that you need to learn especially if you are working with limited resources and you aren't actually able to alter the database itself (can often be the case if the work has been outsourced to an external contractor).
Below is the result of one of these benchmarking/tuning scripts.
[root@db1 ~]# ./tuning-primer.sh
Using login values from ~/.my.cnf
- INITIAL LOGIN ATTEMPT FAILED -
Testing for stored webmin passwords: None Found
Could not auto detect login info!
Would you like to provide a different socket?: [y/N]
Do you have your login handy ? [y/N] : y
Would you like me to create a ~/.my.cnf file for you? [y/N] :
-- MYSQL PERFORMANCE TUNING PRIMER --
- By: Matthew Montgomery -
MySQL Version 5.0.45-log i686
Uptime = 0 days 1 hrs 20 min 24 sec
Avg. qps = 58980
Total Questions = 284523063
Threads Connected = 27
Warning: Server has not been running for at least 48hrs.
It may not be safe to use these recommendations
To find out more information on how each of these
runtime variables effects performance visit:
for info about MySQL's Enterprise Monitoring and Advisory Service
The slow query log is enabled.
Current long_query_time = 10 sec.
You have 93 out of 284727673 that take longer than 10 sec. to complete
Your long_query_time may be too high, I typically set this under 5 sec.
BINARY UPDATE LOG
The binary update log is NOT enabled.
You will not be able to do point in time recovery
Current thread_cache_size = 8
Current threads_cached = 3
Current threads_per_sec = 0
Historic threads_per_sec = 0
Your thread_cache_size is fine
Current max_connections = 100
Current threads_connected = 27
Historic max_used_connections = 56
The number of used connections is 56% of the configured maximum.
Your max_connections variable seems to be fine.
Max Memory Ever Allocated : 1.08 G
Configured Max Per-thread Buffers : 1.20 G
Configured Max Global Buffers : 426 M
Configured Max Memory Limit : 1.61 G
Physical Memory : 3.95 G
Max memory limit seem to be within acceptable norms
Current MyISAM index space = 93 M
Current key_buffer_size = 384 M
Key cache miss rate is 1 : 162
Key buffer free ratio = 85 %
Your key_buffer_size seems to be fine
Query cache is enabled
Current query_cache_size = 32 M
Current query_cache_used = 7 M
Current query_cache_limit = 1 M
Current Query cache Memory fill ratio = 24.79 %
Current query_cache_min_res_unit = 4 K
Your query_cache_size seems to be too high.
Perhaps you can use these resources elsewhere
MySQL won't cache query results that are larger than query_cache_limit in size
Current sort_buffer_size = 2 M
Current read_rnd_buffer_size = 7 M
Sort buffer seems to be fine
Current join_buffer_size = 132.00 K
You have had 182 queries where a join could not use an index properly
You should enable "log-queries-not-using-indexes"
Then look for non indexed joins in the slow query log.
If you are unable to optimize your queries you may want to increase your
join_buffer_size to accommodate larger joins in one pass.
Note! This script will still suggest raising the join_buffer_size when
ANY joins not using indexes are found.
OPEN FILES LIMIT
Current open_files_limit = 1134 files
The open_files_limit should typically be set to at least 2x-3x
that of table_cache if you have heavy MyISAM usage.
Your open_files_limit value seems to be fine
Current table_cache value = 512 tables
You have a total of 26 tables
You have 81 open tables.
The table_cache value seems to be fine
Current max_heap_table_size = 16 M
Current tmp_table_size = 32 M
Of 319 temp tables, 3% were created on disk
Effective in-memory tmp_table_size is limited to max_heap_table_size.
Created disk tmp tables ratio seems fine
Current read_buffer_size = 1 M
Current table scan ratio = 44612 : 1
You have a high ratio of sequential access requests to SELECTs
You may benefit from raising read_buffer_size and/or improving your use of indexes.
Current Lock Wait ratio = 1 : 5
You may benefit from selective use of InnoDB.
If you have long running SELECT's against MyISAM tables and perform
frequent updates consider setting 'low_priority_updates=1'
If you have a high concurrency of inserts on Dynamic row-length tables
consider setting 'concurrent_insert=2'.
YouTube News Downloader: - if you're on a Not Really Broadband Network (NRBN) you somehow need to deal with networking/buffering issu...
- many people are often critical of choices that are often made in other countries. If you think carefully about it, massive loops have bee...
If you're like many others on the planet you probably can't be bothered purchasing a new piece of software in order to test an appl...