Binh Nguyen's Blog: Database Performance Optimisation Techniques

Sunday, July 17, 2011

Database Performance Optimisation Techniques

A while back I was working on a project which involved database underperformance of a web based application. Obviously, there are many ways through which to achieve this. Some of these methods are outlined below and include modern as well as well as traditional techniques.

Of late, there has been a trend towards so called non-relational/NoSQL type databases which scale horizontaly rather than vertically. The reasoning behind these thoughts seem to be based on a paper by Edward Schaffer that seems to imply that since the UNIX/Linux shell is capable of producing almost any result required through piping of various commands and since it doesn't have to be abstracted to run across multiple operating systems it should theoretically be able to build a high performance operating system off of this particular base. The exact quote is as follows.

"The UNIX file structure is the fastest and most readily-available database engine ever built: directories may be viewed as catalogs and tables as plain ASCII files. Commands are common UNIX utilities, such as grep, sed and awk. Nothing should be reinvented.

NoSQL was born with these ideas in mind: getting the most from the UNIX system, using some commands that glue together various standard tools. Although NoSQL is a good database system, this is not a panacea for all you problems. If you have to deal with a 10 gigabytes table that must be updated each second from various clients, NoSQL doesn't work for you since it lacks of performance on very big tables, and on frequent updates you must be in real time. For this case, I suggest you use a stronger solution based on Oracle, DB2 or such packages on a Linux cluster, AS/400 or mainframes.

However, if you have a web site containing much information and more reading occurs than writing, you will be surprised how fast is it. NoSQL (pronounced noseequel, as the author suggests) derives most of its code from the RDB database developed at RAND Organization, but more commands have been built in order to accomplish more tasks."

http://www.linuxjournal.com/article/3294

http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772

http://nosql-database.org/

http://nosql.mypopescu.com/

http://nosql.mypopescu.com/post/734816227/nosql-benchmarks-and-performance-evaluations

http://codemonkeyism.com/dark-side-nosql/

http://en.wikipedia.org/wiki/Column-oriented_DBMS

http://en.wikipedia.org/wiki/MapReduce

http://en.wikipedia.org/wiki/Apache_Hadoop

http://www.monash.com/

http://www.dbms2.com/

Obviously, there are problems with using this relatively these relatively immature technologies. Research seems to indicate that depending on the implementation being used they can be hard to modify, ad-hoc reporting can be difficult to produce, and that there is a general lack of expertise within the IT community in this field since almost organisations have no need to switch towards its use and existing relational type databases offer good enough performance as well as a mature toolset and other reporting capabilities.

Although I haven't personally experimented with in memory databases there's no questioning their usefullness. Obviously, a database that runs purely from RAM is going to be faster from both an access and transfer perspective than one that requires the use of secondary storage (even though this may change with the advancement of SSD technology and RAM based cards that have somehow been modified to act as secondary storage).

http://en.wikipedia.org/wiki/In-memory_database

http://hsqldb.org/

"Inefficient schemas. Adding indexes can help improve performance. However, their impact may be limited if your queries are inefficient because of poor table design that results in too many join operations or in inefficient join operations. Schema design is a key performance factor. It also provides information to the server that may be used to optimize query plans. Schema design is largely a tradeoff between good read performance and good write performance. Normalization helps write performance. Denormalization helps read performance."

http://msdn.microsoft.com/en-us/library/ff647793.aspx

http://stackoverflow.com/questions/30474/database-schema-diagram-design-tool

Due to the fact that databases are 'abstractions' that run on top of existing software/hardware the usual performance bottlenecks should be looked at. These include, hardware specification, RAID level.operating system (and tuning options), and even network issues can play a factor especially if using distributed database being used and geography starts to become a problem. Strategies in dealing with these problems are outlined in the, 'Building a Cloud Computer Service' document, http://sites.google.com/site/dtbnguyen/

Obviously the type of join as well as the number of joins can impact on performance.

http://stackoverflow.com/questions/3252272/sql-join-types-and-performance-cross-vs-inner

http://stackoverflow.com/questions/1018822/inner-join-versus-where-clause-any-difference

http://postgresql.1045698.n5.nabble.com/Questions-on-query-planner-join-types-and-work-mem-td2256549.html

Use of indexes (whether they are partial, complete over table, compound, full-text, and/or column based only).

http://use-the-index-luke.com/

While there are two main database engines (InnoDB, MyISAM) there are also others that cater specifically to certain types of transactions.

"Yes, each MySQL storage engine will perform differently but the result will depend on the type of transaction. Reads, Writes, Transaction size, HA requirements are a few of the factors that would effect performance for each storage engine choice.

Prior to MySQL 5.5 the default storage engine is MyISAM which performs great on reads. With MySQL 5.5 the default storage engine is InnoDB which has improved performance over prior releases and allows a nice mix of performance and ACID compliance. These are the 2 most popular storage engines but there are other storage engines like memory, blackhole, ndb and others that offer a unique value for specific applications.

Generally speaking the safe bet is to use InnoDB for most of your tables and revisit the choice after you hit a performance bottleneck and are looking to tune things. The nice thing is it will be a simple task to change the storage engine for a table and provided the table size is not too great it shouldn't take too long to change and test each storage engine for your specific application."

http://www.quora.com/Is-there-any-performance-impact-of-using-different-engines-for-tables-in-a-single-database-in-MYSQL

While I have never personally used database proxy technology it is clear that they have their place. Like other proxy type technologies though they are obviously subject to local caching problems.

http://forge.mysql.com/wiki/MySQL_Proxy

http://support.microsoft.com/kb/216415

Creation of temporary tables on tables where other methods are unsuitable. Can be particularly useful when you have a resource intensive query that may be executed many times but whose actual data is rarely updated.

http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html

http://forge.mysql.com/tools/tool.php?id=1

http://www.tutorialspoint.com/mysql/mysql-temporary-tables.htm

Finally, you can conduct tuning of the actual database server software itself. Something that you need to learn especially if you are working with limited resources and you aren't actually able to alter the database itself (can often be the case if the work has been outsourced to an external contractor).

http://forums.mysql.com/read.php?24,92131,92131

http://www.mysql.com/training/courses/performance_tuning.html

http://www.mysqlperformanceblog.com/2008/03/21/mysql-file-system-fragmentation-benchmarks/

http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/

http://forge.mysql.com/wiki/Recommended_Books

http://www.day32.com/MySQL/

http://weblogs.sqlteam.com/mladenp/archive/2007/11/20/Free-SQL-Server-tools-that-might-make-your-life-a.aspx

http://marcalff.blogspot.com/2011/06/performance-schema-overhead-tuning.html

http://msdn.microsoft.com/en-us/library/ms173494.aspx

http://en.wikipedia.org /wiki/Comparison_of_relational_database_management_systems

http://en.wikipedia.org/wiki/GoogleFS

Below is the result of one of these benchmarking/tuning scripts.

[root@db1 ~]# ./tuning-primer.sh

Using login values from ~/.my.cnf

- INITIAL LOGIN ATTEMPT FAILED -

Testing for stored webmin passwords: None Found

Could not auto detect login info!

Found Sockets:

/var/lib/mysql/mysql.sock

Using: /var/lib/mysql/mysql.sock

Would you like to provide a different socket?: [y/N]

Do you have your login handy ? [y/N] : y

User: root

Password:

Would you like me to create a ~/.my.cnf file for you? [y/N] :

-- MYSQL PERFORMANCE TUNING PRIMER --

- By: Matthew Montgomery -

MySQL Version 5.0.45-log i686

Uptime = 0 days 1 hrs 20 min 24 sec

Avg. qps = 58980

Total Questions = 284523063

Threads Connected = 27

Warning: Server has not been running for at least 48hrs.

It may not be safe to use these recommendations

To find out more information on how each of these

runtime variables effects performance visit:

http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html

Visit http://www.mysql.com/products/enterprise/advisors.html

for info about MySQL's Enterprise Monitoring and Advisory Service

SLOW QUERIES

The slow query log is enabled.

Current long_query_time = 10 sec.

You have 93 out of 284727673 that take longer than 10 sec. to complete

Your long_query_time may be too high, I typically set this under 5 sec.

BINARY UPDATE LOG

The binary update log is NOT enabled.

You will not be able to do point in time recovery

See http://dev.mysql.com/doc/refman/5.0/en/point-in-time-recovery.html

WORKER THREADS

Current thread_cache_size = 8

Current threads_cached = 3

Current threads_per_sec = 0

Historic threads_per_sec = 0

Your thread_cache_size is fine

MAX CONNECTIONS

Current max_connections = 100

Current threads_connected = 27

Historic max_used_connections = 56

The number of used connections is 56% of the configured maximum.

Your max_connections variable seems to be fine.

MEMORY USAGE

Max Memory Ever Allocated : 1.08 G

Configured Max Per-thread Buffers : 1.20 G

Configured Max Global Buffers : 426 M

Configured Max Memory Limit : 1.61 G

Physical Memory : 3.95 G

Max memory limit seem to be within acceptable norms

KEY BUFFER

Current MyISAM index space = 93 M

Current key_buffer_size = 384 M

Key cache miss rate is 1 : 162

Key buffer free ratio = 85 %

Your key_buffer_size seems to be fine

QUERY CACHE

Query cache is enabled

Current query_cache_size = 32 M

Current query_cache_used = 7 M

Current query_cache_limit = 1 M

Current Query cache Memory fill ratio = 24.79 %

Current query_cache_min_res_unit = 4 K

Your query_cache_size seems to be too high.

Perhaps you can use these resources elsewhere

MySQL won't cache query results that are larger than query_cache_limit in size

SORT OPERATIONS

Current sort_buffer_size = 2 M

Current read_rnd_buffer_size = 7 M

Sort buffer seems to be fine

JOINS

Current join_buffer_size = 132.00 K

You have had 182 queries where a join could not use an index properly

You should enable "log-queries-not-using-indexes"

Then look for non indexed joins in the slow query log.

If you are unable to optimize your queries you may want to increase your

join_buffer_size to accommodate larger joins in one pass.

Note! This script will still suggest raising the join_buffer_size when

ANY joins not using indexes are found.

OPEN FILES LIMIT

Current open_files_limit = 1134 files

The open_files_limit should typically be set to at least 2x-3x

that of table_cache if you have heavy MyISAM usage.

Your open_files_limit value seems to be fine

TABLE CACHE

Current table_cache value = 512 tables

You have a total of 26 tables

You have 81 open tables.

The table_cache value seems to be fine

TEMP TABLES

Current max_heap_table_size = 16 M

Current tmp_table_size = 32 M

Of 319 temp tables, 3% were created on disk

Effective in-memory tmp_table_size is limited to max_heap_table_size.

Created disk tmp tables ratio seems fine

TABLE SCANS

Current read_buffer_size = 1 M

Current table scan ratio = 44612 : 1

You have a high ratio of sequential access requests to SELECTs

You may benefit from raising read_buffer_size and/or improving your use of indexes.

TABLE LOCKING

Current Lock Wait ratio = 1 : 5

You may benefit from selective use of InnoDB.

If you have long running SELECT's against MyISAM tables and perform

frequent updates consider setting 'low_priority_updates=1'

If you have a high concurrency of inserts on Dynamic row-length tables

consider setting 'concurrent_insert=2'.

- as usual thanks to all of the individuals and groups who purchase and use my goods and services
http://sites.google.com/site/dtbnguyen/
http://dtbnguyen.blogspot.com.au/

Binh Nguyen's Blog

000webhost

Sunday, July 17, 2011

Database Performance Optimisation Techniques

3D Printing Background, Random Stuff, and More