Collecting statistics for large established database

Giganews Newsgroups
Subject: Collecting statistics for large established database
Posted by:  vsevolod afanassiev (vsevolod.afanassi…@gmail.com)
Date: Wed, 19 Aug 2009

Let's say we have a relatively large database (2 TB or more). It is
being used by an application that went live more than 3 years ago, so
historic (transactional) tables contain more than 1000 days worth of
data. There has been no changes to the table structure, i.e. all
tables are the same as 3 years ago. The tables are not partitioned.
This is classic OLTP application where most tables get populated by
online users (call centre staff). One year we may have 1000 people in
call centres, another year we may have 2000, but we are not going to
have one million. So transactional tables experience linear growth.

Question: why do we need to collect table/index statistics on a
regular basis (i.e. daily or weekly?) At the end of the day the
optimizer does not care about number of records in a table, only
relative values are important. Let's say we have table A with
1,000,000 rows and table B with 1,000 rows. One year later  table A
may have 1,500,000 rows and table B may have 2,000 rows. It shouln't
have any impact on the plan. Yes it is possible that as a result of
table growth a different plan may become slightly more efficient, but
in real-life scenario plan stability is more important.

I think the only situation where regular statistics collection is
important is when plan depends on MIN or MAX value in a column. A
purge job it may be executing something like

DELETE FROM TABLEA WHERE CREATE_DATE < SYSDATE - 180

If column CREATE_DATE is indexed then Oracle needs to know MIN
(CREATE_DATE) to decide whether to use this index or not. But this is
rare situation. It may be simple to add hints to a few statements that
rely on MIN / MAX value than collect statistics every day and risk
plan instability.

In 10g default job for statistics collection relies on table
monitoring. Tables get analyzed when total number of inserted /
updated / deleted rows exceeds 10%. There are two problems with this
approach: (1) it makes statistics collection unpredictable from DBA
perspective (2) the statement updating internal table OBJECT_USAGE is
responsible for significant percentage (around 30%) of soft parses.

It seems to me that for established applications the best approach is
to stop statistics collection (except for partitioned tables where new
partitions are created every day/week/month etc). It will guarantee
plan stability. It is possible that performance of some statements
will deteriorate, but it will happen gradually. With regular
statistics collection there is always treat of optimizer choosing sub-
optimal plan. I have seen quite a few of them, typically it involves
switch from INDEX RANGE SCAN to FAST FULL INDEX SCAN, switch from
nested loops to hash join, etc - often much slower than the original
plan.

Replies