The Bix Blog: September 2010

I just thought to mention 4 imp points about COUNT(*) function that we use very regularly on a table/data set. I have tested this on 11gR2.

1. How Count(*) works :

This is a function works on a set of rows- to count the no of ALL rows in a data set. When we ask Oracle to get count of the no of rows from a table using Count(*),Oracle will use the index on the primary key (or) index on non-null column of the table to get the count .If there are indexes only (unique /non-unique) on nullable columns (columns with no constraint ' NOT NULL' ) (OR) if there are no indexes at all on the table-Oracle will FULL SCAN the table to get the count of rows.
Basically it does not matter whether the index is unique or non-unique but the 'NOT NULL' constraint matters here when using Count(*).Oracle will use the index created on a column with non-null constraint when to find the count using Count(*) but it is different when we want to find the count of a coloumn values-such as Count(emp_name) [assume emp_name is not primary key] it will simply use the index on emp_name (whether the coloumn is defined as null or not null) if it is there else will do the full scan though there is primary key index.I believe all this is bit of 'common sence' related stuff.

SQL> select count(*) from emp;
COUNT(*)
----------
65535
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 3050108756
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT       | | 1 | 41 (0) | 00:00:01 |
| 1 | SORT AGGREGATE            | | 1 | | |
| 2 | INDEX FAST FULL SCAN | SYS_C0024562 | 65535 | 41 (0)| 00:00:01 |
------------------------------------------------------------------------------

---Disable the constraint.
SQL> ALTER TABLE emp
2 disable CONSTRAINT SYS_C0024562;
SQL> select count(*) from emp;
COUNT(*)
----------
65535
Elapsed: 00:00:00.03
Execution Plan
----------------------------------------------------------
Plan hash value: 2083865914
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 102 (0)| 00:00:02 |
| 1 | SORT AGGREGATE    | | 1 | | |
| 2 | TABLE ACCESS FULL | EMP | 65535 | 102 (0)| 00:00:02 |
-------------------------------------------------------------------
Note that : The Optimizer does not take the help of indexes created on nullable coloumns to get the count of ALL rows in a table .

2. SQL Rewrite :

The optimizer is more intelligent sometimes than us . See below I have written a query :
select count(*) from emp a , dept b where a.dept_no=b.dept_no
(dept_no is fk in the table emp and pk in the table dept)
The Optimizer then said to me that -'Hey , you are kidding , you really want the no of rows in just the emp table only - no need of join with parent'. The optimizer has simply re-written the query as
select count(dept_no) from emp (Note - it is not count(*) but count(dept_no)-join key)

See the below two plans:

SQL> select count(*) from emp a , dept b where a.dept_no=b.dept_no;
COUNT(*)
----------
65535
Elapsed: 00:00:00.03
Execution Plan
----------------------------------------------------------
Plan hash value: 2083865914
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT   | | 1 | 3 | 103 (1)| 00:00:02 |
| 1 | SORT AGGREGATE        | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL | EMP | 65535 | 191K| 103 (1)| 00:00:02 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."DEPT_NO" IS NOT NULL)

SQL> select count(dept_no) from emp;
COUNT(DEPT_NO)
--------------
65535
Elapsed: 00:00:00.03
Execution Plan
----------------------------------------------------------
Plan hash value: 2083865914
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 102 (0)| 00:00:02 |
| 1 | SORT AGGREGATE     | | 1 | 3 | | |
| 2 | TABLE ACCESS FULL | EMP | 65535 | 191K| 102 (0)| 00:00:02 |
---------------------------------------------------------------------------
If there is an index (unique/non-unique) on the join key (dept_no)- It will be used by the Optimizer to get the count

One more point is that -
select count(*) from (select * from emp a , dept b where a.dept_no=b.dept_no)
is same as
select count(*) from emp a, dept b where a.dept_no=b.dept_no
The Optimizer serves them in the same way.

3. Count(any_column) :
(This is column level )

select count(dept_no) from emp;
this will get the count of rows with dept_no column not NULL -the Optimizer will use the index to get the count if exists on that column else do the full scan.

SQL> create index dept_no_idx on emp(dept_no);
Index created.
SQL> select count(dept_no) from emp;
COUNT(DEPT_NO)
--------------
65535
Elapsed: 00:00:00.03
Execution Plan
----------------------------------------------------------
Plan hash value: 2843651152
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT       | | 1 | 3 | 36 (0)| 00:00:01 |
| 1 | SORT AGGREGATE           | | 1 | 3 | | |
| 2 | INDEX FAST FULL SCAN | DEPT_NO_IDX | 65535 | 191K| 36 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

4 . Count(*),Count(1),Count(rowid),Count(pk) --> Which is faster ??

The Fact is -All are SAME internally , The Optimizer is smart enough to treat them 'same'.
See below one by one - the COST is same (logical i/o) in all the four cases .
Just use which suites you 'semantically' correct in your context.
I prefer count(*) - when I need the no of all the rows in a set.

Count(*)
----------

SQL> select count(*) from emp;
COUNT(*)
----------
65535
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 3050108756
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT       | | 1 | 41 (0)| 00:00:01 |
| 1 | SORT AGGREGATE          | | 1 | | |
| 2 | INDEX FAST FULL SCAN | SYS_C0024562 | 65535 | 41 (0)| 00:00:01 |
------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
153 consistent gets
0 physical reads
0 redo size
337 bytes sent via SQL*Net to client
337 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Count(rowid)
----------------

SQL> select count(rowid) from emp;
COUNT(ROWID)
------------
65535
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 3050108756
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT     | | 1 | 12 | 41 (0)| 00:00:01 |
| 1 | SORT AGGREGATE          | | 1 | 12 | | |
| 2 | INDEX FAST FULL SCAN| SYS_C0024562 | 65535 | 767K| 41 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
153 consistent gets
0 physical reads
0 redo size
341 bytes sent via SQL*Net to client
337 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Count(1)
------------

SQL> select count(1) from emp;
COUNT(1)
----------
65535
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 3050108756
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT     | | 1 | 41 (0)| 00:00:01 |
| 1 | SORT AGGREGATE          | | 1 | | |
| 2 | INDEX FAST FULL SCAN| SYS_C0024562 | 65535 | 41 (0)| 00:00:01 |
------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
153 consistent gets
0 physical reads
0 redo size
337 bytes sent via SQL*Net to client
337 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Count(pk)
-------------

SQL> select count(emp_no) from emp;
COUNT(EMP_NO)
-------------
65535
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 3050108756
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT       | | 1 | 41 (0)| 00:00:01 |
| 1 | SORT AGGREGATE           | | 1 | | |
| 2 | INDEX FAST FULL SCAN| SYS_C0024562 | 65535 | 41 (0)| 00:00:01 |
------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
153 consistent gets
0 physical reads
0 redo size
342 bytes sent via SQL*Net to client
337 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Before posting technical stuff in coming days, I want to mention couple of basic points here .
See below -Tomkyte's Simple Mantra .

You should do it in a single SQL statement if at all possible. And believe it or not, it is almost always possible.
If you can’t do it in a single SQL Statement, do it in PL/SQL—as little PL/SQL as possible! Follow the saying that goes "more code = more bugs, less code = less bugs."
If you can’t do it in PL/SQL, try a Java stored procedure. The times this is necessary are extremely rare nowadays with Oracle9i and above.
If you can’t do it in Java, do it in a C external procedure. This is most frequently the approach when raw speed or using a third-party API written in C is needed.
If you can’t do it in a C external routine, you might want to seriously think about why it is you need to do it.

I feel this is something important specially for a developer .When talking about sql and pl/sql - the point here is to avoid doing something *procedurally* in pl/sql if that can be done in a single SQL statement .
I see many times similar type of the below pseudocode:

For each x in Cursor_C
loop
Select count(*) into v_count from t2 where colm1=x
If v_count >0 then do something .
End loop;

Look at the above code -Firstly - LOOP - this is something to be avoided if it can be ……….because this causes the 'context switches' between SQL engine and Pl/SQL engine . The switch may consume negligible time (approx -0.0001sec) but just assume if the loop repeats 1 million times …definitely the time taken for switches matters. In most of the cases -it is possible to avoid the loops- if we think the results in 'sets' .Secondly -the developer wanted to check the *existence* for a row where colm1=x , he should not have used here count(*).Count(*) is to count the no of rows ,might not be used to check for the existence of a row and moreover the optimizer in this case may need to full scan the table /index to get the count .But in this example -if we use 'exists' operator - it will nicely check for the existence of the row by using a smart technique called- 'semi join' with less effort .This type of statements could be re-written using - EXISTS /IN , MERGE etc depends on the requirement in a single SQL statement in most of the cases .

We can really do magic with SQL if we think the results in 'SETS'...
SO -
THINK IN 'SETS' …………………………..
THINK IN 'SETS '
..
THINK IN 'SETS '
and Avoid procedural logic if it can be done in SQL …………..

If you finally have come to conclusion that something can not be done in SQL - yes , Undoubtedly this is the time to go for PL/SQL but again - here also think in smart way .For example : when I want to retrieve a single row - why should I use explicit cursor ? I use sweet implicit cursor because I feel Oracle introduced implicit cursor just to avoid explicit cursor which may need more code (and which may probably introduce bugs sometimes) and to give some relief to developers when the developer wants to retrieve a single row.
Can I say something here - actually there is no concept of 'implicit' or 'explicit' in oracle database engine point of view, every SQL is an explicit for Oracle database engine. This implicit cursor concept is a pl/sql programming feature -this is a smart technique given to the developers by Pl/sql just to avoid more coding (explicit cursors) when the developer wants to retrieve a single row - once we submit our cursor to pl/sql (implicit or explicit) - every thing will be converted to explicit and sent to Oracle database engine.

Efficient coding depends on how we understand the programming constructs/features and tools and where and when we have to use them ..There is no one for all rule and most of the times it 'depends'... …………………..
I will be discussing some real technical stuff in coming days .

The Bix Blog

Thursday, September 23, 2010

About: Count(*)

Tuesday, September 21, 2010

THINK IN 'SETS'

Blog Archive