Thursday, September 23, 2010

About: Count(*)

I just thought to mention 4 imp points about COUNT(*) function that we use very regularly on a table/data set.  I have tested this on 11gR2.


1. How Count(*) works :

This is a function works on a set of rows- to count the no of ALL rows in a data set. When we ask Oracle to get count of the no of rows from a table using Count(*),Oracle will use the index on the primary key (or) index on non-null column of the table to get the count .If there are indexes only (unique /non-unique) on nullable columns (columns with no constraint ' NOT NULL' ) (OR) if there are no indexes at all on the table-Oracle will FULL SCAN the table to get the count of rows.
Basically it does not matter whether the index is unique or non-unique but the 'NOT NULL' constraint matters here when using Count(*).Oracle will use the index created on a column with non-null constraint when to find the count using Count(*) but it is different when we want to find the count of a coloumn values-such as Count(emp_name) [assume emp_name is not primary key]  it will simply use the index on emp_name (whether the coloumn is defined as null or not null) if it is there else will do the full scan though there is primary key index.I believe all this is bit of 'common sence' related stuff.

 
SQL> select count(*) from emp;
COUNT(*)
----------
65535
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 3050108756
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT        | | 1 | 41 (0) | 00:00:01 |
| 1 | SORT AGGREGATE            | | 1 | | |
| 2 | INDEX FAST FULL SCAN | SYS_C0024562 | 65535 | 41 (0)| 00:00:01 |
------------------------------------------------------------------------------


---Disable the constraint.
SQL> ALTER TABLE emp
2 disable CONSTRAINT SYS_C0024562;
SQL> select count(*) from emp;
COUNT(*)
----------
65535
Elapsed: 00:00:00.03
Execution Plan
----------------------------------------------------------
Plan hash value: 2083865914
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 102 (0)| 00:00:02 |
| 1 | SORT AGGREGATE    | | 1 | | |
| 2 | TABLE ACCESS FULL | EMP | 65535 | 102 (0)| 00:00:02 |
-------------------------------------------------------------------
Note that : The Optimizer does not take the help of indexes created on nullable coloumns to get the count of ALL rows in a table .



2. SQL Rewrite :

The optimizer is more intelligent sometimes than us . See below I have written a query :
select count(*) from emp a , dept b where a.dept_no=b.dept_no
(dept_no is fk in the table emp and pk in the table dept)
The Optimizer then said to me that -'Hey , you are kidding , you really want the no of rows in just the emp table only - no need of join with parent'. The optimizer has simply re-written the query as
select count(dept_no) from emp  (Note - it is not count(*) but count(dept_no)-join key)


See the below two plans:


SQL> select count(*) from emp a , dept b where a.dept_no=b.dept_no;
COUNT(*)
----------
65535
Elapsed: 00:00:00.03
Execution Plan
----------------------------------------------------------
Plan hash value: 2083865914
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT   | | 1 | 3 | 103 (1)| 00:00:02 |
| 1 | SORT AGGREGATE        | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL  | EMP | 65535 | 191K| 103 (1)| 00:00:02 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."DEPT_NO" IS NOT NULL)


SQL> select count(dept_no) from emp;
COUNT(DEPT_NO)
--------------
65535
Elapsed: 00:00:00.03
Execution Plan
----------------------------------------------------------
Plan hash value: 2083865914
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 102 (0)| 00:00:02 |
| 1 | SORT AGGREGATE     | | 1 | 3 | | |
| 2 | TABLE ACCESS FULL  | EMP | 65535 | 191K| 102 (0)| 00:00:02 |
---------------------------------------------------------------------------
If there is an index (unique/non-unique) on the join key (dept_no)- It will be used by the Optimizer to get the count 

One more point is that -
select count(*) from (select * from emp a , dept b where a.dept_no=b.dept_no)
is same as 
select count(*) from emp a, dept b where a.dept_no=b.dept_no
The Optimizer serves them in the same way.


3. Count(any_column) :
(This is column level ) 

select count(dept_no) from emp;
this will get the count of rows with dept_no column not NULL -the Optimizer will use the index to get the count if exists on that column else do the full scan.


SQL> create index dept_no_idx on emp(dept_no);
Index created.
SQL> select count(dept_no) from emp;
COUNT(DEPT_NO)
--------------
65535
Elapsed: 00:00:00.03
Execution Plan
----------------------------------------------------------
Plan hash value: 2843651152
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT       | | 1 | 3 | 36 (0)| 00:00:01 |
| 1 | SORT AGGREGATE           | | 1 | 3 | | |
| 2 | INDEX FAST FULL SCAN | DEPT_NO_IDX | 65535 | 191K| 36 (0)| 00:00:01 |
-------------------------------------------------------------------------------------


4 . Count(*),Count(1),Count(rowid),Count(pk) --> Which is faster ??


The Fact is -All are SAME internally , The Optimizer is smart enough to treat them 'same'.
See below one by one - the COST is same (logical i/o) in all the four cases .
Just use which suites you 'semantically' correct in your context.
 I prefer count(*) - when I need the no of all the rows in a set.


Count(*)
----------


SQL> select count(*) from emp;
COUNT(*)
----------
65535
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 3050108756
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT        | | 1 | 41 (0)| 00:00:01 |
| 1 | SORT AGGREGATE            | | 1 | | |
| 2 | INDEX FAST FULL SCAN | SYS_C0024562 | 65535 | 41 (0)| 00:00:01 |
------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
153 consistent gets
0 physical reads
0 redo size
337 bytes sent via SQL*Net to client
337 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


Count(rowid)
----------------


SQL> select count(rowid) from emp;
COUNT(ROWID)
------------
65535
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 3050108756
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT     | | 1 | 12 | 41 (0)| 00:00:01 |
| 1 | SORT AGGREGATE          | | 1 | 12 | | |
| 2 | INDEX FAST FULL SCAN| SYS_C0024562 | 65535 | 767K| 41 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
153 consistent gets
0 physical reads
0 redo size
341 bytes sent via SQL*Net to client
337 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


Count(1)
------------


SQL> select count(1) from emp;
COUNT(1)
----------
65535
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 3050108756
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT     | | 1 | 41 (0)| 00:00:01 |
| 1 | SORT AGGREGATE          | | 1 | | |
| 2 | INDEX FAST FULL SCAN| SYS_C0024562 | 65535 | 41 (0)| 00:00:01 |
------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
153 consistent gets
0 physical reads
0 redo size
337 bytes sent via SQL*Net to client
337 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


Count(pk)
-------------


SQL> select count(emp_no) from emp;
COUNT(EMP_NO)
-------------
65535
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 3050108756
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT        | | 1 | 41 (0)| 00:00:01 |
| 1 | SORT AGGREGATE            | | 1 | | |
| 2 | INDEX FAST FULL SCAN| SYS_C0024562 | 65535 | 41 (0)| 00:00:01 |
------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
153 consistent gets
0 physical reads
0 redo size
342 bytes sent via SQL*Net to client
337 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

No comments:

Post a Comment