Generating Histograms
Today I have been trying to generate a histogram for a table column as part of a larger statistics project. In fact it is quite a straightforward process although some of the syntax is a bit tricky.
The table I am working with is called GP.CAR and contains a list of all cars that have raced in Formula 1 since 1961. I am trying to build a frequency histogram based on the DRIVER_KEY column which is a CHAR(4). The histogram will contain the following data:
| MSCH | 91 |
| APRO | 51 |
| ASEN | 41 |
| NMAN | 31 |
| JSTE | 27 |
| NLAU | 25 |
| JCLA | 25 |
| NPIQ | 23 |
| FALO | 22 |
| DHIL | 22 |
| MHAK | 20 |
The histogram can be built using the following code:
DECLARE
l_statrec dbms_stats.statrec;
l_charvals dbms_stats.chararray;
l_bkvals dbms_stats.numarray;
BEGIN
NULL;
l_charvals := dbms_stats.chararray ();
l_charvals.extend (11);
l_bkvals := dbms_stats.numarray ();
l_bkvals.extend (11);
l_charvals(1) := 'MSCH'; l_bkvals(1) := 91;
l_charvals(2) := 'APRO'; l_bkvals(2) := 51;
l_charvals(3) := 'ASEN'; l_bkvals(3) := 41;
l_charvals(4) := 'NMAN'; l_bkvals(4) := 31;
l_charvals(5) := 'JSTE'; l_bkvals(5) := 27;
l_charvals(6) := 'NLAU'; l_bkvals(6) := 25;
l_charvals(7) := 'JCLA'; l_bkvals(7) := 25;
l_charvals(8) := 'NPIQ'; l_bkvals(8) := 23;
l_charvals(9) := 'FALO'; l_bkvals(9) := 22;
l_charvals(10) := 'DHIL'; l_bkvals(10) := 22;
l_charvals(11) := 'MHAK'; l_bkvals(11) := 20;
l_statrec.epc := 11;
l_statrec.bkvals := l_bkvals;
l_statrec.eavs := 0;
DBMS_STATS.PREPARE_COLUMN_VALUES (l_statrec,l_charvals);
DBMS_STATS.SET_COLUMN_STATS
(
ownname => 'GP',
tabname => 'CAR',
colname => 'DRIVER_KEY',
distcnt => 11,
density => 0.00210084,
nullcnt => 0,
srec => l_statrec,
avgclen => 4
);
END;
/
I have not yet worked out how to calculate the density. In the above example I used the value in DBA_TAB_COLUMNS generated by a previous GATHER_TABLE_STATS operation.
In order to test that the histogram had been correctly generated, I used the following code:
DECLARE
l_column_name VARCHAR2(30);
l_endpoint_number NUMBER;
l_endpoint_value NUMBER;
l_cardinality NUMBER;
CURSOR c1 IS
SELECT column_name
FROM dba_tab_columns
WHERE owner = 'GP'
AND table_name = 'CAR'
AND histogram = 'FREQUENCY'
ORDER BY column_id;
CURSOR c2 (p_column_name VARCHAR2) IS
SELECT endpoint_value,endpoint_number,
endpoint_number -
NVL (LAG (endpoint_number,1) OVER (ORDER BY endpoint_number),0)
FROM dba_histograms
WHERE owner = 'GP'
AND table_name = 'CAR'
AND column_name = p_column_name
ORDER BY endpoint_number;
BEGIN
OPEN c1;
LOOP
FETCH c1 INTO l_column_name;
EXIT WHEN c1%NOTFOUND;
DBMS_OUTPUT.PUT_LINE ('Column Name: '||l_column_name);
DBMS_OUTPUT.NEW_LINE;
DBMS_OUTPUT.PUT ('Endpoint Value ');
DBMS_OUTPUT.PUT (' ');
DBMS_OUTPUT.PUT ('Endpoint Number');
DBMS_OUTPUT.PUT (' ');
DBMS_OUTPUT.PUT ('Cardinality');
DBMS_OUTPUT.NEW_LINE;
OPEN c2 (l_column_name);
LOOP
FETCH c2 INTO l_endpoint_value,l_endpoint_number,l_cardinality;
EXIT WHEN c2%NOTFOUND;
DBMS_OUTPUT.PUT (LPAD (TO_CHAR(l_endpoint_value),32,' '));
DBMS_OUTPUT.PUT (' ');
DBMS_OUTPUT.PUT (LPAD (TO_CHAR(l_endpoint_number),12,' '));
DBMS_OUTPUT.PUT (' ');
DBMS_OUTPUT.PUT (LPAD (TO_CHAR(l_cardinality),12,' '));
DBMS_OUTPUT.NEW_LINE;
END LOOP;
CLOSE c2;
DBMS_OUTPUT.NEW_LINE;
END LOOP;
CLOSE c1;
END;
/
The values in the ENDPOINT_NUMBER column are cumulative. Therefore I have used a LAG analytic function to calculate the cardinality of each value. The above script generates output similar to the following:
Column Name: DRIVER_KEY
Endpoint Value Endpoint Number Cardinality
-------------- --------------- -----------
40149562867409400000000000000000 91 91
33912840973169800000000000000000 142 51
33918822668491100000000000000000 183 41
40656607447559200000000000000000 214 31
38592008404879700000000000000000 241 27
40654579423238300000000000000000 266 25
38559493043189900000000000000000 291 25
40662755645815800000000000000000 314 23
36478518251134200000000000000000 336 22
35454232704855500000000000000000 358 22
40127236464058400000000000000000 378 20
