Event Log Analysis

By Johan Appelgren

Analyzing Event Logs with Oracle OLAP

In addition to being used for business intelligence, online analytical processing is a powerful technology for the analysis of event logs of any kind, as found in e.g. call center, CRM or fault statistics applications.

In order to not disclose any real-world client information, this article uses the event log service in Microsoft Windows 2000 as the source for event data, but it should be obvious how this simple example extends to more realistic scenarios. The example shows how one can answer questions such as

Event Log

The event log service records system and application events from software and hardware components. Events are categorized by time, date, severity, originating component and an arbitrary numerical type code as well as a number of additional categories not considered in this example. The event log tool can be used to display the event log:


Figure 1. Event Log Tool displaying system events (in Swedish :-)

The tool is capable of dumping the event log in various formats, including comma-separated text (fragment):

Type,Date,Time,Source,Category,Event,User,Computer
Information,5/10/2004,8:02:50 AM,eventlog,None,6009,N/A,IXTREME
Information,5/10/2004,8:02:50 AM,eventlog,None,6005,N/A,IXTREME
Information,5/10/2004,3:13:59 PM,eventlog,None,6006,N/A,IXTREME
Information,5/10/2004,3:15:23 PM,eventlog,None,6009,N/A,IXTREME
Information,5/10/2004,3:15:23 PM,eventlog,None,6005,N/A,IXTREME
Warning,5/10/2004,4:19:25 PM,Print,None,20,SYSTEM,IXTREME
Information,5/10/2004,4:19:25 PM,Print,None,15,Administrator,IXTREME
Information,5/10/2004,4:19:25 PM,Print,None,15,Administrator,IXTREME

Data Loading

Each event can be represented as a row in a table with the definition below. Since the table can be expected to grow to enormous size, it needs to be partitioned in order to simplify pruning of obsolete data. (The actual partitioning values need to be determined dynamically but more partitions can easily be added later by splitting the "other" partition.)
CREATE TABLE log (
    type VARCHAR2(13) NOT NULL,
    datetime TIMESTAMP NOT NULL,
    source VARCHAR2(30) NOT NULL,
    category VARCHAR2(10) NOT NULL,
    event NUMBER NOT NULL,
    "USER" VARCHAR2(30) NOT NULL,
    computer VARCHAR2(30) NOT NULL
)
PARTITION BY RANGE (datetime) (
    PARTITION other VALUES LESS THAN (MAXVALUE)
);
Due to the regular format of the dumped data, data loading can be performed through the use of SQL Loader, given the following control file:
LOAD DATA
APPEND INTO TABLE log
FIELDS TERMINATED BY "," (
	type, 
	dat BOUNDFILLER,
	time BOUNDFILLER,
	datetime EXPRESSION "TO_TIMESTAMP(:dat||:time, 'MM/DD/YYYYHH:MI:SS AM')",
	source,
	category,
	event,
	"USER",
	computer
)

Dimensions and Measures

In the example, the measures (or facts) are identified by a calendar date and an event identifier (which is composed of a source string and a number).

Dimensions

Ignoring other categories such as "user" and "computer", the dimensions of the OLAP cube will be time and event. Following the guidelines in the Oracle documentation, the time dimension table is defined as:
CREATE TABLE days (
       day DATE NOT NULL PRIMARY KEY,
       day_timespan NUMBER NOT NULL,
       month DATE NOT NULL,
       month_timespan NUMBER NOT NULL,
       quarter DATE NOT NULL,
       quarter_timespan NUMBER NOT NULL,
       year DATE NOT NULL,
       year_timespan NUMBER NOT NULL
);

CREATE or replace TRIGGER days_before_insert
BEFORE INSERT ON days
FOR EACH ROW
BEGIN
	:NEW.day_timespan := 1;
	:NEW.month := LAST_DAY(:NEW.day); -- last day of month
	:NEW.month_timespan := EXTRACT(DAY FROM :NEW.month);
	:NEW.quarter := TRUNC(ADD_MONTHS(:NEW.day, 3), 'Q') - 1;
	:NEW.quarter_timespan := :NEW.quarter - ADD_MONTHS(:NEW.quarter, -3);
	:NEW.year := TRUNC(ADD_MONTHS(:NEW.day, 12), 'YY') - 1;
	:NEW.year_timespan := :NEW.year - ADD_MONTHS(:NEW.year, -12);
END;
/
The days_before_insert trigger is merely a convenience procedure for generating the required month-, quarter- and year-related fields.

Note: The way Oracle OLAP is designed makes the selected time resolution affect the values stored in the dimension table: A given month, quarter or year is here identified by its last day. If the resolution is changed to hours, a year will have to be identified by its last hour.
It would have been ever so much versatile if Oracle had made a given time period be identified by its exclusive upper limit, where e.g. 1 Jan, 2004 00:00 represents the year 2003, new year's eve 2003, the last minute of 2003 or even the last millisecond of 2003!

The other dimension table, events, follows:

CREATE TABLE events (
       id VARCHAR2(10) NOT NULL PRIMARY KEY,
       event NUMBER NOT NULL,
       source VARCHAR2(30) NOT NULL,
       first TIMESTAMP NOT NULL,
       UNIQUE (event, source)
);

CREATE SEQUENCE ids;

CREATE or replace TRIGGER events_before_insert
BEFORE INSERT ON events
FOR EACH ROW
BEGIN
	SELECT ids.nextval INTO :NEW.id FROM DUAL;
END;
/
The surrogate key is necessary since certain features of Ora OLAP (BI Beans) seem to break when encountering compound or non-character primary keys.

Measures

The actual measures are the number of errors, warnings and informations having occurred for an event id during the day as well as the number of first occurrences ever of the event id (will be one only once, zero thereafter).
CREATE TABLE measures (
       day DATE NOT NULL REFERENCES days(day),
       id VARCHAR2(10) NOT NULL REFERENCES events(id),
       errors NUMBER NOT NULL,
       warnings NUMBER NOT NULL,
       informations NUMBER NOT NULL,
       firsts NUMBER NOT NULL
)
PARTITION BY RANGE (day) (
    PARTITION other VALUES LESS THAN (MAXVALUE)
);
CREATE INDEX measures_day ON measures(day, id) LOCAL;
ALTER TABLE measures ADD PRIMARY KEY(day, id);

Data Transformation

With the table definitions in place, the transformation consists of the following steps:
  1. Select all events rows from the log table that have not been transformed yet into a temporary table.
  2. Using the temporary table, update the events table with event types which have not been encountered previously, and figure out the first time they where encountered.
  3. Make sure all days between the previous last day and the last day in the temporary table are in the days table.
  4. Finally, have the two preceding steps ensured that no foreign key constraints will be violated, sum the measures grouped by day and event type into the measures table.
INSERT INTO tmp
SELECT * FROM log 
WHERE datetime >= (SELECT NVL(MAX(day) + 1, TO_DATE(1, 'J')) FROM measures);

INSERT INTO events (source, event, first)
SELECT source, event, MIN(datetime) first FROM tmp
NATURAL LEFT OUTER JOIN events WHERE first IS NULL
GROUP BY source, event ORDER BY first;

DECLARE
	d0 DATE;
	d1 DATE;
	d DATE;
BEGIN
	SELECT MAX(day) + 1 INTO d0 FROM days;
	IF d0 IS NULL THEN SELECT TRUNC(MIN(datetime), 'DD') INTO d0 FROM tmp; END IF;
	SELECT TRUNC(MAX(datetime), 'DD') INTO d1 FROM tmp;
	d := d0;
	WHILE d <= d1 LOOP
	      INSERT INTO days (day) VALUES (d);
	      d := d + 1;
	END LOOP;
END;
/

INSERT INTO measures
SELECT TRUNC(datetime, 'DD') day,
id,
SUM(DECODE(type, 'Error', 1, 0)) errors,
SUM(DECODE(type, 'Warning', 1, 0)) warnings,
SUM(DECODE(type, 'Information', 1, 0)) informations,
MAX(DECODE(first, datetime, 1, 0)) firsts
FROM tmp NATURAL JOIN events
GROUP BY TRUNC(datetime, 'DD'), id;

EXECUTE cwm2_olap_metadata_refresh.mr_refresh;
COMMIT;

OLAP

Create the dimensions and the cube using either OEM or CWM commands.
  1. Create the time dimension with four levels: day-month-quarter-year.
  2. Create the event dimension with two levels: id-source.
  3. Create a cube dimensioned by time and event with measures errors, warnings, informations and firsts
Remember to execute mr_refresh and your cube will be visible from Cube Viewer and BI Beans. Enjoy!

A Cul-de-Sac

I have tried to change the time resolution to hours (changing from the days table to an hours table and making the aforementioned changes to the time level identifiers) and even augmented the CWM metadata to honor 'HOUR' in addition to 'DAYS' etc.

But it will not work perfectly. Especially BI Beans seem to be having problems with non-standard time levels :-(

To be continued...

Download

The files available below make you able to run the examples if you have at least Oracle 9i with Patch Set 4. All but the last one make changes in the current schema.
Copyright © 2004 by Johan Appelgren AB. All rights reserved.