Kevin

Kevin

  • NA
  • 1
  • 0

Log query search, tricky with efficient non-overlaping interval

Jul 20 2008 4:36 PM
I'm building a log search function in c# for a certain management app and would like some help on the design how to solve
this, here is my solution (on paper) so far:

The log files are saved to disk, one file per date and one row "per logged event". Each row has a specific logged time
(within the file's date) with different keys and values.

The search filter query is built up via comboboxes and a datagridview. The datagridview is databinded
to a DataTable with columns such as "Date", "Account" etc. The query can contain several search rows with different parameters
specfied. The user should be able to choose date either by saying 'All logs' (taking all logfiles available) or requesting
logs from 'Last X weeks' or by specyfing do DateTime-values.

All this works ok, but the issue comes when to really search for this query. I would like to achieve a way of merging
the filter rows to be as few as possible so that no rows overlaps in date. Why I want this is because
when I loop thourgh the logfiles and each line in the current file, I only want to read the lines once to avoid reducing the
performance and operation time.

So an example;

Row 1:  Date: a|----------------|d , "Orders=3", "Price=30", Type=High"
Row 2:  Date:      b|-----|c       , "Orders=All", "Price=62", Type=Medium"
Row 3:  Date: All logs             , "Orders=4", "Price=20", Type=Medium"

This should be merged to this:

Row 1:  Date: -*|----|a,  "Orders=4", "Price=20", Type=Medium"
Row 2:  Date: a|------|b, "Orders=All", "Price=30,62,20", Type=High,Medium"
Row 3:  Date: b|------|c, "Orders=All", "Price=30,62,20", Type=High, Medium"
Row 4:  Date: c|------|d, "Orders=3,4", "Price=30,20", Type=High, Medium"
Row 5:  Date: d|------|+*, Orders=4", "Price=20", Type=Medium

The '-*' and '+*' indicates the lowest/highest avaiable date if we the 'All logs' specified. Also
note that the 'All'-value have precedence over other values. This gives an non-overlapping interval which
satifies the whole query efficiently.

So, any hints on how to implement such a thing? Or should I consider using some other design approach?