Skip to content

Commit

Permalink
Issue 3342 - RFE logconv.pl should have a replacement in CLI tools
Browse files Browse the repository at this point in the history
Bug description:
	Perl DB is not available in RHEL 10 so the server access log
	analyzer tool (logconv.pl) needs to be ported to python.

Fix description:
	Initial draft of logconv.py, this is a work in progress. This
	commit message will be updated following code review/rework.

Fixes: 389ds#3342

Reviewed by:
  • Loading branch information
jchapma committed Dec 2, 2024
1 parent fd62700 commit c114f06
Showing 1 changed file with 1,879 additions and 0 deletions.
Loading

2 comments on commit c114f06

@progier389
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi James:

A few remarks about the general architecture:

  1. Having an huge function is difficult to read, imho you should better split the if key == /elif by using an array of functions:
  2. Splitting the parsing and the statistics computation allow to reuse the fonction when parsing json completed operation object

so in short it will looks like:

        'RESULT_REGEX': ((re.compile(r'''
                           ...
                        ''', re.VERBOSE), process_result_line),

pending_conns = {}   # A dict of: conn_id --> dict whose 'ops' member is a dict of  opids -> concatenation of matching groups with same opid/connid (so that we can delete all ops associated with the connection (when closing the connection) with del pending_conns[conn_id])   #i.e value is now a tuple of regex, function to process it



    def match_line(self, line, bytes_read):
        for key, (pattern, action) in self.regexes.items():
            match = pattern.match(line)
            if not match:
                continue
            try:
                groups = match.dictgroup()
                # datetime library doesnt support nano seconds so we need to "normalize" the timestamp
                 ...
                groups['norm_timestamp'] = norm_timestamp
                action(self, groups)
            except IndexError as exc:
                printf(f'Access log line {line} is probably truncated: {exc}')
                return

def pronn_id = groups['conn_id']
        op_id = groups['op_id']
        try:
            conn = pending_conns[conn_id];
            op = conn['ops'][op_id]
        except KeyError:
            # Operation is not present (probably around thee start of the log file)
            return
        op.update(groups)
        process_result_statistics(op)
        del conn['ops'][op_id}']

def process_search_line(self, groups):
        conn_id = groups['conn_id']
        op_id = groups['op_id']
        try:
            conn = pending_conns[conn_id];
        except KeyError:
             conn = { 'conn_id': conn_id, 'ops': {} }
        op = groups
          # remainder of search processing (updating op)
        conn['op_id'] = op

def process_result_statistics(self, op)
        # That could be resused when finding json object for completed operation
        Then you have most of the code after
                 if key == 'RESULT_REGEX':
        (replacing: all match.group('x') per op['x'] )onn_id = groups['conn_id']
        op_id = groups['op_id']
        try:
            conn = pending_conns[conn_id];
            op = conn['ops'][op_id]
        except KeyError:
            # Operation is not present (probably around thee start of the log file)
            return
        op.update(groups)
        process_result_statistics(op)
        del conn['ops'][op_id}']

def process_search_line(self, groups):
        conn_id = groups['conn_id']
        op_id = groups['op_id']
        try:
            conn = pending_conns[conn_id];
        except KeyError:
             conn = { 'conn_id': conn_id, 'ops': {} }
        op = groups
          # remainder of search processing (updating op)
        conn['op_id'] = op

def process_result_statistics(self, op)
        # That could be resused when finding json object for completed operation
        Then you have most of the code after
                 if key == 'RESULT_REGEX':
        (replacing: all match.group('x') per op['x'] )

except that it looks fine.
I like the use of heapq

@jchapma
Copy link
Owner Author

@jchapma jchapma commented on c114f06 Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @progier389

I think this is an excellent idea.

Please sign in to comment.