Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular expression - bracket expression, corner case #114

Open
cspiel opened this issue Jun 1, 2018 · 1 comment
Open

Regular expression - bracket expression, corner case #114

cspiel opened this issue Jun 1, 2018 · 1 comment

Comments

@cspiel
Copy link
Collaborator

cspiel commented Jun 1, 2018

When trying to capture a text enclosed in (non-nested) square brackets
like, for example, in

mooplot.cc:28:26: warning: pass by value and use std::move [modernize-pass-by-value]

with a longest, leftmost regular-expresison engine, I come up with

warning:[^[]*\[\([^]]*\)\]

relying on Section 9.3.5 #1 of the POSIX Standard, which states

The right-square-bracket (']') shall lose its special meaning and
represent itself in a bracket expression if it occurs first in the
list (after an initial circumflex ('^'), if any).

Thus I was surprised that osh(1) barfs:

Malformed regular expression 'warning:[^[]*\[\([^]]*\)\]': Lm_lexer: regex: mismatched parenthesis

Replacing the opening or closing square brackets inside the bracket
expressions with their octal or hexadecimal equivalents works ok.
This means that just the corner case of literal square brackets is not
covered. (I'm aware that Omake/Osh do not claim to be POSIX-compliant.)

Full demo program:

###  match.osh

.LANGUAGE: program

print_newline() =
        print($'''
''')

show_sample_text(a_sample) =
        println($'Sample Text')
        println($'-----------')
        println($"'$(a_sample)'")
        print_newline()

match_string_against_patterns(a_sample, some_patterns) =
        println($'Lex-search')
        println($'----------')
        foreach(p => ..., some_patterns)
            println($"pattern '$p'")
            have_found_patten = false
            channel = open-in-string(a_sample)
            lex-search(channel)
            case p
                println($"matched '$1'")
                have_found_patten = true
                export
            if not(have_found_patten)
                println($'pattern did NOT match!')
            close(channel)
            print_newline()

##  Excerpt from the ascii(7) manual page.
##
##      Oct   Dec   Hex   Char
##      ──────────────────────────
##      133   91    5B    [
##      134   92    5C    \  '\\'
##      135   93    5D    ]
patterns[] =
    $'warning:[^\133]*\[\([^\135]*\)\]'    # octal notation
    $'warning:[^\x5b]*\[\([^\x5d]*\)\]'    # hexadecimal notation (lowercase)
    $'warning:[^\x5B]*\[\([^\x5D]*\)\]'    # hexadecimal notation (uppercase)
    $'warning:[^[]*\[\([^]]*\)\]'          # character notation

sample = $'mooplot.cc:28:26: warning: pass by value and use std::move [modernize-pass-by-value]'

show_sample_text(sample)
match_string_against_patterns(sample, patterns)

Simply say osh match.osh to reproduce the error.

@ANogin
Copy link

ANogin commented Jun 1, 2018

To be honest, I no longer have any clue how/why we ended up with our own custom regex engine in omake (or more precisely, libmojave). Perhaps there were no good alternatives back in the day. But it seems wrong or at least completely unnecessary today - I would recommend replacing it with a more standard library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants