Results 1 to 15 of 15
  1. Default Regex - Is this possible?


    Desired Functionality:
    Using python regular expressions (perl-like), is it possible to create a query which only finds matches for which it has not already found?

    Examples:

    Input String:
    1 1 1 2 2 2 3 3 3 4 4 4 40 40 40 45 45 45

    Regex:
    ????

    Expected Output:
    ['1', '2', '3', '4', '40', '45']

    ---------------------------------

    Input String:
    the the quick quick brown brown fox fox jumped jumped over over the the lazy lazy dog dog

    Regex:
    ????

    Expected Output:
    ['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'lazy', 'dog']

    ---------------------------------

    Input String:
    dog lazy the over jumped fox brown quick the the quick brown fox jumped over the lazy dog

    Regex:
    ????

    Expected Output:
    ['dog', 'lazy', 'the', 'over', 'jumped', 'fox', 'brown', 'quick']

  2. Default


    Do the words have to be in succession to count as repeats? Yes : No

  3. Default


    No.

    Input:
    1 2 3 4 5 3 2 4 5 1

    Regex:
    ????

    Output:
    ['1', '2', '3', '4', '5']

    That answer your question?

  4. Default


    (\w+\s)\1

    finds duplicate words in succession, but you have to call the regex in a loop until it doesn't find any more duplicates.

    Haven't tested it but you could try the following with a loop:

    (\w+\s).*\1

  5. Default


    If you're not restricted to the regex itself, what you could do is iterate through all matches, adding them to a set. Then convert it to a list. Does not preserve order, though.

    A less efficient method that does preserve order would just check if the match is in the list (linear time) before adding it.

  6. Default


    You'd think a powerful parser like regex would be able to return only unique strings. I guess there's no management of that. :(

    I know I could do...

    list(set((\d+\s)))

    But I was just wondering if there was a way to do this just through regex.

  7. Default


    There may be; I don't know it all that well (more familiar with the, less featureful Lua pattern matching). Question is, why would you need one?

  8. Default


    Because regex is in C and therefore runs much faster than python methods of doing it.

  9. Default


    I haven't used C in a long time but for doing unique word counts in Perl, the easiest and fastest way to do it would be with a hash table. I forget how those are implemented in C.

  10. Default


    Python dictionary is basically a hash table, and a set is basically a dictionary that only holds empty values, so yeah.

    And regex is written in regex. (That or Python is written in C, take your pick.) Either way you'd still need some way to represent a set, or in the absence of one you'd check previous values by doing a linear traversal which is even worse.

  11. DUCKS
    IGN: Mondays
    Server: Bellocan
    Level: 170
    Job: White Knight
    Guild: Affinity
    Alliance: Honour
    norway

    Default


    Wouldn't tokenizing while reading through the string and putting them in a hash-table work fine?

  12. Default


    Creating a lexical analyzer is way out of the scope of my project. I'll just work with what I have. Thanks!

  13. Default


    This is the best way of doing it. If you need to preserve order, use an ordered set data structure.

    Iterating through the matches and inserting each match into a set will not take much more time than the regex alone. Go ahead, try it.


  14. Default


    Why is it premature optimization to attempt to use a well-established regular expression parser than to create my own method? I don't see the resemblance. All programmers ought to look for already present standardized implementations before attempting to implement their own solutions to prevent reinventing the wheel.

  15. Default


    I think the "wheel" of this problem is to do a unique sort on the list, which you will probably find library functions available in C to do this task.

  16.  

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •