[Asterisk-code-review] digium commits: Improve reliability of extracting user from ... (repotools[master])

Matt Jordan asteriskteam at digium.com
Tue Jun 9 23:07:36 CDT 2015


Matt Jordan has posted comments on this change.

Change subject: digium_commits: Improve reliability of extracting user from messages
......................................................................


Patch Set 1:

(2 comments)

https://gerrit.asterisk.org/#/c/608/1/digium_commits.py
File digium_commits.py:

Line 57:        realname = None
       :         email = None
       :         license = None
       : 
       :         name = raw_user.strip()
       :         if ':' in name:
       :             name.replace(':', '')
       :         if '<' in name:
       :             realname = name[:name.index('<')]
       :             email = name[(name.index('<') + 1):name.index('>')]
       :         if '(' in name:
       :             if not realname:
       :                 realname = name[:name.index('(')]
       :                 license = name[(name.index('(') + 1):name.index(')')]
       :         if not realname:
       :             realname = name
       :         if len(realname) == 0:
       :             return None
       : 
       :         realname = realname.strip()
       :         user = get_user_by_username(realname)
       :         if email:
       :             user.email = email.strip()
       :         if license:
       :             user.license = license.strip()
> Would a regex be a good fit for this? As it is right now, this seems prone 
I'd prefer not to use a regex, for two reasons:
1) The previous regexs for "Tested by" and "Reported by" had subtle bugs that were not easy to diagnose until we had a lot of commits messages pass through. It's easy to get regexs wrong, and we have a long and glorious history of getting them wrong enough.
2) If a commit message has "<mjordan at digium.com" or "(License 5000", we should be catching that in peer review. This doesn't have to accept completely arbitrary input; it has to accept peer reviewed input.

If index fails to find the item, it will raise an exception. That's not really a terrible thing, since we can't parse the name.


Line 163:             if 'uploaded by' in token.lower():
        :                 name = token[token.lower().index('uploaded by') + 11:]
        :             if 'submitted by' in token.lower():
        :                 name = token[token.lower().index('submitted by') + 12:]
> If you're normalizing these sorts of things, should "uploaded by" and "subm
Our commit message guidelines do not allow for hyphens, or colons.

https://wiki.asterisk.org/wiki/display/AST/Commit+Messages

The only reason we have "uploaded by" is because that is what we used to specify in our commit message guidelines.

There's only so much we can feasibly work around. If we allow for arbitrarily formatted commit messages, it will break the commit message parser at some point. I'm not sure how much it's worth trying to work around poorly formatted or incorrect messages.


-- 
To view, visit https://gerrit.asterisk.org/608
To unsubscribe, visit https://gerrit.asterisk.org/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id85d719aa2cc59015f1b49539563ddbb7b2b1928
Gerrit-PatchSet: 1
Gerrit-Project: repotools
Gerrit-Branch: master
Gerrit-Owner: Matt Jordan <mjordan at digium.com>
Gerrit-Reviewer: Mark Michelson <mmichelson at digium.com>
Gerrit-Reviewer: Matt Jordan <mjordan at digium.com>
Gerrit-HasComments: Yes



More information about the asterisk-code-review mailing list