Accessing Backreferences ruby
2008-03-28 12:37:58
版权声明:原创作品,允许转载,转载时请务必以超链接形式标明文章 原始出处 、作者信息和本声明。否则将追究法律责任。http://fsjoy.blog.51cto.com/318484/68556 |
The special global variables $1, $2, and so on, can be used to reference matches:
Within a substitution such as sub or gsub, these variables cannot be used:
str = "a123b45c678" str.sub(/(a\d+)(b\d+)(c\d+)/, "1st=#$1, 2nd=#$2, 3rd=#$3") # "1st=, 2nd=, 3rd=" Why didn't this work? Because the arguments to sub are evaluated before sub is called. This code is equivalent:
str = "a123b45c678" s2 = "1st=#$1, 2nd=#$2, 3rd=#$3" reg = /(a\d+)(b\d+)(c\d+)/ str.sub(reg,s2) # "1st=, 2nd=, 3rd=" This code, of course, makes it much clearer that the values $1 through $3 are unrelated to the match done inside the sub call.
In this kind of case, the special codes \1, \2, and so on, can be used:
str = "a123b45c678" str.sub(/(a\d+)(b\d+)(c\d+)/, '1st=\1, 2nd=\2, 3rd=\3') # "1st=a123, 2nd=b45, 3rd=c768" Notice that we used single quotes (hard quotes) in the preceding example. If we used double quotes (soft quotes) in a straightforward way, the backslashed items would be interpreted as octal escape sequences:
str = "a123b45c678" str.sub(/(a\d+)(b\d+)(c\d+)/, "1st=\1, 2nd=\2, 3rd=\3") # "1st=\001, 2nd=\002, 3rd=\003" str = "a123b45c678" str.sub(/(a\d+)(b\d+)(c\d+)/, "1st=\\1, 2nd=\\2, 3rd=\\3") # "1st=a123, 2nd=b45, 3rd=c678" It's also possible to use the block form of a substitution, in which case the global variables may be used:
str = "a123b45c678"
str.sub(/(a\d+)(b\d+)(c\d+)/) { "1st=#$1, 2nd=#$2, 3rd=#$3" }
# "1st=a123, 2nd=b45, 3rd=c678"When using a block in this way, it is not possible to use the special backslashed numbers inside a double-quoted string (or even a single-quoted one). This is reasonable if you think about it.
As an aside here, I will mention the possibility of noncapturing groups. Sometimes you may want to regard characters as a group for purposes of crafting a regular expression; but you may not need to capture the matched value for later use. In such a case, you can use a noncapturing group, denoted by the (?:...) syntax:
str = "a123b45c678" str.sub(/(a\d+)(?:b\d+)(c\d+)/, "1st=\\1, 2nd=\\2, 3rd=\\3") # "1st=a123, 2nd=c678, 3rd=" In the preceding example, the second grouping was thrown away, and what was the third submatch became the second.
I personally don't like either the \1 or the $1 notations. They are convenient sometimes, but it isn't ever necessary to use them. We can do it in a "prettier," more object-oriented way.
The class method Regexp.last_match returns an object of class MatchData (as does the instance method match). This object has instance methods that enable the programmer to access backreferences.
The MatchData object is manipulated with a bracket notation as though it were an array of matches. The special element 0 contains the text of the entire matched string. Thereafter, element n refers to the nth match:
pat = /(.+[aiu])(.+[aiu])(.+[aiu])(.+[aiu])/i
# Four identical groups in this pattern
refs = pat.match("Fujiyama")
# refs is now: ["Fujiyama","Fu","ji","ya","ma"]
x = refs[1]
y = refs[2..3]
refs.to_a.each {|x| print "#{x}\n"}Note that the object refs is not a true array. Thus when we want to treat it as one by using the iterator each, we must use to_a (as shown) to convert it to an array.
We may use more than one technique to locate a matched substring within the original string. The methods begin and end return the beginning and ending offsets of a match. (It is important to realize that the ending offset is really the index of the next character after the match.)
str = "alpha beta gamma delta epsilon" # 0....5....0....5....0....5.... # (for your counting convenience) pat = /(b[^ ]+ )(g[^ ]+ )(d[^ ]+ )/ # Three words, each one a single match refs = pat.match(str) # "beta " p1 = refs.begin(1) # 6 p2 = refs.end(1) # 11 # "gamma " p3 = refs.begin(2) # 11 p4 = refs.end(2) # 17 # "delta " p5 = refs.begin(3) # 17 p6 = refs.end(3) # 23 # "beta gamma delta" p7 = refs.begin(0) # 6 p8 = refs.end(0) # 23 Similarly, the offset method returns an array of two numbers, which are the beginning and ending offsets of that match. To continue the previous example:
range0 = refs.offset(0) # [6,23] range1 = refs.offset(1) # [6,11] range2 = refs.offset(2) # [11,17] range3 = refs.offset(3) # [17,23] The portions of the string before and after the matched substring can be retrieved by the instance methods pre_match and post_match, respectively. To continue the previous example:
before = refs.pre_match # "alpha " after = refs.post_match # "epsilon" 本文出自 “李骥平” 博客,请务必保留此出处http://fsjoy.blog.51cto.com/318484/68556 本文出自 51CTO.COM技术博客 |


str="a123b45c678"
fsjoy1983
博客统计信息
热门文章
最新评论
友情链接
