Processing a String by Complex Separators and Patterns
Given a string with complex and variable separators, how do I break it into words?
Splitting strings based on a single separator character is fairly simple in Go, but what about more tenacious strings? If a string uses multiple separators that should be treated equally, then regexp.Split
is the tool for you:
test_regexp_separator.go
package main
import (
"fmt"
"regexp"
)
func main() {
s := "one#two;three"
words := regexp.MustCompile("[#;]").Split(s, -1)
if words != nil {
for i, word := range words {
fmt.Println(i, " => ", word)
}
}
}
$ go run test_regexp_separator.go
0 => one
1 => two
2 => three
However, some strings are more complex, still – involving tokens wrapped in complex patterns. For these situations, regexp.FindAllStringSubmatch
combined with regexp capture groups will do what you need.
regexp.FindAllStringSubmatch
will search a string for the regexp, and will return tuples of the matched string and any submatches found via capture groups.
test_regexp_match.go
package main
import (
"fmt"
"regexp"
)
func main() {
s := "[one][two][three]"
matches := regexp.MustCompile(`\[(.*?)\]`).FindAllStringSubmatch(s, -1)
if matches == nil {
fmt.Println("No matches found.")
return
}
for i, match := range matches {
full := match[0]
submatches := match[1:len(match)]
fmt.Printf("%v => \"%v\" from \"%v\"\n", i, submatches[0], full)
}
}
$ go run test_regexp_match.go
0 => "one" from "[one]"
1 => "two" from "[two]"
2 => "three" from "[three]"