The Go Cookbook

A community built and contributed collection of practical recipes for real world Golang development.

View project on GitHub

Processing a String by Complex Separators and Patterns

Given a string with complex and variable separators, how do I break it into words?

Splitting strings based on a single separator character is fairly simple in Go, but what about more tenacious strings? If a string uses multiple separators that should be treated equally, then regexp.Split is the tool for you:

test_regexp_separator.go
package main

import (
	"fmt"
	"regexp"
)

func main() {
	s := "one#two;three"
	words := regexp.MustCompile("[#;]").Split(s, -1)
	if words != nil {
		for i, word := range words {
			fmt.Println(i, " => ", word)
		}
	}
}
$ go run test_regexp_separator.go
0  =>  one
1  =>  two
2  =>  three

However, some strings are more complex, still – involving tokens wrapped in complex patterns. For these situations, regexp.FindAllStringSubmatch combined with regexp capture groups will do what you need.

regexp.FindAllStringSubmatch will search a string for the regexp, and will return tuples of the matched string and any submatches found via capture groups.

test_regexp_match.go
package main

import (
	"fmt"
	"regexp"
)

func main() {
	s := "[one][two][three]"
	matches := regexp.MustCompile(`\[(.*?)\]`).FindAllStringSubmatch(s, -1)
	if matches == nil {
		fmt.Println("No matches found.")
		return
	}

	for i, match := range matches {
		full := match[0]
		submatches := match[1:len(match)]
		fmt.Printf("%v => \"%v\" from \"%v\"\n", i, submatches[0], full)
	}
}
$ go run test_regexp_match.go
0 => "one" from "[one]"
1 => "two" from "[two]"
2 => "three" from "[three]"