Processing a String One Word or Character at a Time
Given a string, how do I break it into words or characters and process each one in turn?
Each Character
Because of Go’s built in support for Unicode “runes”, processing a string one character at a time is quite straightforward. Simply iterate over the range
of that string:
package main
import "fmt"
func main() {
for i, c := range "abc" {
fmt.Println(i, " => ", string(c))
}
}
$ go run test_each_char.go
0 => a
1 => b
2 => c
Each Word
Processing a string one word at a time is a bit more involved, and depends on your specific needs. If you’re fine with the unsophisticated approach of cutting the string into words based on whitespace, then you’re in luck - strings.Fields
was built just for you:
package main
import (
"fmt"
"strings"
)
func main() {
words := strings.Fields("This, that, and the other.")
for i, word := range words {
fmt.Println(i, " => ", word)
}
}
$ go run test_words.go
0 => This,
1 => that,
2 => and
3 => the
4 => other.
Without Punctuation
However, most applications will need a more grammatically tolerant approach, where punctuation is taken into account. Here we have two options. We can either make use of a strings.Replacer
, which we generate via the strings.NewReplacer
function:
package main
import (
"fmt"
"strings"
)
func main() {
s := "This, that, and the other."
replacer := strings.NewReplacer(",", "", ".", "", ";", "")
s = replacer.Replace(s)
words := strings.Fields(s)
for i, word := range words {
fmt.Println(i, " => ", word)
}
}
$ go run test_without_punctuation.go
0 => This
1 => that
2 => and
3 => the
4 => other
Or we can achieve a bit more clarity by making use of strings.Map
:
package main
import (
"fmt"
"strings"
)
func main() {
removePunctuation := func(r rune) rune {
if strings.ContainsRune(".,:;", r) {
return -1
} else {
return r
}
}
s := "This, that, and the other."
s = strings.Map(removePunctuation, s)
words := strings.Fields(s)
for i, word := range words {
fmt.Println(i, " => ", word)
}
}
$ go run test_without_punctuation_using_map.go
0 => This
1 => that
2 => and
3 => the
4 => other
Special Separators
There are other situations where you’d want to split a string based on a separator other than whitespace. The UNIX /etc/passwd
file, for example, contains lines of tokens separated by colons. Splitting each line into the relevant pieces is easy in Go, with the strings.Split
function, which is a more generic form of strings.Fields
:
package main
import (
"fmt"
"strings"
)
func main() {
s := "root:*:0:0:System Administrator:/root:/bin/sh"
words := strings.Split(s, ":")
for i, word := range words {
fmt.Println(i, " => ", word)
}
}
$ go run test_separator.go
0 => root
1 => *
2 => 0
3 => 0
4 => System Administrator
5 => /root
6 => /bin/sh