Module lessons (4/5)
Strings, bytes, and runes
In Go a string is an immutable sequence of bytes. String literals
are encoded in UTF-8. A rune is an alias for int32 and
represents a single Unicode code point.
Understanding the difference between bytes and runes is essential to avoid bugs when a string contains accents, emojis, ideograms or any non-ASCII character.
string = bytes, not characters
s := "ciaò"
fmt.Println(len(s)) // 5 — not 4!
fmt.Println(s[0]) // 99 ('c' as a byte)
fmt.Println(s[3]) // 195 (first byte of 'ò' in UTF-8)len(s) returns the number of bytes, not visible characters.
Indexing s[i] returns the i-th byte (uint8), not the character.
for range iterates runes
for i, r := range s decodes UTF-8 on the fly: i is the byte-offset
of the start of the rune, r is the rune (int32):
s := "ciaò"
for i, r := range s {
fmt.Printf("%d %c (U+%04X)\n", i, r, r)
}
// 0 c (U+0063)
// 1 i (U+0069)
// 2 a (U+0061)
// 3 ò (U+00F2) <- starts at byte 3, takes 2 bytes[]rune(s): per-character indexing
Explicit conversion: allocates a new slice of runes. Costly but sometimes necessary:
runes := []rune("ciaò")
fmt.Println(len(runes)) // 4
fmt.Println(string(runes[3])) // òstring(runes) is the inverse conversion: takes a []rune and produces
a UTF-8 string.
Strings are immutable
s := "ciao"
// s[0] = 'C' // ERROR: cannot assign to s[0]To mutate, convert to []byte or []rune, operate, convert back:
b := []byte(s)
b[0] = 'C'
s = string(b) // "Ciao"Concatenation
The + operator works, but for many concatenations in a loop use
strings.Builder (more efficient, avoids intermediate allocations):
var sb strings.Builder
for i := 0; i < 5; i++ {
sb.WriteString("ab")
}
fmt.Println(sb.String()) // "ababababab"Useful packages
strings:Contains,HasPrefix,Split,Join,ToUpper,Replace...strconv:Atoi,Itoa,ParseFloat,FormatInt...unicode:IsLetter,IsDigit,ToUpper...unicode/utf8:RuneCountInString(alternative tolen([]rune(s))).
Try it
Iterate the string s with range and print index and rune using %d %c.
Show hint
range over a string returns byte-offset + rune.
Solution available after 3 attempts
Convert s to []rune and print its length in characters.
Show hint
Explicit conversion: `[]rune(s)`.
Solution available after 3 attempts
What is len("ciaò")?
s := "ciaò"
fmt.Println(len(s))Recap
string= immutable sequence of bytes, UTF-8 encoded.len(s)= bytes;s[i]= the i-th byte (uint8).for i, r := range sdecodes runes;iis the byte-offset.[]rune(s)to index by character (costly, allocates).- To mutate: go through
[]byteor[]rune. - To concatenate in a loop:
strings.Builder. - Key packages:
strings,strconv,unicode,unicode/utf8.