Skip to main content
eLearner.app
Module 4 · Lesson 4 of 519/50 in the course~12 min
Module lessons (4/5)

Strings, bytes, and runes

In Go a string is an immutable sequence of bytes. String literals are encoded in UTF-8. A rune is an alias for int32 and represents a single Unicode code point.

Understanding the difference between bytes and runes is essential to avoid bugs when a string contains accents, emojis, ideograms or any non-ASCII character.

string = bytes, not characters

Go
s := "ciaò"
fmt.Println(len(s))   // 5 — not 4!
fmt.Println(s[0])     // 99 ('c' as a byte)
fmt.Println(s[3])     // 195 (first byte of 'ò' in UTF-8)

len(s) returns the number of bytes, not visible characters. Indexing s[i] returns the i-th byte (uint8), not the character.

for range iterates runes

for i, r := range s decodes UTF-8 on the fly: i is the byte-offset of the start of the rune, r is the rune (int32):

Go
s := "ciaò"
for i, r := range s {
    fmt.Printf("%d %c (U+%04X)\n", i, r, r)
}
// 0 c (U+0063)
// 1 i (U+0069)
// 2 a (U+0061)
// 3 ò (U+00F2)   <- starts at byte 3, takes 2 bytes

[]rune(s): per-character indexing

Explicit conversion: allocates a new slice of runes. Costly but sometimes necessary:

Go
runes := []rune("ciaò")
fmt.Println(len(runes))    // 4
fmt.Println(string(runes[3])) // ò

string(runes) is the inverse conversion: takes a []rune and produces a UTF-8 string.

Strings are immutable

Go
s := "ciao"
// s[0] = 'C'   // ERROR: cannot assign to s[0]

To mutate, convert to []byte or []rune, operate, convert back:

Go
b := []byte(s)
b[0] = 'C'
s = string(b)   // "Ciao"

Concatenation

The + operator works, but for many concatenations in a loop use strings.Builder (more efficient, avoids intermediate allocations):

Go
var sb strings.Builder
for i := 0; i < 5; i++ {
    sb.WriteString("ab")
}
fmt.Println(sb.String())   // "ababababab"

Useful packages

  • strings: Contains, HasPrefix, Split, Join, ToUpper, Replace...
  • strconv: Atoi, Itoa, ParseFloat, FormatInt...
  • unicode: IsLetter, IsDigit, ToUpper...
  • unicode/utf8: RuneCountInString (alternative to len([]rune(s))).

Try it

Exercise#go.m4.l4.e1
Attempts: 0Loading…

Iterate the string s with range and print index and rune using %d %c.

Loading editor…
Show hint

range over a string returns byte-offset + rune.

Solution available after 3 attempts

Exercise#go.m4.l4.e2
Attempts: 0Loading…

Convert s to []rune and print its length in characters.

Loading editor…
Show hint

Explicit conversion: `[]rune(s)`.

Solution available after 3 attempts

Quiz#go.m4.l4.e3
Ready

What is len("ciaò")?

Go
s := "ciaò"
fmt.Println(len(s))
Answer options

Recap

  • string = immutable sequence of bytes, UTF-8 encoded.
  • len(s) = bytes; s[i] = the i-th byte (uint8).
  • for i, r := range s decodes runes; i is the byte-offset.
  • []rune(s) to index by character (costly, allocates).
  • To mutate: go through []byte or []rune.
  • To concatenate in a loop: strings.Builder.
  • Key packages: strings, strconv, unicode, unicode/utf8.