Skip to content

Latest commit

 

History

History
215 lines (169 loc) · 9.38 KB

strings.md

File metadata and controls

215 lines (169 loc) · 9.38 KB

Overview

  1. Overview of Strings, Runes, Code-points
  2. Common methods and JVM equivalents

Definitions

  1. Code point: same a rune, defined by Unicode
  2. Rune: a Code point (see above)
  3. String: read-only (immutable), slice of (arbitrary) bytes
  4. Char/Character: TODO

Key Concepts

  1. Go sources are always UTF-8
  2. Conversion between []byte and string is cheap
  3. Strings can contain unprintable chars (can contain any bytes)
  4. Zero value is empty string: ""

Literals

Iteration

// i == current index
// c == current character
for i, c := range "foo" {
    fmt.Println(i, c)
}

Idioms

  1. Avoid bytes for presentation, use runes instead
    1. Runes are safe for multi-byte chars
    2. []byte and string are fine for serialization, storage, ...

Common operations

JVM method Golang
String::charAt "abc"[i:i+1] or ( []rune("abc") )[3] == 'd'
String::compareTo s1 < s2 or strings.Compare(...)
String::contains strings.Contains(haystack, needle)
String::endsWith strings.HasSuffix(s, sfx)
String::equals s1 == s2
String::equalsIgnoreCase strings.EqualsFold(s1, s2)
String::format fmt.Sprintf(s, ...)
String::getBytes []byte(s)
String::indexOf strings.Index(haystack, needle)
String::isBlank strings.TrimSpace(s) == ""
String::isEmpty len(s) == 0
String::lastIndexOf strings.LastIndex(haystack, needle)
String::length len(s)
String::repeat strings.Repeat(s, n)
String::replaceAll strings.ReplaceAll(...)
String::split strings.Split(s, sep)
String::startsWith strings.HasPrefix(s, pfx)
String::substring "abcde"[2:4] == "cd"
String::toCharArray Use range in loops, or []rune(s)
String::toLowerCase strings.ToLower(s)
String::toUpperCase strings.ToUpper(s)
String::trim strings.TrimSpace(s)
String::valueOf string(foo)
StringUtils::join strings.Join(slice, sep)
StringUtils::containsAny strings.ContainsAny(haystack,needles)

CaseFormat

  1. Changing among `lowerCamel, UpperCamel, lower-kebab, lower_snake, UPPER_SNAKE, ...
  2. github.com/iancoleman/strcase
  3. github.com/bitly/nsq/internal/stringy

Char Codes

  1. See ascii table
  2. See unicode table
  3. Upper case letters from A == 65 to Z == 90
  4. Lower case letters from a == 97 to z == 122

Char code to letter

upperA := string(65)    // A
upperZ := string(90)    // Z

lowerA := string(97)    // a
lowerZ := string(122)   // z

Letter to char code

// -- upper case
codeForA := int('A') // 65
codeForA := "ABC"[0] // 65 (only for ascii)
codeForA := []rune("ABC")[0] // 65 (only for any unicode)
...
codeForZ := int('Z') // 90

// -- lower case
codeForA := int('a') // 97
...
codeForZ := int('z') // 122

String to runes (unicode)

s := "abc"
[]rune(s) // []rune{97, 98, 99}

c := "🐧"
[]rune(c)[0] == 128039

// https://unicode-table.com/en/3088/
c := "よ"
[]rune(c)[0] == 12424  // \u3088

c := "🤣"
[]rune(c)[0] == 129315
  • TODO: produce \u something (eg. fmt.Sprintf("%+q")

Runes (unicode) to string

r := []rune{97, 36}
string(r) == "a$"

r := rune(129315)   // 129315 (dec) == '\U0001F923' (codepoint)
string(r) == "🤣"

r := '\u3088'       // single quotes for rune, lowercase u for lower codepoints
string(r) == "よ"

r := '\U0001F923'   // uppercase U for higher code-points
string(r) == "🤣"

Bytes to string

// base 10 == dec
b := []byte{119, 99}
string(b) == "wc"
  • TODO: hex example

String to bytes

s := "ab"
bytes.Equal([]byte(s), []byte{97, 98})

Ascii code to String

asc == 115  // base 10 == dec
string(asc) == "s"

b := []byte{115}
string(b) == "s"

String to Ascii code

s := "Q"
[]byte(s)[0] == 81 // base 10 == dec

Printing runes

r := []rune("😀")[0]
fmt.Printf("%d (dec) == %+q (code-point) == 0x%x (hex)",
    r, r, r)
// 128512 (dec) == '\U0001f600' (code-point) == 0x1f600 (hex)

TODO/Unorganized

  • TODO: StringUtils.abbreviate TODO

  • TODO: StringUtils.appendIfMissing ...TODO: ??...

  • TODO: StringUtils.capitalize TODO

  • TODO: StringUtils.isAlpha ...TODO: ??...

  • TODO: StringUtils.isNumeric ...TODO: ??...

  • TODO: StringUtils.isAsciiPrintable ...TODO: ??...

  • TODO: StringUtils.leftPad fmt.Printf("%06d", 12)

  • TODO: StringUtils.rightPad fmt.Printf("%06d", 12)

  • TODO: StringUtils.prependIfMissing ...TODO: ??...

  • TODO: StringUtils.remove/Delete ...TODO: ??...

  • TODO: StringUtils.replace ...TODO: ??...

  • TODO: StringUtils.substringLeft ...TODO: ??...

  • TODO: StringUtils.substringRight ...TODO: ??...

  • TODO: StringUtils.substringBefore see below

  • TODO: StringUtils.substringBeforeLast see below

  • TODO: StringUtils.substringAfter see below

  • TODO: StringUtils.substringAfterLast see below

  • TODO: StringUtils.uncapitalize see below

  • TODO: builder - https://yourbasic.org/golang/build-append-concatenate-strings-efficiently/

  • TODO: concatenation - https://yourbasic.org/golang/build-append-concatenate-strings-efficiently/

  • TODO: strconv.Itoa

Other Resources

  1. Official docs
  2. Official Language spec
  3. Official Language Spec for Rune literal
  4. Official Language Spec for String literal
  5. https://www.practical-go-lessons.com/chap-7-hexadecimal-octal-ascii-utf8-unicode-runes
  6. yourbasic