我需要读取一个以GBK编码的文本文件.GO编程语言中的标准库假定所有文本都以UTF-8编码.

如何读取其他编码的文件?

推荐答案

以前(正如在较早的答案中提到的),完成此操作的"简单"方法包括使用需要CGO的第三方包并包装iconv库.出于许多原因,这是不受欢迎的.值得庆幸的是,在相当长的一段时间里,只使用围棋作者提供的包(不是在主包集中,而是在Go Sub-Repositories个包中)就有了一种更好的All Go方式来做到这一点.

golang.org/x/text/encoding包为通用字符编码定义了一个接口,该接口可以与UTF-8相互转换.golang.org/x/text/encoding/simplifiedchinese子包提供GB18030GBKHZ-GB2312编码实现.

下面是读写GBK编码文件的示例.请注意,当数据被读/写时,第io.Reader和第io.Writer在飞翔上进行编码.

package main

import (
    "bufio"
    "fmt"
    "log"
    "os"

    "golang.org/x/text/encoding/simplifiedchinese"
    "golang.org/x/text/transform"
)

// Encoding to use. Since this implements the encoding.Encoding
// interface from golang.org/x/text/encoding you can trivially
// change this out for any of the other implemented encoders,
// e.g. `traditionalchinese.Big5`, `charmap.Windows1252`,
// `korean.EUCKR`, etc.
var enc = simplifiedchinese.GBK

func main() {
    const filename = "example_GBK_file"
    exampleWriteGBK(filename)
    exampleReadGBK(filename)
}

func exampleReadGBK(filename string) {
    // Read UTF-8 from a GBK encoded file.
    f, err := os.Open(filename)
    if err != nil {
        log.Fatal(err)
    }
    r := transform.NewReader(f, enc.NewDecoder())

    // Read converted UTF-8 from `r` as needed.
    // As an example we'll read line-by-line showing what was read:
    sc := bufio.NewScanner(r)
    for sc.Scan() {
        fmt.Printf("Read line: %s\n", sc.Bytes())
    }
    if err = sc.Err(); err != nil {
        log.Fatal(err)
    }

    if err = f.Close(); err != nil {
        log.Fatal(err)
    }
}

func exampleWriteGBK(filename string) {
    // Write UTF-8 to a GBK encoded file.
    f, err := os.Create(filename)
    if err != nil {
        log.Fatal(err)
    }
    w := transform.NewWriter(f, enc.NewEncoder())

    // Write UTF-8 to `w` as desired.
    // As an example we'll write some text from the Wikipedia
    // GBK page that includes Chinese.
    _, err = fmt.Fprintln(w,
        `In 1995, China National Information Technology Standardization
Technical Committee set down the Chinese Internal Code Specification
(Chinese: 汉字内码扩展规范(GBK); pinyin: Hànzì Nèimǎ
Kuòzhǎn Guīfàn (GBK)), Version 1.0, known as GBK 1.0, which is a
slight extension of Codepage 936. The newly added 95 characters were not
found in GB 13000.1-1993, and were provisionally assigned Unicode PUA
code points.`)
    if err != nil {
        log.Fatal(err)
    }

    if err = f.Close(); err != nil {
        log.Fatal(err)
    }
}

Playground

Go相关问答推荐

无法在Macos上使用Azure Speech golang SDK

如何预编译Golang标准库?

如何使用工作区方法扩展克隆的Golang库

Go中的net.SplitHostPort(r.RemoteAddr)安全性

Wamtime Memory中的‘Offset’是什么?Read?

GORM:一个表的两个外键

`docker system df` 与 `/system/df` (docker api 端点)

仅使用公共 api 对 alexedwards/scs 进行简单测试

按位移计算结果中的差异

带有前导零的整数参数被 flag.IntVar 解析为八进制

无法使用 gocsv 读取引用字段

如何使用 Docker 引擎 SDK 和 Golang 运行 docker 挂载卷

CORS grpc 网关 GoLang

如何在时间范围内规范化数组的元素?

Golang泛型在用作 map 元素时不起作用

panic :拨号 tcp:在 172.22.64.1:53 上查找 bookstoreDB:没有这样的主机

如何访问Go 1.18泛型 struct (structs)中的共享字段

如何将类型转换为字节数组golang

Golang LinkedList 删除第一个元素

获取单调时间,同 CLOCK_MONOTONIC