这实际上很管用.我没有判断整个代码是否有错误,但乍一看都是正确的,包括"prism_tm".诀窍是让图像变得非常大,因为tesseract似乎忽略了小字符:
library(magick)
#> Linking to ImageMagick 7.1.0.31
#> Enabled features: cairo, fontconfig, freetype, heic, lcms, pango, raw, rsvg, webp, x11
#> Disabled features: fftw, ghostscript
#> Using 12 threads
library(tesseract)
input <- image_read("https://i.stack.imgur.com/JxGHc.png") %>%
# preprocess image to make it easier to ocr
image_convert(type = 'Grayscale') %>%
image_deskew() %>%
image_resize("2000x") %>%
ocr()
df <- data.table::fread(text = input)
#> Warning in data.table::fread(text = input): Detected 11 column names but the
#> data has 12 columns (i.e. invalid file). Added 1 extra default column name for
#> the first column which is guessed to be row names or an index. Use setnames()
#> afterwards if this guess is not correct, or fix the file write command that
#> created the file to create a valid file.
df
#> V1 info tmax ACREAGE GLOBALID
#> 1: 1 PRISM_tm 30.3976 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 2: 2 PRISM_tm 26.0226 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 3: 3 PRISM_tm 27.1775 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 4: 4 PRISM_tm 24,164 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 5: 5 PRISM_tm 24.458 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 6: 6 PRISM_tm 26.118 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 7: 7 PRISM_tm 27.259 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 8: 8 PRISM_tm 30.105 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 9: 9 PRISM_tm 30.697 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 10: 10 PRISM_tm 32949 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 11: 11 PRISM_tm 32,966 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 12: 12 PRISM_tm 32.081 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 13: 13 PRISM_tm 29.847 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 14: 14 PRISM_tm 27.576 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 15: 15 PRISM_tm 24.671 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 16: 16 PRISM_tm 24.382 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 17: 17 PRISM_tm 24.382 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 18: 18 PRISM_tm 26.365 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 19: 19 PRISM_tm 29.246 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 20: 20 PRISM_tm 30.737 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 21: 21 PRISM_tm 31.658 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 22: 22 PRISM_tm 31.386 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 23: 23 PRISM_tm 32457 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 24: 24 PRISM_tm 32.093 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 25: 25 PRISM_tm 30.303 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 26: 26 PRISM_tm 26.231 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> 27: 27 PRISM_tm 25.956 783805 {257865E5-DA82-41F8-B679-169C60B2BB4D}
#> V1 info tmax ACREAGE GLOBALID
#> datasource variable datatype resolutior Date year month
#> 1: PRISM tmax provisional 4kmM3 2021-10 2021 10
#> 2: PRISM tmax provisional 4kmM3 2021-11 2021 11
#> 3: PRISM tmax provisional 4kmM3 2021-12 2021 12
#> 4: PRISM tmax stable 4kmM3 2005-01 2005 1
#> 5: PRISM tmax stable 4kmM3 2005-02 2005 2
#> 6: PRISM tmax stable 4kmM3 2005-03 2005 3
#> 7: PRISM tmax stable 4kmM3 2005-04 2005 4
#> 8: PRISM tmax stable 4kmM3 2005-05 2005 5
#> 9: PRISM tmax stable 4kmM3 2005-06 2005 6
#> 10: PRISM tmax stable 4kmM3 2005-07 2005 7
#> 11: PRISM tmax stable 4kmM3 2005-08 2005 8
#> 12: PRISM tmax stable 4kmM3 2005-09 2005 9
#> 13: PRISM tmax stable 4kmM3 2005-10 2005 10
#> 14: PRISM tmax stable 4kmM3 2005-11 2005 11
#> 15: PRISM tmax stable 4kmM3 2005-12 2005 12
#> 16: PRISM tmax stable 4kmM3 2006-01 2006 1
#> 17: PRISM tmax stable 4kmM3 2006-02 2006 2
#> 18: PRISM tmax stable 4kmM3 2006-03 2006 3
#> 19: PRISM tmax stable 4kmM3 2006-04 2006 4
#> 20: PRISM tmax stable 4kmM3 2006-05 2006 5
#> 21: PRISM tmax stable 4kmM3 2006-06 2006 6
#> 22: PRISM tmax stable 4kmM3 2006-07 2006 7
#> 23: PRISM tmax stable 4kmM3 2006-08 2006 8
#> 24: PRISM tmax stable 4kmM3 2006-09 2006 9
#> 25: PRISM tmax stable 4kmM3 2006-10 2006 10
#> 26: PRISM tmax stable 4kmM3 2006-11 2006 11
#> 27: PRISM tmax stable 4kmM3 2006-12 2006 12
#> datasource variable datatype resolutior Date year month
由reprex package(v2.0.1)于2022-08-10创建
可以安全地忽略来自fread
的警告,因为它只抱怨第一列中缺少标题.