由于各种原因,我的任务是将数据从大型GML文件导入到MySQL数据库中.我正在查看的GML文件是https://use-land-property-data.service.gov.uk/datasets/inspire/download的文件.

基本上,我感兴趣的是每个对象(多边形)的坐标.我可以导入坐标,就像它们出现在GML文件中一样(LAT1 LONG 1 LAT2 LONG 2...).我还需要将最大和最小纬度和经度作为四个单独的列(这需要计算;它不包括在文件中).

我到目前为止的try 如下:

#!/bin/bash

gmlfile=$1
gmldir="/home/user.name/gml_files"

gml_short=$(echo $gmlfile | sed 's/\.gml//')

echo "INSERT IGNORE INTO gmldata.uk (region,coordinates,maxLat,maxLong,minLat,minLong) VALUES" > gml_sql_query.sql

xmllint --xpath "//*[local-name()='posList']" "$gmldir/$gmlfile" | cut -d">" -f2 | cut -d"<" -f1 > "$gmldir/$gml_short-coords.txt" #Dump all coordinates to coordinate file

while read coordinates; do

    latitude=$(echo $coordinates | tr ' ' '\n' | paste -d' ' - - | cut -d" " -f1) #odd elements
    longitude=$(echo $coordinates | tr ' ' '\n' | paste -d' ' - - | cut -d" " -f2) #even elements

    max_lat=$(echo "$latitude" | tr " " '\n' | sort -n | tail -1)
    min_lat=$(echo "$latitude" | tr " " '\n' | sort -n | head -1)
    max_long=$(echo "$longitude" | tr " " '\n' | sort -n | tail -1)
    min_long=$(echo "$longitude" | tr " " '\n' | sort -n | head -1)


    echo "('$gml_short','$coordinates','$max_lat','$max_long','$min_lat','$min_long')," >> gml_sql_query.sql

done < "$gmldir/$gml_short-coords.txt"

sed -i '$s/,$/;/' gml_sql_query.sql #Replace last , with ;

mysql < gml_sql_query.sql

Xmllint行和MySQL导入本身都很快."gml_sql_query.sql"文件的填充速度非常非常慢;大约每30秒1000个.每一个GML文件都有数万个坐标,所以这需要很长很长的时间.我的脚本中哪些部分效率最低,有什么方法可以加速这一过程吗?

两个侧面:

  1. 我try 使用lat_longlat_process(等)变量来执行大部分管道操作,而只完成第一列/第二列的最后一个管道或其他操作.我没有发现这有任何改善.
  2. 我try 使用数组,但由于值是小数,所以速度也很慢:
mapfile -t lat_array <<< "$latitude"
    max_lat=0
    min_lat=9999999.0
    for num in "${lat_array[@]}"; do
        if [[ $(echo "$num > $max_lat" | bc -l) -eq 1 ]]; then
            max_lat=$num
        fi

        if [[ $(echo "$num < $min_lat" | bc -l) -eq 1 ]]; then
            min_lat=$num
        fi
    done

    mapfile -t long_array <<< "$longitude"
    max_long=0
    min_long=9999999.0
    for num in "${long_array[@]}"; do
        if [[ $(echo "$num > $max_long" | bc -l) -eq 1 ]]; then
            max_long=$num
        fi

        if [[ $(echo "$num < $min_long" | bc -l) -eq 1 ]]; then
            min_long=$num
        fi
    done

编辑:输入数据示例:

525873.122 104945.339 525873.13 104945.234 525872.916 104949.942 525873.308 104950.628 525874.582 104951.118 525875.268 104951.804 525875.301 104951.902 525871.972 104955.36 525847.95 104958.6 525824.152 104961.864 525821.7 104962.2 525798.7 104964.55 525778.14 104966.53 525773.63 104966.97 525778.15 104958.65 525781.863 104952.578 525783.98 104948.99 525797.06 104948.84 525819.82 104948.3 525845.38 104948.07 525856.52 104948.02 525871.5 104945.55 525871.69 104945.52 525872.5 104945.4 525873.122 104945.339
525661.75 105117.9 525685.085 105118.801 525684.99 105119.75 525684.5 105124.8 525684.47 105125.2 525681.819 105124.974 525680.35 105124.85 525671.99 105124.09 525664.24 105123.45 525661.29 105123.2 525661.75 105117.9
525685.1 105118.65 525686.35 105118.85 525698.949 105119.35 525700.4 105119.45 525701.699 105119.5 525703.15 105119.5 525709.949 105119.85 525709.29 105128.05 525707.25 105127.85 525699.55 105127.15 525697.9 105127.05 525689.8 105126.35 525685.819 105126 525684.686 105125.905 525684.406 105125.882 525684.47 105125.2 525684.5 105124.8 525684.99 105119.75 525685.1 105118.65

需要的MySQL INSERT语句:

INSERT IGNORE INTO gmldata.uk (region,coordinates,maxLat,maxLong,minLat,minLong) VALUES
('Adur_District_Council','525873.122 104945.339 525873.13 104945.234 525872.916 104949.942 525873.308 104950.628 525874.582 104951.118 525875.268 104951.804 525875.301 104951.902 525871.972 104955.36 525847.95 104958.6 525824.152 104961.864 525821.7 104962.2 525798.7 104964.55 525778.14 104966.53 525773.63 104966.97 525778.15 104958.65 525781.863 104952.578 525783.98 104948.99 525797.06 104948.84 525819.82 104948.3 525845.38 104948.07 525856.52 104948.02 525871.5 104945.55 525871.69 104945.52 525872.5 104945.4 525873.122 104945.339','525875.301','104966.97','525773.63','104945.234'),
('Adur_District_Council','525661.75 105117.9 525685.085 105118.801 525684.99 105119.75 525684.5 105124.8 525684.47 105125.2 525681.819 105124.974 525680.35 105124.85 525671.99 105124.09 525664.24 105123.45 525661.29 105123.2 525661.75 105117.9','525685.085','105125.2','525661.29','105117.9'),
('Adur_District_Council','525685.1 105118.65 525686.35 105118.85 525698.949 105119.35 525700.4 105119.45 525701.699 105119.5 525703.15 105119.5 525709.949 105119.85 525709.29 105128.05 525707.25 105127.85 525699.55 105127.15 525697.9 105127.05 525689.8 105126.35 525685.819 105126 525684.686 105125.905 525684.406 105125.882 525684.47 105125.2 525684.5 105124.8 525684.99 105119.75 525685.1 105118.65','525709.949','105128.05','525684.406','105118.65');

推荐答案

使用任何awk:

$ cat tst.sh
#!/usr/bin/env bash

gmlfile='Adur_District_Council.gml'
gml_short="${gmlfile%%.gml}"

awk -v q="'" -v gml_short="$gml_short" '
    BEGIN {
        OFS = q "," q
        print "INSERT IGNORE INTO gmldata.uk (region,coordinates,maxLat,maxLong,minLat,minLong) VALUES"
        ORS = ""
    }
    {
        max_lat = min_lat = $1
        max_lon = min_lon = $2
        for ( i=3; i<NF; i+=2 ) {
            lat = $i
            lon = $(i+1)
            max_lat = (lat > max_lat ? lat : max_lat)
            min_lat = (lat < min_lat ? lat : min_lat)
            max_lon = (lon > max_lon ? lon : max_lon)
            min_lon = (lon < min_lon ? lon : min_lon)
        }
        print sep "(" q gml_short, $0, max_lat, max_lon, min_lat, min_lon q ")"
        sep = ",\n"
    }
    END {
        print ";\n"
    }
' coords.txt

$ cat coords.txt
525873.122 104945.339 525873.13 104945.234 525872.916 104949.942 525873.308 104950.628 525874.582 104951.118 525875.268 104951.804 525875.301 104951.902 525871.972 104955.36 525847.95 104958.6 525824.152 104961.864 525821.7 104962.2 525798.7 104964.55 525778.14 104966.53 525773.63 104966.97 525778.15 104958.65 525781.863 104952.578 525783.98 104948.99 525797.06 104948.84 525819.82 104948.3 525845.38 104948.07 525856.52 104948.02 525871.5 104945.55 525871.69 104945.52 525872.5 104945.4 525873.122 104945.339
525661.75 105117.9 525685.085 105118.801 525684.99 105119.75 525684.5 105124.8 525684.47 105125.2 525681.819 105124.974 525680.35 105124.85 525671.99 105124.09 525664.24 105123.45 525661.29 105123.2 525661.75 105117.9
525685.1 105118.65 525686.35 105118.85 525698.949 105119.35 525700.4 105119.45 525701.699 105119.5 525703.15 105119.5 525709.949 105119.85 525709.29 105128.05 525707.25 105127.85 525699.55 105127.15 525697.9 105127.05 525689.8 105126.35 525685.819 105126 525684.686 105125.905 525684.406 105125.882 525684.47 105125.2 525684.5 105124.8 525684.99 105119.75 525685.1 105118.65

$ ./tst.sh
INSERT IGNORE INTO gmldata.uk (region,coordinates,maxLat,maxLong,minLat,minLong) VALUES
('Adur_District_Council','525873.122 104945.339 525873.13 104945.234 525872.916 104949.942 525873.308 104950.628 525874.582 104951.118 525875.268 104951.804 525875.301 104951.902 525871.972 104955.36 525847.95 104958.6 525824.152 104961.864 525821.7 104962.2 525798.7 104964.55 525778.14 104966.53 525773.63 104966.97 525778.15 104958.65 525781.863 104952.578 525783.98 104948.99 525797.06 104948.84 525819.82 104948.3 525845.38 104948.07 525856.52 104948.02 525871.5 104945.55 525871.69 104945.52 525872.5 104945.4 525873.122 104945.339','525875.301','104966.97','525773.63','104945.234'),
('Adur_District_Council','525661.75 105117.9 525685.085 105118.801 525684.99 105119.75 525684.5 105124.8 525684.47 105125.2 525681.819 105124.974 525680.35 105124.85 525671.99 105124.09 525664.24 105123.45 525661.29 105123.2 525661.75 105117.9','525685.085','105125.2','525661.29','105117.9'),
('Adur_District_Council','525685.1 105118.65 525686.35 105118.85 525698.949 105119.35 525700.4 105119.45 525701.699 105119.5 525703.15 105119.5 525709.949 105119.85 525709.29 105128.05 525707.25 105127.85 525699.55 105127.15 525697.9 105127.05 525689.8 105126.35 525685.819 105126 525684.686 105125.905 525684.406 105125.882 525684.47 105125.2 525684.5 105124.8 525684.99 105119.75 525685.1 105118.65','525709.949','105128.05','525684.406','105118.65');

因此,您的完整脚本将变为(未经测试):

#!/usr/bin/env bash

gmlfile="$1"
gmldir='/home/user.name/gml_files'
gml_short="${gmlfile%%.gml}"

xmllint --xpath "//*[local-name()='posList']" "$gmldir/$gmlfile" | cut -d">" -f2 | cut -d"<" -f1 > "$gmldir/$gml_short-coords.txt" #Dump all coordinates to coordinate file

awk -v q="'" -v gml_short="$gml_short" '
    BEGIN {
        OFS = q "," q
        print "INSERT IGNORE INTO gmldata.uk (region,coordinates,maxLat,maxLong,minLat,minLong) VALUES"
        ORS = ""
    }
    {
        max_lat = min_lat = $1
        max_lon = min_lon = $2
        for ( i=3; i<NF; i+=2 ) {
            lat = $i
            lon = $(i+1)
            max_lat = (lat > max_lat ? lat : max_lat)
            min_lat = (lat < min_lat ? lat : min_lat)
            max_lon = (lon > max_lon ? lon : max_lon)
            min_lon = (lon < min_lon ? lon : min_lon)
        }
        print sep "(" q gml_short, $0, max_lat, max_lon, min_lat, min_lon q ")"
        sep = ",\n"
    }
    END {
        print ";\n"
    }
' "$gmldir/$gml_short-coords.txt" > gml_sql_query.sql

mysql < gml_sql_query.sql

或者,如果你不想要那个显然是临时的gml_sql_query.sql文件:

$ cat tst.sh
#!/usr/bin/env bash

gmlfile="$1"
gmldir='/home/user.name/gml_files'
gml_short="${gmlfile%%.gml}"

xmllint --xpath "//*[local-name()='posList']" "$gmldir/$gmlfile" | cut -d">" -f2 | cut -d"<" -f1 > "$gmldir/$gml_short-coords.txt" #Dump all coordinates to coordinate file

mysql < <(
    awk -v q="'" -v gml_short="$gml_short" '
        BEGIN {
            OFS = q "," q
            print "INSERT IGNORE INTO gmldata.uk (region,coordinates,maxLat,maxLong,minLat,minLong) VALUES"
            ORS = ""
        }
        {
            max_lat = min_lat = $1
            max_lon = min_lon = $2
            for ( i=3; i<NF; i+=2 ) {
                lat = $i
                lon = $(i+1)
                max_lat = (lat > max_lat ? lat : max_lat)
                min_lat = (lat < min_lat ? lat : min_lat)
                max_lon = (lon > max_lon ? lon : max_lon)
                min_lon = (lon < min_lon ? lon : min_lon)
            }
            print sep "(" q gml_short, $0, max_lat, max_lon, min_lat, min_lon q ")"
            sep = ",\n"
        }
        END {
            print ";\n"
        }
    ' "$gmldir/$gml_short-coords.txt"
)

Mysql相关问答推荐

SQL中的搜索模式

JOOQ-如何在一个查询中引用多个(但不是所有)表中的所有字段

Google Sheet查询以提取数据并用值替换复选框.(差异)

不分组寻呼

比较2个mysql json数据类型列

在 MYSQL 8 中的 JSON 字段上的 SELECT 中返回所有键及其数组项

基于多个 where 子句在规范化表中查找公共 ID(值)

任何值的 SQL WHERE 子句?

SQL:如何为给定组中的所有记录 Select 一列与另一列不匹配的位置

有人可以帮我用 R 编程解决这个问题吗?

谁能帮我优化where子句

我在查询中没有得到正确的结果

用两个新列制作表格,按偶数或奇数对另一列进行分类,并将一个类别中的所有偶数和奇数相加

如何在 Mysql 中创建复合外键

MySQL 5.6 DATETIME 不接受毫秒/微秒

自动登录 phpMyAdmin

Doctrine2 迁移向下迁移并从浏览器而不是命令行迁移

#1030 - 从存储引擎 Aria 收到错误 176读取错误校验和的页面

考虑到 NodeJS,MySQL 和 MySQL2 有什么区别

警告:mysqli_connect(): (HY000/1045): Access denied for user 'username'@'localhost' (using password: YES)