由于各种原因,我的任务是将数据从大型GML文件导入到MySQL数据库中.我正在查看的GML文件是https://use-land-property-data.service.gov.uk/datasets/inspire/download的文件.
基本上,我感兴趣的是每个对象(多边形)的坐标.我可以导入坐标,就像它们出现在GML文件中一样(LAT1 LONG 1 LAT2 LONG 2...).我还需要将最大和最小纬度和经度作为四个单独的列(这需要计算;它不包括在文件中).
我到目前为止的try 如下:
#!/bin/bash
gmlfile=$1
gmldir="/home/user.name/gml_files"
gml_short=$(echo $gmlfile | sed 's/\.gml//')
echo "INSERT IGNORE INTO gmldata.uk (region,coordinates,maxLat,maxLong,minLat,minLong) VALUES" > gml_sql_query.sql
xmllint --xpath "//*[local-name()='posList']" "$gmldir/$gmlfile" | cut -d">" -f2 | cut -d"<" -f1 > "$gmldir/$gml_short-coords.txt" #Dump all coordinates to coordinate file
while read coordinates; do
latitude=$(echo $coordinates | tr ' ' '\n' | paste -d' ' - - | cut -d" " -f1) #odd elements
longitude=$(echo $coordinates | tr ' ' '\n' | paste -d' ' - - | cut -d" " -f2) #even elements
max_lat=$(echo "$latitude" | tr " " '\n' | sort -n | tail -1)
min_lat=$(echo "$latitude" | tr " " '\n' | sort -n | head -1)
max_long=$(echo "$longitude" | tr " " '\n' | sort -n | tail -1)
min_long=$(echo "$longitude" | tr " " '\n' | sort -n | head -1)
echo "('$gml_short','$coordinates','$max_lat','$max_long','$min_lat','$min_long')," >> gml_sql_query.sql
done < "$gmldir/$gml_short-coords.txt"
sed -i '$s/,$/;/' gml_sql_query.sql #Replace last , with ;
mysql < gml_sql_query.sql
Xmllint行和MySQL导入本身都很快."gml_sql_query.sql"文件的填充速度非常非常慢;大约每30秒1000个.每一个GML文件都有数万个坐标,所以这需要很长很长的时间.我的脚本中哪些部分效率最低,有什么方法可以加速这一过程吗?
两个侧面:
- 我try 使用
lat_long
和lat_process
(等)变量来执行大部分管道操作,而只完成第一列/第二列的最后一个管道或其他操作.我没有发现这有任何改善. - 我try 使用数组,但由于值是小数,所以速度也很慢:
mapfile -t lat_array <<< "$latitude"
max_lat=0
min_lat=9999999.0
for num in "${lat_array[@]}"; do
if [[ $(echo "$num > $max_lat" | bc -l) -eq 1 ]]; then
max_lat=$num
fi
if [[ $(echo "$num < $min_lat" | bc -l) -eq 1 ]]; then
min_lat=$num
fi
done
mapfile -t long_array <<< "$longitude"
max_long=0
min_long=9999999.0
for num in "${long_array[@]}"; do
if [[ $(echo "$num > $max_long" | bc -l) -eq 1 ]]; then
max_long=$num
fi
if [[ $(echo "$num < $min_long" | bc -l) -eq 1 ]]; then
min_long=$num
fi
done
编辑:输入数据示例:
525873.122 104945.339 525873.13 104945.234 525872.916 104949.942 525873.308 104950.628 525874.582 104951.118 525875.268 104951.804 525875.301 104951.902 525871.972 104955.36 525847.95 104958.6 525824.152 104961.864 525821.7 104962.2 525798.7 104964.55 525778.14 104966.53 525773.63 104966.97 525778.15 104958.65 525781.863 104952.578 525783.98 104948.99 525797.06 104948.84 525819.82 104948.3 525845.38 104948.07 525856.52 104948.02 525871.5 104945.55 525871.69 104945.52 525872.5 104945.4 525873.122 104945.339
525661.75 105117.9 525685.085 105118.801 525684.99 105119.75 525684.5 105124.8 525684.47 105125.2 525681.819 105124.974 525680.35 105124.85 525671.99 105124.09 525664.24 105123.45 525661.29 105123.2 525661.75 105117.9
525685.1 105118.65 525686.35 105118.85 525698.949 105119.35 525700.4 105119.45 525701.699 105119.5 525703.15 105119.5 525709.949 105119.85 525709.29 105128.05 525707.25 105127.85 525699.55 105127.15 525697.9 105127.05 525689.8 105126.35 525685.819 105126 525684.686 105125.905 525684.406 105125.882 525684.47 105125.2 525684.5 105124.8 525684.99 105119.75 525685.1 105118.65
需要的MySQL INSERT语句:
INSERT IGNORE INTO gmldata.uk (region,coordinates,maxLat,maxLong,minLat,minLong) VALUES
('Adur_District_Council','525873.122 104945.339 525873.13 104945.234 525872.916 104949.942 525873.308 104950.628 525874.582 104951.118 525875.268 104951.804 525875.301 104951.902 525871.972 104955.36 525847.95 104958.6 525824.152 104961.864 525821.7 104962.2 525798.7 104964.55 525778.14 104966.53 525773.63 104966.97 525778.15 104958.65 525781.863 104952.578 525783.98 104948.99 525797.06 104948.84 525819.82 104948.3 525845.38 104948.07 525856.52 104948.02 525871.5 104945.55 525871.69 104945.52 525872.5 104945.4 525873.122 104945.339','525875.301','104966.97','525773.63','104945.234'),
('Adur_District_Council','525661.75 105117.9 525685.085 105118.801 525684.99 105119.75 525684.5 105124.8 525684.47 105125.2 525681.819 105124.974 525680.35 105124.85 525671.99 105124.09 525664.24 105123.45 525661.29 105123.2 525661.75 105117.9','525685.085','105125.2','525661.29','105117.9'),
('Adur_District_Council','525685.1 105118.65 525686.35 105118.85 525698.949 105119.35 525700.4 105119.45 525701.699 105119.5 525703.15 105119.5 525709.949 105119.85 525709.29 105128.05 525707.25 105127.85 525699.55 105127.15 525697.9 105127.05 525689.8 105126.35 525685.819 105126 525684.686 105125.905 525684.406 105125.882 525684.47 105125.2 525684.5 105124.8 525684.99 105119.75 525685.1 105118.65','525709.949','105128.05','525684.406','105118.65');