我正在try 使用PowerShell将一个JSON文件(从Mongo的Export-MdbcData输出)加载到一个SQL Server表中.JSON文件数据示例如下:
{ "code" : "0088", "name" : "BUTTON", "detail" : { "quantity" : 1 } }
{ "code" : "0081", "name" : "MATTERHORN", "detail" : { "quantity" : 2 } }
{ "code" : "0159", "name" : "BANKSTON", "detail" : { "quantity" : 1 } }
在下面的PowerShell脚本中,文件被读入数组,数组被转换为数据表以加载到SQL服务器表中.有没有更好/更快的方法来读入JSON文件?对于一个小的输入文件,只需几秒钟就可以加载数据,但对于超过400万条记录,整个过程需要数小时.
$encoding = [System.Text.Encoding]::UTF8
$output = [System.Collections.ArrayList]::new()
foreach ($line in [System.IO.File]::ReadLines($pathToJsonFile, $encoding))
{
$json = $line | ConvertFrom-Json
foreach ($detail in $json.detail)
{
[void]$output.Add(
[pscustomobject]@{
code = $json.code
name = $json.name
quantity = $detail.quantity
}
)
}
}
$dataTable = [System.Data.DataTable]::new()
$dataTable = $output | ConvertTo-DataTable
.
.
UPDATE:
I modified the script using @Charlieface's suggestion and removed the inner foreach statement to see if it will speed it up more. It loaded 4M+ records in about 17 minutes. I used batchsize = 80K and each insert iteration took about 14 seconds. However, comparing to a CSV file input with the same batch size and record count, the insert iteration takes about 3 seconds. I'm guessing the parsing of the JSON takes longer than a delimited file.
foreach ($line in [System.IO.File]::ReadLines($pathToJsonFile, $encoding))
{
$json = $line | ConvertFrom-Json;
[void]$dataTable.Rows.Add($json.code, $json.name, $json.detail.quantity);
$i++;
if (($i % $batchsize) -eq 0) {
$bulkcopy.WriteToServer($dataTable)
Write-Host "$i rows have been inserted in $($elapsed.Elapsed.ToString())."
$datatable.Clear()
}
}