我一直在try 让文档人工智能批量提交工作,但遇到了一些困难.我使用RawDocument提交单个文件,假设我可以迭代我的数据集(27k图像),但 Select 了Batch,因为它似乎是更合适的技术.
当我运行我的代码时,我看到一个错误:"无法处理所有文档".调试信息的前几行是:
O:17:"Google\RPC\Status":5:{ S:7:"*代码";我:3;S:10:"*消息";S:32:"所有单据处理失败."; S:26:"谷歌\rpc\状态详情"; O:38:"Google\Protobuf\Internal\RepeatedField":4:{ S:49:"Google\Protobuf\Internal\RepeatedFieldcontainer";a:0:{}s:44:"Google\Protobuf\Internal\RepeatedFieldtype";i:11;s:45:"Google\Protobuf\Internal\RepeatedFieldklass";s:19:"Google\Protobuf\Any";s:52:"Google\Protobuf\Internal\RepeatedFieldlegacy_klass";s:19:"Google\Protobuf\Any";}s:38:"Google\Protobuf\Internal\Messagedesc";O:35:"Google\Protobuf\Internal\Descriptor":13:{s:46:"Google\Protobuf\Internal\Descriptorfull_name";s:17:"google.rpc.Status";s:42:"Google\Protobuf\Internal\Descriptorfield";a:3:{i:1;O:40:"Google\Protobuf\Internal\FieldDescriptor":14:{s:46:"Google\Protobuf\Internal\FieldDescriptorname";s:4:"code";```
support for this error说明错误的原因是:
GcsUriPrefix和gcsOutputConfig.gcsUri参数需要以gs://开头,以反斜杠(/)结尾.判断存储桶URI的配置.
我没有使用gcsUriPrefix(我应该吗?我的存储桶>;最大批量限制),但我的gcsOutputConfig.gcsUri在这些限制之内.我提供的文件列表给出了文件名(指向正确的存储桶),因此不应该有尾随的反斜杠.
欢迎提出建议
function filesFromBucket( $directoryPrefix ) {
// NOT recursive, does not search the structure
$gcsDocumentList = [];
// see https://cloud.google.com/storage/docs/samples/storage-list-files-with-prefix
$bucketName = 'my-input-bucket';
$storage = new StorageClient();
$bucket = $storage->bucket($bucketName);
$options = ['prefix' => $directoryPrefix];
foreach ($bucket->objects($options) as $object) {
$doc = new GcsDocument();
$doc->setGcsUri('gs://'.$object->name());
$doc->setMimeType($object->info()['contentType']);
array_push( $gcsDocumentList, $doc );
}
$gcsDocuments = new GcsDocuments();
$gcsDocuments->setDocuments($gcsDocumentList);
return $gcsDocuments;
}
function batchJob ( ) {
$inputConfig = new BatchDocumentsInputConfig( ['gcs_documents'=>filesFromBucket('the-bucket-path/')] );
// see https://cloud.google.com/php/docs/reference/cloud-document-ai/latest/V1.DocumentOutputConfig
// nb: all uri paths must end with / or an error will be generated.
$outputConfig = new DocumentOutputConfig(
[ 'gcs_output_config' =>
new GcsOutputConfig( ['gcs_uri'=>'gs://my-output-bucket/'] ) ]
);
// see https://cloud.google.com/php/docs/reference/cloud-document-ai/latest/V1.DocumentProcessorServiceClient
$documentProcessorServiceClient = new DocumentProcessorServiceClient();
try {
// derived from the prediction endpoint
$name = 'projects/######/locations/us/processors/#######';
$operationResponse = $documentProcessorServiceClient->batchProcessDocuments($name, ['inputDocuments'=>$inputConfig, 'documentOutputConfig'=>$outputConfig]);
$operationResponse->pollUntilComplete();
if ($operationResponse->operationSucceeded()) {
$result = $operationResponse->getResult();
printf('<br>result: %s<br>',serialize($result));
// doSomethingWith($result)
} else {
$error = $operationResponse->getError();
printf('<br>error: %s<br>', serialize($error));
// handleError($error)
}
} finally {
$documentProcessorServiceClient->close();
}
}