我对Snowflake和ELT一般都是新手,在使用"Insert Into"VS将数据加载到Snowflake时,是否有显著的性能差异.从外部stage"复制到"?
是否会使用"INSERT INTO"而不是"COPY INTO"来逐行处理该数据?据我所知,这会损害像Snowflake这样的列式数据库的性能.
使用"INSERT INTO"将数据从外部阶段加载到Snowflake原始表:
-- BEGIN COPY INTO PROCESS
INSERT INTO ORDERS.RAW.FACT_ORDERS (
ID,
ORDER_ID,
PRODUCT_ID,
PRODUCT_PRICE,
QUANTITY,
SALE_FACTOR,
FINAL_PRODUCT_PRICE,
PURCHASE_DATE,
ORDER_RETURN_FLAG,
RETURN_ID,
CUSTOMER_ID,
STORE_ID,
EMPLOYEE_ID,
_METADATA_PARTITION_DATE,
_METADATA_FILE_NAME,
_METADATA_CREATED_BATCH_ID,
_METADATA_UPDATED_BATCH_ID,
_METADATA_CREATED_DATE_TIME,
_METADATA_UPDATED_DATE_TIME
)
SELECT DISTINCT
stg.$1,
stg.$2,
stg.$3,
stg.$4,
stg.$5,
stg.$6,
stg.$7,
stg.$8,
stg.$9,
stg.$10,
stg.$11,
stg.$12,
stg.$13,
to_date($process_date, 'YYYYMMDD') as _METADATA_PARTITION_DATE,
METADATA$filename::varchar(512) as _METADATA_FILE_NAME,
$batch_id,
$batch_id,
$batch_timestamp,
$batch_timestamp
FROM
@ORDERS.RAW.STAGE (pattern => $file_pattern) stg
使用"复制到"将数据从外部stage加载到Snowflake原始表中:
-- BEGIN COPY INTO PROCESS
COPY INTO ORDERS.RAW.FACT_ORDERS (
ID,
ORDER_ID,
PRODUCT_ID,
PRODUCT_PRICE,
QUANTITY,
SALE_FACTOR,
FINAL_PRODUCT_PRICE,
PURCHASE_DATE,
ORDER_RETURN_FLAG,
RETURN_ID,
CUSTOMER_ID,
STORE_ID,
EMPLOYEE_ID,
_METADATA_PARTITION_DATE,
_METADATA_FILE_NAME,
_METADATA_CREATED_BATCH_ID,
_METADATA_UPDATED_BATCH_ID,
_METADATA_CREATED_DATE_TIME,
_METADATA_UPDATED_DATE_TIME
)
SELECT DISTINCT
stg.$1,
stg.$2,
stg.$3,
stg.$4,
stg.$5,
stg.$6,
stg.$7,
stg.$8,
stg.$9,
stg.$10,
stg.$11,
stg.$12,
stg.$13,
to_date($process_date, 'YYYYMMDD') as _METADATA_PARTITION_DATE,
METADATA$filename::varchar(512) as _METADATA_FILE_NAME,
$batch_id,
$batch_id,
$batch_timestamp,
$batch_timestamp
FROM
@ORDERS.RAW.STAGE (pattern = $file_pattern) stg
我正在try 了解将大量数据从AWS S3加载到我的Snowflake原始表的最佳实践.