我正在使用房利美抵押贷款数据试验MySQL和数据分析.为此,我创建了两个表("perf"和"acq")和几个python函数.我首先删除集合和表(如果存在),然后创建集合(抵押贷款分析)和两个表.然后,我建立一个文件列表,对应于我想要执行的分析年数.所有这些都很好用.
然后,我使用下面的函数将来自房利美的perf和acq文本文件的数据加载到表中.相同的函数用于加载两个表.它每次都与"perf"表一起工作,而从不与"acq"表一起工作.如果我接受SQL语句并在mySQL工作台中执行它们,那么这些语句每次都能工作.我被难倒了,需要一些帮助.
在工作台中工作但在Python中不工作的SQL语句是:
LOAD DATA INFILE '/Users/<my user name>/github/mortgage-analysis-example/data/text/acq/Acquisition_2000Q1.txt'
INTO TABLE acq
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
(loan_id, orig_channel, seller_name, orig_interest_rate, orig_upb, orig_loan_term,
orig_date, first_pay_date, orig_ltv, orig_cltv, num_borrowers, dti,
borrower_credit_score, first_home_buyer, loan_purpose, property_type, num_units,
occupancy_status, property_state, zip, mortgage_insurance_percent, product_type,
coborrow_credit_score, mortgage_insurance_type, relocation_mortgage_ind);
加载此文件的python函数是:
def loadTable(env_in, template, file_list):
# env_in: (i know, uck global variable, holds info for this notebook common to all functions
# template: SQL template file
# file_list: python list element with fully qualified file names to use with SQL statement
env = env_in # environment info
from mysql.connector import Error
_file = open(env["base_path"]+env["path_separator"]+template, "r")
_template = _file.readlines()
try:
conn = mysql.connector.connect(host=env["mySQL"]["host"],user=env["mySQL"]["user"], passwd=env['pw'])
if conn.is_connected():
print('Connected to MySQL database')
except Error as e:
print(e)
cursor = conn.cursor()
cursor.execute("USE mortgage_analysis;")
cursor.execute("SET SESSION sql_mode = '';")
print("starting table load")
t0 = time.time()
res = []
for _file in file_list:
_sql = _template[0].format(_file)
print(f"\n{_sql}\n")
try:
res = cursor.execute(_sql)
warn = cursor.fetchwarnings()
#print(f"warn: {warn}")
except Error as e:
print(f"{_sql} \n{e}")
t1 = time.time()
print(f"Years: {env['years']} Table load time: {t1-t0}")
conn.close
return env
未识别错误(try always works),也未生成警告(fetchwarnings始终为空).
用于创建这两个表的SQL语句是:
DROP TABLE IF EXISTS acq;
CREATE TABLE acq (id DOUBLE AUTO_INCREMENT, loan_id DOUBLE, orig_channel VARCHAR(255), seller_name VARCHAR(255), orig_interest_rate DOUBLE, orig_upb DOUBLE, orig_loan_term DOUBLE, orig_date VARCHAR(255), first_pay_date VARCHAR(255), orig_ltv DOUBLE, orig_cltv DOUBLE, num_borrowers DOUBLE, dti DOUBLE, borrower_credit_score DOUBLE, first_home_buyer VARCHAR(255), loan_purpose VARCHAR(255), property_type VARCHAR(255), num_units DOUBLE, occupancy_status VARCHAR(255), property_state VARCHAR(255), zip DOUBLE, mortgage_insurance_percent DOUBLE, product_type VARCHAR(255), coborrow_credit_score DOUBLE, mortgage_insurance_type DOUBLE, relocation_mortgage_ind VARCHAR(255), PRIMARY KEY (id));
DROP TABLE IF EXISTS perf;
CREATE TABLE perf (id DOUBLE AUTO_INCREMENT, loan_id DOUBLE, monthly_reporting_period VARCHAR(255), servicer VARCHAR(255), interest_rate DECIMAL(6,3), current_actual_upb DECIMAL(12,2), loan_age DOUBLE, remaining_months_to_legal_maturity DOUBLE, adj_remaining_months_to_maturity DOUBLE, maturity_date VARCHAR(255), msa DOUBLE, current_loan_delinquency_status DOUBLE, mod_flag VARCHAR(255), zero_balance_code VARCHAR(255), zero_balance_effective_date VARCHAR(255), last_paid_installment_date VARCHAR(255), foreclosed_after VARCHAR(255), disposition_date VARCHAR(255), foreclosure_costs DOUBLE, prop_preservation_and_reair_costs DOUBLE, asset_recovery_costs DOUBLE, misc_holding_expenses DOUBLE, holding_taxes DOUBLE, net_sale_proceeds DOUBLE, credit_enhancement_proceeds DOUBLE, repurchase_make_whole_proceeds DOUBLE, other_foreclosure_proceeds DOUBLE, non_interest_bearing_upb DOUBLE, principal_forgiveness_upb VARCHAR(255), repurchase_make_whole_proceeds_flag VARCHAR(255), foreclosure_principal_write_off_amount VARCHAR(255), servicing_activity_indicator VARCHAR(255), PRIMARY KEY (id));