我有一个包含一列邮政地址的数据帧(用geopy.geocoders GoogleV3
生成--我用它来解析我的数据帧).然而,geolocator.geocode
的输出具有国家名称--这是我不想要的.它还包含单元号--我不想要.
我该怎么做呢?
我试过了:
test_add['clean address'] = test_add.apply(lambda x: x['clean address'][:-5], axis = 1)
和
def remove_units(X):
X = X.split()
X_new = [x for x in X if not x.startswith("#")]
return ' '.join(X_new)
test_add['parsed addresses'] = test_add['clean address'].apply(remove_units)
它适用于:
data = ["941 Thorpe St, Rock Springs, WY 82901, USA",
"2809 Harris Dr, Antioch, CA 94509, USA",
"7 Eucalyptus, Newport Coast, CA 92657, USA",
"725 Mountain View St, Altadena, CA 91001, USA",
"1966 Clinton Ave #234, Calexico, CA 92231, USA",
"431 6th St, West Sacramento, CA 95605, USA",
"5574 Old Goodrich Rd, Clarence, NY 14031, USA",
"Valencia Way #1234, Valley Center, CA 92082, USA"]
test_df = pd.DataFrame(data, columns=['parsed addresses'])
但是当我使用具有150k这样的地址的更大的数据帧时,得到一个错误:"AttributeError:‘Float’对象没有‘Split’属性".
Ultimately, I require only street number, street name, city, state 和 zipcode.