我正在用Python进行一项相关研究,该研究需要具有60,000个数据点的数据集中每对坐标之间的距离矩阵.我try 过进行垂直化并使用Objandas,但Objandas的问题在于,要运行距离函数,我需要重复列表中的x和y数据(x数据重复60,000个塔的集合,y连续重复每个坐标60,000次)使每个列表3.6e9值长,并且我的计算机在此完成之前内存耗尽,或者当我try 在学校的远程桌面上运行它时,需要半个多小时,但我一直无法成功运行它.这是我正在运行的代码:
#Florida Tower Matrix
#take coordinates of Florida towers
#CHECK THE LAT/LONG order
import geojson
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
features = []
with open("/Users/katcn/Desktop/Spring 2024/Research/PleaseWork.geojson") as f:
gj = geojson.load(f)
for i in range(59629):
features.append(gj['features'][i]["geometry"]['coordinates'])
#OR make the X matrix all in one column
#make the Y matrix repeat each value 59000 times
longitude = []
latitude = []
for i in range(len(features)):
for j in range(len(features)):
longitude.append(features[j][0])
for k in range(len(features)):
latitude.append(features[i][0])
dict = {"longitude" : longitude, "latitude" : latitude}
df = pd.DataFrame(dict)
dict2 = {"longitude" : longitude, "latitude" : latitude}
df2 = pd.DataFrame(dict2)
#calculate distance between two towers
geometry = [Point(xy) for xy in zip(df.longitude, df.latitude)]
gdf = gpd.GeoDataFrame(df, crs={'init': 'epsg:4326'}, geometry=geometry)
geometry2 = [Point(xy) for xy in zip(df2.longitude, df2.latitude)]
gdf2 = gpd.GeoDataFrame(df2, crs={'init': 'epsg:4326'}, geometry=geometry2)
distances = gdf.geometry.distance(gdf2.geometry)
print(distances)
任何关于如何以不同的方式处理这个问题以使其成为更合理的运行时的建议都是很棒的.