我已经下载了NYC Taxi Zones dataset(从SODA Api下载并保存为json文件-不是GeoJson或Shapetime).该数据集相当小,因此我使用了包含的全部信息.为了文章的方便,我展示了数据集的前2行:
应用unnest()
并 Select 一些列和前2行后,在数据集下方.
您可以使用以下命令使用Polars导入数据
import polars as pl
poc = pl.read_json("./data.json"))
我对多边形感兴趣.实际上,我正在try 使用shapely
模块使用的多多边形和wkt(众所周知的文本表示)方法重新计算shape_area
.
到目前为止,我所做的是使用列coordinates
并将其转换为MultiPolygon()对象-可由Shapely模块读取.
def flatten_list_of_lists(data):
return [subitem3 for subitem1 in data for subitem2 in subitem1 for subitem3 in subitem2]
该函数将list[list[list[list[f64]]]]
对象作为输入并转换为list[list[f64]]
对象.
flattened_lists = [flatten_list_of_lists(row) for row in poc["coordinates"].to_list()]
print(flattened_lists)
[[-74.18445299999996,40.69499599999904], [-74.184488999999,40.695094999999987], [-74.18449799999996,40.6951849999987], [-74.184380999999997,40.695877999999989], [-74.1842819999994,40.696210999999999], [-74.1840209999997,40.697074999999989].
然后我使用下面的函数应用字符串连接,基本上是:
- 将
list[list[f64]]
对象转换为字符串. - 在字符串前面添加关键字MultiPolygon.
- 将"["和"]"替换为"(".,')'分别.
def polygon_to_wkt(polygon):
# Convert each coordinate pair to a string formatted as "lon lat"
coordinates_str = ", ".join(f"{lon} {lat}" for lon, lat in polygon)
# Format into a WKT Multipolygon string (each polygon as a single polygon multipolygon)
return f"""MULTIPOLYGON ((({coordinates_str})))"""
formatted_wkt = [polygon_to_wkt(polygon) for polygon in flattened_lists]
poc = poc.with_columns(pl.Series("WKT", formatted_wkt))
最后,我使用方法wkt.loads("MultiPolygon ((()))").area
来计算多多边形对象的形状区域
def convert_to_shapely_area(wkt_str):
try:
return wkt.loads(wkt_str).area
except Exception as e:
print("Error converting to WKT:", e)
return None
poc = poc.with_columns(
pl.col("WKT").map_elements(convert_to_shapely_area).alias("shapely_geometry")
)
print(poc.head())
即使对于第一个形状,WKT正确返回对象的面积,而对于第二个MultiPolygon,方法返回以下错误:
IllegalArgument例外:LinearRing的点不形成封闭的行串
What I have noticed between the two rows, is that the multipolygon of Newark Airport is a continues object of list[list[f64]]] coordinates. Whereas, the Jamaika Bay has multiple sublists [list[list[f64]]] elements (check column coordinates to verify this). Also the screenshot below verify this statement.
因此,在应用WKT之前,有没有办法将牙买加湾的多多边形统一到一个地理对象中?
PS:GitHub上的许多解决方案都使用形状文件,但我想定期使用SODA API自动重新下载NYC区域.
要从SODA API下载原始.json文件(省略logger_object
,让我们假装它是print()
)
import requests
params = {
"$limit": geospatial_batch_size, #i.e. 50_000
"$$app_token": config.get("api-settings", "app_token")
}
response = requests.get(api_url, params=params)
if response.status_code == 200:
data = response.json()
if not data:
logger_object.info("No data found, please check the API connection.")
sys.exit()
with open("{0}/nyc_zone_districts_data.json".format(geospatial_data_storage), "w") as f:
json.dump(data, f, indent=4)
else:
logger_object.error("API request failed.")
logger_object.error("Error: {0}".format(response.status_code))
logger_object.error(response.text)