load saved model pyspark
print(spark.version) 2.4.3 # fit model cvModel = cv_grid.fit(train_df) # save best model to specified path mPath = "/path/to/model/folder" cvModel.bestModel.write().overwrite().save(mPath) # read pickled model via pipeline api from pyspark.ml.pipeline import PipelineModel persistedModel = PipelineModel.load(mPath) # predict predictionsDF = persistedModel.transform(test_df)
Source: stackoverflow.com