I'm working on making a logistic regression with a simple dataset in Python:
My goal is to predict whether or not someone survived. After cleaning the dataset & getting rid of NaN values as well as String columns, I've used the following code to turn every column data type to float64(cleaned dataset shown below as well):
titanic_data['Survived'] = titanic_data['Survived'].astype(float)
titanic_data['Sibling/Spouse'] = titanic_data['Sibling/Spouse'].astype(float)
titanic_data['Parents/Children'] = titanic_data['Parents/Children'].astype(float)
titanic_data['male'] = titanic_data['male'].astype(float)
titanic_data['Q'] = titanic_data['Q'].astype(float)
titanic_data['S'] = titanic_data['S'].astype(float)
titanic_data[2] = titanic_data[2].astype(float)
titanic_data[3] = titanic_data[3].astype(float)
上述代码的输出:
Survived float64
Age float64
Sibling/Spouse float64
Parents/Children float64
Fare float64
male float64
Q float64
S float64
2 float64
3 float64
dtype: object
当我运行Logistic回归代码时(如下所示),我得到错误mixed type of string and non-string is not supported.
我的回归代码:
# Logistic regression
# Split the dataset
x = titanic_data.drop("Survived",axis=1)
y = titanic_data["Survived"]
from sklearn.model_selection import train_test_split
x_train, y_train, x_test, y_test = train_test_split(x,y,test_size=0.3,random_state=1)
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
logreg.fit(x_train, y_train)
但是正如您所看到的,我已经将列数据类型更改为完全相同,那么为什么我会收到这个错误&我可以做些什么来修复它?