在要记录转换的变量中可能有零.Example:
df1[1, 1] <- 0
lm(Y ~ log(X1) + X2 + X3, df1)
# Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
# NA/NaN/Inf in 'x'
# In addition: Warning message:
# In log(X1) : NaNs produced
你可以考虑log1p
,计算log(1+x)
.
lm(Y ~ log1p(X1 + 1) + X2 + X3, df1)
# Call:
# lm(formula = Y ~ log1p(X1 + 1) + X2 + X3, data = df1)
#
# Coefficients:
# (Intercept) log1p(X1 + 1) X2 X3
# 2.1257 -1.5689 0.5337 1.0699
然而,这改变了解释,见related post on Cross Validated.无论如何,你应该决定如何处理零值.
Data:
df1 <- structure(list(X1 = c(0, -0.564698171396089, 0.363128411337339,
0.63286260496104, 0.404268323140999, -0.106124516091484, 1.51152199743894,
-0.0946590384130976, 2.01842371387704, -0.062714099052421), X2 = c(1.30486965422349,
2.28664539270111, -1.38886070111234, -0.278788766817371, -0.133321336393658,
0.635950398070074, -0.284252921416072, -2.65645542090478, -2.44046692857552,
1.32011334573019), X3 = c(-0.306638594078475, -1.78130843398,
-0.171917355759621, 1.2146746991726, 1.89519346126497, -0.4304691316062,
-0.25726938276893, -1.76316308519478, 0.460097354831271, -0.639994875960119
), Y = c(2.00627879909717, 1.08150911284604, 1.41465103918476,
1.37787039819613, 3.04863502238068, -0.828228728348569, 0.198328716326719,
-2.34295203837687, -1.61863179473641, 1.03962922460575)), row.names = c(NA,
-10L), class = "data.frame")