Python hog+svm 分类太耗时

发布于04月05日

我有一个名为Animal的目录-它由两个子目录(cat 和狗)组成， for each 文件创建完整路径不会花费太多时间，而且返回速度很快，以下是示例代码:

import  cv2
import numpy as np
from skimage.feature import hog
from sklearn.svm import SVC
from sklearn.decomposition import PCA
import os
mypca =PCA(n_components=10)
parent ="Animal"
file_list =[]
features =[]
labels =[]
subdirectory =os.listdir(parent)
for  animal in subdirectory:
    full_directory =os.path.join(parent,animal)
    for  file in os.listdir(full_directory):
        file_path =os.path.join(full_directory,file)
        file_list.append(file_path)
print(file_list)

我保存了每个文件的完整路径，以避免嵌套循环，然后我应该遍历列表，逐个读取每个图像，并应用HOG特征提取算法，以下是代码:

for file in file_list:
    image =cv2.imread(file)
    image = cv2.resize(image, (128 * 4, 64 * 4))
    fd, hog_image = hog(image, orientations=9, pixels_per_cell=(8, 8),
                            cells_per_block=(4, 4), visualize=True, multichannel=True)
    features.append(fd)
    if file.find("cats") !=-1:
        labels.append(0)
    else:
        labels.append(1)
labels =np.array(labels)
features =np.array(features)
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test =train_test_split(features,labels,test_size=0.2,random_state=1)
X_train =mypca.fit_transform(X_train)
X_test =mypca.transform(X_test)
mymodel =SVC()
mymodel.fit(X_train,y_train)
print(mymodel.score(X_test,y_test))
print(features.shape)

但是这个代码太耗时了(就连我的电脑HP Oman都有6个CPU Qore)，你能建议我如何加快执行时间吗？原来目录中每个类别有4000张图片，我已经删除了2000多张，但我仍然等待了很长时间，直到它完成操作，我也try 了应用PCA来降低维度.你有什么建议？

Python hog+svm 分类太耗时

推荐答案

Python相关问答推荐

通过仅导入pandas来在for循环中进行多情节

将HTML输出转换为表格中的问题

多处理代码在while循环中不工作

如何计算两极打印机中 * 所有列 * 的出现次数？

如果条件为真，则Groupby.mean()

如何在Windows上用Python提取名称中带有逗号的文件？

如果值不存在，列表理解返回列表

按顺序合并2个词典列表

"使用odbc_connect(raw)连接字符串登录失败；可用于pyodbc"

如何在Python数据框架中加速序列的符号化

Pandas—合并数据帧，在公共列上保留非空值，在另一列上保留平均值

如何使用Pandas DataFrame按日期和项目汇总计数作为列标题

如何找出Pandas 图中的连续空值(NaN)？

基于Scipy插值法的三次样条系数

当HTTP 201响应包含 Big Data 的POST请求时，应该是什么？

Python日志(log)库如何有效地获取lineno和funcName？

如何在Django模板中显示串行化器错误

以极轴表示的行数表达式？

替换包含Python DataFrame中的值的<；

查找数据帧的给定列中是否存在特定值