比较 R 与 Matlab 的数据挖掘

发布于01月27日

主要是因为我最近开始学习Matlab的开源代码.我目前在数据挖掘和机器学习领域工作.我发现许多机器学习算法是在R中实现的，我仍在探索在R中实现的不同包.

我有一个快速的问题:在数据挖掘应用方面，你如何将R与Matlab进行比较，它的流行程度、优缺点、行业和学术认可度等.？你会 Select 哪一个？为什么？

我对Matlab和R进行了各种比较，对比了各种指标，但我特别感兴趣的是它在数据挖掘和ML中的适用性.

我很感激任何建议.

推荐答案

在过go 三年左右的时间里，我每天都在使用R，其中最大的一部分用于机器学习/数据挖掘问题.

我在大学时是Matlab的独家用户；当时我以为是

神经网络工具箱，优化工具箱，统计工具箱，

My Top 5 list for Learning ML/Data Mining in R:

Mining Association Rules in R

This refers to a couple things: First, a group of R Package that all begin arules (available from CRAN); you can find the complete list (arules, aruluesViz, etc.) on the Project Homepage. Second, all of these packages are based on a data-mining technique known as Market-Basked Analysis and alternatively as Association Rules. In many respects, this family of algorithms is the essence of data-mining--exhaustively traverse large transaction databases and find above-average associations or correlations among the fields (variables or features) in those databases. In practice, you connect them to a data source and let them run overnight. The central R Package in the set mentioned above is called arules; On the CRAN Package page for arules, you will find links to a couple of excellent secondary sources (vignettes in R's lexicon) on the arules package and on Association Rules technique in general.

标准参考文献，T100，由黑斯蒂等人.

The most current edition of this book is available in digital form for free. Likewise, at the book's website (linked to just above) are all data sets used in ESL, available for free download. (As an aside, i have the free digital version; i also purchased the hardback version from BN.com; all of the color plots in the digital version are reproduced in the hardbound version.) ESL contains thorough introductions to at least one exemplar from most of the major ML rubrics--e.g., neural metworks, SVM, KNN; unsupervised techniques (LDA, PCA, MDS, SOM, clustering), numerous flavors of regression, CART, Bayesian techniques, as well as model aggregation techniques (Boosting, Bagging) and model tuning (regularization). Finally, get the R Package that accompanies the book from CRAN (which will save the trouble of having to download the enter the datasets).

克兰Task View: Machine Learning

The +3,500 Packages available for R are divided up by domain into about 30 package families or 'Task Views'. Machine Learning is one of these families. The Machine Learning Task View contains about 50 or so Packages. Some of these Packages are part of the core distribution, including e1071 (a sprawling ML package that includes working code for quite a few of the usual ML categories.)

Revolution Analytics Blog

With particular focus on the posts tagged with Predictive Analytics

100由Josh Reich的幻灯片和R代码组成

A thorough study of the code would, by itself, be an excellent introduction to ML in R.

我认为最后一个资源非常好，但没有进入前五名:

A Guide to Getting Stared in Machine Learning [in R]

发表在博客A Beautiful WWW上

比较 R 与 Matlab 的数据挖掘

推荐答案

R相关问答推荐

ggplot 2中的地块底图(basemaps_gglayer()不起作用)

如何删除R中除某些特定名称外的所有字符串？

如何在R中合并和合并多个rabrame？

删除列表中存储的数据帧内和数据帧之间的重复行

将嵌套列表子集化为嵌套列表

Ggplot2中的重复注记

使用外部文件分配变量名及其值

我如何才能找到FAMILY=POISSON(LINK=&Q；LOG&Q；)中的模型预测指定值的日期？

即使硬币没有被抛出，也要保持对其的跟踪

R spatstat Minkowski Sum()返回多个边界

将Posict转换为数字时的负时间(以秒为单位)

将选定的索引范围与阈值进行比较

KM估计的差异：SvyKm与带权重的调查

从多个可选列中选取一个值到一个新列中

是否有可能从边界中找到一个点值？

为什么将负值向量提升到分数次方会得到NaN

如果满足条件，则替换列的前一个值和后续值

无法保存gglot的所有pdf元素

在使用ggplot2的情况下，如何在使用coord_trans函数的同时，根据未转换的坐标比来定位geom_瓷砖？

如何在R曲线图弹出窗口中更改r和theta标签