Introduction
Today I was studying maching learning. Ng mentioned about the usage of multicore in ML, which draw my attention. Because in another course, Bioinformatics Algorithms (Part 1), the calculation of clump finding is a complex project that it may spent hours to find correct result. So I slightly dug the parallel use in R by google. One of the interesting result is coming from r-bloggers. So I repeated Daniel’s code on my machine.
parallel(multicore)
This is one of the most popular packages of parallel computing. Since R 2.14.0, it had been included in R and called Parallel. Daniel’s code still use the old name, so I had to rename all multicore as parallel.
snow
The snow (Simple Network of Workstations) package by Tierney et al. can use PVM, MPI, NWS as well as direct networking sockets. It provides an abstraction layer by hiding the communications details. The snowFT package provides fault-tolerance extensions to snow.
snowfall
The snowfall package by Knaus provides a more recent alternative to snow. Functions can be used in sequential or parallel mode.
- the above two are quoted from CRAN.
lapply (function)
It is a commonly used loop function in R. We’ll use it to conduct benchmark test.
code
- The codes below are originally written by Daniel Marcelino.
Preparation for Packages
1 |
if(!require(rbenchmark)){ |
Create Benchmark Function
r
stands for the replication times.n
is the number of single calculation.v
stands for verification, you could verify whether the results are identical.
1 |
res <-function(n = 1e3, r = 100, v = F) { |
Test Benchmark
- Here we gonna use 10 as replication times due to the limited time.
1 |
res1 = res(1e3, 10) |
Plot the results
1 |
library(ggplot2) |
Conclusion
According to this result, we should use default function such as lapply
until the loop number is bigger than a hundred thousand. Also as the data size increase, parallel
computing is relatively more cheap.
近期评论