Note: Check out our blogpost about simple parallelization using Python, R, Matlab & Octave, or this webinar on parallelizing algorithms on Amazon's massive X1 instances.
Broadly, there are two ways to speed up your calculations with Domino.
In your project's Settings, you can upgrade your hardware. Each run will use the amount cores and RAM you specify according the hardware tier you choose.
- You can start as many concurrent runs as you want, thereby "manually" parallelizing your work. Each run you start gets its own virtualized machine (container), so you can take advantage of dozens of machines at once.
Note: The more powerful the hardware, the more expensive your run becomes. Specifically, more powerful hardware costs more per minute. Be conscious of how long each run takes and how many concurrent runs you have going if budget is a consideration.
By the way, if you need to work more interactively, you can spin up an IPython/Jupyter Notebook session, or an "R Notebook" session. They will use your current hardware tier, just like running a script would do.
Pro tip: many of our machine have multiple cores, and there are tons of great libraries in R and Python to take advantage of multiple cores for faster computation.
We wrote a blog post about parallelizing your code, and here are a few more specific resources:
- IPython Notebook Clusters
- Scikitlearn supports native parallelism with its n_jobs parameters for a number of routines, including random forests, grid search, kmeans, cross validation, and more.
- Parallel package
- Foreach package
- The Caret package has native multicore support
- Examples of parallelized random forests
Please sign in to leave a comment.