Docker uses cgroup to limit the amount of memory accessible to the container. This is not perfect and a few examples of the problems can be seen with:
$ free -h # This is from a 1G instance, but reporting 2G
total used free shared buff/cache available
Mem: 1.9G 204M 1.5G 19M 231M 1.6G
Swap: 1.0G 607M 416M
This output will display the 'host' memory usage and not the pod. This applies equally to R (and Python) allocating memory - using /proc/meminfo instead of cgroups.
You can verify if your pod has crashed due to running out of memory. This may require support from your Domino Admin(s) to check on your run. Login to the Central Server/Rancher and run:
# kubectl get po -n <Namespace> |grep <runID>
run-<runId>-<uniqueID> 3/4 OOMKilled 0 23h
What can I do to lessen the impact?
You can avoid crashing your Workspace by setting soft and hard limits inside your R application.
There are several discussions online about how to mitigate R memory usage - but this method will allow your Workspace to remain operational, even if your R memory allocation fails.
By setting the soft and hard limits, the Docker container should remain running although your application will not complete the tasks.
Install these packages (example are for the Dockerfile)
RUN R --no-save -e "install.packages(c('httr'),repos='http://cran.r-project.org')"
RUN R --no-save -e "install.packages(c('devtools'),repos='http://cran.r-project.org')"
RUN R --no-save -e "devtools::install_github('krlmlr/ulimit')"
In your R application or run, set a high memory value (Value in MiB's and the 'rep' allocation uses 4x the bytes) and try to allocate memory. The memory_limit value must be lower than the HW Tier RAM specification, i.e 15GB RAM HW Tier should not have a higher value than 15*1024=15360.
ulimit::memory_limit(1000)
> rep(0L, 1e9)Error: cannot allocate vector of size 3.7 Gb
Setting the memory limit close to the HW Tier should allow your work to proceed with less problems due to running out of memory. Note that the RStudio workspace may not have access to the group information, making it harder to automate the calculation of real free memory.
Using a shared HW Tier will also cause memory exhaustion where the 'neighbours' on the node may have consumed the memory available to the pods using the shared nodes.
Comments
0 comments
Please sign in to leave a comment.