Version/Environment (if relevant):
Domino versions < 4.2
Issue:
If your code performs operations that force the underlying OS to terminate the Domino process unexpectedly (for example, one that requires a multiple of the available RAM on the machine), your run will end in an "Error" execution state, meaning that results from the run are not synced back to the project.
(Note: later versions of Domino impose limits on memory available to the run and prevent this problem from happening.)
Resolution:
Prevention
The impact of a crashed run is greatest if it happens to a long running workspace session (such as Jupyter or RStudio) where work has not yet been synced. If you are executing code interactively that you expect to consume a lot of memory on the machine, you should sync your project files during the course of the session using the pull-out menu:
If you need recover files from an Error'd run, it may be possible for an administrator of your deployment to do so. Recovery is more likely if less time has passed since the Error'd run, and if you haven't executed new runs since then.
The steps to recover the project are as follows. They must be performed by an administrator that has shell access to the Domino servers:
1. Get the run ID for the Error'd run by clicking on the run in the UI and noting the last part of the URL (a hash).
2. Identify which Executor executed the run, and locate it on the dispatcher dashboard. There are a few ways to do this:
Method A: If the run was recent, Ctrl+F search for the run ID on the dispatcher and see if you can find an indication of the machine.
Method B: Locate the application.log from when the run was executing, search for the run ID, and find a matching line that shows the executor machine's ID.
Method C: Confirm which hardware tier was used (shown in the run details in the UI). If there are only one or a few machines in this tier active, you can check directly for which one contains the relevant run working directory (steps 4-5 explains how to do this).
3. Once you've identified the machine, put it into maintenance mode by finding it on the dispatcher, clicking "actions", and on the following page choosing "MM On". If it isn't currently running, click "Start" as well. This serves a couple of purposes: it prevents the machine from timing out and stopping while you're copying the files, and it also prevents a new run in the project from attempting to re-use the working directory and wipe out the changes.
4. Get the address for the executor (it's shown on the Dispatcher dashboard) and SSH onto the machine. The username for your ssh connection will be dependent on your deployment (it may be "ubuntu", if you aren't sure what it is).
5. Your run's files are stored in: $DOMINO_INSTALLATION_DIR/executor/server/runs/$RUN_ID, substituting in the proper value for $DOMINO_INSTALLATION_DIR (it's usually "/domino", though in some deployments it's "/var/opt/domino").
6. Retrieve the files. If the project isn't too large it's usually easiest to just run a tar czf
command to make an archive, and then scp that to your local machine. You may need to use sudo/chown to get around permissions errors.
Comments
0 comments
Please sign in to leave a comment.