To get Graphframes working in Jupyter notebook when using Spark / PySpark you need to download the Graphframes JAR and copy it to your $SPARK_HOME/jars folder then make a 2nd copy and change the file extension to .zip. Finally you must make sure that your PYTHONPATH environment variable includes the full path to the .zip file.
Here is an example of setting this up in the Dockerfile instructions for an environment build using Graphframes 0.5.0 and Spark 2.1:
RUN wget --quiet https://dl.bintray.com/spark-packages/maven/graphframes/graphframes/0.5.0-spark2.1-s_2.11/graphframes-0.5.0-spark2.1-s_2.11.jar && \ cp graphframes-0.5.0-spark2.1-s_2.11.jar /opt/spark-2.1.0-bin-hadoop2.6/jars && \ cp graphframes-0.5.0-spark2.1-s_2.11.jar /opt/spark-2.1.0-bin-hadoop2.6/jars/graphframes-0.5.0-spark2.1-s_2.11.zip && \ echo 'export PYTHONPATH=${PYTHONPATH:-}:${SPARK_HOME:-}/jars/graphframes-0.5.0-spark2.1-s_2.11.zip' >> /home/ubuntu/.domino-defaults
Comments
0 comments
Please sign in to leave a comment.