If you're running an on-demand spark cluster with executor hardware tiers sized medium or larger, and you're not receiving output from your spark job, this may be caused by a mismatch in the number or cpu cores spark expects, versus what is available in the hardware tier.
If you retrieve the support bundle from your spark workspace (requires admin permissions), you may see entries in the spark master logs such as:
2022-02-28T18:28:06.847027350Z 22/02/28 18:28:06 INFO Master: Registered app pyspark-shell with ID app-20220228182806-0000
2022-02-28T18:28:06.857629987Z 22/02/28 18:28:06 WARN Master: App app-20220228182806-0000 requires more resource than any of Workers could have.
A quick test to validate this issue, is trying to run your spark job on a small hardware tier. If you get output from a small hardware tier, but not from larger tiers, this is indicating that the spark executor config may be trying to use more hardware than is available.
This can be adjusted in the Spark session through use of the spark.config() function:
# You can examine the full config
spark.sparkContext.getConf().getAll()
# Change CPU cores
spark.config("spark.executor.cores", "<some-integer>")
# Change Available Memory, using m for MB, g for GB
spark.config("spark.executor.memory", "4g")
If you're still not getting output after trying smaller values here, please retrieve the support bundle from your workspace, and send that to us in a support ticket for further review.
Comments
0 comments
Please sign in to leave a comment.