Hướng dẫn spark-submit python file
How to submit a Python file (.py) with PySpark code to Spark submit? spark-submit is used to submit the Spark applications written in Scala, Java, R, and Python to cluster. In this article, I will cover a few examples of how to submit a python (.py) file by using several options and configurations. Show Apache Spark binary comes with Below is a simple
When you wanted to spark-submit a PySpark application (Spark with Python), you need to specify the .py file you wanted to run and specify the .egg file or .zip file for dependency libraries. Below are some of the options & configurations specific to run pyton (.py) file with spark submit. besides these, you can also use most of the options & configs that are covered in the below sections. Note: When you submit a Python file to spark-submit make sure your python file contains PySpark code.
Note: Files specified with 2. Spark Submit Command OptionsBelow I have covered some of the
2. 1 Deployment Modes (–deploy-mode)Using
2.2 Cluster Managers (–master)Using
2.3 Driver and Executor Resources (Cores & Memory)While submitting an application, you can also specify how much memory and cores you wanted to give for driver and executors.
Example:
2.4 Other Options
Note: Files specified with Example: Below example submits the application to yarn cluster manager by using cluster deployment mode and with 8g driver memory, 16g, and 2 cores for each executor.
3. Spark Submit ConfigurationsSpark submit supports several configurations using
Besides these, PySpark also supports many more configurations. Example :
Alternatively, you can also set these globally @
First preference goes to ConclusionIn this article I have explained how to submit a python file using spark-submit to run it on the cluster, different options you can use with python file, configuration e.t.c Happy Learning !! |