Init script is just a shell script, which will be run for each node in the cluster, before Apache Spark driver or executor JVM starts.
A cluster can have multiple init script if you want. These init scripts will be executed in the order provided.
Cluster scope init script
If your cluster is not in Edit mode, you can not see the button to add init script. You need to click Edit in the cluster configuration page, then you can add init script to your cluster settings.
Note that if you use a workspace folder to store the init script,
you do not need to specify the top level
/Workspace in the script path.
For example, if you init script path is
in the cluster init setup, if you choose Workspace,
the path you need to fill in is
You can also store init script in DBFS path and ABFSS path, but DBFS path is being deprecated by Databricks. Here is how you can create and update the content of the init script if you use DBFS:
timedatectl set-timezone Asia/Shanghai
For ABFSS path, the setup is the same.
Databricks exposes some environment variables for the init script. For the details, refer to documentation here.
Global init script
It is not recommended by Databricks official to use global init script.
Init script logs
There will possibly be some errors when running the init scripts.
The logging path for init scripts can be configured in the advanced configuration section for the cluster.
Go to edit mode of cluster, under
Advanced options for cluster, in the
logging tab, we configure the log path.
For example, the default options is
The init script log path will be the following format:
Under the above path, there will log for stdout and stderr.
You can check them using the
dbutils.fs utilities like this:
License CC BY-NC-ND 4.0