Skip to main content

Change Timezone in Databricks Spark

·382 words·2 mins

The Databricks cluster is using UTC as the default timezone. So when you run some time-related code, the displayed time is not the local time, which is not ideal. In this post, I want to share how to change the timezone setting for Databricks cluster.

Change the system timezone
#

In the notebook cell, run the following to check what Linux system Databricks is using:

%sh
lsb_release -a

It shows the underlying system is Ubuntu. On Ubuntu, we can use timedatectl command line tool to change the timezone.

List timezones:

%sh
timedatectl list-timezones

Pick a timezone we want:

%sh
timedatectl set-timezone Asia/Shanghai

Note the above command does not impact the timezone setting of spark, since spark has already been started.

ref:

Change timezone for databricks spark
#

For the current session
#

If you only want to set the timezone for current spark session, just run the following statement:

spark.conf.set('spark.sql.session.timeZone', 'Asia/Shanghai')

The explanation for spark.sql.session.timeZone:

The ID of session local timezone in the format of either region-based zone IDs or zone offsets. Region IDs must have the form ‘area/city’, such as ‘America/Los_Angeles’. Zone offsets must be in the format ‘(+|-)HH’, ‘(+|-)HH: mm’ or ‘(+|-)HH:mm:ss’, e.g ‘-08’, ‘+01:00’ or ‘-13:33:33’. Also ‘UTC’ and ‘Z’ are supported as aliases of ‘+00:00’. Other short names are not recommended to use because they can be ambiguous.

Then you can run display(spark.sql("select current_timezone()")) in Databricks notebook cell to verify the change. However, if you create a new notebook using the same cluster, the timezone setting does not persist.

ref:

In the cluster config
#

In the Advanced options section of cluster setting page, under Spark tab, you can add the following config:

spark.sql.session.timeZone Asia/Shanghai

This will make sure every notebooks that attached to this cluster has the correct timezone setup.

databricks cluster init script
#

You can also set up the timezone in init script for the cluster. Just add someting like the following to the cluster init script:

timedatectl set-timezone Asia/Shanghai

Precedence of different settings
#

The precedence of setting up timezone of the above three methods:

setting in notebook session > cluster config > init script

ref: