Databricks is centered around notebooks for doing a lot of works. As a result, it is often not obvious how to run or re-use code from another notebook or Python module. In this post, I want to share how to do this in the Databricks.
Run/import code from another notebook#
Suppose that we have the following notebook1
:
foo = dbutils.widgets.get("foo")
bar = dbutils.widgets.get("bar")
result = foo + '&' + bar
def show():
print(result)
dbutils.notebook.exit(result)
In another notebook, we have two ways to “run” the code in notebook1
.
%run
command#
We can use the %run
magic to run the code in notebook1
:
%run ./path/to/notebook1 $foo=“hello” $bar=“world”
The path of notebook1
is its relative path to the current notebook.
We can give parameters to notebook1
via the ‘$’ assignment.
In notebook1
, you can then retrieve the value via dbutils.notebook.get()
method.
It will “import” the functions and variables from notebook1
to the current notebooks.
So you can use those functions in your current notebook.
In this sense, it it like from some_module import *
in Python.
using dbutils.notebook.run()
#
We can also run notebook1
like this:
dbutils.notebook.run(‘path/to/notebook1’, 30, {‘foo’: “fooVal”, ‘bar’: “barVal”})
The parameters for notebook1
is provided in the 3rd argument.
This will run notebook1
in a ephemeral job.
No functions or variable from that notebook will be exposed to your current notebook.
To return results from called notebook, we can use dbutils.notebook.exit(“result_str”)
.
This is rather limited, but it seems currently only string result is supported.
You can be creative in how to interpret the returned string though, e.g., as the name of table.
Import a Python module#
Since Databricks runtime 11.3, we can import Python module in the workspace. Create a normal Python module like you often do, and you can then import that module in a notebook. It conforms better to the Python ideology of “explicit is better than implicit”.
I create a demo directory test the feature, which has the following structure.
.
|____hello.py
|____my-notebook
|____utility
| |____math_ops.py
Content of math_ops.py
:
def power(x, y):
return x**y
Content of hello.py
:
print("hello")
In the notebook my-notebook
, we can run the following code without error:
import hello
import utility.math_ops as m_op
m_op.power(2, 3)
autoreload#
Since Databricks runtime 11.0, for the notebooks, it is using the Ipython kernel under the hood (source here). Note that like typical Ipython notebook, when you import a Python module, the module is not reloaded when there are changes to the module code by default. For example, if you add a new method or change method code for the module, it will not work.
Instead, we need to load the autoreload
extension for the notebook
(also mentioned in the official doc):
%load_ext autoreload
%autoreload 2
To check the documentation of autoreload
, you can run %autoreload?
.
If you don’t want to the autoreload magic, to make your module update work, you have to manually detach and attach to the cluster.
ref:
- https://learn.microsoft.com/en-us/azure/databricks/files/workspace-modules#import-python-and-r-modules
- autoreload doc: https://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html#autoreload
- Ipython reload module: https://stackoverflow.com/q/5364050/6064933
References#
- databricks notebook love hate relation: https://towardsdatascience.com/databricks-notebooks-a-love-hate-relationship-8f73e5b291fb
- difference between %run and dbutils.notebook.run(), https://community.databricks.com/s/question/0D53f00001GHVd5CAH/whats-the-difference-between-run-vs-dbutilsnotebookrun
- comparison of %run and dbutils.notebook.run(): https://learn.microsoft.com/en-gb/azure/databricks/notebooks/notebook-workflows#comparison-of-run-and-dbutilsnotebookrun
- share Python source file, https://learn.microsoft.com/en-gb/azure/databricks/notebooks/share-code
- https://medium.com/@YuhengD/best-practice-of-databricks-notebook-modulization-d2797dd29dd3
- databricks widgets, https://learn.microsoft.com/en-gb/azure/databricks/notebooks/widgets