For array type column, explode()
will convert it to n
rows, where n
is the number of elements in the array.
For map/dictionary type column, explode()
will convert it to nx2
shape, i.e., n rows, 2 columns (for key and value).
Explode for array#
CREATE TABLE IF NOT EXISTS my_table (name STRING, books ARRAY<STRING>, grades ARRAY<FLOAT>);
INSERT INTO my_table VALUES
("Tom", array('abc', 'efs'), array(85, 70, 91)),
('Alice', array('wer', 'mzdf'), array(70, 80, 90))
LATERAL VIEW explode
will generate the different combinations of exploded columns.
In the above case, column books
has 2 elements, and column grades
has 3 elements.
So for each name, you now have 2x3=6 rows.
SELECT
name, book_name, grade
FROM my_table
LATERAL VIEW explode(books) AS book_name
LATERAL VIEW explode(grades) AS grade
The generated table is like this:
name | book_name | grade |
---|---|---|
Alice | wer | 70 |
Alice | wer | 80 |
Alice | wer | 90 |
Alice | mzdf | 70 |
Alice | mzdf | 80 |
Alice | mzdf | 90 |
Tom | abc | 85 |
Tom | abc | 70 |
Tom | abc | 91 |
Tom | efs | 85 |
Tom | efs | 70 |
Tom | efs | 91 |
Explode for map#
CREATE TABLE IF NOT EXISTS demo_table (name STRING, cnt MAP<STRING, INT>);
INSERT INTO demo_table VALUES
("ABC", map_from_entries(array(("k1", 1), ("k2", 2)))),
("BCD", map_from_entries(array(("k1", 3), ("K2", 5)))),
("YUV", map_from_entries(array(("k1", 9), ("k2", 8), ("k3", 7))))
The demo_table
:
name | cnt |
---|---|
YUV | {“k1”:9,“k2”:8,“k3”:7} |
ABC | {“k1”:1,“k2”:2} |
BCD | {“k1”:3,“K2”:5} |
We can run LATERAL VIEW explode
on the column cnt
:
SELECT
name, key, val
FROM nn_ecp.demo_table
LATERAL VIEW explode(cnt) AS key, val
The generated table is like this:
name | key | val |
---|---|---|
YUV | k1 | 9 |
YUV | k2 | 8 |
YUV | k3 | 7 |
ABC | k1 | 1 |
ABC | k2 | 2 |
BCD | k1 | 3 |
BCD | K2 | 5 |
references#
- explode: https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.explode.html
- lateral view: https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-lateral-view.html
map_from_entries()
method: https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/map_from_entriesexplode()
vsexplode_outer()
vsposexplode()
vsposexplode_outer()
: https://www.youtube.com/watch?v=ZIWdx204-0E