NULL 語意
說明
表格由一組列組成,而每一列都包含一組欄位。欄位與資料類型相關聯,並表示實體的特定屬性(例如,age
是稱為 person
的實體的欄位)。有時候,特定於某一列的欄位值在該列產生時並不知道。在 SQL
中,此類值表示為 NULL
。本節詳細說明 NULL
值在各種運算子、運算式和其他 SQL
建構中的處理語意。
- 比較運算子中的 Null 處理
- 邏輯運算子中的 Null 處理
- 運算式中的 Null 處理
- WHERE、HAVING 和 JOIN 條件中的 Null 處理
- GROUP BY 和 DISTINCT 中的 Null 處理
- ORDER BY 中的 Null 處理
- UNION、INTERSECT、EXCEPT 中的 Null 處理
- EXISTS 和 NOT EXISTS 子查詢中的 Null 處理
- IN 和 NOT IN 子查詢中的 Null 處理
下列說明名為 person
的表格的結構配置和資料。資料在 age
欄位中包含 NULL
值,而此表格將用於下列各節中的各種範例。
表格:person
Id | Name | Age |
---|---|---|
100 | Joe | 30 |
200 | Marry | NULL |
300 | Mike | 18 |
400 | Fred | 50 |
500 | Albert | NULL |
600 | Michelle | 30 |
700 | Dan | 50 |
比較運算子
Apache Spark 支援標準的比較運算子,例如「>」、「>=」、「=」、「<」和「<=」。當運算元之一或兩個運算元都是未知或 NULL
時,這些運算子的結果為未知或 NULL
。為了比較 NULL
值是否相等,Spark 提供了一個空值安全相等運算子(「<=>」),當一個運算元為 NULL
時傳回 False
,當兩個運算元都是 NULL
時傳回 True
。下表說明當一個或兩個運算元為 NULL` 時,比較運算子的行為
左運算元 | 右運算元 | > | >= | = | < | <= | <=> |
---|---|---|---|---|---|---|---|
NULL | 任何值 | NULL | NULL | NULL | NULL | NULL | False |
任何值 | NULL | NULL | NULL | NULL | NULL | NULL | False |
NULL | NULL | NULL | NULL | NULL | NULL | NULL | True |
範例
-- Normal comparison operators return `NULL` when one of the operand is `NULL`.
SELECT 5 > null AS expression_output;
+-----------------+
|expression_output|
+-----------------+
| null|
+-----------------+
-- Normal comparison operators return `NULL` when both the operands are `NULL`.
SELECT null = null AS expression_output;
+-----------------+
|expression_output|
+-----------------+
| null|
+-----------------+
-- Null-safe equal operator return `False` when one of the operand is `NULL`
SELECT 5 <=> null AS expression_output;
+-----------------+
|expression_output|
+-----------------+
| false|
+-----------------+
-- Null-safe equal operator return `True` when one of the operand is `NULL`
SELECT NULL <=> NULL;
+-----------------+
|expression_output|
+-----------------+
| true|
+-----------------+
邏輯運算子
Spark 支援標準的邏輯運算子,例如 AND
、OR
和 NOT
。這些運算子以 布林
敘述作為引數,並傳回 布林
值。
下表說明當一個或兩個運算元為 NULL
時,邏輯運算子的行為。
左運算元 | 右運算元 | OR | AND |
---|---|---|---|
True | NULL | True | NULL |
False | NULL | NULL | False |
NULL | True | True | NULL |
NULL | False | NULL | False |
NULL | NULL | NULL | NULL |
運算元 | NOT |
---|---|
NULL | NULL |
範例
-- Normal comparison operators return `NULL` when one of the operands is `NULL`.
SELECT (true OR null) AS expression_output;
+-----------------+
|expression_output|
+-----------------+
| true|
+-----------------+
-- Normal comparison operators return `NULL` when both the operands are `NULL`.
SELECT (null OR false) AS expression_output
+-----------------+
|expression_output|
+-----------------+
| null|
+-----------------+
-- Null-safe equal operator returns `False` when one of the operands is `NULL`
SELECT NOT(null) AS expression_output;
+-----------------+
|expression_output|
+-----------------+
| null|
+-----------------+
敘述
在 Spark 中,比較運算子和邏輯運算子被視為敘述。除了這兩種敘述之外,Spark 還支援其他形式的敘述,例如函數敘述、轉型敘述等。Spark 中的敘述可廣泛分類為
- 不允許 Null 的敘述
- 可以處理
NULL
值運算元的敘述- 這些敘述的結果取決於敘述本身。
不允許 Null 的敘述
當敘述的一個或多個引數為 NULL
時,不允許 Null 的敘述會傳回 NULL
,而且大多數的敘述都屬於此類別。
範例
SELECT concat('John', null) AS expression_output;
+-----------------+
|expression_output|
+-----------------+
| null|
+-----------------+
SELECT positive(null) AS expression_output;
+-----------------+
|expression_output|
+-----------------+
| null|
+-----------------+
SELECT to_date(null) AS expression_output;
+-----------------+
|expression_output|
+-----------------+
| null|
+-----------------+
可處理 Null 值運算元的表達式
此類表達式旨在處理 NULL
值。表達式的結果取決於表達式本身。舉例來說,函數表達式 isnull
會在 null 輸入時傳回 true
,而在非 null 輸入時傳回 false
,而函數 coalesce
會在其運算元清單中傳回第一個非 NULL
值。不過,當 coalesce
的所有運算元都是 NULL
時,它會傳回 NULL
。以下是此類別中不完整的表達式清單。
- COALESCE
- NULLIF
- IFNULL
- NVL
- NVL2
- ISNAN
- NANVL
- ISNULL
- ISNOTNULL
- ATLEASTNNONNULLS
- IN
範例
SELECT isnull(null) AS expression_output;
+-----------------+
|expression_output|
+-----------------+
| true|
+-----------------+
-- Returns the first occurrence of non `NULL` value.
SELECT coalesce(null, null, 3, null) AS expression_output;
+-----------------+
|expression_output|
+-----------------+
| 3|
+-----------------+
-- Returns `NULL` as all its operands are `NULL`.
SELECT coalesce(null, null, null, null) AS expression_output;
+-----------------+
|expression_output|
+-----------------+
| null|
+-----------------+
SELECT isnan(null) AS expression_output;
+-----------------+
|expression_output|
+-----------------+
| false|
+-----------------+
內建聚合表達式
聚合函數會透過處理一組輸入列來計算單一結果。以下是 NULL
值由聚合函數處理的方式規則。
- 所有聚合函數都會忽略
NULL
值,不予處理。- 此規則唯一的例外是 COUNT(*) 函數。
- 有些聚合函數會在所有輸入值都是
NULL
或輸入資料集為空時傳回NULL
。
這些函數的清單如下- MAX
- MIN
- SUM
- AVG
- EVERY
- ANY
- SOME
範例
-- `count(*)` does not skip `NULL` values.
SELECT count(*) FROM person;
+--------+
|count(1)|
+--------+
| 7|
+--------+
-- `NULL` values in column `age` are skipped from processing.
SELECT count(age) FROM person;
+----------+
|count(age)|
+----------+
| 5|
+----------+
-- `count(*)` on an empty input set returns 0. This is unlike the other
-- aggregate functions, such as `max`, which return `NULL`.
SELECT count(*) FROM person where 1 = 0;
+--------+
|count(1)|
+--------+
| 0|
+--------+
-- `NULL` values are excluded from computation of maximum value.
SELECT max(age) FROM person;
+--------+
|max(age)|
+--------+
| 50|
+--------+
-- `max` returns `NULL` on an empty input set.
SELECT max(age) FROM person where 1 = 0;
+--------+
|max(age)|
+--------+
| null|
+--------+
WHERE、HAVING 和 JOIN 子句中的條件表達式
WHERE
、HAVING
運算子會根據使用者指定的條件來篩選列。 JOIN
運算子用於根據聯結條件來結合兩個表格中的列。對於這三個運算子來說,條件表達式都是布林表達式,且可以傳回 True、False 或 Unknown (NULL)
。如果條件的結果為 True
,則表示它們「符合條件」。
範例
-- Persons whose age is unknown (`NULL`) are filtered out from the result set.
SELECT * FROM person WHERE age > 0;
+--------+---+
| name|age|
+--------+---+
|Michelle| 30|
| Fred| 50|
| Mike| 18|
| Dan| 50|
| Joe| 30|
+--------+---+
-- `IS NULL` expression is used in disjunction to select the persons
-- with unknown (`NULL`) records.
SELECT * FROM person WHERE age > 0 OR age IS NULL;
+--------+----+
| name| age|
+--------+----+
| Albert|null|
|Michelle| 30|
| Fred| 50|
| Mike| 18|
| Dan| 50|
| Marry|null|
| Joe| 30|
+--------+----+
-- Person with unknown(`NULL`) ages are skipped from processing.
SELECT age, count(*) FROM person GROUP BY age HAVING max(age) > 18;
+---+--------+
|age|count(1)|
+---+--------+
| 50| 2|
| 30| 2|
+---+--------+
-- A self join case with a join condition `p1.age = p2.age AND p1.name = p2.name`.
-- The persons with unknown age (`NULL`) are filtered out by the join operator.
SELECT * FROM person p1, person p2
WHERE p1.age = p2.age
AND p1.name = p2.name;
+--------+---+--------+---+
| name|age| name|age|
+--------+---+--------+---+
|Michelle| 30|Michelle| 30|
| Fred| 50| Fred| 50|
| Mike| 18| Mike| 18|
| Dan| 50| Dan| 50|
| Joe| 30| Joe| 30|
+--------+---+--------+---+
-- The age column from both legs of join are compared using null-safe equal which
-- is why the persons with unknown age (`NULL`) are qualified by the join.
SELECT * FROM person p1, person p2
WHERE p1.age <=> p2.age
AND p1.name = p2.name;
+--------+----+--------+----+
| name| age| name| age|
+--------+----+--------+----+
| Albert|null| Albert|null|
|Michelle| 30|Michelle| 30|
| Fred| 50| Fred| 50|
| Mike| 18| Mike| 18|
| Dan| 50| Dan| 50|
| Marry|null| Marry|null|
| Joe| 30| Joe| 30|
+--------+----+--------+----+
聚合運算子 (GROUP BY、DISTINCT)
如前一節 比較運算子 中所討論,兩個 NULL
值不相等。然而,為了分組和區別處理,兩個或更多具有 NULL 資料
的值會被分組到同一個區段中。此行為符合 SQL 標準和其他企業資料庫管理系統。
範例
-- `NULL` values are put in one bucket in `GROUP BY` processing.
SELECT age, count(*) FROM person GROUP BY age;
+----+--------+
| age|count(1)|
+----+--------+
|null| 2|
| 50| 2|
| 30| 2|
| 18| 1|
+----+--------+
-- All `NULL` ages are considered one distinct value in `DISTINCT` processing.
SELECT DISTINCT age FROM person;
+----+
| age|
+----+
|null|
| 50|
| 30|
| 18|
+----+
排序運算子 (ORDER BY 子句)
Spark SQL 支援在 ORDER BY
子句中指定 NULL 排序。Spark 透過將所有 NULL
值放在最前面或最後面來處理 ORDER BY
子句,具體取決於 NULL 排序規格。預設情況下,所有 NULL
值都會放在最前面。
範例
-- `NULL` values are shown at first and other values
-- are sorted in ascending way.
SELECT age, name FROM person ORDER BY age;
+----+--------+
| age| name|
+----+--------+
|null| Marry|
|null| Albert|
| 18| Mike|
| 30|Michelle|
| 30| Joe|
| 50| Fred|
| 50| Dan|
+----+--------+
-- Column values other than `NULL` are sorted in ascending
-- way and `NULL` values are shown at the last.
SELECT age, name FROM person ORDER BY age NULLS LAST;
+----+--------+
| age| name|
+----+--------+
| 18| Mike|
| 30|Michelle|
| 30| Joe|
| 50| Dan|
| 50| Fred|
|null| Marry|
|null| Albert|
+----+--------+
-- Columns other than `NULL` values are sorted in descending
-- and `NULL` values are shown at the last.
SELECT age, name FROM person ORDER BY age DESC NULLS LAST;
+----+--------+
| age| name|
+----+--------+
| 50| Fred|
| 50| Dan|
| 30|Michelle|
| 30| Joe|
| 18| Mike|
|null| Marry|
|null| Albert|
+----+--------+
設定運算子 (UNION、INTERSECT、EXCEPT)
在設定運算的背景下,NULL
值會以空值安全的方式進行相等性比較。這表示在比較列時,兩個 NULL
值會被視為相等,這與一般的 EqualTo
(=
) 運算子不同。
範例
CREATE VIEW unknown_age SELECT * FROM person WHERE age IS NULL;
-- Only common rows between two legs of `INTERSECT` are in the
-- result set. The comparison between columns of the row are done
-- in a null-safe manner.
SELECT name, age FROM person
INTERSECT
SELECT name, age from unknown_age;
+------+----+
| name| age|
+------+----+
|Albert|null|
| Marry|null|
+------+----+
-- `NULL` values from two legs of the `EXCEPT` are not in output.
-- This basically shows that the comparison happens in a null-safe manner.
SELECT age, name FROM person
EXCEPT
SELECT age FROM unknown_age;
+---+--------+
|age| name|
+---+--------+
| 30| Joe|
| 50| Fred|
| 30|Michelle|
| 18| Mike|
| 50| Dan|
+---+--------+
-- Performs `UNION` operation between two sets of data.
-- The comparison between columns of the row ae done in
-- null-safe manner.
SELECT name, age FROM person
UNION
SELECT name, age FROM unknown_age;
+--------+----+
| name| age|
+--------+----+
| Albert|null|
| Joe| 30|
|Michelle| 30|
| Marry|null|
| Fred| 50|
| Mike| 18|
| Dan| 50|
+--------+----+
EXISTS/NOT EXISTS 子查詢
在 Spark 中,WHERE 子句內允許使用 EXISTS 和 NOT EXISTS 函數。這些是布林函數,會傳回 TRUE
或 FALSE
。換句話說,EXISTS 是成員條件,當它所引用的子查詢傳回一或多列時,會傳回 TRUE
。類似地,NOT EXISTS 是非成員條件,當子查詢沒有傳回任何列或傳回零列時,會傳回 TRUE。
這兩個函數不受子查詢結果中是否存在 NULL 的影響。它們通常較快,因為它們可以在沒有特別考量空值感知的情況下轉換為半聯結/反半聯結。
範例
-- Even if subquery produces rows with `NULL` values, the `EXISTS` expression
-- evaluates to `TRUE` as the subquery produces 1 row.
SELECT * FROM person WHERE EXISTS (SELECT null);
+--------+----+
| name| age|
+--------+----+
| Albert|null|
|Michelle| 30|
| Fred| 50|
| Mike| 18|
| Dan| 50|
| Marry|null|
| Joe| 30|
+--------+----+
-- `NOT EXISTS` expression returns `FALSE`. It returns `TRUE` only when
-- subquery produces no rows. In this case, it returns 1 row.
SELECT * FROM person WHERE NOT EXISTS (SELECT null);
+----+---+
|name|age|
+----+---+
+----+---+
-- `NOT EXISTS` expression returns `TRUE`.
SELECT * FROM person WHERE NOT EXISTS (SELECT 1 WHERE 1 = 0);
+--------+----+
| name| age|
+--------+----+
| Albert|null|
|Michelle| 30|
| Fred| 50|
| Mike| 18|
| Dan| 50|
| Marry|null|
| Joe| 30|
+--------+----+
IN/NOT IN 子查詢
在 Spark 中,查詢的 WHERE 子句內允許使用 IN
和 NOT IN
函數。與 EXISTS
函數不同,IN
函數可以傳回 TRUE
、FALSE
或 UNKNOWN (NULL)
值。從概念上來說,IN
函數在語意上等同於一組相等條件,這些條件以析取運算子 (OR
) 分隔。例如,c1 IN (1, 2, 3) 在語意上等同於 (C1 = 1 OR c1 = 2 OR c1 = 3)
。
就處理 NULL
值而言,語意可以從比較運算子 (=
) 和邏輯運算子 (OR
) 中的 NULL
值處理中推論出來。總之,以下是計算 IN
函數結果的規則。
- 當在清單中找到非 NULL 值時,會傳回 TRUE
- 當在清單中找不到非 NULL 值,且清單不包含 NULL 值時,會傳回 FALSE
- 當值為
NULL
,或在清單中找不到非 NULL 值,且清單包含至少一個NULL
值時,會傳回 UNKNOWN
當清單包含 NULL
時,NOT IN 總是會傳回 UNKNOWN,與輸入值無關。這是因為如果值不在包含 NULL
的清單中,則 IN 會傳回 UNKNOWN,而 NOT UNKNOWN 又是 UNKNOWN。
範例
-- The subquery has only `NULL` value in its result set. Therefore,
-- the result of `IN` predicate is UNKNOWN.
SELECT * FROM person WHERE age IN (SELECT null);
+----+---+
|name|age|
+----+---+
+----+---+
-- The subquery has `NULL` value in the result set as well as a valid
-- value `50`. Rows with age = 50 are returned.
SELECT * FROM person
WHERE age IN (SELECT age FROM VALUES (50), (null) sub(age));
+----+---+
|name|age|
+----+---+
|Fred| 50|
| Dan| 50|
+----+---+
-- Since subquery has `NULL` value in the result set, the `NOT IN`
-- predicate would return UNKNOWN. Hence, no rows are
-- qualified for this query.
SELECT * FROM person
WHERE age NOT IN (SELECT age FROM VALUES (50), (null) sub(age));
+----+---+
|name|age|
+----+---+
+----+---+