CREATE FUNCTION

說明

CREATE FUNCTION 陳述式用於在 Spark 中建立暫時或永久函數。暫時函數的範圍在一個工作階段層級，而永久函數則建立在持續型目錄中，並提供給所有工作階段使用。在 USING 子句中指定的資源會在首次執行時提供給所有執行器使用。除了 SQL 介面外，Spark 還允許使用者使用 Scala、Python 和 Java API 建立自訂使用者定義的純量和聚合函數。如需更多資訊，請參閱純量 UDF 和 UDAF。

語法

CREATE [ OR REPLACE ] [ TEMPORARY ] FUNCTION [ IF NOT EXISTS ]
    function_name AS class_name [ resource_locations ]

參數

OR REPLACE

如果指定，則函數的資源會重新載入。這主要用於擷取函數實作所做的任何變更。這個參數與 IF NOT EXISTS 互斥，且不能同時指定。
TEMPORARY

指出所建立函數的範圍。當指定 TEMPORARY 時，所建立的函數在目前工作階段中有效且可見。這類函數不會在目錄中建立持續性項目。
IF NOT EXISTS

如果指定，則只有在函數不存在時才會建立函數。如果指定的函數已存在於系統中，則函數建立會成功 (不會擲回錯誤)。這個參數與 OR REPLACE 互斥，且不能同時指定。
function_name

指定要建立的函數名稱。函數名稱可以選擇使用資料庫名稱限定。

語法： [ database_name. ] function_name
class_name

指定提供要建立函數實作的類別名稱。實作類別應延伸下列其中一個基礎類別
- 應延伸 UDF 或 UDAF 在 org.apache.hadoop.hive.ql.exec 套件中。
- 應延伸 AbstractGenericUDAFResolver、GenericUDF 或 GenericUDTF 在 org.apache.hadoop.hive.ql.udf.generic 套件中。
- 應延伸 UserDefinedAggregateFunction 在 org.apache.spark.sql.expressions 套件中。
resource_locations

指定包含函數實作及其相依項目的資源清單。

語法： USING { { (JAR | FILE | ARCHIVE) resource_uri } , ... }

範例

-- 1. Create a simple UDF `SimpleUdf` that increments the supplied integral value by 10.
--    import org.apache.hadoop.hive.ql.exec.UDF;
--    public class SimpleUdf extends UDF {
--      public int evaluate(int value) {
--        return value + 10;
--      }
--    }
-- 2. Compile and place it in a JAR file called `SimpleUdf.jar` in /tmp.

-- Create a table called `test` and insert two rows.
CREATE TABLE test(c1 INT);
INSERT INTO test VALUES (1), (2);

-- Create a permanent function called `simple_udf`. 
CREATE FUNCTION simple_udf AS 'SimpleUdf'
    USING JAR '/tmp/SimpleUdf.jar';

-- Verify that the function is in the registry.
SHOW USER FUNCTIONS;
+------------------+
|          function|
+------------------+
|default.simple_udf|
+------------------+

-- Invoke the function. Every selected value should be incremented by 10.
SELECT simple_udf(c1) AS function_return_value FROM test;
+---------------------+
|function_return_value|
+---------------------+
|                   11|
|                   12|
+---------------------+

-- Created a temporary function.
CREATE TEMPORARY FUNCTION simple_temp_udf AS 'SimpleUdf' 
    USING JAR '/tmp/SimpleUdf.jar';

-- Verify that the newly created temporary function is in the registry.
-- Please note that the temporary function does not have a qualified
-- database associated with it.
SHOW USER FUNCTIONS;
+------------------+
|          function|
+------------------+
|default.simple_udf|
|   simple_temp_udf|
+------------------+

-- 1. Modify `SimpleUdf`'s implementation to add supplied integral value by 20.
--    import org.apache.hadoop.hive.ql.exec.UDF;
  
--    public class SimpleUdfR extends UDF {
--      public int evaluate(int value) {
--        return value + 20;
--      }
--    }
-- 2. Compile and place it in a jar file called `SimpleUdfR.jar` in /tmp.

-- Replace the implementation of `simple_udf`
CREATE OR REPLACE FUNCTION simple_udf AS 'SimpleUdfR'
    USING JAR '/tmp/SimpleUdfR.jar';

-- Invoke the function. Every selected value should be incremented by 20.
SELECT simple_udf(c1) AS function_return_value FROM test;
+---------------------+
|function_return_value|
+---------------------+
|                   21|
|                   22|
+---------------------+

Spark SQL 指南

CREATE FUNCTION

說明

語法

參數

範例

相關陳述