Updated on 2026-01-04 GMT+08:00

Feature Operators of Large Models

Feature operators of large models are encapsulated into an extension file stored in the DWS system. An extension can be created using the CREATE EXTENSION command. Large models can directly invoke the encapsulated Python UDFs. This function is supported only by clusters running version 9.1.1.200 and later.

Example: Create a simple fault prediction model and use the root mean square value as the feature value.

  1. Create an extension and load the feature operator bq_ops of the large model.

    CREATE EXTENSION bq_ops;

    After the creation is complete, Python UDFs are automatically loaded. The following table lists the functions of the feature operators of large models. All functions are Python UDFs, and double values indicating features of the current signal are returned for them.

    Table 1 Functions

    Signal Feature Type

    Function

    Operator Function

    Basic time domain feature indicators

    get_sharpness(signal double precision[], fs int)

    Obtains the sharpness, which indicates the number of medium- and high-frequency components in a sound signal and how they affect human ears. A larger value indicates a sharper sound.

    get_roughness(signal double precision[], fs int)

    Obtains the roughness, which describes the impact of the rapid fluctuation of a sound signal in time on human ears.

    get_mean_square(signal double precision[])

    Obtains the mean square, which indicates the average energy level of a vibration signal in the time domain.

    get_rms(signal double precision[])

    Obtains the root mean square (RMS), which indicates the energy or intensity of a vibration signal.

    get_var(signal double precision[])

    Obtains the variance, which indicates how much a signal's values deviate from its average value.

    get_pk_pk(signal double precision[])

    Obtains the peak-to-peak value, which indicates the difference between the maximum and minimum values of a vibration signal's waveform.

    get_shape_factor(signal double precision[])

    Obtains the shape factor, which is a dimensionless metric that measures the waveform flatness of vibration signals. It is calculated as the ratio of the root mean square value to the absolute average value. It represents a centralized characteristic of signal amplitude distribution, reflects a deviation degree between a waveform and an ideal sine wave, and is sensitive to an impact component and a non-highest sense in a signal.

    get_crest(signal double precision[])

    Obtains the peak factor, which indicates the ratio of a signal's peak value to RMS. It describes the impact feature or waveform sharpness of a signal.

    get_impulse(signal double precision[])

    Obtains the pulse, which indicates the ratio of a signal's peak value to average value. It is mainly used to evaluate the impact feature or instantaneous energy concentration of a signal.

    get_clearance(signal double precision[])

    Obtains the clearance, which is a dimensionless metric that measures the intensity of extreme shocks in vibration signals. It is calculated as the ratio of the peak value to the square root amplitude. It represents the prominence of the maximum peak value in the signal relative to the typical amplitude level, and is sensitive to the severe impact caused by severe local faults (such as severe peeling and fracture).

    get_skewness(signal double precision[])

    Obtains the skewness, which reflects the skew direction and degree of a signal's amplitude distribution relative to its average value.

    get_kurt(signal double precision[])

    Obtains the kurtosis, which reflects the sharpness of a signal's amplitude distribution relative to the normal distribution and the existence of abnormal values (such as impact or transient events) through the sharpness or tail features of a vibration signal's amplitude distribution.

    get_gini(signal double precision[])

    Obtains the Gini index, which indicates the uniformity or concentration of a signal's energy distribution. It is originally derived from income inequality measurement in economics and is widely used in the signal processing field to evaluate the energy distribution of signals.

    Frequency domain and acoustic feature indicators

    get_spec_ctrd(signal double precision[], fs int)

    Obtains the center of gravity frequency, which reflects the energy concentration position of a signal.

    get_spec_ms(signal double precision[], fs int)

    Obtains the mean square frequency, which is the weighted average of the square of a signal spectrum's frequency components and reflects the frequency distribution of signal energy.

    get_spec_rms(signal double precision[], fs int)

    Obtains the root mean square frequency, which is the square root of a signal spectrum's square sum and reflects the overall distribution of a signal spectrum.

    get_spec_var_ctrd(signal double precision[], fs int)

    Obtains the variance of signal spectrum distribution, which reflects the dispersion degree of signal frequency components.

    get_spec_std_ctrd(signal double precision[], fs int)

    Obtains the square root of the frequency variance, which describes the dispersion degree of the signal spectrum distribution.

    get_pse(signal double precision[], fs int)

    Obtains the spectral entropy, which is an important indicator indicating the complexity or disorder degree of a signal's frequency domain energy distribution. It is based on the concept of information entropy and describes the energy distribution of a signal in the frequency domain.

    get_env_rms(signal double precision[])

    Obtains the envelope spectrum RMS, which represents the envelope spectrum energy level of a signal and the overall energy intensity of modulation components (such as the fault characteristic frequency) in a signal.

    get_ehr(signal double precision[])

    Obtains the harmonic ratio, which is used to measure the strength of harmonic components in a signal and reflects the periodical or non-linear features of a signal.

    get_kurt_aver(signal double precision[])

    Obtains the average kurtosis of one or multiple signals, which is used to measure the changes in impact characteristics of a signal in a long term.

    get_mpf(signal double precision[], fs int)

    Obtains the rotational speed, which indicates the number of rotations per unit time. Generally, the rotational speed is expressed in revolutions per minute (RPM).

    Band-pass filtering segment indicators

    get_bpf_0_500_rms(signal double precision[], fs int)

    Obtains the RMS of a low-frequency signal. After 0-500 Hz low-pass filtering is performed on a signal, a valid value is calculated to describe energy or strength of the low-frequency signal.

    get_bpf_500_2000_rms(signal double precision[], fs int)

    Obtains the RMS of an intermediate-frequency signal. After 500-2000 Hz band-pass filtering is performed on the signal, a valid value is calculated to describe the energy or strength of the intermediate-frequency signal.

    get_bpf_2000_inf_rms(signal double precision[], fs int)

    Obtains the RMS of a high-frequency signal. After 2000-fs/2 Hz high-pass filtering is performed on a signal, a valid value is calculated to describe energy or strength of the high-frequency signal.

    get_bpf_0_500_kurt(signal double precision[], fs int)

    Obtains the kurtosis of a low-frequency signal. After 0-500 Hz low-pass filtering is performed on the signal, the cliff metric is calculated to describe the impact strength of the low-frequency signal.

    get_bpf_500_2000_kurt(signal double precision[], fs int)

    Obtains the kurtosis of an intermediate-frequency signal. After 500-2000 Hz band-pass filtering is performed on the signal, the cliff index is calculated to describe the impact strength of the intermediate-frequency signal.

    get_bpf_2000_inf_kurt(signal double precision[], fs int)

    Obtains the kurtosis of a high-frequency signal. After 2000-fs/2 Hz high-pass filtering is performed on a signal, the cliff metric is calculated to describe the impact strength of the high-frequency signal.

    get_bpf_0_500_ehr(signal double precision[], fs int)

    Obtains the harmonic ratio of a low-frequency signal. After 0-500 Hz low-pass filtering is performed on the signal, the harmonic-to-noise ratio is calculated to describe the strength ratio of the harmonic component to the noise component in the low-frequency signal.

    get_bpf_500_2000_ehr(signal double precision[], fs int)

    Obtains the harmonic ratio of an intermediate-frequency signal. After 500-2000 Hz band-pass filtering is performed on a signal, the harmonic-to-noise ratio is calculated to describe the ratio of the harmonic component to the noise component in the intermediate-frequency signal.

    get_bpf_2000_inf_ehr(signal double precision[], fs int)

    Obtains the harmonic ratio of a high-frequency signal. After 2000-fs/2 Hz high-pass filtering is performed on a signal, the harmonic-to-noise ratio is calculated to describe the strength ratio of the harmonic component to the noise component in the high-frequency signal.

  2. Create a table named bq_col_table. device_code indicates the current device ID, measuring_point_code indicates the measurement point code, date_time indicates the signal collection date, and high_array indicates the received signals.

    CREATE TABLE bq_col_table(device_code varchar, measuring_point_code text,date_time timestamp with time zone, high_array double precision[]) with (orientation=column, enable_hstore_opt=true);

    A 10-hour signal is collected every hour. Assume that the following data is imported to the database:

    INSERT INTO bq_col_table VALUES
    ('10098819','3a138131-344a-af96-9e9d-da049656d905','2024-07-13 17:59:59+08:00','{0.527995824813842,-0.62188184261322,-0.332374721765518,-0.139671847224235,-0.308928370475769,-0.165734529495239,0.137558653950691,-0.923967480659484,-0.398990541696548,0.620271801948547,0.366085141897201,-0.873452186584472,-1.00577819347381,-0.581831872463226,0.0675214752554893,0.789226412773132,-0.643114387989044,-0.779465496540069,0.913703441619873,1.33372521400451,-0.0830182060599327,0.621579945087432,1.48476803302764}');

  3. Call the get_rms() operator to calculate the RMS of the signal, which indicates the energy intensity of the sample signal. The result is as follows:

    SELECT device_code, measuring_point_code, date_time, get_rms(high_array) FROM bq_col_table;
     device_code |         measuring_point_code         |       date_time        |     get_rms
    -------------+--------------------------------------+------------------------+------------------
     10098819    | 3a138131-344a-af96-9e9d-da049656d905 | 2024-07-13 17:59:59+08 | .705324261533061
    (1 row)

  4. Repeat the preceding steps to collect signals of 10 days as samples and set a normal RMS range, for example, [0.5, 0.8].
  5. If any abnormal signal is imported:

    INSERT INTO bq_col_table VALUES
    ('10098819','3a138131-344a-af96-9e9d-da049656d905','2024-07-13 07:59:59+08:00','{0.544054210186004,-0.769003570079803,-1.79972970485687,0.659896433353424,1.65061652660369,-0.221043065190315,-1.83933162689208,-2.58152985572814,-0.627029538154602,2.1537218093872,2.14685225486755,-0.0429693721234798,-1.21243667602539,-1.02749335765838,-0.526543200016021,-0.0408141687512397,1.96406400203704,2.1080584526062,0.257277429103851,-1.36532151699066,-2.31293749809265,-0.803890943527221,1.13646578788757}');

    An abnormal RMS value 1.43 is contained in the database. You can then analyze the abnormal signal.

    SELECT device_code, measuring_point_code, date_time, get_rms(high_array) FROM bq_col_table;
     device_code |         measuring_point_code         |       date_time        |     get_rms
    -------------+--------------------------------------+------------------------+------------------
     10098819    | 3a138131-344a-af96-9e9d-da049656d905 | 2024-07-13 17:59:59+08 | .705324261533061
     ……
     10098819    | 3a138131-344a-af96-9e9d-da049656d905 | 2024-07-13 07:59:59+08 | 1.43480152874657

If the following operators are used, third-party libraries such as NumPy and SciPy are referenced, and UDF processes will occupy more memory: get_skewness, get_kurt, get_env_rms, get_ehr, get_kurt_aver, get_bpf_0_500_rms, get_bpf_500_2000_rms, get_bpf_2000_inf_rms, get_bpf_0_500_kurt, get_bpf_500_2000_kurt, get_bpf_2000_inf_kurt, get_bpf_0_500_ehr, get_bpf_500_2000_ehr, and get_bpf_2000_inf_ehr. Ensure that the value of udf_memory_limit is no less than 2 GB.