文档首页/ 数据仓库服务 DWS/ 开发指南/ 开发指南(9.1.1.x)/ DWS AI/ 库内推理

更新时间：2026-01-28 GMT+08:00

查看PDF

库内推理

随着人工智能技术的迅猛发展，AI在各行各业中正逐渐渗透，成为推动技术创新和业务发展的关键力量。从自然语言处理到计算机视觉，再到深度学习，AI正以前所未有的速度改变着我们的工作和生活方式。特别是在数据管理和处理领域，AI技术的应用使得信息处理变得更加智能、高效和精准。在数据库的智能化的趋势中，pgai插件作为一项面向数据库的AI增强解决方案，能够将先进AI能力集成至数据操作流程中，实现更智能、高效和精准的数据处理。

DWS作为一种高性能的数据仓库解决方案，一直致力于为用户提供更强大的数据处理能力。随着AI技术在数据分析和决策中的重要性日益凸显，DWS通过集成pgai插件，提供更加智能化的数据分析服务，实现数据即推理、即处理的全新操作模式。这不仅能够提升数据处理的速度和效率，还能够帮助企业在大数据分析中快速获取洞察，提升决策质量。

集成pgai插件后，DWS用户无需依赖外部AI平台，即可在数据库中直接进行LLM和Embedding模型调用，简化了复杂的RAG应用流程，也为用户带来了更高效、更灵活的数据分析体验。这一创新举措标志着DWS在智能数据处理方面迈出了重要的一步，也为未来的AI与数据库融合奠定了坚实的基础。

注意事项

该功能当前处于Beta测试阶段，如需使用，请联系技术支持。
如需使用库内推理功能，请联系技术支持添加feature_support_options参数取值为enable_pgai_extension，仅9.1.1.200及以上集群版本支持。
由于受到tiktoken库的限制，ai.openai_tokenize和ai.openai_detokenize两个函数必须依赖OpenAI大模型服务才能正常使用。
其他函数均可通过设置base-url和api-key直接使用。
推荐购买支持OpenAI大模型的华为云MaaS服务。

库内推理的Function列表

通过执行“CREATE EXTENSION ai;”来创建pgai扩展（另外由于该扩展依赖于pgvector，因此需要优先创建pgvector扩展），当前DWS支持的库内推理的Function列表如下：

表1 库内推理的Function列表
Function名称	描述	备注
ai.set_func_model	设置Function默认使用的模型名称。	-
ai.dws_pgai_encrypt_info	设置并且加密所有Function使用的baseurl和apikey。	-
ai.openai_tokenize	将文本转换为token。	仅支持openai提供的gpt系列模型服务。
ai.openai_detokenize	将token转换为文本。	仅支持openai提供的gpt系列模型服务。
ai.openai_list_models	显示可用模型列表。	部分大模型服务可能不支持该功能。
ai.openai_list_models_with_raw_response	显示可用模型列表，以json格式返回。	部分大模型服务可能不支持该功能。
ai.openai_embed	将文本转换成向量。	使用Embedding模型。
ai.openai_embed_with_raw_response	将文本转换成向量，以json格式返回。	使用Embedding模型。
ai.openai_chat_complete	与LLM交互。	使用LLM。
ai.openai_chat_complete_with_raw_response	与LLM交互，以json格式返回。	使用LLM。
ai.openai_moderate	将文本分类为是否有害。	使用moderate模型。
ai.openai_moderate_with_raw_response	将文本分类为是否有害，以json格式返回。	使用moderate模型。
ai.chunk_text	支持单个分隔符的文本分块。	-
ai.chunk_text_recursively	支持多个分隔符递归的文本分块。	-
ai.similarity	计算两个输入文本的相似度。	使用Embedding模型。
ai.vector_cosine_similarity	计算两个向量的文本相似度。	-
ai.classify	根据输入标签对文本进行分类。	使用LLM。
ai.extract	从输入文本中提取关键字的内容。	使用LLM。
ai.mask	从输入文本中脱敏关键字的内容。	使用LLM。
ai.fix_grammar	修正输入文本的语法。	使用LLM。
ai.summarize	生成输入文本的摘要。	使用LLM。
ai.translate	将输入文本翻译成指定语言。	使用LLM。
ai.rank	根据主题对多个文本进行相关性打分。	使用LLM。
ai.sentiment	对输入文本进行情感分析。	使用LLM。
ai.textfilter	根据给定条件对输入文本进行过滤。	使用LLM。

ai.set_func_model

描述：设置Function默认使用的模型名称。未设置前，对应Function默认使用的模型名称为NULL，此时无法使用Function；设置后，调用Function时优先使用默认模型。

参数说明：

funcname：text，必填，Function的名称。比如：openai_embed、openai_chat_complete等需要使用LLM或者Embedding模型服务的函数。
modelname：text，必填，使用的模型的名称。根据模型服务提供的模型名称确定。

返回值说明：无返回值。

使用示例：

     
          SELECT ai.set_func_model('openai_embed', 'nomic-embed-text');

设置后可以通过以下语句查看当前function默认使用模型名称的情况。

SELECT * from ai.ai_model_info;

查询结果如下：

               func_name                |    model_name    
------------------------------------------------------------
 openai_tokenize                        | 
 openai_detokenize                      | 
 openai_embed_with_raw_response         | 
 openai_chat_complete                   | 
 openai_chat_complete_with_raw_response | 
 openai_moderate                        | 
 openai_moderate_with_raw_response      | 
 similarity                             | 
 classify                               | 
 extract                                | 
 mask                                   | 
 fix_grammar                            | 
 summarize                              | 
 translate                              | 
 rank                                   | 
 sentiment                              | 
 textfilter                             | 
 openai_embed                           | nomic-embed-text
(18 rows)

ai.dws_pgai_encrypt_info

描述：全局设置并且加密所有Function使用的模型服务的baseurl和apikey。

参数说明：

base_url：text，必填，使用的模型服务的baseurl。
api_key：text，必填，使用的模型服务的apikey。

返回值说明：无返回值。

使用示例：

SELECT ai.dws_pgai_encrypt_info('https://example.com', 'your_api_key');

ai.openai_tokenize

描述：对于给定的模型，将文本转换为token。只能通过提供openai提供的系列模型服务使用。

参数说明：

text_input：text，必填，输入文本内容。
model：text，选填，使用的模型名称。默认为ai.set_func_model设置的模型名称。

返回值说明：返回int[]类型的token。

使用示例：

     
          SELECT ai.openai_tokenize('have a test');

ai.openai_detokenize

描述：对于给定的模型，将token转换为文本。只能通过openai提供的系列模型服务使用。

参数说明：

tokens：int[]，必填，需要转为文本的token数组。
model：text，选填，使用的模型名称。默认为ai.set_func_model设置的模型名称。

返回值说明：返回text类型的文本。

使用示例：

     
          SELECT ai.openai_detokenize(array[15365, 23456, 29889, 11, 9906, 1917, 0]);

ai.openai_list_models

描述：显示模型服务平台支持的模型列表，部分模型平台可能不支持该功能。

无输入参数。

返回值说明：返回text[]类型的模型信息。

使用示例：

     
          SELECT ai.openai_list_models();
                                                                             openai_list_models  
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{"{'id': 'bge-large', 'created': datetime.datetime(2023, 2, 28, 18, 56, 42, tzinfo=datetime.timezone.utc), 'owned_by': 'openai'}","{'id': 'deepseek-r1-distill-qwen-1.5b', 'created': datetime.datetime(2023, 2, 28, 18, 56, 42, tzinfo=datetime.timezone.utc), 'owned_by': 'openai'}"}
(1 row)

ai.openai_list_models_with_raw_response

描述：显示模型服务平台支持的模型列表，返回原始响应数据。

无输入参数。

返回值说明：返回text类型的模型信息。

使用示例：

     
          SELECT ai.openai_list_models_with_raw_response();
                                                                     openai_list_models_with_raw_response
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{"data": [{"id": "deepseek-v3-0324", "object": "model", "created": 1677610602, "owned_by": "openai"},{"id": "nomic-embed-text", "object": "model", "created": 1677610602, "owned_by": "openai"}], "object": "list"}
(1 row)

ai.openai_embed

描述：将文本或者token(int[])转换成向量。

表2 参数说明：
参数名称	参数类型	是否必填	参数说明
input_text \| input_texts \| input_tokens	text \| text[] \| int[]	必填	支持三种类型的输入，根据输入类型自动适配。
model	text	选填	使用的模型名称，默认为ai.set_func_model设置的模型名称。

返回值说明：返回text | jsonb | text，与输入类型对应。

使用示例：

输入text类型：

     
          SELECT ai.openai_embed('have a test');
                           openai_embed
------------------------------------------------------------------------
[0.012326963, 0.015280011, -0.17099911,..., -0.005275759, -0.03978255] 
(1 row)

输入text[]类型：

     
          SELECT ai.openai_embed(array['have a test1', 'have a test2']);
                                                                                     openai_embed
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[{"index": 0, "embedding": [-.02023241, .01978845, -.1544127, ..., -.006075094, -.057604033]}","{"index": 1, "embedding": [-.0066108834, -.0045409035, -.15522258, ..., .0038487795, -.03174798]}]
(1 row)

ai.openai_embed_with_raw_response

描述：将文本或者token转换为向量。

表3 参数说明
参数名称	参数类型	是否必填	参数说明
input_text \| input_texts \| input_tokens	text \| text[] \| int[]	必填	支持三种类型的输入，根据输入类型自动适配。
model	text	选填	使用的模型名称，默认为ai.set_func_model设置的模型名称。

返回值说明：返回jsonb类型，未经处理的API服务响应。

使用示例：

输入text类型：

     
          SELECT ai.openai_embed_with_raw_response('have a test');
                                                                     openai_embed_with_raw_response
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 {"data": [{"index": 0, "object": "embedding", "embedding": [.012326963, .015280011, ..., -.005275759, -.03978255]}], "model": "nomic-embed-text:latest", "usage": {"total_tokens": 3, "prompt_tokens": 3, "completion_tokens": 0, "prompt_tokens_details": null, "completion_tokens_details": null}, "object": "list"}
(1 row)

输入text[]类型：

     
          SELECT ai.openai_embed_with_raw_response(array['have a test1', 'have a test2']);
                                                                     openai_embed_with_raw_response
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 {"data": [{"index": 0, "object": "embedding", "embedding": [-.02023241, .01978845, ..., -.006075094, -.057604033]}, {"index": 1, "object": "embedding", "embedding": [-.0066108834, -.0045409035, ..., .0038487795, -.03174798]}], "model": "nomic-embed-text:latest", "usage": {"total_tokens": 8, "prompt_tokens": 8, "completion_tokens": 0, "prompt_tokens_details": null, "completion_tokens_details": null}, "object": "list"}
(1 row)

ai.openai_chat_complete

描述：与LLM交互进行文本生成。

表4 参数说明
参数名称	参数类型	是否必填	参数说明
messages \| input_text	jsonb \| text	必填	支持两种类型的输入，与LLM交互。
model	text	选填	使用的模型名称，默认为ai.set_func_model设置的模型名称。

返回值说明：返回text类型的数据，直接返回LLM回答的所有内容。

使用示例：

输入jsonb类型：

     
      
        
        
          SELECT ai.openai_chat_complete(
'[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Do your know Huawei Cloud?"},
{"role": "assistant", "content": "Huawei Cloud is a cloud service provider."}
]'::jsonb
);
                                                                           openai_chat_complete
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Yes, **Huawei Cloud** is a significant player in the global cloud computing market, offering comprehensive cloud services powered by Huawei's expertise in ICT infrastructure. Here’s what I know about it:
 ### Core Offerings:
 1. **IaaS/PaaS/SaaS Solutions**:
    - Compute, storage, networking (e.g., Elastic Cloud Server, Object Storage Service).
    - Databases (GaussDB), AI/ML platforms (ModelArts), big data services, and container orchestration.
    - Industry-specific SaaS solutions (e.g., finance, healthcare, manufacturing).
 2. **Global Infrastructure**: 
    - Data centers in over 30 regions (including Asia-Pacific, Latin America, Africa, and Europe), with ~86 Availability Zones (AZs).
 3. **AI & Advanced Tech**:
    - **Pangu Models**: Large-scale AI models for industries like healthcare, mining, and meteorology.
    - AI-native development tools and platforms like **CodeArts Snap** (AI-assisted coding).
 4. **Hybrid & Multi-Cloud**:
    - Supports hybrid deployments (e.g., HCS Online/Stack) and integrations with third-party clouds.
 5. **Security & Compliance**: 
    - End-to-end security (hardware to application layers).
    - Compliance with global standards (GDPR, C5, etc.).
 6. **Industry Ecosystems**:
    - Partner programs (e.g., Huawei Cloud Startup Program) and industry-specific solutions (e.g., smart cities, autonomous driving). ### Key Strengths:    
    - **Hardware-Software Integration**: Optimization through in-house chips (e.g., Ascend AI processors) and hardware. 
    - **Edge Computing**: Kunpeng-based edge solutions for low-latency applications.
    - **Digital Sovereignty**: Tools for data localization and regulatory adherence.
 ### Recent Updates (2023–2024): 
    - **Pangu 3.0**: Enhanced multi-modal industrial AI models.
    - **MetaStudio**: Digital twin/metaverse development suite.
    - **Green Initiatives**: Focus on sustainable data centers.
 ### Need more details?
    Tell me what you’re curious about:
       - Specific services (e.g., AI tools, databases, pricing)? 
       - How it compares to AWS/Azure? 
       - Use
(1 row)

         

       

     
    

输入text类型：

     
      
        
        
          SELECT ai.openai_chat_complete('Do your know Huawei Cloud?');
                                                                           openai_chat_complete
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Yes, I'm familiar with **Huawei Cloud**, which is the cloud services division of **Huawei Technologies**. Here's a quick overview:
 ### Key Highlights:
 1. **Services Offered**:
    - **Compute**: Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) for scalable computing.
    - **Storage**: High-performance storage solutions for data centers and cloud environments.
    - **Networking**: Next-generation networking and security services.
    - **AI/Big Data**: Integration with AI models, big data analytics, and cloud-native applications.
    - **Enterprise Solutions**: Customized cloud solutions for businesses, including hybrid and multi-cloud architectures.
 2. **Key Features**:
    - **Global Reach**: Operates in over 170 countries, providing cloud services worldwide.
    - **Security & Compliance**: Certifications like ISO 27001, SOC 2, and compliance with GDPR and other standards.
    - **AI Integration**: Partnerships with AI companies like **Tengda** for AI-driven cloud solutions.
    - **5G & Edge Computing**: Supports 5G networks and edge computing for low-latency applications.
 3. **Use Cases**:
    - Enterprise scalability, AI-driven analytics, IoT, and hybrid cloud deployments.

 ### Why Choose Huawei Cloud?
 - **Reliability**: Proven infrastructure and disaster recovery capabilities.
 - **Flexibility**: Support for both public and private cloud environments.
 - **Innovation**: Integration with cutting-edge technologies like AI and 5G.

 If you're looking to deploy applications or scale infrastructure, Huawei Cloud offers a robust platform with strong enterprise support. Let me know if you need specific details! 
(1 row)

         

       

     
    

ai.openai_chat_complete_with_raw_response

描述：与LLM交互进行文本生成。

表5 参数说明
参数名称	参数类型	是否必填	参数说明
messages \| input	jsonb \| text	必填	支持两种类型的输入，与LLM交互。
model	text	选填	使用的模型名称，默认为ai.set_func_model设置的模型名称。

返回值说明：返回jsonb类型的数据，未经处理的api服务响应。

使用示例：

输入jsonb类型：

     
          SELECT ai.openai_chat_complete_with_raw_response(
'[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Do your know Huawei Cloud?"},
{"role": "assistant", "content": "Huawei Cloud is a cloud service provider."}
]'::jsonb
);

                                                                   openai_chat_complete_with_raw_response
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{"id": "chatcmpl-a9298cb4ef", "model": "hosted_vllm/deepseek-r1-671b", "usage": {"tpot": 68, "ttft": 228, "total_tokens": 698, "prompt_tokens": 27, "completion_tokens": 671}, "object": "chat.completion", "choices": [{"index": 0, "message": {"role": "assistant", "content": "\nYes, I'm familiar with **Huawei Cloud** (officially **HUAWEI CLOUD**). Here's a concise overview of its key aspects:  \n\n### Key Features & Offerings:\n1. **Global Infrastructure**:  \n   - 30+ regions and 84 availability zones worldwide.  \n   - Compliant with local regulations (e.g., GDPR in Europe).  \n\n2. **Hybrid & Multi-Cloud Solutions**:  \n   - **Huawei Cloud Stack**: On-premises extension for hybrid cloud.  \n   - Partnerships for interoperability (e.g., with AWS, Azure).  \n\n3. **AI-Centric Services**:  \n   - **Pangu Models**: Large AI models for industry-specific applications.  \n   - AI development platforms (**ModelArts**) and inferencing tools.  \n\n4. **Industry-Specific Solutions**:  \n   - Focus on **smart manufacturing**, **finance**, **healthcare**, and **smart cities**.  \n\n5. **Advanced Technologies**:  \n   - Cloud-native ecosystem (Kubernetes, serverless).  \n   - **Big Data**, **IoT**, and **5G integration**.  \n\n6. **Developer Ecosystem**:  \n   - Open-source contributions (e.g., openEuler OS, MindSpore AI framework).  \n   - Extensive SDKs/APIs and low-code tools.  \n\n### Competitive Edge:\n- **Cost-Effectiveness**: Competitive pricing, especially in Asia-Pacific markets.  \n- **Security/Compliance**: Emphasis on data sovereignty and end-to-end encryption.  \n- **5G Synergy**: Unique integration with Huawei’s 5G infrastructure.  \n\n### Market Position:\n- **#2 in China** (behind Alibaba Cloud), top 5 global IaaS market share.  \n- Rapid growth in emerging markets (Asia, Latin America, Africa).  \n\n### Controversies:\n- Geopolitical challenges (U.S. trade restrictions) impacting global expansion.  \n\nHuawei Cloud is a strategic choice for enterprises needing AI/5G-driven solutions, especially in regions with strong Huawei presence. Let me know if you’d like deep dives into specific services!", "reasoning_content": "\nOkay, the user asked if I know Huawei Cloud. They probably want a quick confirmation or basic info. \n\nLooking at the history, my previous reply was super short—just calling it a \"cloud service provider.\" That's accurate but kinda barebones. Maybe the user expected more detail, since Huawei Cloud is a major player. \n\nThe user didn't specify their knowledge level, so they could be a newbie checking options or a tech pro comparing providers. Since they didn't ask for specifics, I'll keep it broad but add layers: technical depth (AI, infrastructure) and practical angles (use cases, global reach). \n\nHmm, should I mention controversy? Like US-China tensions affecting Huawei? Probably not—unless the user asks. They seem neutral, so stick to facts. \n\nNoting they're Tier 1: that's a credibility signal for enterprise users. Adding revenue rank (#2 in China) too—it’s impressive but not overselling. \n\nBalancing conciseness with usefulness: bullet points in reply will help, but avoid jargon avalanches. They can always ask follow-ups.\n"}, "finish_reason": "stop", "provider_specific_fields": {}}], "created": 1763017779450, "message": ""}
(1 row)

输入text类型：

     
          SELECT ai.openai_chat_complete_with_raw_response('Do your know Huawei Cloud?');
                                                                     openai_chat_complete_with_raw_response
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{"id": "chatcmpl-cdff97c366", "model": "hosted_vllm/deepseek-r1-671b", "usage": {"tpot": 69, "ttft": 185, "total_tokens": 736, "prompt_tokens": 15, "completion_tokens": 721}, "object": "chat.completion", "choices": [{"index": 0, "message": {"role": "assistant", "content": "\nYes, I'm familiar with **Huawei Cloud**! Here's a comprehensive overview:  \n\n### **What is Huawei Cloud?**  \nHuawei Cloud is the global cloud service arm of Huawei Technologies, launched in 2011. It’s one of China’s largest and the world’s fastest-growing cloud providers (Top 5 globally by market share).  \n\n### **Key Strengths**  \n1. **Full-Stack Services**:  \n   - **IaaS**: Virtual machines, storage, networking.  \n   - **PaaS**: Databases (GaussDB), AI development (ModelArts), container services (CCI/CCE).  \n   - **SaaS**: Collaboration tools (WeLink), industry-specific solutions.  \n\n2. **AI & Big Data Leadership**:  \n   - **Pangu AI Models**: Large language models (LLMs) rivaling GPT-3.5, tailored for industries (e.g., finance, healthcare).  \n   - **ModelArts**: One-stop AI development platform supporting training/deployment.  \n\n3. **Global Infrastructure**:  \n   - 88 availability zones across 30 regions (Asia-Pacific, LATAM, Africa, Europe).  \n   - Compliant with GDPR, CCPA, and local regulations (e.g., China’s DSL).  \n\n4. **Hybrid & Multi-Cloud Solutions**:  \n   - **Huawei Cloud Stack**: On-premises private cloud integration.  \n   - Partnerships for seamless multi-cloud management.  \n\n5. **Industry-Specific Solutions**:  \n   - Smart cities, autonomous driving, manufacturing (Industrial IoT), and fintech (distributed databases).  \n\n### **Competitive Edge**  \n- **Security**: End-to-end encryption + \"Zero Trust\" architecture.  \n- **Cost Efficiency**: Aggressive pricing, especially in emerging markets.  \n- **5G + Cloud Synergy**: Edge computing optimized for 5G latency (e.g., Intelligent EdgeFabric).  \n- **Open Source**: Major contributor to Kubernetes, Spark, and EdgeGallery.  \n\n### **Market Position**  \n- Dominant in China (≈20% market share), expanding globally.  \n- Key clients include governments, telecoms (e.g., China Mobile), and enterprises (Ping An, SAIC Motor).  \n\n### **Free Tier & Trials**  \nHuawei Cloud offers free", "reasoning_content": "\nHmm, the user is asking about Huawei Cloud. They might be researching cloud service providers for personal or professional use, or perhaps evaluating options for a project. Since they kept it very general (\"Do you know\"), I should give a solid foundational overview.\n\nI recall Huawei Cloud is China's largest cloud provider with a global reach. Key highlights to cover: their \"Everything as a Service\" model, AI focus with Pangu models, infrastructure strength (they have hyper-scale regions), and industry solutions. The compliance point is crucial too - multinational users always care about data sovereignty. \n\nBetter structure this clearly: start with positioning, then major tech strengths (compute/storage/AI), deployment flexibility (hybrid/multi-cloud), and close with why businesses pick them. The free credits note could hook users exploring cost options. Should mention Pangu separately - it's becoming a major USP against Western clouds. \n\nUser seems technical given they named a specific cloud platform, but keeping explanations accessible. No signs of urgency in their tone, so a comprehensive (but not overwhelming) response fits. Might follow up with questions about their use case next.\n"}, "finish_reason": "timeout", "provider_specific_fields": {}}], "created": 1763017897477, "message": "Reach time limitation! Use stream request to avoid timeout."}
(1 row)

ai.openai_moderate

描述：判断文本内容是否存在有害性。

参数说明：

input_text：text，必填，输入用于判断的文本内容。
model：text，选填，使用的模型名称，默认为ai.set_func_model设置的模型名称。

返回值说明：返回jsonb类型的数据，给出判断结果。

使用示例：

     
          SELECT ai.openai_moderate('You're such a fool.');

ai.openai_moderate_with_raw_response

描述：判断文本内容是否存在有害性。

参数说明：

input_text：text，必填，输入用于判断的文本内容。
model：text，选填，使用的模型名称，默认为ai.set_func_model设置的模型名称。

返回值说明：返回jsonb类型的数据，未经处理的api服务响应。

使用示例：

SELECT ai.openai_moderate_with_raw_response('You are such a fool.');

ai.chunk_text

描述：将文本按照分隔符进行分块处理。

参数说明：

input：text，必填，输入用于分块的文本。
chunk_size：int，选填，指定每个文本块的最大长度，默认为NULL。
chunk_overlap：int，选填，相邻文本块之间的重叠字符数，默认为NULL。
separator：text，选填，自定义的分隔符，默认为NULL。
is_separator_regex：boolean，选填，判断separator（分隔符）是否为一个正则表达式，默认为false。

返回值说明：返回text[]类型的分块结果数据。

使用示例：

      
           SELECT ai.chunk_text('A data warehouse is a centralized data storage system specifically designed to collect, store, process, and analyze large volumes of data from multiple heterogeneous data sources. By integrating information from different operating systems, transaction systems, and external data sources, it enables enterprises to perform complex data analysis and reporting. Unlike traditional database systems, data warehouses emphasize historical data and query efficiency, typically supporting business intelligence (BI) and decision support systems (DSS).', 100, 10, '.');
                                                                                    chunk_text
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{"(0, 'A data warehouse is a centralized data storage system specifically designed to collect, store, process, and analyze large volumes of data from multiple heterogeneous data sources')","(1, 'By integrating information from different operating systems, transaction systems, and external data sources, it enables enterprises to perform complex data analysis and reporting')","(2, 'Unlike traditional database systems, data warehouses emphasize historical data and query efficiency, typically supporting business intelligence (BI) and decision support systems (DSS)')"}
(1 row)

ai.chunk_text_recursively

描述：将文本按照多个分隔符递归进行分块处理。

参数说明：

input：text，必填，输入用于分块的文本。
chunk_size：int，选填，指定每个文本块的最大长度，默认为NULL。
chunk_overlap：int，选填，相邻文本块之间的重叠字符数，默认为NULL。
separator：text[]，选填，自定义的分隔符，默认为NULL。
is_separator_regex：boolean，选填，判断separator（分隔符）是否为一个正则表达式，默认为false。

返回值说明：返回text[]类型的分块结果数据。

使用示例：

     
          SELECT ai.chunk_text_recursively('
A data warehouse is a centralized data storage system specifically designed to collect, store, process, and analyze large volumes of data from multiple heterogeneous data sources. By integrating information from different operating systems, transaction systems, and external data sources, it enables enterprises to perform complex data analysis and reporting. Unlike traditional database systems, data warehouses emphasize historical data and query efficiency, typically supporting business intelligence (BI) and decision support systems (DSS).
', 100, 10, array[',', '.']);
                                                                           chunk_text_recursively
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{"(0, 'A data warehouse is a centralized data storage system specifically designed to collect, store')","(1, ', store, process')","(2, ', and analyze large volumes of data from multiple heterogeneous data sources')","(3, '. By integrating information from different operating systems')","(4, ', transaction systems, and external data sources')","(5, ', it enables enterprises to perform complex data analysis and reporting')","(6, '. Unlike traditional database systems')","(7, ', data warehouses emphasize historical data and query efficiency')","(8, ', typically supporting business intelligence (BI) and decision support systems (DSS).')"}
(1 row)

ai.similarity

描述：计算两个输入文本的相似度。

参数说明：

input_text1：text，必填，用于计算相似度的文本1。
input_text2：text，必填，用于计算相似度的文本2。
model：text，选填，使用的模型名称，默认为ai.set_func_model设置的模型名称。

返回值说明：返回float类型的数据，给出相似度结果，范围在[-1, 1]之间，越接近1相似度越高。

使用示例：

     
          SELECT ai.similarity('I like apple.', 'I like pen.');
similarity
------------
 .5875276
(1 row)

ai.vector_cosine_similarity

描述：计算两个输入向量的相似度。

参数说明：

input_vec1：text，必填，用于计算相似度的向量1。
input_vec2：text，必填，用于计算相似度的向量2。

返回值说明：返回float类型的数据，给出相似度结果，范围在[-1, 1]之间，越接近1相似度越高。

使用示例：

     
          SELECT ai.vector_cosine_similarity('[1.0, 2.0, 3.0]', '[-2.0, 5.0, -3.0]');
vector_cosine_similarity
------------------------
-.043355495
(1 row)

ai.classify

描述：根据指定标签对输入文本进行分类。

参数说明：

input_text：text，必填，需要分类的文本。
category：text[]，必填，文本归类的标签集合。
model：text，选填，使用的模型名称，默认为ai.set_func_model设置的模型名称。

返回值说明：返回text类型的数据，内容为category中的一个标签。

使用示例：

     
          SELECT ai.classify('The football match between Team A and Team B was thrilling. The game ended with a 3-2 victory for Team A, and the players celebrated with their fans.', array['Sports', 'Technology', 'Culture']);
classify
---------
Sports
(1 row)

ai.extract

描述：从输入文本中提取特定标签的相关信息。

参数说明：

input_text：text，必填，需提取信息的文本。
extract_keywords：text[]，必填，需要从文本中提取的关键字信息。
model：text，选填，使用的模型名称，默认为ai.set_func_model设置的模型名称。

返回值说明：返回jsonb类型的数据，给出每个标签以及对应的标签信息。

使用示例：

     
          SELECT ai.extract('Name: John Smith, Gender: Male, Age: 30, Email: john.smith@example.com, Phone: +1 555-1234, Experience: Highly experienced professional', array['Name', 'Gender', 'Email']);
                                  extract
----------------------------------------------------------------------------
{"Name": "John Smith", "Email": "john.smith@example.com", "Gender": "Male"}
(1 row)

ai.mask

描述：从输入文本中将指定的关键信息内容脱敏。

参数说明：

input_text：text，必填，需要脱敏的输入文本。
mask_keywords：text[]，必填，需要从文本中脱敏的关键信息。
model：text，选填，使用的模型名称，默认为ai.set_func_model设置的模型名称。

返回值说明：返回text类型的数据，给出脱敏关键信息后的文本。

使用示例：

     
          SELECT ai.mask('Name: John Smith, Gender: Male, Age: 30, Email: john.smith@example.com, Phone: +1 555-1234, Experience: Highly experienced professional', array['Name', 'Gender', 'Email']);
                                                                mask
-------------------------------------------------------------------------------------------------------------------------------
Name: [#####], Gender: [#####], Age: 30, Email: [#####], Phone: +1 555-1234, Experience: Highly experienced professional
(1 row)

ai.fix_grammar

描述：修正输入文本的语法错误。

参数说明：

input_text：text，必填，需要修正语法的输入文本。
model：text，选填，使用的模型名称，默认为ai.set_func_model设置的模型名称。

返回值说明：返回text类型的数据，给出修正语法后的文本内容。

使用示例：

     
          SELECT ai.fix_grammar('She do not like him.');
      fix_grammar
-----------------------
She does not like him.
(1 row)

ai.summarize

描述：生成输入文本的摘要。

参数说明：

input_text：text，必填，需要生成文本摘要的输入文本。
model：text，选填，使用的模型名称，默认为ai.set_func_model设置的模型名称。

返回值说明：返回text类型的数据，给出文本摘要内容。

使用示例：

     
          SELECT ai.summarize('
Huawei Cloud GaussDB is a high-performance, distributed database service launched by Huawei, supporting multiple database engines including relational databases (such as MySQL and PostgreSQL) and non-relational databases. GaussDB delivers elasticity, high availability, and auto-scaling capabilities to meet enterprises'' demanding requirements for database performance, reliability, and flexibility.
As Huawei Cloud''s flagship database, GaussDB integrates an advanced distributed architecture that supports automatic fault recovery and data backup to ensure data security. Simultaneously, it optimizes query performance and reduces operational costs through intelligent scheduling and resource management technologies. GaussDB also excels in high concurrency and high throughput, making it ideal for large-scale data processing and high-load applications across industries such as finance, e-commerce, and manufacturing.
Furthermore, through deep integration with other Huawei Cloud services, GaussDB delivers comprehensive data processing solutions to empower enterprises in digital transformation and intelligent management. It not only fulfills fundamental database requirements but also supports AI and big data applications, further enhancing the value of enterprise data.
');
                                                                                      summarize
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Huawei Cloud GaussDB is a high-performance, distributed database service supporting relational and non-relational databases, offering elasticity, high availability, and auto-scaling to meet enterprise needs for performance, reliability, and flexibility. It integrates an advanced distributed architecture for automatic fault recovery, data backup, and optimized query performance with cost reduction. Its high concurrency and throughput make it suitable for large-scale data processing and high-load applications in industries like finance and e-commerce. Integrated with other Huawei Cloud services, it provides comprehensive data processing solutions, supporting AI and big data to drive digital transformation and intelligent management.
(1 row)

ai.translate

描述：将输入文本翻译成指定语言。

参数说明：

input_text：text，必填，输入需翻译的文本。
target_language: text，必填，翻译后的目标语言。
model：text，选填，使用的模型名称，默认为ai.set_func_model设置的模型名称。

返回值说明：返回text类型的数据，给出翻译后的文本。

使用示例：

     
          SELECT ai.translate('A database manages data objects', 'chinese');
  translate
--------------------------
数据库用于管理各类数据对象
(1 row)

ai.rank

描述：根据主题对多个输入文本进行相关性打分。

参数说明：

input_topic：text，必填，主题的内容。
input_texts: text[]，必填，多个文本内容。
model：text，选填，使用的模型名称，默认为ai.set_func_model设置的模型名称。

返回值说明：返回jsonb类型的数据，给出每个文本字段的相关性评分，取值为[0, 1]，值越大相关性越高。

使用示例：

     
          SELECT ai.rank('The Future Development of Artificial Intelligence.', 
array['The rapid advancement of artificial intelligence has already made a significant impact across multiple industries, including healthcare and finance.', 
'Machine learning and deep learning are core technologies of artificial intelligence, and an increasing number of companies are beginning to apply these technologies.', 
'Forecast data for the global technology market in 2023 indicates that the cloud computing market is experiencing steady growth.']);
                                                                                     rank
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{"Forecast data for the global technology market in 2023 indicates that the cloud computing market is experiencing steady growth.": 3.0, "The rapid advancement of artificial intelligence has already made a significant impact across multiple industries, including healthcare and finance.": 8.0, "Machine learning and deep learning are core technologies of artificial intelligence, and an increasing number of companies are beginning to apply these technologies.": 7.0}
(1 row)

ai.sentiment

描述：对输入文本进行情感分析。判断文本的内容是positive（正面），negative（负面），neutral（中性）还是mixed（混合）。

参数说明：

input_text：text，必填，需要分析的文本内容。
model：text，选填，使用的模型名称，默认为ai.set_func_model设置的模型名称。

返回值说明：返回text类型的数据，给出文本内容的情感判断结果。

使用示例：

     
          SELECT ai.sentiment('You are such a kind person.');
sentiment
-----------
positive
(1 row)

ai.textfilter

描述：根据给定条件对输入文本进行过滤。

参数说明：

input_text：text，必填，需过滤的文本内容。
filter_condition：text，必填，过滤条件。
model：text，选填，使用的模型名称，默认为ai.set_func_model设置的模型名称。

返回值说明：返回boolean类型的数据，给出文本是否符合过滤条件的判断。

使用示例：

     
          SELECT ai.textfilter('The food at this restaurant is delicious!', 'Is this a negative review?');
textfilter
-----------
f
(1 row)

RAG使用案例

RAG使用案例场景：根据用户对披萨的评价生成一份有关披萨的用户评价报告。

前提条件：需要pgvector插件支持。

创建示例表并插入数据。

       
            CREATE TABLE public.pizza_reviews  (
id bigserial NOT NULL,
product text NOT NULL,
customer_message text NULL,
text_length INTEGER,
CONSTRAINT pizza_reviews_pkey PRIMARY KEY (id)
);

INSERT INTO public.pizza_reviews values
(1, 'pizza','The best pizza I''ve ever eaten. The sauce was so tangy!'),
(2, 'pizza','The pizza was disgusting. I think the pepperoni was made from rats.'),
(3, 'pizza','I ordered a hot-dog and was given a pizza, but I ate it anyway.'),
(4, 'pizza','I hate pineapple on pizza. It is a disgrace. Somehow, it worked well on this izza though.'),
(5, 'pizza','I ate 11 slices and threw up. The pizza was tasty in both directions.');

创建存储中间结果（向量）的表pizza_reviews_embeddings和记录最终输出结果的表ai_report。

       
            CREATE TABLE public.pizza_reviews_embeddings (
id bigserial NOT NULL,
text_id text NOT NULL,
text_content text NOT NULL, -- it is same as pizza_reviews
model_name text NOT NULL,
ntoken int4 NULL,
nlength int4 NULL,
embedding vector NOT NULL,
CONSTRAINT pizza_reviews_embeddings_pkey PRIMARY KEY (id)
);

CREATE TABLE public.ai_report (
send_message text NULL,
chat_completion jsonb NULL,
final_report text NULL,
create_time timestamptz NULL
);

向量化数据并插入表pizza_reviews_embeddings中。

       
            WITH tmp AS (
SELECT
tt.id, tt.customer_message,
'text-embedding-3-small'::text as model_name,
ai.openai_embed(customer_message)::vector as embedding
FROM
pizza_reviews  as tt
)
INSERT INTO pizza_reviews_embeddings
(text_id, text_content, model_name, embedding )
SELECT
id, customer_message, model_name, embedding
FROM
tmp;

向量化问题并查找相似度最高的3条数据。

       
        
          
          
            WITH
business_question AS (
SELECT question
FROM
(values
('why customer do not like our pizza?')
)as t(question)
)
, embedding_question AS (
SELECT
question, ai.openai_embed(question)::vector as embedding
FROM
business_question
)
SELECT
eqt.question,
emt.text_content ,
emt.embedding <-> eqt.embedding as similarity
FROM pizza_reviews_embeddings emt  CROSS JOIN embedding_question eqt
ORDER BY emt.embedding <-> eqt.embedding
LIMIT 3;

           

         

       
      

生成调研报告。

       
        
          
          
            WITH embedding_question AS (
SELECT
'why customer dont like our pizza'::text as question, ai.openai_embed('why customer dont like our pizza')::vector AS embedding
),
reasons AS (
SELECT
eqt.question,
emt.text_content ,
emt.embedding <-> eqt.embedding as similarity
FROM pizza_reviews_embeddings emt  CROSS JOIN embedding_question eqt
ORDER BY
emt.embedding <-> eqt.embedding
LIMIT 5
)
,agg_resons AS (
SELECT
question, jsonb_pretty(jsonb_agg(text_content)) AS reasons
FROM reasons
GROUP BY question
)
,report_needs AS (
SELECT
chr(10)||'// 1. requirements:
// 1.1 generate a business report to answer user question with provided data.
// 1.2 The report should be markdown format and less than 300 words' || chr(10) AS report_needs,
chr(10)||'// 2. data' || chr(10) AS  data_needs,
chr(10)||'// 3. user question'|| chr(10) AS user_question
)
,report AS (
SELECT
report_needs || data_needs ||  reasons  ||user_question || question AS send_message,
ai.openai_chat_complete(
jsonb_build_array(
jsonb_build_object(
'role', 'user', 'content',
report_needs || data_needs ||  reasons  ||user_question || question)
)) AS chat_completion
FROM
agg_resons CROSS JOIN report_needs
)
INSERT INTO ai_report
(send_message, chat_completion, final_report, create_time)
SELECT
send_message,
chat_completion,
now() AS create_time
FROM report;

           

         

       
      

父主题： DWS AI

上一篇：向量计算

下一篇：MCP Server

意见反馈

文档内容是否对您有帮助？

有帮助没帮助

提供反馈

提交成功！非常感谢您的反馈，我们会继续努力做到更好！您可在我的云声建议查看反馈及问题处理状态。

系统繁忙，请稍后重试

在使用文档中是否遇到以下问题

内容与产品页面不一致

内容不易理解

缺失示例代码

步骤不可操作

搜不到想要的内容

缺少最佳实践

意见反馈（选填）

0/500

请至少选择一项反馈信息并填写问题反馈

字符长度不能超过500

直接提交取消

如您有其它疑问，您也可以通过华为云社区问答频道来与我们联系探讨

盘古Doer提问云社区提问

库内推理

注意事项

库内推理的Function列表

ai.set_func_model

ai.dws_pgai_encrypt_info

ai.openai_tokenize

ai.openai_detokenize

ai.openai_list_models

ai.openai_list_models_with_raw_response

ai.openai_embed

ai.openai_embed_with_raw_response

ai.openai_chat_complete

ai.openai_chat_complete_with_raw_response

ai.openai_moderate

ai.openai_moderate_with_raw_response

ai.chunk_text

ai.chunk_text_recursively

ai.similarity

ai.vector_cosine_similarity

ai.classify

ai.extract

ai.mask

ai.fix_grammar

ai.summarize

ai.translate

ai.rank

ai.sentiment

ai.textfilter

RAG使用案例

相关文档

意见反馈

文档内容是否对您有帮助？