更新时间:2024-08-29 GMT+08:00
分享

文档问答

基于已有的知识库进行回答。有stuff、refine和map-reduce策略。

  • Stuff:将所有文档直接填充到prompt中,提给模型回答,适合文档较少的场景。
    from pangukitsappdev.api.embeddings.factory import Embeddings
    from pangukitsappdev.api.llms.factory import LLMs
    from pangukitsappdev.api.memory.vector.factory import Vectors
    from pangukitsappdev.api.memory.vector.vector_config import VectorStoreConfig, ServerInfoCss
    from pangukitsappdev.skill.doc.ask import DocAskStuffSkill
    vector_store_config = VectorStoreConfig(store_name="css",
                                            index_name="your_index_name",
                                            embedding=Embeddings.of("css"),
                                            text_key="name",
                                            vector_fields=["description"],
                                            distance_strategy="inner_product",
                                            server_info=ServerInfoCss(env_prefix="sdk.memory.css"))
    vector_api = Vectors.of("css", vector_store_config)
    # 检索
    query = "杜甫的诗代表了什么主义诗歌艺术的高峰?"
    docs = vector_api.similarity_search(query, 4)
    
    # 问答
    doc_skill = DocAskStuffSkill(LLMs.of("pangu"))
    
    print(doc_skill.execute({"documents": docs, "question": query}))
    
  • Refine:基于首个文档,并循环后续文档来迭代更新答案。
    from pangukitsappdev.api.embeddings.factory import Embeddings
    from pangukitsappdev.api.llms.factory import LLMs
    from pangukitsappdev.api.memory.vector.factory import Vectors
    from pangukitsappdev.api.memory.vector.vector_config import VectorStoreConfig, ServerInfoCss
    from pangukitsappdev.skill.doc.ask import DocAskRefineSkill
    vector_store_config = VectorStoreConfig(store_name="css",
                                            index_name="your_index_name",
                                            embedding=Embeddings.of("css"),
                                            text_key="name",
                                            vector_fields=["description"],
                                            distance_strategy="inner_product",
                                            server_info=ServerInfoCss(env_prefix="sdk.memory.css"))
    vector_api = Vectors.of("css", vector_store_config)
    
    # 检索
    query = "杜甫的诗代表了什么主义诗歌艺术的高峰?"
    docs = vector_api.similarity_search(query, 4)
    
    # 问答
    doc_skill = DocAskRefineSkill(LLMs.of("pangu"))
    
    print(doc_skill.execute({"documents": docs, "question": query}))
  • Map-Reduce:先将文档单独进行摘要, 将摘要后的文档再提交给模型。 必要时循环迭代摘要。
    from pangukitsappdev.api.embeddings.factory import Embeddings
    from pangukitsappdev.api.llms.factory import LLMs
    from pangukitsappdev.api.memory.vector.factory import Vectors
    from pangukitsappdev.api.memory.vector.vector_config import VectorStoreConfig, ServerInfoCss
    from pangukitsappdev.skill.doc.ask import DocAskMapReduceSkill
    vector_store_config = VectorStoreConfig(store_name="css",
                                            index_name="your_index_name",
                                            embedding=Embeddings.of("css"),
                                            text_key="name",
                                            vector_fields=["description"],
                                            distance_strategy="inner_product",
                                            server_info=ServerInfoCss(env_prefix="sdk.memory.css"))
    vector_api = Vectors.of("css", vector_store_config)
    
    # 检索
    query = "杜甫的诗代表了什么主义诗歌艺术的高峰?"
    docs = vector_api.similarity_search(query, 4)
    
    # 问答
    doc_skill = DocAskMapReduceSkill(LLMs.of("pangu"))
    
    print(doc_skill.execute({"documents": docs, "question": query}))

相关文档