更新时间:2024-10-16 GMT+08:00
分享

文档摘要

基于已有的知识库进行摘要总结,包括stuff、refine和map-reduce策略。

  • Stuff:将所有文档直接填充到prompt中,提给模型处理,适用于文档较少的场景。
    import com.huaweicloud.pangu.dev.sdk.api.llms.LLMs;
    import com.huaweicloud.pangu.dev.sdk.api.memory.bo.Document;
    import com.huaweicloud.pangu.dev.sdk.skill.DocSkill;
    import com.huaweicloud.pangu.dev.sdk.api.skill.Skills;
    import com.huaweicloud.pangu.dev.sdk.api.memory.vector.Vector;
    import com.huaweicloud.pangu.dev.sdk.api.memory.vector.Vectors;
    import com.huaweicloud.pangu.dev.sdk.api.memory.config.VectorStoreConfig;
    import com.huaweicloud.pangu.dev.sdk.api.embedings.Embeddings;
    
    import java.util.List;
    
    Vector cssVector = Vectors.of(Vectors.CSS,
                VectorStoreConfig.builder()
                    .embedding(Embeddings.of(Embeddings.CSS))
                    .indexName("test-stuff-document-062102")
                    .build());
    
    // 检索
    String query = "杜甫";
    List<Document> docs = cssVector.similaritySearch(query, 4, 105);
    
    // 摘要
    DocSkill docSkill = Skills.Document.newDocSummarizeStuffSkill(LLMs.of(LLMs.PANGU));
    
    System.out.println(docSkill.executeWithDocs(docs));
    
  • Refine:基于首个文档摘要,循环后续文档来迭代更新。
    import com.huaweicloud.pangu.dev.sdk.api.llms.LLMs;
    import com.huaweicloud.pangu.dev.sdk.api.memory.bo.Document;
    import com.huaweicloud.pangu.dev.sdk.skill.DocSkill;
    import com.huaweicloud.pangu.dev.sdk.api.skill.Skills;
    import com.huaweicloud.pangu.dev.sdk.api.memory.vector.Vector;
    import com.huaweicloud.pangu.dev.sdk.api.memory.vector.Vectors;
    import com.huaweicloud.pangu.dev.sdk.api.memory.config.VectorStoreConfig;
    import com.huaweicloud.pangu.dev.sdk.api.embedings.Embeddings;
    
    import java.util.List;
    
    Vector cssVector = Vectors.of(Vectors.CSS,
                VectorStoreConfig.builder()
                    .embedding(Embeddings.of(Embeddings.CSS))
                    .indexName("test-stuff-document-062102")
                    .build());
    
    // 检索
    String query = "杜甫";
    List<Document> docs = cssVector.similaritySearch(query, 4, 105);
    
    // 摘要
    DocSkill docSkill = Skills.Document.newDocSummarizeRefineSkill(LLMs.of(LLMs.PANGU));
    
    System.out.println(docSkill.executeWithDocs(docs));
    
  • Map-Reduce:先将文档单独进行摘要,再将摘要后的文档提交给模型。必要时,会循环迭代摘要。
    import com.huaweicloud.pangu.dev.sdk.api.llms.LLMs;
    import com.huaweicloud.pangu.dev.sdk.api.memory.bo.Document;
    import com.huaweicloud.pangu.dev.sdk.skill.DocSkill;
    import com.huaweicloud.pangu.dev.sdk.api.skill.Skills;
    import com.huaweicloud.pangu.dev.sdk.api.memory.vector.Vector;
    import com.huaweicloud.pangu.dev.sdk.api.memory.vector.Vectors;
    import com.huaweicloud.pangu.dev.sdk.api.memory.config.VectorStoreConfig;
    import com.huaweicloud.pangu.dev.sdk.api.embedings.Embeddings;
    
    import java.util.List;
    
    Vector cssVector = Vectors.of(Vectors.CSS,
                VectorStoreConfig.builder()
                    .embedding(Embeddings.of(Embeddings.CSS))
                    .indexName("test-stuff-document-062102")
                    .build());
    
    // 检索
    String query = "杜甫";
    List<Document> docs = cssVector.similaritySearch(query, 4, 105);
    
    // 摘要
    DocSkill docSkill = Skills.Document.newDocSummarizeMapReduceSkill(LLMs.of(LLMs.PANGU));
    
    System.out.println(docSkill.executeWithDocs(docs));

相关文档