Updated on 2024-11-29 GMT+08:00

Managing Knowledge Bases

You can manage knowledge bases on the LakeSearch web UI, including creating and deleting a knowledge base. DOC, DOCX, PDF, and JSON data can be uploaded to a knowledge base. Due to HBase storage restrictions, a single document to be uploaded cannot exceed 10 MB.

Prerequisites

You have created a LakeSearch user, for example, lakeuser, and added it to the lakesearchgroup. For details about user creation and permission management, see Creating a LakeSearch Role.

Creating a Knowledge Base and Uploading Documents

  1. Log in to the LakeSearch web UI as lakeuser. For details, see Accessing the LakeSearch Web UI.
  2. Create a knowledge base.

    1. Choose Knowledge Bases on the left, and click Create Knowledge Base.
    2. Enter a knowledge base name and description, and click Confirm.

  3. Upload documents to the knowledge base.

    1. Click the ID of the knowledge base created in 2.b. Set the following basic parameters of the knowledge base.
      Table 1 Basic knowledge base parameters

      Parameter

      Description

      Top K Recalls

      Top k of vector query. A larger value means that more vectors are recalled to improve precision and more resources are consumed.

      • Default value: 50
      • Value range: 10-300

      Reference Documents

      Number of reference documents transferred to the model for dialogs. Documents are sorted by its relevance to questions and answers.

      • Default value: 3
      • Value range: 1 to 10

      Refined Ranking

      Whether to use the refined ranking model to sort query results for second time.

      • This function is disabled by default.
      • Value range: off and on

      Custom Prompt

      Prompts are used to guide the model to generate expected results. Customized prompts are supported. You can click Configure to view the default prompt and set a new one.

    2. Upload data.
      • Upload documents. DOC, DOCX, and PDF formats are supported.

        In the Document Management tab, click Upload, and then Select Document. Select the document you want to upload, and click Confirm. When the Document Status changes to Normal, the upload is successful.

      • Create an FAQ (enter questions and answers).

        On the Q&A Management page, click Create, enter the standard question and answer, and click Confirm. The Q&A is used to construct the answer to the question and similar questions so that users can quickly find the desired answer.

      • Import FAQ data in batches in XLSX or XLS format.

        In the FAQs Import tab, click Upload, and then Select Document. Select the document you want to upload, and click Confirm. When the Document Status changes to Normal, the upload is successful.

      • Upload structured data. JSON documents using UTF-8 are supported.

        In the Structured Data tab, click Upload, and then Select Document. Select the document you want to upload, and click Confirm. When the Document Status changes to Normal, the upload is successful.

  4. Toggle on the switch on the right of Knowledge Base Status to set a knowledge base Enabled.

FAQ Batch Import Table

  • Only Excel (XLSX and XLS) files can be imported.
  • A maximum of 1,000 FAQs (that is, 1000 lines in an Excel file) can be imported.
  • You do not need to add a table header. You can directly enter answers and questions in the table.
  • The answer and question columns are mandatory. The similar question columns are optional.
Table 2 Supported data format

Answer (Mandatory)

Question (Mandatory)

Similar Question (Optional)

Similar Question (Optional)

Similar Question (Optional)

Similar Question (Optional)

Similar Question (Optional)

Answer: 1

Question 1

Similar question A1

Similar question B1

Similar question C1

Similar question D1

Similar question E1

Answer 2

Question 2

Similar question A2

Similar question B2

Similar question C2

Similar question D2

-

Answer 3

Question 3

Similar question A3

-

-

-

-

Answer 4

Question 4

-

-

-

-

-

Structured Data Format

JSON documents encoded in UTF-8 format are supported. The documents must meet the field requirements of StructureData.

Table 3 Structured data fields

Parameter

Mandatory

Description

id

Yes

ID of each data record, which can contain 4 to 64 characters.

content

  • Yes
  • If cmd is DELETE, leave this parameter blank.

Content of each data record, which can contain 1 to 1,000 characters.

cmd

Yes

Operation. The options are as follows:

  • ADD (default value): Add a data record.
  • UPDATE: Update a data record.
  • DELETE: Delete a data record.

title

No

Title, which can contain a maximum of 640 characters.

category

No

Data category, which can contain a maximum of 640 characters.

url

No

URL for uploading data, which can contain a maximum of 2,000 characters.

Format: "((http|https)://)(www.)?[a-zA-Z0-9@: %._\\+~#?&//=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@: %._\\+~#?&//=]*)"

docTime

No

Time when a document is uploaded

The time is in YYYY-MM-DD HH:MM:SS format.

tags

No

Tag of each data record

Format: ["tag1","tag2","tag3"]

Example for structured data
[
  {
    "cmd": "ADD",
    "id": "100001",
    "content": "content for the first data"
  },
  {
    "cmd": "ADD",
    "id": "100002",
    "title": "title for the second data",
    "content": "content for the second data",
    "url": "https://www.xxx.com/intl/zh-cn/",
    "docTime":"2015/01/01 12:10:30",
    "category":"category1",
    "tags":["tag1","tag2","tag3"]
  },
  {
    "cmd": "UPDATE",
    "id": "100002",
    "content":"The content for the second data is updated",
    "category":"newCategory"
  },
  {
    "cmd": "DELETE",
    "id": "100001"
  }
]