Help Center/ DataArts Insight/ User Guide/ Intelligent Analysis Assistant/ Evaluating an Intelligent Analysis Assistant
Updated on 2025-05-20 GMT+08:00

Evaluating an Intelligent Analysis Assistant

The system supports detailed analysis and evaluation of the intelligent analysis assistant's Q&A. The evaluation system covers a series of key indicators and is equipped with annotation functions to ensure accuracy and comprehensiveness. Additionally, DataArts Insight introduces a bad case management mechanism for inaccurate Q&A identified during evaluation. It can efficiently track and record Q&A marked as incorrect, facilitating subsequent problem resolution and improvement work.

Evaluating the intelligent analysis assistant can quickly and accurately assess the effectiveness of various modules, and track low-accuracy Q&A questions, supporting rapid optimization and iteration. This section describes how to effectively evaluate Q&A questions.

Notes and Constraints

  • To use this function, you must have the administrator permissions.

Prerequisites

Procedure

Figure 1 Process of evaluating an intelligent analysis assistant
Table 1 Process description

Step

Description

Step 1: Upload an Evaluation Set

The uploaded evaluation set contains questions for the intelligent analysis assistant. Subsequent evaluations will be based on these questions.

Step 2: Evaluate and Annotate

Evaluation annotation can quickly and accurately assess the end-to-end effects and module effectiveness of the current intelligent analysis assistant.

Step 3: (Optional) Compare Evaluation Results

Choose two intelligent analysis assistants for evaluation annotation under the same evaluation set, to compare Q&A accuracy rates and optimize assistant configuration accordingly.

Step 4: Manage Bad Cases

Bad cases can effectively track questions marked as incorrect during evaluation annotation, facilitating subsequent problem handling and improvement.

Step 1: Upload an Evaluation Set

  1. Log in to the DataArts Insight console.
  2. Click in the upper left corner of the management console to select a region. Then, select an enterprise project in the upper right corner.
  3. On the top menu of the console, click Project. On the displayed My Projects page, click the name of the desired project.
  4. In the navigation pane on the left, choose Q&A Management > Assessment Management.
  5. Click Upload Assessment Set. On the displayed page (Figure 2), enter the evaluation set name.

    Figure 2 Page for uploading evaluation sets

  6. Click Download Template, complete the evaluation set template, and click Select File to upload the evaluation file.
  7. Select an evaluation assistant and click OK.
    Table 2 Parameters for uploading an evaluation set

    Parameter

    Description

    Assessment Set Name

    Name the evaluation set. The name can contain up to 32 characters and must be specified. Only letters, numbers, hyphens (-), and underscores (_) are allowed.

    Uploading the Assessment File

    Subsequent evaluations will be based on the evaluation file. The parameter descriptions for the evaluation template are as follows:

    conversationId: Conversation ID, for example, the IDs of the first and second conversations with the intelligent analysis assistant are 1 and 2, respectively.

    seqId: IDs of different questions in one conversation, e.g., questions in the first conversation are numbered 1, 2, 3...

    question: Different questions in one conversation, e.g., what is the percentage growth in sales of products in different regions?

    Assessment Assistant

    Select the intelligent analysis assistant to be evaluated, with a maximum of two intelligent analysis assistants.

Step 2: Evaluate and Annotate

  1. In the navigation pane on the left, choose Q&A Management > Assessment Management.
  2. Click an evaluation set name. The intelligent analysis assistant associated with the evaluation set is displayed on the left of the page, that is, the assessment assistant selected in step 1. The top of the page shows the accuracy of the evaluation results, calculated as: accuracy = 1 – Errors/Total steps.
  3. In the dialogue list area, annotate each question in the evaluation set. For parameter descriptions, refer to Table 3.
  4. The left side of the dialogue list area shows different conversations corresponding to the conversationId in the evaluation set template. After completing the annotation of questions in a conversation, click Confirm to complete the evaluation annotation of that conversation (Figure 3).

    Questions marked as incorrect during evaluation annotation will be automatically updated to bad case management for subsequent tracking and improvement.

    Figure 3 Evaluation annotation
    Table 3 Evaluation annotation parameter descriptions

    Parameter

    Description

    Keyword Rewrite

    You can configure question and answer keywords in the assistant. After multi-turn rewriting, the system will check if the question contains any user-defined keywords. If so, the keywords in the question will be replaced with the configured replacement content.

    Grounded

    The assistant will search and retrieve relevant data tables, fields, and enumerations based on your query. This search step simplifies the input for the model compared to directly inputting all dataset schemas and enumeration information. It improves the effectiveness of the NL2SQL model and reduces inference latency.

    Prompt

    Based on prompt word templates and user instructions, as well as search results, the assistant dynamically generates corresponding prompt words for each user query.

    Semantics SQL

    The foundation model performs inference for the NL2SQL task based on prompt words, generating semantic SQL. Note that semantic SQL is generated based on the dataset schema and is not directly executable physical SQL.

    Semantic DQE

    Semantic SQL is transformed into data query expressions (DQEs). Compared to SQL, DQE is a more structured data structure and is a universal data query structure in DataArts Insight. During the transformation, semantic SQL is post-processed to validate and correct any illusions or errors, improving the accuracy of the entire data query.

    Physical SQL

    DQE is further transformed into physical SQL that can be executed by the target data source. This transformation includes mapping the dataset schema to the physical table schema, adapting to the target data source dialect, injecting default filtering conditions and access control conditions.

Step 3: (Optional) Compare Evaluation Results

Choose two intelligent analysis assistants for evaluation annotation under the same evaluation set, to compare Q&A accuracy rates and optimize assistant configuration accordingly.

  1. In the navigation pane on the left, choose Q&A Management > Assessment Management.
  2. Click Upload Assessment Set, select the two intelligent analysis assistants to be compared for evaluation, set parameters, and click Confirm.
  3. Click an evaluation set name. In the dialogue list area, annotate each question in the evaluation set. For parameter descriptions, refer to Table 3.
    Figure 4 Evaluation annotation

    Questions marked as incorrect during evaluation annotation will be automatically updated to bad case management for subsequent tracking and improvement.

Step 4: Manage Bad Cases

Bad cases can effectively track questions marked as incorrect during evaluation annotation, facilitating subsequent problem handling and improvement.

  1. In the navigation pane on the left, choose Q&A Management > BadCase Management. This page provides bad case query functions, supports filtering by question, stage, intelligent analysis assistant, owner, and other dimensions, and supports sorting by update time.
    Figure 5 Bad case management

    Bad case management displays questions marked as incorrect during evaluation annotation based on different stages of the parsing process (keyword rewriting, retrieval results, prompts, semantic SQL, DQE, SQL).

  2. Click View Details to enter the bad case details page, where you can modify the bad case status and the current owner. After modification, click OK to save.
    Figure 6 Bad case details

    Table 4 Bad case parameter descriptions

    Parameter

    Description

    Stage

    Stages of the intelligent analysis assistant parsing process, including keyword rewriting, retrieval results, prompts, semantic SQL, DQE, and SQL.

    Intelligent Analysis Assistants

    The intelligent analysis assistant associated with the bad case, that is, the intelligent analysis assistant associated during evaluation annotation.

    Owner

    Owner for handling bad cases.

    Annotated by

    User who annotated the bad case during evaluation.

    Update Date

    Time when the bad case was last updated.

    Status

    Created: Default status, indicating that the bad case has been recorded and initially identified.

    Modifying: The bad case is currently under review or being modified.

    CCB: The bad case has been submitted to the Change Control Board (CCB) for review, currently pending.

    Suspended: The bad case is temporarily shelved for some reason, awaiting further processing.

    Completed: The bad case has been processed, the issue has been resolved, or necessary measures have been taken.

    Non-problem: After further analysis, the originally marked bad case is not actually an issue.

    • The bad case status can only be updated sequentially to the next stage and cannot be modified to a previous stage. For example, the status cannot be changed from Completed to Suspended.