Evaluating an Intelligent Analysis Assistant

The system supports detailed analysis and evaluation of the intelligent analysis assistant's Q&A. The evaluation system covers a series of key indicators and is equipped with annotation functions to ensure accuracy and comprehensiveness. Additionally, DataArts Insight introduces a bad case management mechanism for inaccurate Q&A identified during evaluation. It can efficiently track and record Q&A marked as incorrect, facilitating subsequent problem resolution and improvement work.

Evaluating the intelligent analysis assistant can quickly and accurately assess the effectiveness of various modules, and track low-accuracy Q&A questions, supporting rapid optimization and iteration. This section describes how to effectively evaluate Q&A questions.

Notes and Constraints

To use this function, you must have the administrator permissions.

Prerequisites

A project has been created by referring to Creating a Project.

A data source has been connected by referring to Connecting to a Data Source.
A dataset has been created by referring to Creating a Dataset.
An intelligent analysis assistant has been created by referring to Creating an Intelligent Analysis Assistant.

Procedure

Figure 1 Process of evaluating an intelligent analysis assistant
Click to enlarge

**Table 1** Process description
Step	Description
Step 1: Upload an Evaluation Set	The uploaded evaluation set contains questions for the intelligent analysis assistant. Subsequent evaluations will be based on these questions.
Step 2: Evaluate and Annotate	Evaluation annotation can quickly and accurately assess the end-to-end effects and module effectiveness of the current intelligent analysis assistant.
Step 3: (Optional) Compare Evaluation Results	Choose two intelligent analysis assistants for evaluation annotation under the same evaluation set, to compare Q&A accuracy rates and optimize assistant configuration accordingly.
Step 4: Manage Bad Cases	Bad cases can effectively track questions marked as incorrect during evaluation annotation, facilitating subsequent problem handling and improvement.

Step 1: Upload an Evaluation Set

Log in to the DataArts Insight console.
Click in the upper left corner of the management console to select a region. Then, select an enterprise project in the upper right corner.
On the top menu of the console, click Project. On the displayed My Projects page, click the name of the desired project.
In the navigation pane on the left, choose Q&A Management > Assessment Management.
Click Upload Assessment Set. On the displayed page (Figure 2), enter the evaluation set name.

Figure 2 Page for uploading evaluation sets
Click Download Template, complete the evaluation set template, and click Select File to upload the evaluation file.

Select an evaluation assistant and click OK.

**Table 2** Parameters for uploading an evaluation set
Parameter	Description
Assessment Set Name	Name the evaluation set. The name can contain up to 32 characters and must be specified. Only letters, numbers, hyphens (-), and underscores (_) are allowed.
Uploading the Assessment File	Subsequent evaluations will be based on the evaluation file. The parameter descriptions for the evaluation template are as follows: conversationId: Conversation ID, for example, the IDs of the first and second conversations with the intelligent analysis assistant are 1 and 2, respectively. seqId: IDs of different questions in one conversation, e.g., questions in the first conversation are numbered 1, 2, 3... question: Different questions in one conversation, e.g., what is the percentage growth in sales of products in different regions?
Assessment Assistant	Select the intelligent analysis assistant to be evaluated, with a maximum of two intelligent analysis assistants.

Step 2: Evaluate and Annotate

In the navigation pane on the left, choose Q&A Management > Assessment Management.
Click an evaluation set name. The intelligent analysis assistant associated with the evaluation set is displayed on the left of the page, that is, the assessment assistant selected in step 1. The top of the page shows the accuracy of the evaluation results, calculated as: accuracy = 1 – Errors/Total steps.
In the dialogue list area, annotate each question in the evaluation set. For parameter descriptions, refer to Table 3.

The left side of the dialogue list area shows different conversations corresponding to the conversationId in the evaluation set template. After completing the annotation of questions in a conversation, click Confirm to complete the evaluation annotation of that conversation (Figure 3).

Questions marked as incorrect during evaluation annotation will be automatically updated to bad case management for subsequent tracking and improvement.

Figure 3 Evaluation annotation
Click to enlarge

**Table 3** Evaluation annotation parameter descriptions
Parameter	Description
Keyword Rewrite	You can configure question and answer keywords in the assistant. After multi-turn rewriting, the system will check if the question contains any user-defined keywords. If so, the keywords in the question will be replaced with the configured replacement content.
Grounded	The assistant will search and retrieve relevant data tables, fields, and enumerations based on your query. This search step simplifies the input for the model compared to directly inputting all dataset schemas and enumeration information. It improves the effectiveness of the NL2SQL model and reduces inference latency.
Prompt	Based on prompt word templates and user instructions, as well as search results, the assistant dynamically generates corresponding prompt words for each user query.
Semantics SQL	The foundation model performs inference for the NL2SQL task based on prompt words, generating semantic SQL. Note that semantic SQL is generated based on the dataset schema and is not directly executable physical SQL.
Semantic DQE	Semantic SQL is transformed into data query expressions (DQEs). Compared to SQL, DQE is a more structured data structure and is a universal data query structure in DataArts Insight. During the transformation, semantic SQL is post-processed to validate and correct any illusions or errors, improving the accuracy of the entire data query.
Physical SQL	DQE is further transformed into physical SQL that can be executed by the target data source. This transformation includes mapping the dataset schema to the physical table schema, adapting to the target data source dialect, injecting default filtering conditions and access control conditions.

Step 3: (Optional) Compare Evaluation Results

Choose two intelligent analysis assistants for evaluation annotation under the same evaluation set, to compare Q&A accuracy rates and optimize assistant configuration accordingly.

In the navigation pane on the left, choose Q&A Management > Assessment Management.
Click Upload Assessment Set, select the two intelligent analysis assistants to be compared for evaluation, set parameters, and click Confirm.
Click an evaluation set name. In the dialogue list area, annotate each question in the evaluation set. For parameter descriptions, refer to Table 3.
Figure 4 Evaluation annotation

Questions marked as incorrect during evaluation annotation will be automatically updated to bad case management for subsequent tracking and improvement.

Step 4: Manage Bad Cases

Bad cases can effectively track questions marked as incorrect during evaluation annotation, facilitating subsequent problem handling and improvement.

In the navigation pane on the left, choose Q&A Management > BadCase Management. This page provides bad case query functions, supports filtering by question, stage, intelligent analysis assistant, owner, and other dimensions, and supports sorting by update time.
Figure 5 Bad case management

Bad case management displays questions marked as incorrect during evaluation annotation based on different stages of the parsing process (keyword rewriting, retrieval results, prompts, semantic SQL, DQE, SQL).

Click View Details to enter the bad case details page, where you can modify the bad case status and the current owner. After modification, click OK to save.

Figure 6 Bad case details
Click to enlarge

**Table 4** Bad case parameter descriptions
Parameter	Description
Stage	Stages of the intelligent analysis assistant parsing process, including keyword rewriting, retrieval results, prompts, semantic SQL, DQE, and SQL.
Intelligent Analysis Assistants	The intelligent analysis assistant associated with the bad case, that is, the intelligent analysis assistant associated during evaluation annotation.
Owner	Owner for handling bad cases.
Annotated by	User who annotated the bad case during evaluation.
Update Date	Time when the bad case was last updated.
Status	Created: Default status, indicating that the bad case has been recorded and initially identified. Modifying: The bad case is currently under review or being modified. CCB: The bad case has been submitted to the Change Control Board (CCB) for review, currently pending. Suspended: The bad case is temporarily shelved for some reason, awaiting further processing. Completed: The bad case has been processed, the issue has been resolved, or necessary measures have been taken. Non-problem: After further analysis, the originally marked bad case is not actually an issue.

The bad case status can only be updated sequentially to the next stage and cannot be modified to a previous stage. For example, the status cannot be changed from Completed to Suspended.

Parent topic: Intelligent Analysis Assistant

Previous topic: Intelligent Building

Next topic: Sharing an Intelligent Analysis Assistant