Updated on 2025-11-19 GMT+08:00

Preprocessing Data

This section describes the data preprocessing procedure before data is uploaded to the platform based on different scenarios.

You can obtain text data, code, and dialogs in the same format as the pre-trained data in the industry by referring to Obtaining Source Data. You need to convert the text data into the JSONL format. Each line of text is a JSON string, and each JSON string contains only the text field. The value of the text field indicates your text data.

The following is an example:

{"text": "Recently, the province has launched a standardized campaign to clean up coal-related fees. Today, the provincial environmental protection department issued a notice incorporating the clearance of such fees into the annual performance evaluation for environmental protection agencies at all levels. Any agency that fails to complete the task will be subject to a "one-vote veto," disqualifying it from annual excellence awards. Currently, the provincial environmental protection system is implementing the initiative in full, focusing particularly on the collection of coal-related sewage fees and environmental monitoring service fees. According to the notice, city- and county-level environmental protection departments are required to conduct intensive reviews on these two items, ensuring that sewage and monitoring service fees are collected in full compliance with regulations. All fee items without a legal basis must be eliminated immediately, and unauthorized or illegal charging practices must be corrected. Fees that exceed legal standards or actual costs must also be adjusted without delay. In cases where the charging process is non-compliant, relevant personnel are acting illegally, or required documents are incomplete, these issues must be standardized promptly. For problems such as over-scope or over-standard charging, environmental protection departments at all levels are mandated to improve their systems and mechanisms to prevent banned or abolished projects from reappearing, ensure that under-standard projects are properly handled, and that standardized projects are strictly implemented in accordance with policy. Efforts must be made to resolutely end over-standard and over-scope collections, unauthorized adjustments to collection methods, and any disguised attempts to increase the burden on enterprises. The ultimate goal is to ensure that all coal-related fee collection activities are conducted lawfully and in compliance with regulations. The provincial environmental protection department has now officially included the clearance of coal-related fees in the annual target responsibility evaluation for all levels of environmental protection agencies. Failure to complete the task will trigger the "one-vote veto" mechanism. (Reporter: Xue Lin; Correspondent: Li Jingping)"}