本文档说明如何使用 Training Data API 进行手工分步执行训练数据生成流程。与 5.1 文档不同,本文档专注于四个步骤的独立执行和控制。
方法 | 端点 | 描述 |
---|---|---|
POST |
/api/v0/data_pipeline/tasks/{task_id}/execute |
执行单个步骤(分步模式) |
方法 | 端点 | 描述 |
---|---|---|
GET |
/api/v0/data_pipeline/tasks/{task_id} |
获取任务和步骤状态 |
GET |
/api/v0/data_pipeline/tasks/{task_id}/logs |
获取任务执行日志 |
GET |
/api/v0/data_pipeline/tasks/{task_id}/files |
查看步骤生成文件 |
GET |
/api/v0/data_pipeline/tasks/{task_id}/files/{file_name} |
下载步骤生成文件 |
方法 | 端点 | 描述 |
---|---|---|
POST |
/api/v0/data_pipeline/tasks/{task_id}/files |
上传修改后的文件 |
注意:以下API的详细用法请参考5.1.Training数据集自动生成和加载过程API
方法 | 端点 | 描述 |
---|---|---|
POST |
/api/v0/data_pipeline/tasks |
创建训练任务 |
POST |
/api/v0/database/tables |
查询业务数据库表名列表 |
POST |
/api/v0/data_pipeline/tasks/{task_id}/table-list |
在线提交表名列表 |
POST |
/api/v0/data_pipeline/tasks/{task_id}/upload-table-list |
上传表清单文件 |
在开始分步执行前,需要完成以下准备工作:
这部分内容可以参考 5.1.Training 数据集自动生成和加载过程的API
API: POST /api/v0/data_pipeline/tasks/{task_id}/execute
请求参数:
{
"execution_mode": "step",
"step_name": "{step_name}"
}
参数说明:
execution_mode
(string, 必需): 执行模式,分步执行时必须为 "step"
step_name
(string, 必需): 步骤名称,支持以下四个值:
"ddl_generation"
- DDL/MD文档生成"qa_generation"
- Question-SQL对生成"sql_validation"
- SQL验证和修复"training_load"
- 训练数据加载预期返回结果:
{
"code": 200,
"data": {
"execution_mode": "step",
"message": "任务正在后台执行,请通过状态接口查询进度",
"response": "任务执行已启动",
"step_name": "ddl_generation",
"task_id": "task_20250703_000820"
},
"message": "操作成功",
"success": true
}
ddl_generation
)POST /api/v0/data_pipeline/tasks/{task_id}/execute
{
"execution_mode": "step",
"step_name": "ddl_generation"
}
参数实例:
POST: http://localhost:8084/api/v0/data_pipeline/tasks/task_20250703_000820/execute
{
"execution_mode": "step",
"step_name": "ddl_generation"
}
预期返回结果:
{
"code": 200,
"data": {
"execution_mode": "step",
"message": "任务正在后台执行,请通过状态接口查询进度",
"response": "任务执行已启动",
"step_name": "ddl_generation",
"task_id": "task_20250703_000820"
},
"message": "操作成功",
"success": true
}
{table_name}.ddl
- 带中文注释的建表语句{table_name}_detail.md
- 详细的表结构说明文档metadata.txt
- 表结构元数据摘要qa_generation
提供基础数据qa_generation
)POST /api/v0/data_pipeline/tasks/{task_id}/execute
{
"execution_mode": "step",
"step_name": "qa_generation"
}
POST http://localhost:8084/api/v0/data_pipeline/tasks/task_20250703_000820/execute
{
"execution_mode": "step",
"step_name": "qa_generation"
}
预期返回结果
{
"code": 200,
"data": {
"execution_mode": "step",
"message": "任务正在后台执行,请通过状态接口查询进度",
"response": "任务执行已启动",
"step_name": "qa_generation",
"task_id": "task_20250703_000820"
},
"message": "操作成功",
"success": true
}
qs_{db_name}_{timestamp}_pair.json
- 问答对数据文件qs_{db_name}_{timestamp}_pair.json.backup
- 自动备份文件(如果有LLM自动纠正的动作)metadata_detail.md
- 业务主题分析详情前置条件: ddl_generation
步骤必须成功完成
后续步骤: 为 sql_validation
提供待验证的SQL
sql_validation
)POST /api/v0/data_pipeline/tasks/{task_id}/execute
{
"execution_mode": "step",
"step_name": "sql_validation"
}
执行示例:
POST http://localhost:8084/api/v0/data_pipeline/tasks/task_20250703_000820/execute
{
"execution_mode": "step",
"step_name": "sql_validation"
}
{
"code": 200,
"data": {
"execution_mode": "step",
"message": "任务正在后台执行,请通过状态接口查询进度",
"response": "任务执行已启动",
"step_name": "sql_validation",
"task_id": "task_20250703_000820"
},
"message": "操作成功",
"success": true
}
sql_validation_{timestamp}_summary.log
- 验证结果摘要file_modifications_{timestamp}.log
- 文件修改记录(如果启用修改)注意:这些参数在任务创建时设置,分步执行时会使用创建时的配置,目前这些参数不暴露给前端UI,均使用默认值。
enable_sql_validation
(boolean): 是否启用SQL验证enable_llm_repair
(boolean): 是否启用LLM修复modify_original_file
(boolean): 是否修改原始JSON文件qa_generation
步骤必须成功完成training_load
提供验证过的训练数据training_load
)POST /api/v0/data_pipeline/tasks/{task_id}/execute
{
"execution_mode": "step",
"step_name": "training_load"
}
执行示例
POST http://localhost:8084/api/v0/data_pipeline/tasks/task_20250703_000820/execute
{
"execution_mode": "step",
"step_name": "training_load"
}
返回结果
{
"code": 200,
"data": {
"execution_mode": "step",
"message": "任务正在后台执行,请通过状态接口查询进度",
"response": "任务执行已启动",
"step_name": "training_load",
"task_id": "task_20250703_000820"
},
"message": "操作成功",
"success": true
}
*.ddl
)*_detail.md
)qs_*.json
)API: GET /api/v0/data_pipeline/tasks/{task_id}
关键字段说明:
{
"step_status": {
"ddl_generation": "completed",
"qa_generation": "running",
"sql_validation": "pending",
"training_load": "pending"
},
"current_step": {
"execution_id": "task_20250702_174000_step_qa_generation_exec_20250702_190410",
"step": "qa_generation",
"status": "running",
"started_at": "2025-07-02T19:04:09.933108"
}
}
状态值说明
pending
- 等待执行running
- 正在执行completed
- 执行完成failed
- 执行失败执行示例
GET http://localhost:8084/api/v0/data_pipeline/tasks/task_20250703_000820
开始执行:"step_status": { "ddl_generation": "running", ...
结束执行:"step_status": { "ddl_generation": "completed", ...
成功执行返回结果
{
"code": 200,
"data": {
"completed_at": null,
"created_at": "2025-07-03T00:08:20.129529",
"current_step": {
"execution_id": "task_20250703_000820_step_ddl_generation_exec_20250703_001027",
"started_at": "2025-07-03T00:10:27.281031",
"status": "running",
"step": "ddl_generation"
},
"error_message": null,
"parameters": {
"business_context": "高速公路服务区管理系统",
"db_connection": "postgresql://postgres:postgres@192.168.67.1:6432/highway_db",
"enable_llm_repair": true,
"enable_sql_validation": true,
"enable_training_data_load": true,
"file_upload_mode": true,
"modify_original_file": true,
"table_list_file": "{task_directory}/table_list.txt"
},
"response": "获取任务状态成功",
"result": null,
"started_at": "2025-07-03T00:10:27.273943",
"status": "in_progress",
"step_status": {
"ddl_generation": "running",
"qa_generation": "pending",
"sql_validation": "pending",
"training_load": "pending"
}, ... ...
执行失败返回结果
{
"code": 200,
"data": {
"completed_at": "2025-07-03T00:28:22.923340",
"created_at": "2025-07-03T00:08:20.129529",
"current_step": null,
"error_message": "文件验证失败: DDL文件数量(14)与表数量(7)不一致",
"parameters": {
"business_context": "高速公路服务区管理系统",
"db_connection": "postgresql://postgres:postgres@192.168.67.1:6432/highway_db",
"enable_llm_repair": true,
"enable_sql_validation": true,
"enable_training_data_load": true,
"file_upload_mode": true,
"modify_original_file": true,
"table_list_file": "{task_directory}/table_list.txt"
},
"response": "获取任务状态成功",
"result": null,
"started_at": "2025-07-03T00:10:27.273943",
"status": "failed",
"step_status": {
"ddl_generation": "completed",
"qa_generation": "failed",
"sql_validation": "pending",
"training_load": "pending"
},
"steps": [
{
"completed_at": "2025-07-03T00:27:35.026604",
"error_message": null,
"started_at": "2025-07-03T00:20:14.120309",
"step_name": "ddl_generation",
"step_status": "completed"
},
{
"completed_at": "2025-07-03T00:28:22.920372",
"error_message": "文件验证失败: DDL文件数量(14)与表数量(7)不一致",
"started_at": "2025-07-03T00:28:22.908887",
"step_name": "qa_generation",
"step_status": "failed"
},
{
"completed_at": null,
"error_message": null,
"started_at": null,
"step_name": "sql_validation",
"step_status": "pending"
},
{
"completed_at": null,
"error_message": null,
"started_at": null,
"step_name": "training_load",
"step_status": "pending"
}
],
"task_id": "task_20250703_000820",
"task_name": "服务区初始化数据加载",
"total_steps": 4
},
"message": "操作成功",
"success": true
}
API: GET /api/v0/data_pipeline/tasks/{task_id}/logs
支持按日志级别过滤和行数限制,详细用法参考自动化工作流指南 - 4.2 查看任务日志。
API: GET /api/v0/data_pipeline/tasks/{task_id}/files
查看当前任务目录下的所有文件,详细用法参考自动化工作流指南 - 4.3 查看和下载文件。
# 1. 执行DDL生成
curl -X POST http://localhost:8084/api/v0/data_pipeline/tasks/task_20250702_174000/execute \
-H "Content-Type: application/json" \
-d '{
"execution_mode": "step",
"step_name": "ddl_generation"
}'
# 2. 监控执行状态
curl -X GET http://localhost:8084/api/v0/data_pipeline/tasks/task_20250702_174000
# 3. 检查生成的DDL文件是否满意
curl -X GET http://localhost:8084/api/v0/data_pipeline/tasks/task_20250702_174000/files
# 4. 如果满意,继续执行Q&A生成
curl -X POST http://localhost:8084/api/v0/data_pipeline/tasks/task_20250702_174000/execute \
-H "Content-Type: application/json" \
-d '{
"execution_mode": "step",
"step_name": "qa_generation"
}'
# 5. 重复监控和检查过程...
# 如果某个步骤结果不满意,可以重新执行
# 注意:重新执行会覆盖该步骤的输出文件
curl -X POST http://localhost:8084/api/v0/data_pipeline/tasks/task_20250702_174000/execute \
-H "Content-Type: application/json" \
-d '{
"execution_mode": "step",
"step_name": "qa_generation"
}'
如果任务创建时禁用了某些步骤(如 enable_sql_validation: false
),系统会自动跳过这些步骤。
通过以下方式获取详细错误信息:
error_message
字段ERROR
级别的日志steps
数组中各步骤的 error_message
每个步骤重新执行时会自动清理该步骤的旧输出文件,但保留其他步骤的文件。
可以通过文件上传API手工替换生成的文件:
系统会自动创建关键文件的备份:
qs_*.json.backup
- 问答对数据备份