Quellcode durchsuchen

发现数据库连接的bug,准备为data_pipeline模块修改单例模式。

wangxq vor 1 Woche
Ursprung
Commit
d6ffe2fac0
62 geänderte Dateien mit 5040 neuen und 72 gelöschten Zeilen
  1. 233 0
      api_usage_examples.md
  2. 307 3
      citu_app.py
  3. 34 7
      data_pipeline/api/simple_db_manager.py
  4. 157 1
      data_pipeline/api/simple_file_manager.py
  5. 315 0
      data_pipeline/api/table_inspector_api.py
  6. 1 1
      data_pipeline/config.py
  7. 3 0
      data_pipeline/sql/init_tables.sql
  8. 14 22
      data_pipeline/tools/base.py
  9. 20 3
      data_pipeline/tools/comment_generator.py
  10. 11 8
      data_pipeline/tools/data_sampler.py
  11. 43 27
      data_pipeline/tools/ddl_generator.py
  12. 7 0
      data_pipeline/training_data/task_20250702_144901/table_list.txt
  13. 31 0
      data_pipeline/training_data/task_20250702_174000/bss_business_day_data.ddl
  14. 31 0
      data_pipeline/training_data/task_20250702_174000/bss_business_day_data_detail.md
  15. 17 0
      data_pipeline/training_data/task_20250702_174000/bss_car_day_count.ddl
  16. 17 0
      data_pipeline/training_data/task_20250702_174000/bss_car_day_count_detail.md
  17. 15 0
      data_pipeline/training_data/task_20250702_174000/bss_company.ddl
  18. 15 0
      data_pipeline/training_data/task_20250702_174000/bss_company_detail.md
  19. 16 0
      data_pipeline/training_data/task_20250702_174000/bss_section_route.ddl
  20. 7 0
      data_pipeline/training_data/task_20250702_174000/bss_section_route_area_link.ddl
  21. 7 0
      data_pipeline/training_data/task_20250702_174000/bss_section_route_area_link_detail.md
  22. 16 0
      data_pipeline/training_data/task_20250702_174000/bss_section_route_detail.md
  23. 19 0
      data_pipeline/training_data/task_20250702_174000/bss_service_area.ddl
  24. 19 0
      data_pipeline/training_data/task_20250702_174000/bss_service_area_detail.md
  25. 18 0
      data_pipeline/training_data/task_20250702_174000/bss_service_area_mapper.ddl
  26. 18 0
      data_pipeline/training_data/task_20250702_174000/bss_service_area_mapper_detail.md
  27. 1 0
      data_pipeline/training_data/task_20250702_174000/db_query_decision_prompt.txt
  28. 10 0
      data_pipeline/training_data/task_20250702_174000/filename_mapping.txt
  29. 62 0
      data_pipeline/training_data/task_20250702_174000/metadata.txt
  30. 20 0
      data_pipeline/training_data/task_20250702_174000/metadata_detail.md
  31. 190 0
      data_pipeline/training_data/task_20250702_174000/qs_highway_db_20250702_191655_pair.json
  32. 202 0
      data_pipeline/training_data/task_20250702_174000/qs_highway_db_20250702_191655_pair.json.backup
  33. 11 0
      data_pipeline/training_data/task_20250702_174000/table_list.txt
  34. 15 0
      data_pipeline/training_data/task_20250702_174000/task_config.json
  35. 117 0
      data_pipeline/training_data/task_20250702_174000/task_result.json
  36. 31 0
      data_pipeline/training_data/task_20250702_194611/bss_business_day_data.ddl
  37. 31 0
      data_pipeline/training_data/task_20250702_194611/bss_business_day_data_detail.md
  38. 17 0
      data_pipeline/training_data/task_20250702_194611/bss_car_day_count.ddl
  39. 17 0
      data_pipeline/training_data/task_20250702_194611/bss_car_day_count_detail.md
  40. 15 0
      data_pipeline/training_data/task_20250702_194611/bss_company.ddl
  41. 15 0
      data_pipeline/training_data/task_20250702_194611/bss_company_detail.md
  42. 16 0
      data_pipeline/training_data/task_20250702_194611/bss_section_route.ddl
  43. 7 0
      data_pipeline/training_data/task_20250702_194611/bss_section_route_area_link.ddl
  44. 7 0
      data_pipeline/training_data/task_20250702_194611/bss_section_route_area_link_detail.md
  45. 16 0
      data_pipeline/training_data/task_20250702_194611/bss_section_route_detail.md
  46. 19 0
      data_pipeline/training_data/task_20250702_194611/bss_service_area.ddl
  47. 19 0
      data_pipeline/training_data/task_20250702_194611/bss_service_area_detail.md
  48. 18 0
      data_pipeline/training_data/task_20250702_194611/bss_service_area_mapper.ddl
  49. 18 0
      data_pipeline/training_data/task_20250702_194611/bss_service_area_mapper_detail.md
  50. 45 0
      data_pipeline/training_data/task_20250702_194611/db_query_decision_prompt.txt
  51. 10 0
      data_pipeline/training_data/task_20250702_194611/filename_mapping.txt
  52. 62 0
      data_pipeline/training_data/task_20250702_194611/metadata.txt
  53. 20 0
      data_pipeline/training_data/task_20250702_194611/metadata_detail.md
  54. 194 0
      data_pipeline/training_data/task_20250702_194611/qs_highway_db_20250702_200305_pair.json
  55. 202 0
      data_pipeline/training_data/task_20250702_194611/qs_highway_db_20250702_200305_pair.json.backup
  56. 11 0
      data_pipeline/training_data/task_20250702_194611/table_list.txt
  57. 15 0
      data_pipeline/training_data/task_20250702_194611/task_config.json
  58. 117 0
      data_pipeline/training_data/task_20250702_194611/task_result.json
  59. 844 0
      docs/data_pipeline_api_auto_workflow_guide.md
  60. 271 0
      docs/data_pipeline_api_workflow_guide.md
  61. 615 0
      docs/database_table_api_guide.md
  62. 369 0
      test_table_inspector_api.py

+ 233 - 0
api_usage_examples.md

@@ -0,0 +1,233 @@
+# 表检查API使用指南
+
+本文档介绍新开发的数据库表检查API的使用方法。
+
+## 📋 API概览
+
+### 1. 获取表列表
+- **路径**: `POST /api/v0/database/tables`
+- **功能**: 获取数据库中的表列表
+
+### 2. 获取表DDL/文档
+- **路径**: `POST /api/v0/database/table/ddl`
+- **功能**: 获取表的DDL语句或MD文档
+
+## 🔧 API 1: 获取表列表
+
+### 请求示例
+
+```bash
+curl -X POST http://localhost:8084/api/v0/database/tables \
+  -H "Content-Type: application/json" \
+  -d '{
+    "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db",
+    "schema": "public,ods"
+  }'
+```
+
+### 参数说明
+
+| 参数 | 类型 | 必需 | 说明 |
+|------|------|------|------|
+| db_connection | string | ✅ | 完整的PostgreSQL连接字符串 |
+| schema | string | ❌ | 查询的schema,支持多个用逗号分隔,默认为public |
+
+### 响应示例
+
+```json
+{
+  "success": true,
+  "code": 200,
+  "message": "获取表列表成功",
+  "data": {
+    "tables": [
+      "public.bss_company",
+      "public.bss_branch_copy",
+      "ods.raw_data"
+    ],
+    "total": 3,
+    "schemas": ["public", "ods"],
+    "db_connection_info": {
+      "database": "highway_db"
+    }
+  }
+}
+```
+
+## 📄 API 2: 获取表DDL/文档
+
+### 请求示例
+
+#### DDL格式
+```bash
+curl -X POST http://localhost:8084/api/v0/database/table/ddl \
+  -H "Content-Type: application/json" \
+  -d '{
+    "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db",
+    "table": "public.bss_company",
+    "business_context": "高速公路服务区管理系统",
+    "type": "ddl"
+  }'
+```
+
+#### MD文档格式
+```bash
+curl -X POST http://localhost:8084/api/v0/database/table/ddl \
+  -H "Content-Type: application/json" \
+  -d '{
+    "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db",
+    "table": "public.bss_company",
+    "business_context": "高速公路服务区管理系统",
+    "type": "md"
+  }'
+```
+
+#### 同时获取DDL和MD
+```bash
+curl -X POST http://localhost:8084/api/v0/database/table/ddl \
+  -H "Content-Type: application/json" \
+  -d '{
+    "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db",
+    "table": "public.bss_company",
+    "business_context": "高速公路服务区管理系统",
+    "type": "both"
+  }'
+```
+
+### 参数说明
+
+| 参数 | 类型 | 必需 | 说明 |
+|------|------|------|------|
+| db_connection | string | ✅ | 完整的PostgreSQL连接字符串 |
+| table | string | ✅ | 表名,格式为 schema.tablename |
+| business_context | string | ❌ | 业务上下文描述,用于LLM生成更准确的注释 |
+| type | string | ❌ | 输出类型:ddl/md/both,默认ddl |
+
+### 响应示例
+
+```json
+{
+  "success": true,
+  "code": 200,
+  "message": "获取表DDL成功",
+  "data": {
+    "ddl": "-- 中文名: 服务区档口基础信息表\n-- 描述: 服务区档口基础信息表...\ncreate table public.bss_company (\n  id varchar(32) not null     -- 主键ID,\n  ...\n);",
+    "md": "## bss_company(服务区档口基础信息表)\n...",
+    "table_info": {
+      "table_name": "bss_company",
+      "schema_name": "public",
+      "full_name": "public.bss_company",
+      "comment": "服务区档口基础信息表",
+      "field_count": 15,
+      "row_count": 1000,
+      "table_size": "256 kB"
+    },
+    "fields": [
+      {
+        "name": "id",
+        "type": "varchar",
+        "nullable": false,
+        "comment": "主键ID",
+        "is_primary_key": true,
+        "is_foreign_key": false,
+        "default_value": null,
+        "is_enum": false,
+        "enum_values": []
+      }
+    ],
+    "generation_info": {
+      "business_context": "高速公路服务区管理系统",
+      "output_type": "both",
+      "has_llm_comments": true,
+      "database": "highway_db"
+    }
+  }
+}
+```
+
+## 🚀 特性说明
+
+### 智能注释生成
+- 当提供`business_context`时,系统会调用LLM生成智能注释
+- LLM会结合表结构、样例数据和业务上下文生成准确的中文注释
+- 自动识别枚举字段并提供可能的取值
+
+### 多格式输出
+- **DDL**: 标准的CREATE TABLE语句,包含注释
+- **MD**: Markdown格式的表文档,适合文档系统
+- **Both**: 同时提供DDL和MD格式
+
+### 高性能设计
+- 复用现有的`data_pipeline`模块,90%+代码复用率
+- 异步处理,支持并发请求
+- 智能缓存,避免重复计算
+
+## 🧪 测试方法
+
+运行测试脚本:
+```bash
+python test_table_inspector_api.py
+```
+
+测试脚本包含:
+- 表列表API的各种参数组合测试
+- DDL/MD生成API的功能测试
+- 错误处理测试
+- 性能基准测试
+
+## ⚠️ 注意事项
+
+1. **连接字符串**: 必须包含完整的数据库信息
+2. **LLM调用**: 当提供`business_context`时会调用LLM,响应时间较长
+3. **权限要求**: 需要数据库的读取权限
+4. **超时设置**: DDL生成包含LLM调用,建议设置60秒以上超时
+
+## 🔗 集成示例
+
+### JavaScript/前端集成
+```javascript
+// 获取表列表
+const tables = await fetch('/api/v0/database/tables', {
+  method: 'POST',
+  headers: { 'Content-Type': 'application/json' },
+  body: JSON.stringify({
+    db_connection: 'postgresql://user:pass@host:5432/db',
+    schema: 'public'
+  })
+}).then(r => r.json());
+
+// 获取表DDL
+const ddl = await fetch('/api/v0/database/table/ddl', {
+  method: 'POST',
+  headers: { 'Content-Type': 'application/json' },
+  body: JSON.stringify({
+    db_connection: 'postgresql://user:pass@host:5432/db',
+    table: 'public.users',
+    business_context: '用户管理系统',
+    type: 'both'
+  })
+}).then(r => r.json());
+```
+
+### Python集成
+```python
+import requests
+
+# 获取表列表
+response = requests.post('http://localhost:8084/api/v0/database/tables', 
+  json={
+    'db_connection': 'postgresql://user:pass@host:5432/db',
+    'schema': 'public'
+  })
+tables = response.json()
+
+# 获取表DDL  
+response = requests.post('http://localhost:8084/api/v0/database/table/ddl',
+  json={
+    'db_connection': 'postgresql://user:pass@host:5432/db', 
+    'table': 'public.users',
+    'business_context': '用户管理系统',
+    'type': 'ddl'
+  })
+ddl = response.json()
+``` 

+ 307 - 3
citu_app.py

@@ -2802,6 +2802,7 @@ def create_data_pipeline_task():
             business_context=req.get('business_context'),
             db_name=req.get('db_name'),  # 可选参数,用于指定特定数据库名称
             db_connection=req.get('db_connection'),  # 可选参数,用于指定数据库连接字符串
+            task_name=req.get('task_name'),  # 可选参数,用于指定任务名称
             enable_sql_validation=req.get('enable_sql_validation', True),
             enable_llm_repair=req.get('enable_llm_repair', True),
             modify_original_file=req.get('modify_original_file', True),
@@ -2813,6 +2814,7 @@ def create_data_pipeline_task():
         
         response_data = {
             "task_id": task_id,
+            "task_name": task_info.get('task_name'),
             "status": task_info.get('status'),
             "created_at": task_info.get('created_at').isoformat() if task_info.get('created_at') else None
         }
@@ -2985,6 +2987,7 @@ def get_data_pipeline_task_status(task_id):
         
         response_data = {
             "task_id": task_info['task_id'],
+            "task_name": task_info.get('task_name'),
             "status": task_info['status'],
             "step_status": step_status_summary,
             "created_at": task_info['created_at'].isoformat() if task_info.get('created_at') else None,
@@ -3156,15 +3159,16 @@ def list_data_pipeline_tasks():
         formatted_tasks = []
         for task in tasks:
             formatted_tasks.append({
-                "task_id": task.get('id'),
+                "task_id": task.get('task_id'),
+                "task_name": task.get('task_name'),
                 "status": task.get('status'),
                 "step_status": task.get('step_status'),
                 "created_at": task['created_at'].isoformat() if task.get('created_at') else None,
                 "started_at": task['started_at'].isoformat() if task.get('started_at') else None,
                 "completed_at": task['completed_at'].isoformat() if task.get('completed_at') else None,
-                "created_by": task.get('created_by'),
+                "created_by": task.get('by_user'),
                 "db_name": task.get('db_name'),
-                "business_context": task.get('business_context')
+                "business_context": task.get('parameters', {}).get('business_context') if task.get('parameters') else None
             })
         
         response_data = {
@@ -3185,6 +3189,197 @@ def list_data_pipeline_tasks():
             response_text="获取任务列表失败,请稍后重试"
         )), 500
 
+# ==================== 表检查API端点 ====================
+
+import asyncio
+from data_pipeline.api.table_inspector_api import TableInspectorAPI
+
+@app.flask_app.route('/api/v0/database/tables', methods=['POST'])
+def get_database_tables():
+    """
+    获取数据库表列表
+    
+    请求体:
+    {
+        "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db",  // 可选,不传则使用默认配置
+        "schema": "public,ods"  // 可选,支持多个schema用逗号分隔,默认为public
+    }
+    
+    响应:
+    {
+        "success": true,
+        "code": 200,
+        "message": "获取表列表成功",
+        "data": {
+            "tables": ["public.table1", "public.table2", "ods.table3"],
+            "total": 3,
+            "schemas": ["public", "ods"]
+        }
+    }
+    """
+    try:
+        req = request.get_json(force=True)
+        
+        # 处理数据库连接参数(可选)
+        db_connection = req.get('db_connection')
+        if not db_connection:
+            # 使用app_config的默认数据库配置
+            import app_config
+            db_params = app_config.APP_DB_CONFIG
+            db_connection = f"postgresql://{db_params['user']}:{db_params['password']}@{db_params['host']}:{db_params['port']}/{db_params['dbname']}"
+            logger.info("使用默认数据库配置获取表列表")
+        else:
+            logger.info("使用用户指定的数据库配置获取表列表")
+        
+        # 可选参数
+        schema = req.get('schema', '')
+        
+        # 创建表检查API实例
+        table_inspector = TableInspectorAPI()
+        
+        # 使用asyncio运行异步方法
+        async def get_tables():
+            return await table_inspector.get_tables_list(db_connection, schema)
+        
+        # 在新的事件循环中运行异步方法
+        try:
+            loop = asyncio.new_event_loop()
+            asyncio.set_event_loop(loop)
+            tables = loop.run_until_complete(get_tables())
+        finally:
+            loop.close()
+        
+        # 解析schema信息
+        parsed_schemas = table_inspector._parse_schemas(schema)
+        
+        response_data = {
+            "tables": tables,
+            "total": len(tables),
+            "schemas": parsed_schemas,
+            "db_connection_info": {
+                "database": db_connection.split('/')[-1].split('?')[0] if '/' in db_connection else "unknown"
+            }
+        }
+        
+        return jsonify(success_response(
+            response_text="获取表列表成功",
+            data=response_data
+        )), 200
+        
+    except Exception as e:
+        logger.error(f"获取数据库表列表失败: {str(e)}")
+        return jsonify(internal_error_response(
+            response_text=f"获取表列表失败: {str(e)}"
+        )), 500
+
+@app.flask_app.route('/api/v0/database/table/ddl', methods=['POST'])
+def get_table_ddl():
+    """
+    获取表的DDL语句或MD文档
+    
+    请求体:
+    {
+        "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db",  // 可选,不传则使用默认配置
+        "table": "public.test",
+        "business_context": "这是高速公路服务区的相关数据",  // 可选
+        "type": "ddl"  // 可选,支持ddl/md/both,默认为ddl
+    }
+    
+    响应:
+    {
+        "success": true,
+        "code": 200,
+        "message": "获取表DDL成功",
+        "data": {
+            "ddl": "create table public.test (...);",
+            "md": "## test表...",  // 仅当type为md或both时返回
+            "table_info": {
+                "table_name": "test",
+                "schema_name": "public",
+                "full_name": "public.test",
+                "comment": "测试表",
+                "field_count": 10,
+                "row_count": 1000
+            },
+            "fields": [...]
+        }
+    }
+    """
+    try:
+        req = request.get_json(force=True)
+        
+        # 处理参数(table仍为必需,db_connection可选)
+        table = req.get('table')
+        db_connection = req.get('db_connection')
+        
+        if not table:
+            return jsonify(bad_request_response(
+                response_text="缺少必需参数:table",
+                missing_params=['table']
+            )), 400
+        
+        if not db_connection:
+            # 使用app_config的默认数据库配置
+            import app_config
+            db_params = app_config.APP_DB_CONFIG
+            db_connection = f"postgresql://{db_params['user']}:{db_params['password']}@{db_params['host']}:{db_params['port']}/{db_params['dbname']}"
+            logger.info("使用默认数据库配置获取表DDL")
+        else:
+            logger.info("使用用户指定的数据库配置获取表DDL")
+        
+        # 可选参数
+        business_context = req.get('business_context', '')
+        output_type = req.get('type', 'ddl')
+        
+        # 验证type参数
+        valid_types = ['ddl', 'md', 'both']
+        if output_type not in valid_types:
+            return jsonify(bad_request_response(
+                response_text=f"无效的type参数: {output_type},支持的值: {valid_types}",
+                invalid_params=['type']
+            )), 400
+        
+        # 创建表检查API实例
+        table_inspector = TableInspectorAPI()
+        
+        # 使用asyncio运行异步方法
+        async def get_ddl():
+            return await table_inspector.get_table_ddl(
+                db_connection=db_connection,
+                table=table,
+                business_context=business_context,
+                output_type=output_type
+            )
+        
+        # 在新的事件循环中运行异步方法
+        try:
+            loop = asyncio.new_event_loop()
+            asyncio.set_event_loop(loop)
+            result = loop.run_until_complete(get_ddl())
+        finally:
+            loop.close()
+        
+        response_data = {
+            **result,
+            "generation_info": {
+                "business_context": business_context,
+                "output_type": output_type,
+                "has_llm_comments": bool(business_context),
+                "database": db_connection.split('/')[-1].split('?')[0] if '/' in db_connection else "unknown"
+            }
+        }
+        
+        return jsonify(success_response(
+            response_text=f"获取表{output_type.upper()}成功",
+            data=response_data
+        )), 200
+        
+    except Exception as e:
+        logger.error(f"获取表DDL失败: {str(e)}")
+        return jsonify(internal_error_response(
+            response_text=f"获取表{output_type.upper() if 'output_type' in locals() else 'DDL'}失败: {str(e)}"
+        )), 500
+
 # ==================== Data Pipeline 文件管理 API ====================
 
 from flask import send_file
@@ -3425,5 +3620,114 @@ def get_table_list_info(task_id):
             response_text="获取表清单文件信息失败,请稍后重试"
         )), 500
 
+@app.flask_app.route('/api/v0/data_pipeline/tasks/<task_id>/table-list', methods=['POST'])
+def create_table_list_from_names(task_id):
+    """
+    通过POST方式提交表名列表并创建table_list.txt文件
+    
+    请求体:
+    {
+        "tables": ["table1", "schema.table2", "table3"]
+    }
+    或者:
+    {
+        "tables": "table1,schema.table2,table3"
+    }
+    
+    响应:
+    {
+        "success": true,
+        "code": 200,
+        "message": "表清单已成功创建",
+        "data": {
+            "task_id": "task_20250701_123456",
+            "filename": "table_list.txt",
+            "table_count": 3,
+            "file_size": 45,
+            "file_size_formatted": "45 B",
+            "created_time": "2025-07-01T12:34:56"
+        }
+    }
+    """
+    try:
+        # 验证任务是否存在
+        manager = get_data_pipeline_manager()
+        task_info = manager.get_task_status(task_id)
+        if not task_info:
+            return jsonify(not_found_response(
+                response_text=f"任务不存在: {task_id}"
+            )), 404
+        
+        # 获取请求数据
+        req = request.get_json(force=True)
+        tables_param = req.get('tables')
+        
+        if not tables_param:
+            return jsonify(bad_request_response(
+                response_text="缺少必需参数:tables",
+                missing_params=['tables']
+            )), 400
+        
+        # 处理不同格式的表名参数
+        try:
+            if isinstance(tables_param, str):
+                # 逗号分隔的字符串格式
+                table_names = [name.strip() for name in tables_param.split(',') if name.strip()]
+            elif isinstance(tables_param, list):
+                # 数组格式
+                table_names = [str(name).strip() for name in tables_param if str(name).strip()]
+            else:
+                return jsonify(bad_request_response(
+                    response_text="tables参数格式错误,应为字符串(逗号分隔)或数组"
+                )), 400
+            
+            if not table_names:
+                return jsonify(bad_request_response(
+                    response_text="表名列表不能为空"
+                )), 400
+                
+        except Exception as e:
+            return jsonify(bad_request_response(
+                response_text=f"解析tables参数失败: {str(e)}"
+            )), 400
+        
+        try:
+            # 使用文件管理器创建表清单文件
+            file_manager = get_data_pipeline_file_manager()
+            result = file_manager.create_table_list_from_names(task_id, table_names)
+            
+            response_data = {
+                "task_id": task_id,
+                "filename": result["filename"],
+                "table_count": result["table_count"],
+                "unique_table_count": result["unique_table_count"],
+                "file_size": result["file_size"],
+                "file_size_formatted": result["file_size_formatted"],
+                "created_time": result["created_time"].isoformat() if result.get("created_time") else None,
+                "original_count": len(table_names) if isinstance(table_names, list) else len(tables_param.split(','))
+            }
+            
+            return jsonify(success_response(
+                response_text=f"表清单已成功创建,包含 {result['table_count']} 个表",
+                data=response_data
+            )), 200
+            
+        except ValueError as e:
+            # 表名验证错误(如格式错误、数量限制等)
+            return jsonify(bad_request_response(
+                response_text=str(e)
+            )), 400
+        except Exception as e:
+            logger.error(f"创建表清单文件失败: {str(e)}")
+            return jsonify(internal_error_response(
+                response_text="创建表清单文件失败,请稍后重试"
+            )), 500
+        
+    except Exception as e:
+        logger.error(f"处理表清单创建请求失败: {str(e)}")
+        return jsonify(internal_error_response(
+            response_text="处理请求失败,请稍后重试"
+        )), 500
+
 logger.info("正在启动Flask应用: http://localhost:8084")
 app.run(host="0.0.0.0", port=8084, debug=True)

+ 34 - 7
data_pipeline/api/simple_db_manager.py

@@ -58,6 +58,7 @@ class SimpleTaskManager:
                    business_context: str = None,
                    db_name: str = None,
                    db_connection: str = None,
+                   task_name: str = None,
                    **kwargs) -> str:
         """创建新任务"""
         task_id = self.generate_task_id()
@@ -102,11 +103,12 @@ class SimpleTaskManager:
                 # 创建任务记录
                 cursor.execute("""
                     INSERT INTO data_pipeline_tasks (
-                        task_id, task_type, status, parameters, created_type, 
+                        task_id, task_name, task_type, status, parameters, created_type, 
                         by_user, db_name, output_directory
-                    ) VALUES (%s, %s, %s, %s, %s, %s, %s, %s)
+                    ) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)
                 """, (
                     task_id, 
+                    task_name,
                     'data_workflow', 
                     'pending', 
                     Json(parameters),
@@ -301,8 +303,6 @@ class SimpleTaskManager:
             self.logger.error(f"获取步骤状态失败: {e}")
             raise
     
-
-    
     def get_tasks_list(self, limit: int = 50, offset: int = 0, status_filter: Optional[str] = None) -> List[Dict[str, Any]]:
         """获取任务列表"""
         try:
@@ -312,15 +312,42 @@ class SimpleTaskManager:
                 params = []
                 
                 if status_filter:
-                    where_clause = "WHERE status = %s"
+                    where_clause = "WHERE t.status = %s"
                     params.append(status_filter)
                 
                 params.extend([limit, offset])
                 
+                # 联表查询获取步骤状态汇总(排除result字段)
                 cursor.execute(f"""
-                    SELECT * FROM data_pipeline_tasks 
+                    SELECT 
+                        t.task_id,
+                        t.task_name,
+                        t.task_type,
+                        t.status,
+                        t.parameters,
+                        t.error_message,
+                        t.created_at,
+                        t.started_at,
+                        t.completed_at,
+                        t.created_type,
+                        t.by_user,
+                        t.output_directory,
+                        t.db_name,
+                        CASE 
+                            WHEN COUNT(s.step_name) = 0 THEN NULL
+                            WHEN COUNT(s.step_name) FILTER (WHERE s.step_status = 'failed') > 0 THEN 'failed'
+                            WHEN COUNT(s.step_name) FILTER (WHERE s.step_status = 'running') > 0 THEN 'running'
+                            WHEN COUNT(s.step_name) FILTER (WHERE s.step_status = 'completed') = COUNT(s.step_name) THEN 'all_completed'
+                            WHEN COUNT(s.step_name) FILTER (WHERE s.step_status = 'completed') > 0 THEN 'partial_completed'
+                            ELSE 'pending'
+                        END as step_status
+                    FROM data_pipeline_tasks t
+                    LEFT JOIN data_pipeline_task_steps s ON t.task_id = s.task_id
                     {where_clause}
-                    ORDER BY created_at DESC 
+                    GROUP BY t.task_id, t.task_name, t.task_type, t.status, t.parameters, t.error_message, 
+                             t.created_at, t.started_at, t.completed_at, t.created_type, t.by_user, 
+                             t.output_directory, t.db_name
+                    ORDER BY t.created_at DESC 
                     LIMIT %s OFFSET %s
                 """, params)
                 

+ 157 - 1
data_pipeline/api/simple_file_manager.py

@@ -513,4 +513,160 @@ class SimpleFileManager:
             return {
                 "exists": False,
                 "error": str(e)
-            }
+            }
+    
+    def create_table_list_from_names(self, task_id: str, table_names: List[str]) -> Dict[str, Any]:
+        """
+        从表名列表创建table_list.txt文件
+        
+        Args:
+            task_id: 任务ID
+            table_names: 表名列表
+        
+        Returns:
+            Dict: 创建结果,包含filename、table_count、file_size等信息
+        
+        Raises:
+            ValueError: 表名验证失败(表名格式错误、空列表等)
+            IOError: 文件操作失败
+        """
+        try:
+            # 获取配置
+            from data_pipeline.config import SCHEMA_TOOLS_CONFIG
+            upload_config = SCHEMA_TOOLS_CONFIG.get("file_upload", {})
+            target_filename = upload_config.get("target_filename", "table_list.txt")
+            max_lines = upload_config.get("max_lines", 1000)
+            min_lines = upload_config.get("min_lines", 1)
+            
+            # 验证输入
+            if not table_names:
+                raise ValueError("表名列表不能为空")
+            
+            if not isinstance(table_names, list):
+                raise ValueError("表名必须是列表格式")
+            
+            # 处理和验证表名
+            processed_tables = self._process_table_names(table_names)
+            
+            # 验证表名数量
+            if len(processed_tables) < min_lines:
+                raise ValueError(f"表名数量不能少于 {min_lines} 个")
+            
+            if len(processed_tables) > max_lines:
+                raise ValueError(f"表名数量不能超过 {max_lines} 个")
+            
+            # 确保任务目录存在
+            task_dir = self.get_task_directory(task_id)
+            if not task_dir.exists():
+                task_dir.mkdir(parents=True, exist_ok=True)
+                self.logger.info(f"创建任务目录: {task_dir}")
+            
+            # 确定目标文件路径
+            target_file_path = task_dir / target_filename
+            
+            # 生成文件内容
+            file_content = self._generate_table_list_content(processed_tables)
+            
+            # 写入文件(覆盖模式)
+            with open(target_file_path, 'w', encoding='utf-8') as f:
+                f.write(file_content)
+            
+            # 验证文件是否成功写入
+            if not target_file_path.exists():
+                raise IOError("文件创建失败")
+            
+            # 获取文件信息
+            file_stat = target_file_path.stat()
+            created_time = datetime.fromtimestamp(file_stat.st_mtime)
+            
+            self.logger.info(f"成功创建表清单文件到任务 {task_id}: {target_file_path} ({len(processed_tables)} 个表)")
+            
+            return {
+                "filename": target_filename,
+                "table_count": len(processed_tables),
+                "unique_table_count": len(set(processed_tables)),
+                "file_size": file_stat.st_size,
+                "file_size_formatted": self._format_file_size(file_stat.st_size),
+                "created_time": created_time,
+                "target_path": str(target_file_path)
+            }
+            
+        except Exception as e:
+            self.logger.error(f"创建表清单文件失败: {e}")
+            raise
+    
+    def _process_table_names(self, table_names: List[str]) -> List[str]:
+        """
+        处理表名列表:验证格式、去重、排序
+        
+        Args:
+            table_names: 原始表名列表
+            
+        Returns:
+            List[str]: 处理后的表名列表
+            
+        Raises:
+            ValueError: 表名格式验证失败
+        """
+        processed_tables = []
+        invalid_tables = []
+        
+        for table_name in table_names:
+            # 去除空白
+            table_name = table_name.strip()
+            
+            # 跳过空字符串
+            if not table_name:
+                continue
+            
+            # 跳过注释行
+            if table_name.startswith('#') or table_name.startswith('--'):
+                continue
+            
+            # 验证表名格式
+            if self._is_valid_table_name(table_name):
+                processed_tables.append(table_name)
+            else:
+                invalid_tables.append(table_name)
+        
+        # 如果有无效表名,抛出异常
+        if invalid_tables:
+            raise ValueError(f"包含无效的表名格式: {', '.join(invalid_tables[:5])}")
+        
+        # 去重并保持顺序
+        seen = set()
+        unique_tables = []
+        for table in processed_tables:
+            if table not in seen:
+                seen.add(table)
+                unique_tables.append(table)
+        
+        return unique_tables
+    
+    def _generate_table_list_content(self, table_names: List[str]) -> str:
+        """
+        生成table_list.txt文件内容
+        
+        Args:
+            table_names: 表名列表
+            
+        Returns:
+            str: 文件内容
+        """
+        lines = []
+        
+        # 添加文件头注释
+        lines.append("# 表清单文件")
+        lines.append(f"# 生成时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+        lines.append(f"# 表数量: {len(table_names)}")
+        lines.append("")
+        
+        # 添加表名
+        for table_name in table_names:
+            lines.append(table_name)
+        
+        # 确保文件以换行符结束
+        if lines and not lines[-1] == "":
+            lines.append("")
+        
+        return "\n".join(lines)

+ 315 - 0
data_pipeline/api/table_inspector_api.py

@@ -0,0 +1,315 @@
+"""
+表检查API模块
+
+复用data_pipeline中的数据库连接和查询功能,提供独立的表信息查询API
+"""
+
+import asyncio
+import asyncpg
+import logging
+from typing import List, Optional, Dict, Any
+from data_pipeline.tools.database_inspector import DatabaseInspectorTool
+
+
+class TableInspectorAPI:
+    """表检查API类,复用现有的数据库功能"""
+    
+    def __init__(self):
+        self.logger = logging.getLogger("TableInspectorAPI")
+        self.db_inspector = None
+    
+    async def get_tables_list(self, db_connection: str, schema: Optional[str] = None) -> List[str]:
+        """
+        获取数据库表列表
+        
+        Args:
+            db_connection: 完整的PostgreSQL连接字符串
+            schema: 可选的schema参数,支持多个schema用逗号分隔
+                   如果为None或空字符串,则只返回public schema的表
+        
+        Returns:
+            表名列表,格式为 schema.tablename
+        """
+        try:
+            # 创建数据库检查器实例
+            self.db_inspector = DatabaseInspectorTool(db_connection=db_connection)
+            
+            # 创建连接池
+            await self.db_inspector._create_connection_pool()
+            
+            # 解析schema参数
+            target_schemas = self._parse_schemas(schema)
+            
+            # 查询表列表
+            tables = await self._query_tables(target_schemas)
+            
+            return tables
+            
+        except Exception as e:
+            self.logger.error(f"获取表列表失败: {e}")
+            raise
+        finally:
+            # 清理连接池
+            if self.db_inspector and self.db_inspector.connection_pool:
+                await self.db_inspector.connection_pool.close()
+    
+    def _parse_schemas(self, schema: Optional[str]) -> List[str]:
+        """
+        解析schema参数
+        
+        Args:
+            schema: schema参数,可以是单个schema或逗号分隔的多个schema
+        
+        Returns:
+            schema列表
+        """
+        if not schema or schema.strip() == "":
+            # 如果没有指定schema,默认只查询public schema
+            return ["public"]
+        
+        # 解析逗号分隔的schema
+        schemas = [s.strip() for s in schema.split(",") if s.strip()]
+        
+        # 如果解析后为空,回退到public
+        if not schemas:
+            return ["public"]
+        
+        return schemas
+    
+    async def _query_tables(self, schemas: List[str]) -> List[str]:
+        """
+        查询指定schema中的表
+        
+        Args:
+            schemas: schema列表
+        
+        Returns:
+            表名列表,格式为 schema.tablename
+        """
+        tables = []
+        
+        async with self.db_inspector.connection_pool.acquire() as conn:
+            for schema in schemas:
+                # 查询指定schema中的表
+                query = """
+                SELECT schemaname, tablename 
+                FROM pg_tables 
+                WHERE schemaname = $1
+                ORDER BY tablename
+                """
+                
+                rows = await conn.fetch(query, schema)
+                
+                # 格式化表名为 schema.tablename
+                for row in rows:
+                    schema_name = row['schemaname']
+                    table_name = row['tablename']
+                    full_table_name = f"{schema_name}.{table_name}"
+                    tables.append(full_table_name)
+        
+        # 按名称排序
+        tables.sort()
+        
+        self.logger.info(f"查询到 {len(tables)} 个表,schemas: {schemas}")
+        
+        return tables
+    
+    async def get_table_ddl(self, db_connection: str, table: str, business_context: str = None, output_type: str = "ddl") -> Dict[str, Any]:
+        """
+        获取表的DDL语句或MD文档
+        
+        Args:
+            db_connection: 数据库连接字符串
+            table: 表名,格式为 schema.tablename
+            business_context: 业务上下文描述
+            output_type: 输出类型,支持 "ddl", "md", "both"
+        
+        Returns:
+            包含DDL/MD内容的字典
+        """
+        try:
+            # 解析表名
+            schema_name, table_name = self._parse_table_name(table)
+            
+            # 导入必要的模块
+            from data_pipeline.tools.database_inspector import DatabaseInspectorTool
+            from data_pipeline.tools.comment_generator import CommentGeneratorTool
+            from data_pipeline.tools.ddl_generator import DDLGeneratorTool
+            from data_pipeline.tools.doc_generator import DocGeneratorTool
+            from data_pipeline.tools.data_sampler import DataSamplerTool
+            from data_pipeline.utils.data_structures import TableMetadata, TableProcessingContext
+            from core.vanna_llm_factory import create_vanna_instance
+            
+            # 创建数据库检查器实例
+            db_inspector = DatabaseInspectorTool(db_connection=db_connection)
+            await db_inspector._create_connection_pool()
+            
+            # 创建表元数据对象
+            table_metadata = TableMetadata(
+                table_name=table_name,
+                schema_name=schema_name,
+                full_name=f"{schema_name}.{table_name}",
+                fields=[],
+                comment=None,
+                sample_data=[]
+            )
+            
+            # 获取全局Vanna实例(仅用于LLM调用,不修改其数据库连接)
+            from common.vanna_instance import get_vanna_instance
+            vn = get_vanna_instance()
+            self.logger.info("使用全局Vanna单例实例进行LLM调用(不修改其数据库连接)")
+            
+            # 创建处理上下文
+            context = TableProcessingContext(
+                table_metadata=table_metadata,
+                business_context=business_context or "数据库管理系统",
+                output_dir="/tmp",  # 临时目录,API不会真正写文件
+                pipeline="api_direct",  # API直接调用标识
+                vn=vn,
+                file_manager=None,  # 不需要文件管理器
+                step_results={}
+            )
+            
+            # 第1步:获取表结构信息
+            self.logger.info(f"开始获取表结构: {table}")
+            inspect_result = await db_inspector.execute(context)
+            if not inspect_result.success:
+                raise Exception(f"获取表结构失败: {inspect_result.error_message}")
+            
+            # 第2步:获取样例数据(用于生成更好的注释)
+            self.logger.info("开始获取样例数据")
+            try:
+                data_sampler = DataSamplerTool(vn=vn, db_connection=db_connection)
+                sample_result = await data_sampler.execute(context)
+                if sample_result.success:
+                    self.logger.info("样例数据获取成功")
+                else:
+                    self.logger.warning(f"样例数据获取失败: {sample_result.error_message}")
+            except Exception as e:
+                self.logger.warning(f"样例数据获取异常: {e}")
+            
+            # 第3步:生成注释(调用LLM)
+            if business_context:
+                self.logger.info("开始生成LLM注释")
+                try:
+                    comment_generator = CommentGeneratorTool(
+                        vn=vn,
+                        business_context=business_context,
+                        db_connection=db_connection
+                    )
+                    comment_result = await comment_generator.execute(context)
+                    if comment_result.success:
+                        self.logger.info("LLM注释生成成功")
+                    else:
+                        self.logger.warning(f"LLM注释生成失败: {comment_result.error_message}")
+                except Exception as e:
+                    self.logger.warning(f"LLM注释生成异常: {e}")
+            
+            # 第4步:根据类型生成输出
+            result = {}
+            
+            if output_type in ["ddl", "both"]:
+                self.logger.info("开始生成DDL")
+                ddl_generator = DDLGeneratorTool()
+                ddl_result = await ddl_generator.execute(context)
+                if ddl_result.success:
+                    result["ddl"] = ddl_result.data.get("ddl_content", "")
+                    # 保存DDL结果供MD生成器使用
+                    context.step_results["ddl_generator"] = ddl_result
+                else:
+                    raise Exception(f"DDL生成失败: {ddl_result.error_message}")
+            
+            if output_type in ["md", "both"]:
+                self.logger.info("开始生成MD文档")
+                doc_generator = DocGeneratorTool()
+                
+                # 直接调用MD生成方法,不依赖文件系统
+                md_content = doc_generator._generate_md_content(
+                    table_metadata, 
+                    result.get("ddl", "")
+                )
+                result["md"] = md_content
+            
+            # 添加表信息摘要
+            result["table_info"] = {
+                "table_name": table_metadata.table_name,
+                "schema_name": table_metadata.schema_name,
+                "full_name": table_metadata.full_name,
+                "comment": table_metadata.comment,
+                "field_count": len(table_metadata.fields),
+                "row_count": table_metadata.row_count,
+                "table_size": table_metadata.table_size
+            }
+            
+            # 添加字段信息
+            result["fields"] = [
+                {
+                    "name": field.name,
+                    "type": field.type,
+                    "nullable": field.nullable,
+                    "comment": field.comment,
+                    "is_primary_key": field.is_primary_key,
+                    "is_foreign_key": field.is_foreign_key,
+                    "default_value": field.default_value,
+                    "is_enum": getattr(field, 'is_enum', False),
+                    "enum_values": getattr(field, 'enum_values', [])
+                }
+                for field in table_metadata.fields
+            ]
+            
+            self.logger.info(f"表DDL生成完成: {table}, 输出类型: {output_type}")
+            return result
+            
+        except Exception as e:
+            self.logger.error(f"获取表DDL失败: {e}")
+            raise
+        finally:
+            # 清理连接池
+            if 'db_inspector' in locals() and db_inspector.connection_pool:
+                await db_inspector.connection_pool.close()
+    
+    def _parse_table_name(self, table: str) -> tuple[str, str]:
+        """
+        解析表名
+        
+        Args:
+            table: 表名,格式为 schema.tablename 或 tablename
+        
+        Returns:
+            (schema_name, table_name) 元组
+        """
+        if "." in table:
+            parts = table.split(".", 1)
+            return parts[0], parts[1]
+        else:
+            # 如果没有指定schema,默认为public
+            return "public", table
+    
+    def _parse_db_connection(self, db_connection: str) -> Dict[str, Any]:
+        """
+        解析PostgreSQL连接字符串
+        
+        Args:
+            db_connection: PostgreSQL连接字符串,格式为 postgresql://user:password@host:port/dbname
+        
+        Returns:
+            包含数据库连接参数的字典
+        """
+        import re
+        
+        # 解析连接字符串的正则表达式
+        pattern = r'postgresql://([^:]+):([^@]+)@([^:]+):(\d+)/(.+)'
+        match = re.match(pattern, db_connection)
+        
+        if not match:
+            raise ValueError(f"无效的PostgreSQL连接字符串格式: {db_connection}")
+        
+        user, password, host, port, dbname = match.groups()
+        
+        return {
+            'user': user,
+            'password': password,
+            'host': host,
+            'port': int(port),
+            'dbname': dbname
+        } 

+ 1 - 1
data_pipeline/config.py

@@ -54,7 +54,7 @@ SCHEMA_TOOLS_CONFIG = {
     
     # LLM配置
     "use_app_config_llm": True,                # 是否使用app_config中的LLM配置
-    "comment_generation_timeout": 30,          # LLM调用超时时间(秒)
+    "comment_generation_timeout": 120,          # LLM调用超时时间(秒)
     "max_llm_retries": 3,                      # LLM调用最大重试次数
     
     # 系统表过滤配置

+ 3 - 0
data_pipeline/sql/init_tables.sql

@@ -20,6 +20,7 @@ CREATE TABLE IF NOT EXISTS data_pipeline_tasks (
     task_id VARCHAR(32) PRIMARY KEY,               -- 'task_20250627_143052'
     
     -- 任务基本信息
+    task_name VARCHAR(255),                        -- 任务自定义名称(可选)
     task_type VARCHAR(50) NOT NULL DEFAULT 'data_workflow',
     status VARCHAR(20) NOT NULL DEFAULT 'pending', -- pending/in_progress/partial_completed/completed/failed
     
@@ -89,6 +90,7 @@ CREATE INDEX IF NOT EXISTS idx_tasks_created_at ON data_pipeline_tasks(created_a
 CREATE INDEX IF NOT EXISTS idx_tasks_db_name ON data_pipeline_tasks(db_name);
 CREATE INDEX IF NOT EXISTS idx_tasks_created_type ON data_pipeline_tasks(created_type);
 CREATE INDEX IF NOT EXISTS idx_tasks_task_type ON data_pipeline_tasks(task_type);
+CREATE INDEX IF NOT EXISTS idx_tasks_task_name ON data_pipeline_tasks(task_name);
 
 -- 步骤状态表索引
 CREATE INDEX IF NOT EXISTS idx_steps_task_id ON data_pipeline_task_steps(task_id);
@@ -189,6 +191,7 @@ $$ LANGUAGE plpgsql;
 CREATE OR REPLACE VIEW v_task_step_overview AS
 SELECT 
     t.task_id,
+    t.task_name,
     t.task_type,
     t.status as task_status,
     t.created_at,

+ 14 - 22
data_pipeline/tools/base.py

@@ -8,7 +8,6 @@ from data_pipeline.utils.data_structures import ProcessingResult, TableProcessin
 class ToolRegistry:
     """工具注册管理器"""
     _tools: Dict[str, Type['BaseTool']] = {}
-    _instances: Dict[str, 'BaseTool'] = {}
     
     @classmethod
     def register(cls, name: str):
@@ -22,33 +21,26 @@ class ToolRegistry:
     
     @classmethod
     def get_tool(cls, name: str, **kwargs) -> 'BaseTool':
-        """获取工具实例,支持单例模式"""
-        if name not in cls._instances:
-            if name not in cls._tools:
-                raise ValueError(f"工具 '{name}' 未注册")
-            
-            tool_class = cls._tools[name]
-            
-            # 自动注入vanna实例到需要LLM的工具
-            if hasattr(tool_class, 'needs_llm') and tool_class.needs_llm:
-                from core.vanna_llm_factory import create_vanna_instance
-                kwargs['vn'] = create_vanna_instance()
-                logger = logging.getLogger("ToolRegistry")
-                logger.debug(f"为工具 {name} 注入LLM实例")
-            
-            cls._instances[name] = tool_class(**kwargs)
+        """获取工具实例,每次返回新实例确保参数正确传递"""
+        if name not in cls._tools:
+            raise ValueError(f"工具 '{name}' 未注册")
+        
+        tool_class = cls._tools[name]
         
-        return cls._instances[name]
+        # 自动注入vanna实例到需要LLM的工具
+        if hasattr(tool_class, 'needs_llm') and tool_class.needs_llm:
+            from core.vanna_llm_factory import create_vanna_instance
+            kwargs['vn'] = create_vanna_instance()
+            logger = logging.getLogger("ToolRegistry")
+            logger.debug(f"为工具 {name} 注入LLM实例")
+        
+        # 直接返回新实例,不使用单例模式
+        return tool_class(**kwargs)
     
     @classmethod
     def list_tools(cls) -> List[str]:
         """列出所有已注册的工具"""
         return list(cls._tools.keys())
-    
-    @classmethod
-    def clear_instances(cls):
-        """清除所有工具实例(用于测试)"""
-        cls._instances.clear()
 
 class BaseTool(ABC):
     """工具基类"""

+ 20 - 3
data_pipeline/tools/comment_generator.py

@@ -13,6 +13,7 @@ class CommentGeneratorTool(BaseTool):
     def __init__(self, **kwargs):
         super().__init__(**kwargs)
         self.business_context = kwargs.get('business_context', '')
+        self.db_connection = kwargs.get('db_connection')  # 支持传入数据库连接字符串
         self.business_dictionary = self._load_business_dictionary()
     
     async def execute(self, context: TableProcessingContext) -> ProcessingResult:
@@ -342,13 +343,26 @@ class CommentGeneratorTool(BaseTool):
     
     async def _validate_enum_suggestions(self, table_metadata, enum_suggestions: List[Dict]) -> List[Dict]:
         """验证枚举建议"""
-        from data_pipeline.tools.database_inspector import DatabaseInspectorTool
+        import asyncpg
         from data_pipeline.config import SCHEMA_TOOLS_CONFIG
         
         validated_enums = []
-        inspector = ToolRegistry.get_tool("database_inspector")
         sample_limit = SCHEMA_TOOLS_CONFIG["enum_detection_sample_limit"]
         
+        # 获取数据库连接字符串 - 优先使用传入的连接字符串
+        db_connection = self.db_connection
+        
+        # 如果没有传入连接字符串,尝试从vanna实例获取
+        if not db_connection:
+            if hasattr(self.vn, 'connection_string'):
+                db_connection = self.vn.connection_string
+            elif hasattr(self.vn, '_connection_string'):
+                db_connection = self.vn._connection_string
+        
+        if not db_connection:
+            self.logger.warning("无法获取数据库连接字符串,跳过枚举验证")
+            return validated_enums
+        
         for enum_info in enum_suggestions:
             field_name = enum_info['field_name']
             
@@ -363,7 +377,8 @@ class CommentGeneratorTool(BaseTool):
                 LIMIT {sample_limit}
                 """
                 
-                async with inspector.connection_pool.acquire() as conn:
+                conn = await asyncpg.connect(db_connection)
+                try:
                     rows = await conn.fetch(query)
                     
                     actual_values = [str(row['value']) for row in rows]
@@ -381,6 +396,8 @@ class CommentGeneratorTool(BaseTool):
                         self.logger.info(f"确认字段 {field_name} 为枚举类型,包含 {len(actual_values)} 个值")
                     else:
                         self.logger.info(f"字段 {field_name} 不同值过多({len(actual_values)}),不认为是枚举")
+                finally:
+                    await conn.close()
                         
             except Exception as e:
                 self.logger.warning(f"验证字段 {field_name} 的枚举建议失败: {e}")

+ 11 - 8
data_pipeline/tools/data_sampler.py

@@ -51,27 +51,28 @@ class DataSamplerTool(BaseTool):
     
     async def _simple_sample(self, table_metadata: TableMetadata, limit: int) -> List[Dict[str, Any]]:
         """简单采样策略"""
-        from data_pipeline.tools.database_inspector import DatabaseInspectorTool
-        
-        # 复用数据库检查工具的连接
-        inspector = ToolRegistry.get_tool("database_inspector")
+        import asyncpg
         
+        # 直接使用数据库连接字符串创建连接
         query = f"SELECT * FROM {table_metadata.full_name} LIMIT {limit}"
         
-        async with inspector.connection_pool.acquire() as conn:
+        conn = await asyncpg.connect(self.db_connection)
+        try:
             rows = await conn.fetch(query)
             return [dict(row) for row in rows]
+        finally:
+            await conn.close()
     
     async def _smart_sample_large_table(self, table_metadata: TableMetadata, limit: int) -> List[Dict[str, Any]]:
         """智能采样策略(用于大表)"""
-        from data_pipeline.tools.database_inspector import DatabaseInspectorTool
+        import asyncpg
         
-        inspector = ToolRegistry.get_tool("database_inspector")
         samples_per_section = max(1, limit // 3)
         
         samples = []
         
-        async with inspector.connection_pool.acquire() as conn:
+        conn = await asyncpg.connect(self.db_connection)
+        try:
             # 1. 前N行采样
             front_query = f"SELECT * FROM {table_metadata.full_name} LIMIT {samples_per_section}"
             front_rows = await conn.fetch(front_query)
@@ -118,5 +119,7 @@ class DataSamplerTool(BaseTool):
                         samples.append(row_dict)
                 except Exception as e:
                     self.logger.warning(f"尾部采样失败: {e}")
+        finally:
+            await conn.close()
         
         return samples[:limit]  # 确保不超过限制

+ 43 - 27
data_pipeline/tools/ddl_generator.py

@@ -22,33 +22,49 @@ class DDLGeneratorTool(BaseTool):
             # 生成DDL内容
             ddl_content = self._generate_ddl_content(table_metadata)
             
-            # 确定文件名和路径
-            filename = context.file_manager.get_safe_filename(
-                table_metadata.schema_name,
-                table_metadata.table_name,
-                SCHEMA_TOOLS_CONFIG["ddl_file_suffix"]
-            )
-            
-            # 确定子目录
-            subdirectory = "ddl" if SCHEMA_TOOLS_CONFIG["create_subdirectories"] else None
-            filepath = context.file_manager.get_full_path(filename, subdirectory)
-            
-            # 写入文件
-            with open(filepath, 'w', encoding='utf-8') as f:
-                f.write(ddl_content)
-            
-            self.logger.info(f"DDL文件已生成: {filepath}")
-            
-            return ProcessingResult(
-                success=True,
-                data={
-                    'filename': filename,
-                    'filepath': filepath,
-                    'content_length': len(ddl_content),
-                    'ddl_content': ddl_content  # 保存内容供后续工具使用
-                },
-                metadata={'tool': self.tool_name}
-            )
+            # 如果有file_manager,则写入文件(正常的data_pipeline流程)
+            if context.file_manager:
+                # 确定文件名和路径
+                filename = context.file_manager.get_safe_filename(
+                    table_metadata.schema_name,
+                    table_metadata.table_name,
+                    SCHEMA_TOOLS_CONFIG["ddl_file_suffix"]
+                )
+                
+                # 确定子目录
+                subdirectory = "ddl" if SCHEMA_TOOLS_CONFIG["create_subdirectories"] else None
+                filepath = context.file_manager.get_full_path(filename, subdirectory)
+                
+                # 写入文件
+                with open(filepath, 'w', encoding='utf-8') as f:
+                    f.write(ddl_content)
+                
+                self.logger.info(f"DDL文件已生成: {filepath}")
+                
+                return ProcessingResult(
+                    success=True,
+                    data={
+                        'filename': filename,
+                        'filepath': filepath,
+                        'content_length': len(ddl_content),
+                        'ddl_content': ddl_content  # 保存内容供后续工具使用
+                    },
+                    metadata={'tool': self.tool_name}
+                )
+            else:
+                # 如果没有file_manager,只返回DDL内容(API调用场景)
+                self.logger.info("DDL内容已生成(API调用模式,不写入文件)")
+                
+                return ProcessingResult(
+                    success=True,
+                    data={
+                        'filename': f"{table_metadata.schema_name}_{table_metadata.table_name}.ddl",
+                        'filepath': None,  # 不写入文件
+                        'content_length': len(ddl_content),
+                        'ddl_content': ddl_content  # 保存内容供后续工具使用
+                    },
+                    metadata={'tool': self.tool_name}
+                )
             
         except Exception as e:
             self.logger.exception(f"DDL生成失败")

+ 7 - 0
data_pipeline/training_data/task_20250702_144901/table_list.txt

@@ -0,0 +1,7 @@
+# 表清单文件
+# 生成时间: 2025-07-02 15:32:41
+# 表数量: 3
+
+table1
+schema.table2
+table3

+ 31 - 0
data_pipeline/training_data/task_20250702_174000/bss_business_day_data.ddl

@@ -0,0 +1,31 @@
+-- 中文名: 表注释:高速公路服务区每日业务运营数据表
+-- 描述: 表注释:高速公路服务区每日业务运营数据表,记录交易及运营指标,支撑经营分析与决策。
+create table public.bss_business_day_data (
+  id varchar(32) not null     -- 主键ID,主键,
+  version integer not null    -- 版本号,
+  create_ts timestamp         -- 创建时间,
+  created_by varchar(50)      -- 创建人,
+  update_ts timestamp         -- 更新时间,
+  updated_by varchar(50)      -- 更新人,
+  delete_ts timestamp         -- 删除时间,
+  deleted_by varchar(50)      -- 删除人,
+  oper_date date              -- 统计日期,
+  service_no varchar(255)     -- 服务区编码,
+  service_name varchar(255)   -- 服务区名称,
+  branch_no varchar(255)      -- 档口编码,
+  branch_name varchar(255)    -- 档口名称,
+  wx numeric(19,4)            -- 微信支付金额,
+  wx_order integer            -- 微信订单数量,
+  zfb numeric(19,4)           -- 支付宝支付金额,
+  zf_order integer            -- 支付宝订单数量,
+  rmb numeric(19,4)           -- 现金支付金额,
+  rmb_order integer           -- 现金订单数量,
+  xs numeric(19,4)            -- 行吧支付金额,
+  xs_order integer            -- 行吧支付订单数量,
+  jd numeric(19,4)            -- 金豆支付金额,
+  jd_order integer            -- 金豆订单数量,
+  order_sum integer           -- 订单总数,
+  pay_sum numeric(19,4)       -- 总支付金额,
+  source_type integer         -- 数据来源类别,
+  primary key (id)
+);

+ 31 - 0
data_pipeline/training_data/task_20250702_174000/bss_business_day_data_detail.md

@@ -0,0 +1,31 @@
+## bss_business_day_data(表注释:高速公路服务区每日业务运营数据表)
+bss_business_day_data 表表注释:高速公路服务区每日业务运营数据表,记录交易及运营指标,支撑经营分析与决策。
+字段列表:
+- id (varchar(32)) - 主键ID [主键, 非空]
+- version (integer) - 版本号 [非空]
+- create_ts (timestamp) - 创建时间
+- created_by (varchar(50)) - 创建人
+- update_ts (timestamp) - 更新时间
+- updated_by (varchar(50)) - 更新人
+- delete_ts (timestamp) - 删除时间
+- deleted_by (varchar(50)) - 删除人
+- oper_date (date) - 统计日期
+- service_no (varchar(255)) - 服务区编码
+- service_name (varchar(255)) - 服务区名称
+- branch_no (varchar(255)) - 档口编码
+- branch_name (varchar(255)) - 档口名称
+- wx (numeric(19,4)) - 微信支付金额
+- wx_order (integer) - 微信订单数量
+- zfb (numeric(19,4)) - 支付宝支付金额
+- zf_order (integer) - 支付宝订单数量
+- rmb (numeric(19,4)) - 现金支付金额
+- rmb_order (integer) - 现金订单数量
+- xs (numeric(19,4)) - 行吧支付金额
+- xs_order (integer) - 行吧支付订单数量
+- jd (numeric(19,4)) - 金豆支付金额
+- jd_order (integer) - 金豆订单数量
+- order_sum (integer) - 订单总数
+- pay_sum (numeric(19,4)) - 总支付金额
+- source_type (integer) - 数据来源类别
+字段补充说明:
+- id 为主键

+ 17 - 0
data_pipeline/training_data/task_20250702_174000/bss_car_day_count.ddl

@@ -0,0 +1,17 @@
+-- 中文名: 每日服务区车辆类别数量统计表
+-- 描述: 每日服务区车辆类别数量统计表,用于交通流量分析及资源调度管理
+create table public.bss_car_day_count (
+  id varchar(32) not null     -- 主键ID,主键,
+  version integer not null    -- 版本号,
+  create_ts timestamp         -- 创建时间,
+  created_by varchar(50)      -- 创建人,
+  update_ts timestamp         -- 更新时间,
+  updated_by varchar(50)      -- 更新人,
+  delete_ts timestamp         -- 删除时间,
+  deleted_by varchar(50)      -- 删除人,
+  customer_count bigint       -- 车辆数量,
+  car_type varchar(100)       -- 车辆类别,
+  count_date date             -- 统计日期,
+  service_area_id varchar(32) -- 服务区ID,
+  primary key (id)
+);

+ 17 - 0
data_pipeline/training_data/task_20250702_174000/bss_car_day_count_detail.md

@@ -0,0 +1,17 @@
+## bss_car_day_count(每日服务区车辆类别数量统计表)
+bss_car_day_count 表每日服务区车辆类别数量统计表,用于交通流量分析及资源调度管理
+字段列表:
+- id (varchar(32)) - 主键ID [主键, 非空]
+- version (integer) - 版本号 [非空]
+- create_ts (timestamp) - 创建时间
+- created_by (varchar(50)) - 创建人
+- update_ts (timestamp) - 更新时间
+- updated_by (varchar(50)) - 更新人
+- delete_ts (timestamp) - 删除时间
+- deleted_by (varchar(50)) - 删除人
+- customer_count (bigint) - 车辆数量
+- car_type (varchar(100)) - 车辆类别
+- count_date (date) - 统计日期
+- service_area_id (varchar(32)) - 服务区ID
+字段补充说明:
+- id 为主键

+ 15 - 0
data_pipeline/training_data/task_20250702_174000/bss_company.ddl

@@ -0,0 +1,15 @@
+-- 中文名: 业务支撑系统公司信息表
+-- 描述: 业务支撑系统公司信息表,存储服务区关联企业的基础信息及状态变更记录
+create table public.bss_company (
+  id varchar(32) not null     -- 主键ID,主键,
+  version integer not null    -- 版本号,
+  create_ts timestamp         -- 创建时间,
+  created_by varchar(50)      -- 创建人ID,
+  update_ts timestamp         -- 更新时间,
+  updated_by varchar(50)      -- 更新人ID,
+  delete_ts timestamp         -- 删除时间,
+  deleted_by varchar(50)      -- 删除人ID,
+  company_name varchar(255)   -- 公司名称,
+  company_no varchar(255)     -- 公司编码,
+  primary key (id)
+);

+ 15 - 0
data_pipeline/training_data/task_20250702_174000/bss_company_detail.md

@@ -0,0 +1,15 @@
+## bss_company(业务支撑系统公司信息表)
+bss_company 表业务支撑系统公司信息表,存储服务区关联企业的基础信息及状态变更记录
+字段列表:
+- id (varchar(32)) - 主键ID [主键, 非空]
+- version (integer) - 版本号 [非空]
+- create_ts (timestamp) - 创建时间
+- created_by (varchar(50)) - 创建人ID
+- update_ts (timestamp) - 更新时间
+- updated_by (varchar(50)) - 更新人ID
+- delete_ts (timestamp) - 删除时间
+- deleted_by (varchar(50)) - 删除人ID
+- company_name (varchar(255)) - 公司名称
+- company_no (varchar(255)) - 公司编码
+字段补充说明:
+- id 为主键

+ 16 - 0
data_pipeline/training_data/task_20250702_174000/bss_section_route.ddl

@@ -0,0 +1,16 @@
+-- 中文名: 存储路段与路线关联关系及操作记录(共20字)
+-- 描述: 存储路段与路线关联关系及操作记录(共20字)
+create table public.bss_section_route (
+  id varchar(32) not null     -- 主键ID,主键,
+  version integer not null    -- 版本号,
+  create_ts timestamp         -- 创建时间,
+  created_by varchar(50)      -- 创建人,
+  update_ts timestamp         -- 更新时间,
+  updated_by varchar(50)      -- 更新人,
+  delete_ts timestamp         -- 删除时间,
+  deleted_by varchar(50)      -- 删除人,
+  section_name varchar(255)   -- 路段名称,
+  route_name varchar(255)     -- 路线名称,
+  code varchar(255)           -- 路段编号,
+  primary key (id)
+);

+ 7 - 0
data_pipeline/training_data/task_20250702_174000/bss_section_route_area_link.ddl

@@ -0,0 +1,7 @@
+-- 中文名: BSS系统路线分段与服务区关联表
+-- 描述: BSS系统路线分段与服务区关联表,记录路线分段与服务区的绑定关系,支撑收费及服务设施管理。
+create table public.bss_section_route_area_link (
+  section_route_id varchar(32) not null -- 路段路线ID,主键,
+  service_area_id varchar(32) not null -- 关联服务区ID,主键,
+  primary key (section_route_id, service_area_id)
+);

+ 7 - 0
data_pipeline/training_data/task_20250702_174000/bss_section_route_area_link_detail.md

@@ -0,0 +1,7 @@
+## bss_section_route_area_link(BSS系统路线分段与服务区关联表)
+bss_section_route_area_link 表BSS系统路线分段与服务区关联表,记录路线分段与服务区的绑定关系,支撑收费及服务设施管理。
+字段列表:
+- section_route_id (varchar(32)) - 路段路线ID [主键, 非空]
+- service_area_id (varchar(32)) - 关联服务区ID [主键, 非空]
+字段补充说明:
+- 复合主键:section_route_id, service_area_id

+ 16 - 0
data_pipeline/training_data/task_20250702_174000/bss_section_route_detail.md

@@ -0,0 +1,16 @@
+## bss_section_route(存储路段与路线关联关系及操作记录(共20字))
+bss_section_route 表存储路段与路线关联关系及操作记录(共20字)
+字段列表:
+- id (varchar(32)) - 主键ID [主键, 非空]
+- version (integer) - 版本号 [非空]
+- create_ts (timestamp) - 创建时间
+- created_by (varchar(50)) - 创建人
+- update_ts (timestamp) - 更新时间
+- updated_by (varchar(50)) - 更新人
+- delete_ts (timestamp) - 删除时间
+- deleted_by (varchar(50)) - 删除人
+- section_name (varchar(255)) - 路段名称
+- route_name (varchar(255)) - 路线名称
+- code (varchar(255)) - 路段编号
+字段补充说明:
+- id 为主键

+ 19 - 0
data_pipeline/training_data/task_20250702_174000/bss_service_area.ddl

@@ -0,0 +1,19 @@
+-- 中文名: 业务支撑系统服务区主表
+-- 描述: 业务支撑系统服务区主表,存储名称、编码等基础信息,支撑服务区运营管理。
+create table public.bss_service_area (
+  id varchar(32) not null     -- 主键ID,主键,
+  version integer not null    -- 版本号,
+  create_ts timestamp         -- 创建时间,
+  created_by varchar(50)      -- 创建人ID,
+  update_ts timestamp         -- 更新时间,
+  updated_by varchar(50)      -- 更新人ID,
+  delete_ts timestamp         -- 删除时间,
+  deleted_by varchar(50)      -- 删除人ID,
+  service_area_name varchar(255) -- 服务区名称,
+  service_area_no varchar(255) -- 服务区编码,
+  company_id varchar(32)      -- 运营管理公司ID,
+  service_position varchar(255) -- 地理位置坐标,
+  service_area_type varchar(50) -- 服务区类型,
+  service_state varchar(50)   -- 运营状态,
+  primary key (id)
+);

+ 19 - 0
data_pipeline/training_data/task_20250702_174000/bss_service_area_detail.md

@@ -0,0 +1,19 @@
+## bss_service_area(业务支撑系统服务区主表)
+bss_service_area 表业务支撑系统服务区主表,存储名称、编码等基础信息,支撑服务区运营管理。
+字段列表:
+- id (varchar(32)) - 主键ID [主键, 非空]
+- version (integer) - 版本号 [非空]
+- create_ts (timestamp) - 创建时间
+- created_by (varchar(50)) - 创建人ID
+- update_ts (timestamp) - 更新时间
+- updated_by (varchar(50)) - 更新人ID
+- delete_ts (timestamp) - 删除时间
+- deleted_by (varchar(50)) - 删除人ID
+- service_area_name (varchar(255)) - 服务区名称
+- service_area_no (varchar(255)) - 服务区编码
+- company_id (varchar(32)) - 运营管理公司ID
+- service_position (varchar(255)) - 地理位置坐标
+- service_area_type (varchar(50)) - 服务区类型
+- service_state (varchar(50)) - 运营状态
+字段补充说明:
+- id 为主键

+ 18 - 0
data_pipeline/training_data/task_20250702_174000/bss_service_area_mapper.ddl

@@ -0,0 +1,18 @@
+-- 中文名: BSS系统服务区名称与编码映射表
+-- 描述: BSS系统服务区名称与编码映射表,记录服务区基础信息及变更审计,支持统一管理和数据同步。
+create table public.bss_service_area_mapper (
+  id varchar(32) not null     -- 主键ID,主键,
+  version integer not null    -- 版本号,
+  create_ts timestamp         -- 创建时间,
+  created_by varchar(50)      -- 创建人,
+  update_ts timestamp         -- 更新时间,
+  updated_by varchar(50)      -- 更新人,
+  delete_ts timestamp         -- 删除时间,
+  deleted_by varchar(50)      -- 删除人,
+  service_name varchar(255)   -- 服务区名称,
+  service_no varchar(255)     -- 服务区编码,
+  service_area_id varchar(32) -- 服务区ID,
+  source_system_type varchar(50) -- 数据来源类别名称,
+  source_type integer         -- 数据来源类别ID,
+  primary key (id)
+);

+ 18 - 0
data_pipeline/training_data/task_20250702_174000/bss_service_area_mapper_detail.md

@@ -0,0 +1,18 @@
+## bss_service_area_mapper(BSS系统服务区名称与编码映射表)
+bss_service_area_mapper 表BSS系统服务区名称与编码映射表,记录服务区基础信息及变更审计,支持统一管理和数据同步。
+字段列表:
+- id (varchar(32)) - 主键ID [主键, 非空]
+- version (integer) - 版本号 [非空]
+- create_ts (timestamp) - 创建时间
+- created_by (varchar(50)) - 创建人
+- update_ts (timestamp) - 更新时间
+- updated_by (varchar(50)) - 更新人
+- delete_ts (timestamp) - 删除时间
+- deleted_by (varchar(50)) - 删除人
+- service_name (varchar(255)) - 服务区名称
+- service_no (varchar(255)) - 服务区编码
+- service_area_id (varchar(32)) - 服务区ID
+- source_system_type (varchar(50)) - 数据来源类别名称
+- source_type (integer) - 数据来源类别ID
+字段补充说明:
+- id 为主键

+ 1 - 0
data_pipeline/training_data/task_20250702_174000/db_query_decision_prompt.txt

@@ -0,0 +1 @@
+{"business_scope":"当前数据库存储的是高速公路服务区运营管理的相关数据,主要涉及每日交易记录、车辆流量统计、服务区基础信息及路段关联关系,包含以下业务数据:","core_entities":[{"entity_type":"服务区","description":"高速公路服务区的基础信息及运营状态","key_fields":"service_area_name, service_area_no, company_id, service_position, service_area_type, service_state"},{"entity_type":"车辆类别","description":"不同类型的车辆数量统计","key_fields":"car_type, customer_count"},{"entity_type":"运营管理公司","description":"服务区所属公司的基础信息","key_fields":"company_name, company_no"},{"entity_type":"路段路线关联","description":"路段与路线的绑定关系及编号信息","key_fields":"section_name, route_name, code"}],"key_metrics":[{"metric_type":"支付交易分析","description":"按支付方式划分的金额(wx, zfb, rmb, xs, jd)和订单量(wx_order, zf_order, rmb_order, xs_order, jd_order)统计"},{"metric_type":"车辆流量监控","description":"按日期(count_date)和服务区(service_area_id)划分的车辆数量(customer_count)统计"},{"metric_type":"运营状态监控","description":"服务区运营状态(service_state)和服务区类型(service_area_type)的分布统计"},{"metric_type":"数据来源对比","description":"不同数据来源类别(source_type)的业务数据分布"}]}

+ 10 - 0
data_pipeline/training_data/task_20250702_174000/filename_mapping.txt

@@ -0,0 +1,10 @@
+# 文件名映射报告
+# 格式: 原始表名 -> 实际文件名
+
+public.bss_business_day_data -> bss_business_day_data_detail.md
+public.bss_car_day_count -> bss_car_day_count_detail.md
+public.bss_company -> bss_company_detail.md
+public.bss_section_route -> bss_section_route_detail.md
+public.bss_section_route_area_link -> bss_section_route_area_link_detail.md
+public.bss_service_area -> bss_service_area_detail.md
+public.bss_service_area_mapper -> bss_service_area_mapper_detail.md

+ 62 - 0
data_pipeline/training_data/task_20250702_174000/metadata.txt

@@ -0,0 +1,62 @@
+-- Schema Tools生成的主题元数据
+-- 业务背景: 高速公路服务区管理系统
+-- 生成时间: 2025-07-02 19:16:55
+-- 数据库: highway_db
+
+-- 创建表(如果不存在)
+CREATE TABLE IF NOT EXISTS metadata (
+    id SERIAL PRIMARY KEY,    -- 主键
+    topic_name VARCHAR(100) NOT NULL,  -- 业务主题名称
+    description TEXT,                  -- 业务主体说明
+    related_tables TEXT[],			  -- 相关表名
+    biz_entities TEXT[],               -- 主要业务实体名称
+    biz_metrics TEXT[],                -- 主要业务指标名称
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP    -- 插入时间
+);
+
+-- 插入主题数据
+INSERT INTO metadata(topic_name, description, related_tables, biz_entities, biz_metrics) VALUES
+(
+  '日营业数据分析',
+  '分析各服务区每日营业收入、订单量及支付方式分布,监控经营趋势并优化档口管理',
+  'bss_business_day_data',
+  '服务区,档口,支付方式',
+  '日收入趋势,订单量对比,支付方式占比'
+);
+
+INSERT INTO metadata(topic_name, description, related_tables, biz_entities, biz_metrics) VALUES
+(
+  '交通流量分析',
+  '通过车辆类别统计和服务区车流量变化,评估交通压力并优化基础设施配置',
+  'bss_car_day_count,bss_service_area',
+  '车辆类型,服务区,统计日期',
+  '车流量趋势,高峰时段分析,车型占比排名'
+);
+
+INSERT INTO metadata(topic_name, description, related_tables, biz_entities, biz_metrics) VALUES
+(
+  '公司运营绩效',
+  '对比不同管理公司的服务区数量、运营状态及业务指标,评估企业经营效能',
+  'bss_service_area,bss_company',
+  '运营管理公司,服务区类型,运营状态',
+  '服务区数量排名,区域覆盖率,业务指标对比'
+);
+
+INSERT INTO metadata(topic_name, description, related_tables, biz_entities, biz_metrics) VALUES
+(
+  '路线关联分析',
+  '分析路段路线与服务区的关联关系,评估路网服务能力并优化服务区布局',
+  'bss_section_route,bss_section_route_area_link',
+  '路段路线,服务区,路段编号',
+  '服务区覆盖密度,路线流量分布,关联合理性评估'
+);
+
+INSERT INTO metadata(topic_name, description, related_tables, biz_entities, biz_metrics) VALUES
+(
+  '支付偏好研究',
+  '挖掘不同地区/档口的支付方式偏好,指导支付渠道优化与营销策略制定',
+  'bss_business_day_data,bss_service_area',
+  '服务区,档口,支付类型',
+  '支付方式渗透率,区域偏好对比,档口支付结构分析'
+);
+

+ 20 - 0
data_pipeline/training_data/task_20250702_174000/metadata_detail.md

@@ -0,0 +1,20 @@
+## metadata(存储分析主题元数据)
+
+`metadata` 主要描述了当前数据库包含了哪些数据内容,哪些分析主题,哪些指标等等。
+
+字段列表:
+
+- `id` (serial) - 主键ID [主键, 非空]
+- `topic_name` (varchar(100)) - 业务主题名称 [非空]
+- `description` (text) - 业务主题说明
+- `related_tables` (text[]) - 涉及的数据表 [示例: bss_company, bss_car_day_count]
+- `biz_entities` (text[]) - 主要业务实体名称 [示例: 运营管理公司, 统计日期, 车辆类型]
+- `biz_metrics` (text[]) - 主要业务指标名称 [示例: 业务指标对比, 订单量对比, 车型占比排名]
+- `created_at` (timestamp) - 插入时间 [默认值: `CURRENT_TIMESTAMP`]
+
+字段补充说明:
+
+- `id` 为主键,自增;
+- `related_tables` 用于建立主题与具体明细表的依赖关系;
+- `biz_entities` 表示主题关注的核心对象,例如服务区、车辆、公司;
+- `biz_metrics` 表示该主题关注的业务分析指标,例如营收对比、趋势变化、占比结构等。

+ 190 - 0
data_pipeline/training_data/task_20250702_174000/qs_highway_db_20250702_191655_pair.json

@@ -0,0 +1,190 @@
+[
+  {
+    "question": "统计最近7天各服务区日均营业收入及订单量,按日均收入降序排列",
+    "sql": "SELECT service_name AS 服务区名称, AVG(pay_sum) AS 日均营收总额, AVG(order_sum) AS 日均订单量 FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date >= CURRENT_DATE - 7 GROUP BY service_name ORDER BY 日均营收总额 DESC;"
+  },
+  {
+    "question": "查询2023-10-01当日订单量TOP5档口及对应支付方式分布",
+    "sql": "SELECT branch_name AS 档口名称, order_sum AS 订单总量, wx AS 微信支付金额, zfb AS 支付宝支付金额, rmb AS 现金支付金额 FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date = '2023-10-01' ORDER BY 订单总量 DESC LIMIT 5;"
+  },
+  {
+    "question": "分析本月各服务区微信支付占比变化趋势(按日维度)",
+    "sql": "SELECT oper_date AS 统计日期, service_name AS 服务区名称, (wx / pay_sum * 100)::numeric(5,2) AS 微信支付占比 FROM bss_business_day_data WHERE delete_ts IS NULL AND EXTRACT(MONTH FROM oper_date) = EXTRACT(MONTH FROM CURRENT_DATE) ORDER BY 统计日期;"
+  },
+  {
+    "question": "对比不同服务区现金支付比例(近30天数据),筛选现金支付占比超过20%的记录",
+    "sql": "SELECT service_name AS 服务区名称, (SUM(rmb) / SUM(pay_sum) * 100)::numeric(5,2) AS 现金支付占比 FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date >= CURRENT_DATE - 30 GROUP BY service_name HAVING (SUM(rmb) / SUM(pay_sum) * 100) > 20 ORDER BY 现金支付占比 DESC;"
+  },
+  {
+    "question": "统计各档口平均订单金额(客单价)并筛选高于整体平均值的档口",
+    "sql": "WITH avg_data AS (SELECT AVG(pay_sum / nullif(order_sum,0)) AS global_avg FROM bss_business_day_data WHERE delete_ts IS NULL) SELECT branch_name AS 档口名称, (pay_sum / nullif(order_sum,0))::numeric(10,2) AS 客单价 FROM bss_business_day_data, avg_data WHERE delete_ts IS NULL AND (pay_sum / nullif(order_sum,0)) > global_avg;"
+  },
+  {
+    "question": "分析国庆期间(10.1-10.7)各支付方式交易总额及订单量对比",
+    "sql": "SELECT '微信' AS 支付方式, SUM(wx) AS 交易总额, SUM(wx_order) AS 订单量 FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date BETWEEN '2023-10-01' AND '2023-10-07' UNION ALL SELECT '支付宝', SUM(zfb), SUM(zf_order) FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date BETWEEN '2023-10-01' AND '2023-10-07' UNION ALL SELECT '现金', SUM(rmb), SUM(rmb_order) FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date BETWEEN '2023-10-01' AND '2023-10-07';"
+  },
+  {
+    "question": "查询最近一天营业数据异常(订单量为0但存在支付金额)的记录",
+    "sql": "SELECT oper_date AS 统计日期, service_name AS 服务区名称, branch_name AS 档口名称, pay_sum AS 支付总额, order_sum AS 订单量 FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date = (SELECT MAX(oper_date) FROM bss_business_day_data) AND order_sum = 0 AND pay_sum > 0;"
+  },
+  {
+    "question": "统计各服务区月度累计营收及环比增长率(按最近完整月份数据)",
+    "sql": "WITH monthly_data AS (SELECT service_name, EXTRACT(MONTH FROM oper_date) AS 月份, SUM(pay_sum) AS 月营收 FROM bss_business_day_data WHERE delete_ts IS NULL AND EXTRACT(MONTH FROM oper_date) = EXTRACT(MONTH FROM CURRENT_DATE) - 1 GROUP BY service_name, 月份) SELECT service_name AS 服务区名称, 月营收 AS 当前月营收, LAG(月营收) OVER(PARTITION BY service_name ORDER BY 月份) AS 上月营收, ((月营收 - LAG(月营收) OVER(PARTITION BY service_name ORDER BY 月份)) / LAG(月营收) OVER(PARTITION BY service_name ORDER BY 月份) * 100)::numeric(5,2) AS 环比增长率 FROM monthly_data;"
+  },
+  {
+    "question": "分析各档口非现金支付方式使用率(扫码支付占比)",
+    "sql": "SELECT branch_name AS 档口名称, (SUM(wx + zfb + xs + jd) / SUM(pay_sum) * 100)::numeric(5,2) AS 非现金支付占比, COUNT(*) AS 数据天数 FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY branch_name HAVING SUM(pay_sum) > 0 ORDER BY 非现金支付占比 DESC;"
+  },
+  {
+    "question": "统计国庆黄金周(7天)各服务区营收排名及环比节前7天增长率",
+    "sql": "WITH holiday AS (SELECT service_name, SUM(pay_sum) AS 节日营收 FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date BETWEEN '2023-10-01' AND '2023-10-07' GROUP BY service_name), pre_holiday AS (SELECT service_name, SUM(pay_sum) AS 节前营收 FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date BETWEEN '2023-09-24' AND '2023-09-30' GROUP BY service_name) SELECT h.service_name AS 服务区名称, h.节日营收, p.节前营收, ((h.节日营收 - p.节前营收)/p.节前营收 * 100)::numeric(5,2) AS 增长率 FROM holiday h JOIN pre_holiday p ON h.service_name = p.service_name ORDER BY h.节日营收 DESC;"
+  },
+  {
+    "question": "各服务区过去一周日均车流量排名TOP10",
+    "sql": "SELECT b.service_area_name AS 服务区名称, AVG(a.customer_count) AS 日均车流量 FROM bss_car_day_count a JOIN bss_service_area b ON a.service_area_id = b.id AND b.delete_ts IS NULL WHERE a.count_date >= CURRENT_DATE - 7 GROUP BY b.service_area_name ORDER BY 日均车流量 DESC LIMIT 10;"
+  },
+  {
+    "question": "本月每日总车流量变化趋势分析",
+    "sql": "SELECT count_date AS 统计日期, SUM(customer_count) AS 当日总车流量 FROM bss_car_day_count WHERE count_date >= DATE_TRUNC('month', CURRENT_DATE) GROUP BY count_date ORDER BY count_date ASC;"
+  },
+  {
+    "question": "各车型占比排名(全量数据)",
+    "sql": "SELECT car_type AS 车辆类型, SUM(customer_count) AS 总车次, ROUND(SUM(customer_count)*100/(SELECT SUM(customer_count) FROM bss_car_day_count WHERE delete_ts IS NULL),2) AS 占比百分比 FROM bss_car_day_count WHERE delete_ts IS NULL GROUP BY car_type ORDER BY 总车次 DESC;"
+  },
+  {
+    "question": "某服务区近30天车流量环比增长率",
+    "sql": "WITH daily_count AS (SELECT count_date, SUM(customer_count) AS total_cars FROM bss_car_day_count WHERE service_area_id = 'SA001' AND count_date >= CURRENT_DATE - 30 GROUP BY count_date ORDER BY count_date) SELECT count_date, total_cars, LAG(total_cars) OVER(ORDER BY count_date) AS 前一日车流, ROUND((total_cars - LAG(total_cars) OVER(ORDER BY count_date))*100/LAG(total_cars) OVER(ORDER BY count_date),2) AS 环比增长率 FROM daily_count;"
+  },
+  {
+    "question": "各服务区不同类型车辆数量分布",
+    "sql": "SELECT b.service_area_name AS 服务区名称, a.car_type AS 车辆类型, SUM(a.customer_count) AS 车辆总数 FROM bss_car_day_count a JOIN bss_service_area b ON a.service_area_id = b.id AND b.delete_ts IS NULL GROUP BY b.service_area_name, a.car_type ORDER BY 服务区名称, 车辆总数 DESC;"
+  },
+  {
+    "question": "国庆黄金周与平日车流量对比分析",
+    "sql": "SELECT CASE WHEN count_date BETWEEN '2023-10-01' AND '2023-10-07' THEN '国庆假期' ELSE '普通工作日' END AS 日期类型, SUM(customer_count) AS 总车流量, COUNT(DISTINCT count_date) AS 天数, ROUND(AVG(customer_count),2) AS 日均车流 FROM bss_car_day_count WHERE count_date BETWEEN '2023-10-01' AND '2023-10-14' GROUP BY 日期类型;"
+  },
+  {
+    "question": "某服务区各星期日车流量分布情况",
+    "sql": "SELECT EXTRACT(ISODOW FROM count_date) AS 星期编号, TO_CHAR(count_date, 'Day') AS 星期名称, AVG(customer_count) AS 平均车流量 FROM bss_car_day_count WHERE service_area_id = 'SA001' AND count_date >= CURRENT_DATE - 90 GROUP BY 星期编号, 星期名称 ORDER BY 星期编号;"
+  },
+  {
+    "question": "年度车流量最高TOP10日期明细",
+    "sql": "SELECT count_date AS 统计日期, SUM(customer_count) AS 当日车流 FROM bss_car_day_count WHERE count_date >= DATE_TRUNC('year', CURRENT_DATE) GROUP BY count_date ORDER BY 当日车流 DESC LIMIT 10;"
+  },
+  {
+    "question": "某服务区新能源车占比月度变化趋势",
+    "sql": "SELECT DATE_TRUNC('month', count_date) AS 统计月份, SUM(CASE WHEN car_type IN ('电动客车','电动货车') THEN customer_count ELSE 0 END) AS 新能源车流量, SUM(customer_count) AS 总车流, ROUND(SUM(CASE WHEN car_type IN ('电动客车','电动货车') THEN customer_count ELSE 0 END)*100/SUM(customer_count),2) AS 新能源占比 FROM bss_car_day_count WHERE service_area_id = 'SA002' GROUP BY 统计月份 ORDER BY 统计月份;"
+  },
+  {
+    "question": "各区域公司管辖服务区平均车流量对比",
+    "sql": "SELECT c.company_name AS 运营公司, COUNT(DISTINCT b.id) AS 管辖服务区数, ROUND(AVG(a.customer_count),2) AS 日均车流量 FROM bss_car_day_count a JOIN bss_service_area b ON a.service_area_id = b.id AND b.delete_ts IS NULL JOIN bss_company c ON b.company_id = c.id WHERE a.count_date = CURRENT_DATE GROUP BY c.company_name;"
+  },
+  {
+    "question": "各运营管理公司的服务区数量排名情况如何?",
+    "sql": "SELECT bc.company_name AS 公司名称, COUNT(bsa.id) AS 服务区数量 FROM bss_service_area bsa JOIN bss_company bc ON bsa.company_id = bc.id WHERE bsa.delete_ts IS NULL AND bc.delete_ts IS NULL GROUP BY bc.company_name ORDER BY 服务区数量 DESC LIMIT 10;"
+  },
+  {
+    "question": "当前各运营状态下的服务区数量分布情况?",
+    "sql": "SELECT service_state AS 运营状态, COUNT(*) AS 数量 FROM bss_service_area WHERE delete_ts IS NULL GROUP BY service_state ORDER BY 数量 DESC;"
+  },
+  {
+    "question": "XX公司管理的各类型服务区数量占比分析",
+    "sql": "SELECT service_area_type AS 服务区类型, COUNT(*) AS 数量 FROM bss_service_area WHERE company_id = (SELECT id FROM bss_company WHERE company_name = 'XX公司') AND delete_ts IS NULL GROUP BY service_area_type;"
+  },
+  {
+    "question": "最近一周新增的各公司服务区数量统计",
+    "sql": "SELECT bc.company_name AS 公司名称, COUNT(bsa.id) AS 新增数量 FROM bss_service_area bsa JOIN bss_company bc ON bsa.company_id = bc.id WHERE bsa.create_ts >= CURRENT_DATE - 7 AND bsa.delete_ts IS NULL AND bc.delete_ts IS NULL GROUP BY bc.company_name ORDER BY 新增数量 DESC;"
+  },
+  {
+    "question": "各公司服务区日均订单数对比分析",
+    "sql": "SELECT bc.company_name AS 公司名称, AVG(bdd.order_sum) AS 日均订单数 FROM bss_business_day_data bdd JOIN bss_service_area bsa ON bdd.service_no = bsa.service_area_no JOIN bss_company bc ON bsa.company_id = bc.id WHERE bdd.delete_ts IS NULL AND bsa.delete_ts IS NULL AND bc.delete_ts IS NULL GROUP BY bc.company_name ORDER BY 日均订单数 DESC;"
+  },
+  {
+    "question": "各公司正常运营与非正常运营服务区数量对比",
+    "sql": "SELECT bc.company_name AS 公司名称, bsa.service_state AS 运营状态, COUNT(*) AS 数量 FROM bss_service_area bsa JOIN bss_company bc ON bsa.company_id = bc.id WHERE bsa.delete_ts IS NULL AND bc.delete_ts IS NULL GROUP BY bc.company_name, bsa.service_state ORDER BY 公司名称, 数量 DESC;"
+  },
+  {
+    "question": "2023年Q2各公司服务区总支付金额环比分析",
+    "sql": "SELECT bc.company_name AS 公司名称, SUM(CASE WHEN EXTRACT(QUARTER FROM bdd.oper_date) = 2 THEN bdd.pay_sum ELSE 0 END) AS 第二季度金额, SUM(CASE WHEN EXTRACT(QUARTER FROM bdd.oper_date) = 1 THEN bdd.pay_sum ELSE 0 END) AS 第一季度金额 FROM bss_business_day_data bdd JOIN bss_service_area bsa ON bdd.service_no = bsa.service_area_no JOIN bss_company bc ON bsa.company_id = bc.id WHERE bdd.oper_date BETWEEN '2023-01-01' AND '2023-06-30' AND bdd.delete_ts IS NULL GROUP BY bc.company_name ORDER BY 第二季度金额 DESC;"
+  },
+  {
+    "question": "各公司管理的服务区车辆流量TOP5统计",
+    "sql": "SELECT bc.company_name AS 公司名称, SUM(cc.customer_count) AS 总车流量 FROM bss_car_day_count cc JOIN bss_service_area bsa ON cc.service_area_id = bsa.id JOIN bss_company bc ON bsa.company_id = bc.id WHERE cc.count_date = CURRENT_DATE - 1 AND cc.delete_ts IS NULL AND bsa.delete_ts IS NULL GROUP BY bc.company_name ORDER BY 总车流量 DESC LIMIT 5;"
+  },
+  {
+    "question": "查找近30天无业务数据的服务区清单",
+    "sql": "SELECT bsa.service_area_name AS 服务区名称, bc.company_name AS 管理公司 FROM bss_service_area bsa LEFT JOIN bss_business_day_data bdd ON bsa.service_area_no = bdd.service_no AND bdd.oper_date >= CURRENT_DATE - 30 JOIN bss_company bc ON bsa.company_id = bc.id WHERE bdd.id IS NULL AND bsa.delete_ts IS NULL AND bc.delete_ts IS NULL LIMIT 10;"
+  },
+  {
+    "question": "统计各路段路线关联的服务区数量,评估服务区覆盖密度",
+    "sql": "SELECT sr.route_name AS 路线名称, COUNT(link.service_area_id) AS 服务区数量 FROM bss_section_route sr LEFT JOIN bss_section_route_area_link link ON sr.id = link.section_route_id WHERE sr.delete_ts IS NULL GROUP BY sr.route_name ORDER BY 服务区数量 DESC;"
+  },
+  {
+    "question": "查询最近一个月新增的路段路线与服务区关联关系",
+    "sql": "SELECT sa.service_area_name AS 服务区名称, sr.section_name AS 路段名称, sr.route_name AS 路线名称 FROM bss_section_route_area_link link JOIN bss_section_route sr ON link.section_route_id = sr.id JOIN bss_service_area sa ON link.service_area_id = sa.id WHERE sr.create_ts >= NOW() - INTERVAL '1 month' AND sr.delete_ts IS NULL AND sa.delete_ts IS NULL ORDER BY sr.create_ts DESC LIMIT 10;"
+  },
+  {
+    "question": "分析各服务区关联的路段路线数量TOP10",
+    "sql": "SELECT sa.service_area_name AS 服务区名称, COUNT(sr.id) AS 关联路段数 FROM bss_section_route_area_link link JOIN bss_service_area sa ON link.service_area_id = sa.id JOIN bss_section_route sr ON link.section_route_id = sr.id WHERE sa.delete_ts IS NULL GROUP BY sa.service_area_name ORDER BY 关联路段数 DESC LIMIT 10;"
+  },
+  {
+    "question": "统计无服务区覆盖的路段路线信息",
+    "sql": "SELECT sr.section_name AS 路段名称, sr.route_name AS 路线名称 FROM bss_section_route sr LEFT JOIN bss_section_route_area_link link ON sr.id = link.section_route_id WHERE link.section_route_id IS NULL AND sr.delete_ts IS NULL ORDER BY sr.create_ts DESC;"
+  },
+  {
+    "question": "分析不同路线名称对应的服务区平均覆盖密度",
+    "sql": "SELECT route_name AS 路线名称, AVG(service_count) AS 平均服务区密度 FROM (SELECT sr.route_name, COUNT(link.service_area_id) AS service_count FROM bss_section_route sr LEFT JOIN bss_section_route_area_link link ON sr.id = link.section_route_id WHERE sr.delete_ts IS NULL GROUP BY sr.route_name, sr.id) sub GROUP BY route_name HAVING AVG(service_count) > 0 ORDER BY 平均服务区密度 DESC;"
+  },
+  {
+    "question": "查询包含服务区最多的3个路段编号及其覆盖情况",
+    "sql": "SELECT sr.code AS 路段编号, sr.section_name AS 路段名称, COUNT(link.service_area_id) AS 服务区数量 FROM bss_section_route sr JOIN bss_section_route_area_link link ON sr.id = link.section_route_id WHERE sr.delete_ts IS NULL GROUP BY sr.code, sr.section_name ORDER BY 服务区数量 DESC LIMIT 3;"
+  },
+  {
+    "question": "分析服务区关联路段的创建时间分布情况",
+    "sql": "SELECT EXTRACT(MONTH FROM sr.create_ts) AS 月份, COUNT(*) AS 新增路段数 FROM bss_section_route sr JOIN bss_section_route_area_link link ON sr.id = link.section_route_id WHERE sr.delete_ts IS NULL GROUP BY 月份 ORDER BY 月份;"
+  },
+  {
+    "question": "统计双向路线(上下行)的服务区覆盖对称性",
+    "sql": "SELECT sr.code AS 路段编号, COUNT(DISTINCT CASE WHEN sr.route_name LIKE '%上行%' THEN link.service_area_id END) AS 上行服务区数, COUNT(DISTINCT CASE WHEN sr.route_name LIKE '%下行%' THEN link.service_area_id END) AS 下行服务区数 FROM bss_section_route sr JOIN bss_section_route_area_link link ON sr.id = link.section_route_id WHERE sr.delete_ts IS NULL AND (sr.route_name LIKE '%上行%' OR sr.route_name LIKE '%下行%') GROUP BY sr.code HAVING COUNT(DISTINCT CASE WHEN sr.route_name LIKE '%上行%' THEN link.service_area_id END) != COUNT(DISTINCT CASE WHEN sr.route_name LIKE '%下行%' THEN link.service_area_id END);"
+  },
+  {
+    "question": "分析不同运营状态服务区的路段覆盖分布",
+    "sql": "SELECT sa.service_state AS 运营状态, COUNT(DISTINCT sr.id) AS 覆盖路段数, COUNT(DISTINCT link.service_area_id) AS 服务区数量 FROM bss_section_route sr JOIN bss_section_route_area_link link ON sr.id = link.section_route_id JOIN bss_service_area sa ON link.service_area_id = sa.id WHERE sr.delete_ts IS NULL AND sa.delete_ts IS NULL GROUP BY sa.service_state ORDER BY 覆盖路段数 DESC;"
+  },
+  {
+    "question": "各服务区微信支付渗透率及订单占比分析(按订单量排序)",
+    "sql": "SELECT service_name AS \"服务区名称\", SUM(wx_order)/SUM(order_sum) AS \"微信支付渗透率\" FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY service_name ORDER BY \"微信支付渗透率\" DESC;"
+  },
+  {
+    "question": "不同地区支付宝与现金支付金额对比(取平均值排序)",
+    "sql": "SELECT sa.service_area_type AS \"服务区类型\", AVG(bd.zfb) AS \"平均支付宝支付\", AVG(bd.rmb) AS \"平均现金支付\" FROM bss_business_day_data bd JOIN bss_service_area sa ON bd.service_no = sa.service_area_no WHERE bd.delete_ts IS NULL AND sa.delete_ts IS NULL GROUP BY sa.service_area_type ORDER BY \"平均支付宝支付\" DESC;"
+  },
+  {
+    "question": "档口支付方式金额占比TOP5(按微信支付优先级排序)",
+    "sql": "SELECT branch_name AS \"档口名称\", wx/SUM(pay_sum) OVER(PARTITION BY branch_name) AS \"微信占比\" FROM bss_business_day_data WHERE delete_ts IS NULL ORDER BY \"微信占比\" DESC LIMIT 5;"
+  },
+  {
+    "question": "最近7天各支付类型订单趋势变化(按日期聚合)",
+    "sql": "SELECT oper_date AS \"统计日期\", SUM(wx_order) AS \"微信订单\", SUM(zf_order) AS \"支付宝订单\", SUM(rmb_order) AS \"现金订单\" FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date >= CURRENT_DATE - INTERVAL '7 days' GROUP BY oper_date ORDER BY oper_date;"
+  },
+  {
+    "question": "现金支付占比超过30%的服务区及天数统计",
+    "sql": "SELECT service_name AS \"服务区名称\", COUNT(*) AS \"高现金支付天数\" FROM bss_business_day_data WHERE delete_ts IS NULL AND rmb_order/order_sum > 0.3 GROUP BY service_name ORDER BY \"高现金支付天数\" DESC;"
+  },
+  {
+    "question": "不同档口微信支付平均金额对比(取TOP10)",
+    "sql": "SELECT branch_name AS \"档口名称\", AVG(wx) AS \"平均微信支付金额\" FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY branch_name ORDER BY \"平均微信支付金额\" DESC LIMIT 10;"
+  },
+  {
+    "question": "服务区各支付方式渗透率对比(按服务类型分组)",
+    "sql": "SELECT sa.service_area_type AS \"服务区类型\", bd.service_name AS \"服务区名称\", SUM(xs_order)/SUM(order_sum) AS \"行吧支付渗透率\" FROM bss_business_day_data bd JOIN bss_service_area sa ON bd.service_no = sa.service_area_no WHERE bd.delete_ts IS NULL GROUP BY sa.service_area_type, bd.service_name ORDER BY sa.service_area_type, \"行吧支付渗透率\" DESC;"
+  },
+  {
+    "question": "支付宝订单占比最高的前三天数据明细",
+    "sql": "SELECT oper_date AS \"统计日期\", service_name AS \"服务区名称\", zf_order AS \"支付宝订单数\", order_sum AS \"总订单数\" FROM bss_business_day_data WHERE delete_ts IS NULL ORDER BY zf_order/order_sum DESC LIMIT 3;"
+  },
+  {
+    "question": "行吧支付使用率最低的五个服务区(按订单量)",
+    "sql": "SELECT service_name AS \"服务区名称\", SUM(xs_order)/SUM(order_sum) AS \"行吧支付渗透率\" FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY service_name ORDER BY \"行吧支付渗透率\" ASC LIMIT 5;"
+  }
+]

+ 202 - 0
data_pipeline/training_data/task_20250702_174000/qs_highway_db_20250702_191655_pair.json.backup

@@ -0,0 +1,202 @@
+[
+  {
+    "question": "统计最近7天各服务区日均营业收入及订单量,按日均收入降序排列",
+    "sql": "SELECT service_name AS 服务区名称, AVG(pay_sum) AS 日均营收总额, AVG(order_sum) AS 日均订单量 FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date >= CURRENT_DATE - 7 GROUP BY service_name ORDER BY 日均营收总额 DESC;"
+  },
+  {
+    "question": "查询2023-10-01当日订单量TOP5档口及对应支付方式分布",
+    "sql": "SELECT branch_name AS 档口名称, order_sum AS 订单总量, wx AS 微信支付金额, zfb AS 支付宝支付金额, rmb AS 现金支付金额 FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date = '2023-10-01' ORDER BY 订单总量 DESC LIMIT 5;"
+  },
+  {
+    "question": "分析本月各服务区微信支付占比变化趋势(按日维度)",
+    "sql": "SELECT oper_date AS 统计日期, service_name AS 服务区名称, (wx / pay_sum * 100)::numeric(5,2) AS 微信支付占比 FROM bss_business_day_data WHERE delete_ts IS NULL AND EXTRACT(MONTH FROM oper_date) = EXTRACT(MONTH FROM CURRENT_DATE) ORDER BY 统计日期;"
+  },
+  {
+    "question": "对比不同服务区现金支付比例(近30天数据),筛选现金支付占比超过20%的记录",
+    "sql": "SELECT service_name AS 服务区名称, (SUM(rmb) / SUM(pay_sum) * 100)::numeric(5,2) AS 现金支付占比 FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date >= CURRENT_DATE - 30 GROUP BY service_name HAVING (SUM(rmb) / SUM(pay_sum) * 100) > 20 ORDER BY 现金支付占比 DESC;"
+  },
+  {
+    "question": "统计各档口平均订单金额(客单价)并筛选高于整体平均值的档口",
+    "sql": "WITH avg_data AS (SELECT AVG(pay_sum / nullif(order_sum,0)) AS global_avg FROM bss_business_day_data WHERE delete_ts IS NULL) SELECT branch_name AS 档口名称, (pay_sum / nullif(order_sum,0))::numeric(10,2) AS 客单价 FROM bss_business_day_data, avg_data WHERE delete_ts IS NULL AND (pay_sum / nullif(order_sum,0)) > global_avg;"
+  },
+  {
+    "question": "分析国庆期间(10.1-10.7)各支付方式交易总额及订单量对比",
+    "sql": "SELECT '微信' AS 支付方式, SUM(wx) AS 交易总额, SUM(wx_order) AS 订单量 FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date BETWEEN '2023-10-01' AND '2023-10-07' UNION ALL SELECT '支付宝', SUM(zfb), SUM(zf_order) FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date BETWEEN '2023-10-01' AND '2023-10-07' UNION ALL SELECT '现金', SUM(rmb), SUM(rmb_order) FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date BETWEEN '2023-10-01' AND '2023-10-07';"
+  },
+  {
+    "question": "查询最近一天营业数据异常(订单量为0但存在支付金额)的记录",
+    "sql": "SELECT oper_date AS 统计日期, service_name AS 服务区名称, branch_name AS 档口名称, pay_sum AS 支付总额, order_sum AS 订单量 FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date = (SELECT MAX(oper_date) FROM bss_business_day_data) AND order_sum = 0 AND pay_sum > 0;"
+  },
+  {
+    "question": "统计各服务区月度累计营收及环比增长率(按最近完整月份数据)",
+    "sql": "WITH monthly_data AS (SELECT service_name, EXTRACT(MONTH FROM oper_date) AS 月份, SUM(pay_sum) AS 月营收 FROM bss_business_day_data WHERE delete_ts IS NULL AND EXTRACT(MONTH FROM oper_date) = EXTRACT(MONTH FROM CURRENT_DATE) - 1 GROUP BY service_name, 月份) SELECT service_name AS 服务区名称, 月营收 AS 当前月营收, LAG(月营收) OVER(PARTITION BY service_name ORDER BY 月份) AS 上月营收, ((月营收 - LAG(月营收) OVER(PARTITION BY service_name ORDER BY 月份)) / LAG(月营收) OVER(PARTITION BY service_name ORDER BY 月份) * 100)::numeric(5,2) AS 环比增长率 FROM monthly_data;"
+  },
+  {
+    "question": "分析各档口非现金支付方式使用率(扫码支付占比)",
+    "sql": "SELECT branch_name AS 档口名称, (SUM(wx + zfb + xs + jd) / SUM(pay_sum) * 100)::numeric(5,2) AS 非现金支付占比, COUNT(*) AS 数据天数 FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY branch_name HAVING SUM(pay_sum) > 0 ORDER BY 非现金支付占比 DESC;"
+  },
+  {
+    "question": "统计国庆黄金周(7天)各服务区营收排名及环比节前7天增长率",
+    "sql": "WITH holiday AS (SELECT service_name, SUM(pay_sum) AS 节日营收 FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date BETWEEN '2023-10-01' AND '2023-10-07' GROUP BY service_name), pre_holiday AS (SELECT service_name, SUM(pay_sum) AS 节前营收 FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date BETWEEN '2023-09-24' AND '2023-09-30' GROUP BY service_name) SELECT h.service_name AS 服务区名称, h.节日营收, p.节前营收, ((h.节日营收 - p.节前营收)/p.节前营收 * 100)::numeric(5,2) AS 增长率 FROM holiday h JOIN pre_holiday p ON h.service_name = p.service_name ORDER BY h.节日营收 DESC;"
+  },
+  {
+    "question": "各服务区过去一周日均车流量排名TOP10",
+    "sql": "SELECT b.service_area_name AS 服务区名称, AVG(a.customer_count) AS 日均车流量 FROM bss_car_day_count a JOIN bss_service_area b ON a.service_area_id = b.id AND b.delete_ts IS NULL WHERE a.count_date >= CURRENT_DATE - 7 GROUP BY b.service_area_name ORDER BY 日均车流量 DESC LIMIT 10;"
+  },
+  {
+    "question": "本月每日总车流量变化趋势分析",
+    "sql": "SELECT count_date AS 统计日期, SUM(customer_count) AS 当日总车流量 FROM bss_car_day_count WHERE count_date >= DATE_TRUNC('month', CURRENT_DATE) GROUP BY count_date ORDER BY count_date ASC;"
+  },
+  {
+    "question": "各车型占比排名(全量数据)",
+    "sql": "SELECT car_type AS 车辆类型, SUM(customer_count) AS 总车次, ROUND(SUM(customer_count)*100/(SELECT SUM(customer_count) FROM bss_car_day_count WHERE delete_ts IS NULL),2) AS 占比百分比 FROM bss_car_day_count WHERE delete_ts IS NULL GROUP BY car_type ORDER BY 总车次 DESC;"
+  },
+  {
+    "question": "某服务区近30天车流量环比增长率",
+    "sql": "WITH daily_count AS (SELECT count_date, SUM(customer_count) AS total_cars FROM bss_car_day_count WHERE service_area_id = 'SA001' AND count_date >= CURRENT_DATE - 30 GROUP BY count_date ORDER BY count_date) SELECT count_date, total_cars, LAG(total_cars) OVER(ORDER BY count_date) AS 前一日车流, ROUND((total_cars - LAG(total_cars) OVER(ORDER BY count_date))*100/LAG(total_cars) OVER(ORDER BY count_date),2) AS 环比增长率 FROM daily_count;"
+  },
+  {
+    "question": "各服务区不同类型车辆数量分布",
+    "sql": "SELECT b.service_area_name AS 服务区名称, a.car_type AS 车辆类型, SUM(a.customer_count) AS 车辆总数 FROM bss_car_day_count a JOIN bss_service_area b ON a.service_area_id = b.id AND b.delete_ts IS NULL GROUP BY b.service_area_name, a.car_type ORDER BY 服务区名称, 车辆总数 DESC;"
+  },
+  {
+    "question": "国庆黄金周与平日车流量对比分析",
+    "sql": "SELECT CASE WHEN count_date BETWEEN '2023-10-01' AND '2023-10-07' THEN '国庆假期' ELSE '普通工作日' END AS 日期类型, SUM(customer_count) AS 总车流量, COUNT(DISTINCT count_date) AS 天数, ROUND(AVG(customer_count),2) AS 日均车流 FROM bss_car_day_count WHERE count_date BETWEEN '2023-10-01' AND '2023-10-14' GROUP BY 日期类型;"
+  },
+  {
+    "question": "某服务区各星期日车流量分布情况",
+    "sql": "SELECT EXTRACT(ISODOW FROM count_date) AS 星期编号, TO_CHAR(count_date, 'Day') AS 星期名称, AVG(customer_count) AS 平均车流量 FROM bss_car_day_count WHERE service_area_id = 'SA001' AND count_date >= CURRENT_DATE - 90 GROUP BY 星期编号, 星期名称 ORDER BY 星期编号;"
+  },
+  {
+    "question": "年度车流量最高TOP10日期明细",
+    "sql": "SELECT count_date AS 统计日期, SUM(customer_count) AS 当日车流 FROM bss_car_day_count WHERE count_date >= DATE_TRUNC('year', CURRENT_DATE) GROUP BY count_date ORDER BY 当日车流 DESC LIMIT 10;"
+  },
+  {
+    "question": "某服务区新能源车占比月度变化趋势",
+    "sql": "SELECT DATE_TRUNC('month', count_date) AS 统计月份, SUM(CASE WHEN car_type IN ('电动客车','电动货车') THEN customer_count ELSE 0 END) AS 新能源车流量, SUM(customer_count) AS 总车流, ROUND(SUM(CASE WHEN car_type IN ('电动客车','电动货车') THEN customer_count ELSE 0 END)*100/SUM(customer_count),2) AS 新能源占比 FROM bss_car_day_count WHERE service_area_id = 'SA002' GROUP BY 统计月份 ORDER BY 统计月份;"
+  },
+  {
+    "question": "各区域公司管辖服务区平均车流量对比",
+    "sql": "SELECT c.company_name AS 运营公司, COUNT(DISTINCT b.id) AS 管辖服务区数, ROUND(AVG(a.customer_count),2) AS 日均车流量 FROM bss_car_day_count a JOIN bss_service_area b ON a.service_area_id = b.id AND b.delete_ts IS NULL JOIN bss_company c ON b.company_id = c.id WHERE a.count_date = CURRENT_DATE GROUP BY c.company_name;"
+  },
+  {
+    "question": "各运营管理公司的服务区数量排名情况如何?",
+    "sql": "SELECT bc.company_name AS 公司名称, COUNT(bsa.id) AS 服务区数量 FROM bss_service_area bsa JOIN bss_company bc ON bsa.company_id = bc.id WHERE bsa.delete_ts IS NULL AND bc.delete_ts IS NULL GROUP BY bc.company_name ORDER BY 服务区数量 DESC LIMIT 10;"
+  },
+  {
+    "question": "当前各运营状态下的服务区数量分布情况?",
+    "sql": "SELECT service_state AS 运营状态, COUNT(*) AS 数量 FROM bss_service_area WHERE delete_ts IS NULL GROUP BY service_state ORDER BY 数量 DESC;"
+  },
+  {
+    "question": "XX公司管理的各类型服务区数量占比分析",
+    "sql": "SELECT service_area_type AS 服务区类型, COUNT(*) AS 数量 FROM bss_service_area WHERE company_id = (SELECT id FROM bss_company WHERE company_name = 'XX公司') AND delete_ts IS NULL GROUP BY service_area_type;"
+  },
+  {
+    "question": "最近一周新增的各公司服务区数量统计",
+    "sql": "SELECT bc.company_name AS 公司名称, COUNT(bsa.id) AS 新增数量 FROM bss_service_area bsa JOIN bss_company bc ON bsa.company_id = bc.id WHERE bsa.create_ts >= CURRENT_DATE - 7 AND bsa.delete_ts IS NULL AND bc.delete_ts IS NULL GROUP BY bc.company_name ORDER BY 新增数量 DESC;"
+  },
+  {
+    "question": "各公司服务区日均订单数对比分析",
+    "sql": "SELECT bc.company_name AS 公司名称, AVG(bdd.order_sum) AS 日均订单数 FROM bss_business_day_data bdd JOIN bss_service_area bsa ON bdd.service_no = bsa.service_area_no JOIN bss_company bc ON bsa.company_id = bc.id WHERE bdd.delete_ts IS NULL AND bsa.delete_ts IS NULL AND bc.delete_ts IS NULL GROUP BY bc.company_name ORDER BY 日均订单数 DESC;"
+  },
+  {
+    "question": "各区域服务区覆盖率(按路段关联数量统计)",
+    "sql": "SELECT bc.company_name AS 公司名称, COUNT(DISTINCT bsr.section_route_id) AS 覆盖路段数 FROM bss_section_route_area_link bsral JOIN bss_service_area bsa ON bsral.service_area_id = bsa.id JOIN bss_company bc ON bsa.company_id = bc.id WHERE bsral.delete_ts IS NULL AND bsa.delete_ts IS NULL AND bc.delete_ts IS NULL GROUP BY bc.company_name ORDER BY 覆盖路段数 DESC;"
+  },
+  {
+    "question": "各公司正常运营与非正常运营服务区数量对比",
+    "sql": "SELECT bc.company_name AS 公司名称, bsa.service_state AS 运营状态, COUNT(*) AS 数量 FROM bss_service_area bsa JOIN bss_company bc ON bsa.company_id = bc.id WHERE bsa.delete_ts IS NULL AND bc.delete_ts IS NULL GROUP BY bc.company_name, bsa.service_state ORDER BY 公司名称, 数量 DESC;"
+  },
+  {
+    "question": "2023年Q2各公司服务区总支付金额环比分析",
+    "sql": "SELECT bc.company_name AS 公司名称, SUM(CASE WHEN EXTRACT(QUARTER FROM bdd.oper_date) = 2 THEN bdd.pay_sum ELSE 0 END) AS 第二季度金额, SUM(CASE WHEN EXTRACT(QUARTER FROM bdd.oper_date) = 1 THEN bdd.pay_sum ELSE 0 END) AS 第一季度金额 FROM bss_business_day_data bdd JOIN bss_service_area bsa ON bdd.service_no = bsa.service_area_no JOIN bss_company bc ON bsa.company_id = bc.id WHERE bdd.oper_date BETWEEN '2023-01-01' AND '2023-06-30' AND bdd.delete_ts IS NULL GROUP BY bc.company_name ORDER BY 第二季度金额 DESC;"
+  },
+  {
+    "question": "各公司管理的服务区车辆流量TOP5统计",
+    "sql": "SELECT bc.company_name AS 公司名称, SUM(cc.customer_count) AS 总车流量 FROM bss_car_day_count cc JOIN bss_service_area bsa ON cc.service_area_id = bsa.id JOIN bss_company bc ON bsa.company_id = bc.id WHERE cc.count_date = CURRENT_DATE - 1 AND cc.delete_ts IS NULL AND bsa.delete_ts IS NULL GROUP BY bc.company_name ORDER BY 总车流量 DESC LIMIT 5;"
+  },
+  {
+    "question": "查找近30天无业务数据的服务区清单",
+    "sql": "SELECT bsa.service_area_name AS 服务区名称, bc.company_name AS 管理公司 FROM bss_service_area bsa LEFT JOIN bss_business_day_data bdd ON bsa.service_area_no = bdd.service_no AND bdd.oper_date >= CURRENT_DATE - 30 JOIN bss_company bc ON bsa.company_id = bc.id WHERE bdd.id IS NULL AND bsa.delete_ts IS NULL AND bc.delete_ts IS NULL LIMIT 10;"
+  },
+  {
+    "question": "统计各路段路线关联的服务区数量,评估服务区覆盖密度",
+    "sql": "SELECT sr.route_name AS 路线名称, COUNT(link.service_area_id) AS 服务区数量 FROM bss_section_route sr LEFT JOIN bss_section_route_area_link link ON sr.id = link.section_route_id WHERE sr.delete_ts IS NULL GROUP BY sr.route_name ORDER BY 服务区数量 DESC;"
+  },
+  {
+    "question": "查询最近一个月新增的路段路线与服务区关联关系",
+    "sql": "SELECT sa.service_area_name AS 服务区名称, sr.section_name AS 路段名称, sr.route_name AS 路线名称 FROM bss_section_route_area_link link JOIN bss_section_route sr ON link.section_route_id = sr.id JOIN bss_service_area sa ON link.service_area_id = sa.id WHERE sr.create_ts >= NOW() - INTERVAL '1 month' AND sr.delete_ts IS NULL AND sa.delete_ts IS NULL ORDER BY sr.create_ts DESC LIMIT 10;"
+  },
+  {
+    "question": "分析各服务区关联的路段路线数量TOP10",
+    "sql": "SELECT sa.service_area_name AS 服务区名称, COUNT(sr.id) AS 关联路段数 FROM bss_section_route_area_link link JOIN bss_service_area sa ON link.service_area_id = sa.id JOIN bss_section_route sr ON link.section_route_id = sr.id WHERE sa.delete_ts IS NULL GROUP BY sa.service_area_name ORDER BY 关联路段数 DESC LIMIT 10;"
+  },
+  {
+    "question": "统计无服务区覆盖的路段路线信息",
+    "sql": "SELECT sr.section_name AS 路段名称, sr.route_name AS 路线名称 FROM bss_section_route sr LEFT JOIN bss_section_route_area_link link ON sr.id = link.section_route_id WHERE link.section_route_id IS NULL AND sr.delete_ts IS NULL ORDER BY sr.create_ts DESC;"
+  },
+  {
+    "question": "分析不同路线名称对应的服务区平均覆盖密度",
+    "sql": "SELECT route_name AS 路线名称, AVG(service_count) AS 平均服务区密度 FROM (SELECT sr.route_name, COUNT(link.service_area_id) AS service_count FROM bss_section_route sr LEFT JOIN bss_section_route_area_link link ON sr.id = link.section_route_id WHERE sr.delete_ts IS NULL GROUP BY sr.route_name, sr.id) sub GROUP BY route_name HAVING AVG(service_count) > 0 ORDER BY 平均服务区密度 DESC;"
+  },
+  {
+    "question": "查询包含服务区最多的3个路段编号及其覆盖情况",
+    "sql": "SELECT sr.code AS 路段编号, sr.section_name AS 路段名称, COUNT(link.service_area_id) AS 服务区数量 FROM bss_section_route sr JOIN bss_section_route_area_link link ON sr.id = link.section_route_id WHERE sr.delete_ts IS NULL GROUP BY sr.code, sr.section_name ORDER BY 服务区数量 DESC LIMIT 3;"
+  },
+  {
+    "question": "分析服务区关联路段的创建时间分布情况",
+    "sql": "SELECT EXTRACT(MONTH FROM sr.create_ts) AS 月份, COUNT(*) AS 新增路段数 FROM bss_section_route sr JOIN bss_section_route_area_link link ON sr.id = link.section_route_id WHERE sr.delete_ts IS NULL GROUP BY 月份 ORDER BY 月份;"
+  },
+  {
+    "question": "统计双向路线(上下行)的服务区覆盖对称性",
+    "sql": "SELECT sr.code AS 路段编号, COUNT(DISTINCT CASE WHEN sr.route_name LIKE '%上行%' THEN link.service_area_id END) AS 上行服务区数, COUNT(DISTINCT CASE WHEN sr.route_name LIKE '%下行%' THEN link.service_area_id END) AS 下行服务区数 FROM bss_section_route sr JOIN bss_section_route_area_link link ON sr.id = link.section_route_id WHERE sr.delete_ts IS NULL AND (sr.route_name LIKE '%上行%' OR sr.route_name LIKE '%下行%') GROUP BY sr.code HAVING COUNT(DISTINCT CASE WHEN sr.route_name LIKE '%上行%' THEN link.service_area_id END) != COUNT(DISTINCT CASE WHEN sr.route_name LIKE '%下行%' THEN link.service_area_id END);"
+  },
+  {
+    "question": "查询最近7天内未产生业务数据的服务区关联路段",
+    "sql": "SELECT sr.section_name AS 路段名称, sa.service_area_name AS 服务区名称 FROM bss_section_route_area_link link JOIN bss_section_route sr ON link.section_route_id = sr.id JOIN bss_service_area sa ON link.service_area_id = sa.id LEFT JOIN bss_business_day_data business ON sa.id = business.service_no::uuid AND business.oper_date >= NOW() - INTERVAL '7 days' WHERE business.id IS NULL AND sr.delete_ts IS NULL AND sa.delete_ts IS NULL GROUP BY sr.section_name, sa.service_area_name;"
+  },
+  {
+    "question": "分析不同运营状态服务区的路段覆盖分布",
+    "sql": "SELECT sa.service_state AS 运营状态, COUNT(DISTINCT sr.id) AS 覆盖路段数, COUNT(DISTINCT link.service_area_id) AS 服务区数量 FROM bss_section_route sr JOIN bss_section_route_area_link link ON sr.id = link.section_route_id JOIN bss_service_area sa ON link.service_area_id = sa.id WHERE sr.delete_ts IS NULL AND sa.delete_ts IS NULL GROUP BY sa.service_state ORDER BY 覆盖路段数 DESC;"
+  },
+  {
+    "question": "各服务区微信支付渗透率及订单占比分析(按订单量排序)",
+    "sql": "SELECT service_name AS \"服务区名称\", SUM(wx_order)/SUM(order_sum) AS \"微信支付渗透率\" FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY service_name ORDER BY \"微信支付渗透率\" DESC;"
+  },
+  {
+    "question": "不同地区支付宝与现金支付金额对比(取平均值排序)",
+    "sql": "SELECT sa.service_area_type AS \"服务区类型\", AVG(bd.zfb) AS \"平均支付宝支付\", AVG(bd.rmb) AS \"平均现金支付\" FROM bss_business_day_data bd JOIN bss_service_area sa ON bd.service_no = sa.service_area_no WHERE bd.delete_ts IS NULL AND sa.delete_ts IS NULL GROUP BY sa.service_area_type ORDER BY \"平均支付宝支付\" DESC;"
+  },
+  {
+    "question": "档口支付方式金额占比TOP5(按微信支付优先级排序)",
+    "sql": "SELECT branch_name AS \"档口名称\", wx/SUM(pay_sum) OVER(PARTITION BY branch_name) AS \"微信占比\" FROM bss_business_day_data WHERE delete_ts IS NULL ORDER BY \"微信占比\" DESC LIMIT 5;"
+  },
+  {
+    "question": "最近7天各支付类型订单趋势变化(按日期聚合)",
+    "sql": "SELECT oper_date AS \"统计日期\", SUM(wx_order) AS \"微信订单\", SUM(zf_order) AS \"支付宝订单\", SUM(rmb_order) AS \"现金订单\" FROM bss_business_day_data WHERE delete_ts IS NULL AND oper_date >= CURRENT_DATE - INTERVAL '7 days' GROUP BY oper_date ORDER BY oper_date;"
+  },
+  {
+    "question": "现金支付占比超过30%的服务区及天数统计",
+    "sql": "SELECT service_name AS \"服务区名称\", COUNT(*) AS \"高现金支付天数\" FROM bss_business_day_data WHERE delete_ts IS NULL AND rmb_order/order_sum > 0.3 GROUP BY service_name ORDER BY \"高现金支付天数\" DESC;"
+  },
+  {
+    "question": "不同档口微信支付平均金额对比(取TOP10)",
+    "sql": "SELECT branch_name AS \"档口名称\", AVG(wx) AS \"平均微信支付金额\" FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY branch_name ORDER BY \"平均微信支付金额\" DESC LIMIT 10;"
+  },
+  {
+    "question": "服务区各支付方式渗透率对比(按服务类型分组)",
+    "sql": "SELECT sa.service_area_type AS \"服务区类型\", bd.service_name AS \"服务区名称\", SUM(xs_order)/SUM(order_sum) AS \"行吧支付渗透率\" FROM bss_business_day_data bd JOIN bss_service_area sa ON bd.service_no = sa.service_area_no WHERE bd.delete_ts IS NULL GROUP BY sa.service_area_type, bd.service_name ORDER BY sa.service_area_type, \"行吧支付渗透率\" DESC;"
+  },
+  {
+    "question": "支付宝订单占比最高的前三天数据明细",
+    "sql": "SELECT oper_date AS \"统计日期\", service_name AS \"服务区名称\", zf_order AS \"支付宝订单数\", order_sum AS \"总订单数\" FROM bss_business_day_data WHERE delete_ts IS NULL ORDER BY zf_order/order_sum DESC LIMIT 3;"
+  },
+  {
+    "question": "行吧支付使用率最低的五个服务区(按订单量)",
+    "sql": "SELECT service_name AS \"服务区名称\", SUM(xs_order)/SUM(order_sum) AS \"行吧支付渗透率\" FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY service_name ORDER BY \"行吧支付渗透率\" ASC LIMIT 5;"
+  },
+  {
+    "question": "档口支付结构稳定性分析(计算各支付方式金额方差)",
+    "sql": "SELECT branch_name AS \"档口名称\", VARIANCE(wx) AS \"微信支付方差\", VARIANCE(zf) AS \"支付宝支付方差\" FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY branch_name ORDER BY \"微信支付方差\" + \"支付宝支付方差\" DESC LIMIT 10;"
+  }
+]

+ 11 - 0
data_pipeline/training_data/task_20250702_174000/table_list.txt

@@ -0,0 +1,11 @@
+# 表清单文件
+# 生成时间: 2025-07-02 18:07:15
+# 表数量: 7
+
+bss_car_day_count
+bss_business_day_data
+bss_company
+bss_section_route
+bss_section_route_area_link
+bss_service_area
+bss_service_area_mapper

+ 15 - 0
data_pipeline/training_data/task_20250702_174000/task_config.json

@@ -0,0 +1,15 @@
+{
+  "task_id": "task_20250702_174000",
+  "created_at": "2025-07-02T17:40:00.268100",
+  "parameters": {
+    "db_connection": "postgresql://postgres:postgres@192.168.67.1:6432/highway_db",
+    "table_list_file": "{task_directory}/table_list.txt",
+    "business_context": "高速公路服务区管理系统",
+    "file_upload_mode": true,
+    "enable_llm_repair": true,
+    "modify_original_file": true,
+    "enable_sql_validation": true,
+    "enable_training_data_load": true
+  },
+  "output_directory": "C:\\Projects\\cursor_projects\\Vanna-Chainlit-Chromadb\\data_pipeline\\training_data\\task_20250702_174000"
+}

+ 117 - 0
data_pipeline/training_data/task_20250702_174000/task_result.json

@@ -0,0 +1,117 @@
+{
+  "success": true,
+  "workflow_state": {
+    "start_time": null,
+    "end_time": null,
+    "current_step": "training_data_load",
+    "completed_steps": [
+      "ddl_md_generation",
+      "question_sql_generation",
+      "sql_validation",
+      "training_data_load"
+    ],
+    "failed_steps": [],
+    "artifacts": {
+      "ddl_md_generation": {
+        "total_tables": 7,
+        "processed_successfully": 0,
+        "failed": 7,
+        "files_generated": 0,
+        "duration": 368.9130046367645
+      },
+      "question_sql_generation": {
+        "output_file": "C:\\Projects\\cursor_projects\\Vanna-Chainlit-Chromadb\\data_pipeline\\training_data\\task_20250702_174000\\qs_highway_db_20250702_191655_pair.json",
+        "total_questions": 50,
+        "total_themes": 5,
+        "successful_themes": 5,
+        "failed_themes": [],
+        "duration": 424.0814118385315
+      },
+      "sql_validation": {
+        "original_sql_count": 50,
+        "valid_sql_count": 47,
+        "invalid_sql_count": 3,
+        "success_rate": 0.94,
+        "repair_stats": {
+          "attempted": 3,
+          "successful": 0,
+          "failed": 3
+        },
+        "file_modification_stats": {
+          "modified": 0,
+          "deleted": 3,
+          "failed_modifications": 0
+        },
+        "average_execution_time": 0.051609673500061036,
+        "total_retries": 0,
+        "duration": 145.22257566452026
+      },
+      "training_data_load": {
+        "training_data_dir": "C:\\Projects\\cursor_projects\\Vanna-Chainlit-Chromadb\\data_pipeline\\training_data\\task_20250702_174000",
+        "load_successful": true,
+        "total_records": 506,
+        "data_type_counts": {
+          "sql": 442,
+          "documentation": 34,
+          "ddl": 29,
+          "error_sql": 1
+        },
+        "duration": 73.11930394172668
+      }
+    },
+    "statistics": {
+      "step1_duration": 368.9130046367645,
+      "step2_duration": 424.0814118385315,
+      "step3_duration": 145.22257566452026,
+      "step4_duration": 73.11930394172668
+    }
+  },
+  "artifacts": {
+    "ddl_md_generation": {
+      "total_tables": 7,
+      "processed_successfully": 0,
+      "failed": 7,
+      "files_generated": 0,
+      "duration": 368.9130046367645
+    },
+    "question_sql_generation": {
+      "output_file": "C:\\Projects\\cursor_projects\\Vanna-Chainlit-Chromadb\\data_pipeline\\training_data\\task_20250702_174000\\qs_highway_db_20250702_191655_pair.json",
+      "total_questions": 50,
+      "total_themes": 5,
+      "successful_themes": 5,
+      "failed_themes": [],
+      "duration": 424.0814118385315
+    },
+    "sql_validation": {
+      "original_sql_count": 50,
+      "valid_sql_count": 47,
+      "invalid_sql_count": 3,
+      "success_rate": 0.94,
+      "repair_stats": {
+        "attempted": 3,
+        "successful": 0,
+        "failed": 3
+      },
+      "file_modification_stats": {
+        "modified": 0,
+        "deleted": 3,
+        "failed_modifications": 0
+      },
+      "average_execution_time": 0.051609673500061036,
+      "total_retries": 0,
+      "duration": 145.22257566452026
+    },
+    "training_data_load": {
+      "training_data_dir": "C:\\Projects\\cursor_projects\\Vanna-Chainlit-Chromadb\\data_pipeline\\training_data\\task_20250702_174000",
+      "load_successful": true,
+      "total_records": 506,
+      "data_type_counts": {
+        "sql": 442,
+        "documentation": 34,
+        "ddl": 29,
+        "error_sql": 1
+      },
+      "duration": 73.11930394172668
+    }
+  }
+}

+ 31 - 0
data_pipeline/training_data/task_20250702_194611/bss_business_day_data.ddl

@@ -0,0 +1,31 @@
+-- 中文名: 业务支撑系统每日业务统计表
+-- 描述: 业务支撑系统每日业务统计表
+create table public.bss_business_day_data (
+  id varchar(32) not null     -- 主键ID,主键,
+  version integer not null    -- 版本号,
+  create_ts timestamp         -- 创建时间,
+  created_by varchar(50)      -- 创建人,
+  update_ts timestamp         -- 更新时间,
+  updated_by varchar(50)      -- 更新人,
+  delete_ts timestamp         -- 删除时间,
+  deleted_by varchar(50)      -- 删除人,
+  oper_date date              -- 统计日期,
+  service_no varchar(255)     -- 服务区编码,
+  service_name varchar(255)   -- 服务区名称,
+  branch_no varchar(255)      -- 档口编码,
+  branch_name varchar(255)    -- 档口名称,
+  wx numeric(19,4)            -- 微信支付金额,
+  wx_order integer            -- 微信订单数量,
+  zfb numeric(19,4)           -- 支付宝支付金额,
+  zf_order integer            -- 支付宝订单数量,
+  rmb numeric(19,4)           -- 现金支付金额,
+  rmb_order integer           -- 现金支付订单数量,
+  xs numeric(19,4)            -- 行吧支付金额,
+  xs_order integer            -- 行吧支付订单数,
+  jd numeric(19,4)            -- 金豆支付金额,
+  jd_order integer            -- 金豆支付订单数,
+  order_sum integer           -- 订单总数,
+  pay_sum numeric(19,4)       -- 支付总金额,
+  source_type integer         -- 数据来源类型,
+  primary key (id)
+);

+ 31 - 0
data_pipeline/training_data/task_20250702_194611/bss_business_day_data_detail.md

@@ -0,0 +1,31 @@
+## bss_business_day_data(业务支撑系统每日业务统计表)
+bss_business_day_data 表业务支撑系统每日业务统计表
+字段列表:
+- id (varchar(32)) - 主键ID [主键, 非空]
+- version (integer) - 版本号 [非空]
+- create_ts (timestamp) - 创建时间
+- created_by (varchar(50)) - 创建人
+- update_ts (timestamp) - 更新时间
+- updated_by (varchar(50)) - 更新人
+- delete_ts (timestamp) - 删除时间
+- deleted_by (varchar(50)) - 删除人
+- oper_date (date) - 统计日期
+- service_no (varchar(255)) - 服务区编码
+- service_name (varchar(255)) - 服务区名称
+- branch_no (varchar(255)) - 档口编码
+- branch_name (varchar(255)) - 档口名称
+- wx (numeric(19,4)) - 微信支付金额
+- wx_order (integer) - 微信订单数量
+- zfb (numeric(19,4)) - 支付宝支付金额
+- zf_order (integer) - 支付宝订单数量
+- rmb (numeric(19,4)) - 现金支付金额
+- rmb_order (integer) - 现金支付订单数量
+- xs (numeric(19,4)) - 行吧支付金额
+- xs_order (integer) - 行吧支付订单数
+- jd (numeric(19,4)) - 金豆支付金额
+- jd_order (integer) - 金豆支付订单数
+- order_sum (integer) - 订单总数
+- pay_sum (numeric(19,4)) - 支付总金额
+- source_type (integer) - 数据来源类型
+字段补充说明:
+- id 为主键

+ 17 - 0
data_pipeline/training_data/task_20250702_194611/bss_car_day_count.ddl

@@ -0,0 +1,17 @@
+-- 中文名: `车辆日统计表:按类别统计服务区每日车流量
+-- 描述: `车辆日统计表:按类别统计服务区每日车流量,支撑运营分析与资源调度`
+create table public.bss_car_day_count (
+  id varchar(32) not null     -- 记录ID,主键,
+  version integer not null    -- 数据版本号,
+  create_ts timestamp         -- 创建时间,
+  created_by varchar(50)      -- 创建人,
+  update_ts timestamp         -- 更新时间,
+  updated_by varchar(50)      -- 更新人,
+  delete_ts timestamp         -- 删除时间,
+  deleted_by varchar(50)      -- 删除人,
+  customer_count bigint       -- 车辆数量,
+  car_type varchar(100)       -- 车辆类别,
+  count_date date             -- 统计日期,
+  service_area_id varchar(32) -- 服务区ID,
+  primary key (id)
+);

+ 17 - 0
data_pipeline/training_data/task_20250702_194611/bss_car_day_count_detail.md

@@ -0,0 +1,17 @@
+## bss_car_day_count(`车辆日统计表:按类别统计服务区每日车流量)
+bss_car_day_count 表`车辆日统计表:按类别统计服务区每日车流量,支撑运营分析与资源调度`
+字段列表:
+- id (varchar(32)) - 记录ID [主键, 非空]
+- version (integer) - 数据版本号 [非空]
+- create_ts (timestamp) - 创建时间
+- created_by (varchar(50)) - 创建人
+- update_ts (timestamp) - 更新时间
+- updated_by (varchar(50)) - 更新人
+- delete_ts (timestamp) - 删除时间
+- deleted_by (varchar(50)) - 删除人
+- customer_count (bigint) - 车辆数量
+- car_type (varchar(100)) - 车辆类别
+- count_date (date) - 统计日期
+- service_area_id (varchar(32)) - 服务区ID
+字段补充说明:
+- id 为主键

+ 15 - 0
data_pipeline/training_data/task_20250702_194611/bss_company.ddl

@@ -0,0 +1,15 @@
+-- 中文名: 服务区公司信息表
+-- 描述: 服务区公司信息表,存储运营主体基础数据,支持公司编码、名称及变更记录管理。
+create table public.bss_company (
+  id varchar(32) not null     -- 公司ID,主键,
+  version integer not null    -- 版本号,
+  create_ts timestamp         -- 创建时间,
+  created_by varchar(50)      -- 创建人ID,
+  update_ts timestamp         -- 更新时间,
+  updated_by varchar(50)      -- 更新人ID,
+  delete_ts timestamp         -- 删除时间,
+  deleted_by varchar(50)      -- 删除人ID,
+  company_name varchar(255)   -- 公司名称,
+  company_no varchar(255)     -- 公司编码,
+  primary key (id)
+);

+ 15 - 0
data_pipeline/training_data/task_20250702_194611/bss_company_detail.md

@@ -0,0 +1,15 @@
+## bss_company(服务区公司信息表)
+bss_company 表服务区公司信息表,存储运营主体基础数据,支持公司编码、名称及变更记录管理。
+字段列表:
+- id (varchar(32)) - 公司ID [主键, 非空]
+- version (integer) - 版本号 [非空]
+- create_ts (timestamp) - 创建时间
+- created_by (varchar(50)) - 创建人ID
+- update_ts (timestamp) - 更新时间
+- updated_by (varchar(50)) - 更新人ID
+- delete_ts (timestamp) - 删除时间
+- deleted_by (varchar(50)) - 删除人ID
+- company_name (varchar(255)) - 公司名称
+- company_no (varchar(255)) - 公司编码
+字段补充说明:
+- id 为主键

+ 16 - 0
data_pipeline/training_data/task_20250702_194611/bss_section_route.ddl

@@ -0,0 +1,16 @@
+-- 中文名: 业务支撑系统路段路线关联表
+-- 描述: 业务支撑系统路段路线关联表,记录路段与路线名称对应关系,用于服务区位置管理及路网信息维护
+create table public.bss_section_route (
+  id varchar(32) not null     -- 主键ID,主键,
+  version integer not null    -- 版本号,
+  create_ts timestamp         -- 创建时间,
+  created_by varchar(50)      -- 创建人,
+  update_ts timestamp         -- 更新时间,
+  updated_by varchar(50)      -- 更新人,
+  delete_ts timestamp         -- 删除时间,
+  deleted_by varchar(50)      -- 删除人,
+  section_name varchar(255)   -- 路段名称,
+  route_name varchar(255)     -- 路线名称,
+  code varchar(255)           -- 编号,
+  primary key (id)
+);

+ 7 - 0
data_pipeline/training_data/task_20250702_194611/bss_section_route_area_link.ddl

@@ -0,0 +1,7 @@
+-- 中文名: 存储路线段与服务区关联关系
+-- 描述: 存储路线段与服务区关联关系,管理高速线路与服务区归属
+create table public.bss_section_route_area_link (
+  section_route_id varchar(32) not null -- 路段路线ID,主键,
+  service_area_id varchar(32) not null -- 服务区编码,主键,
+  primary key (section_route_id, service_area_id)
+);

+ 7 - 0
data_pipeline/training_data/task_20250702_194611/bss_section_route_area_link_detail.md

@@ -0,0 +1,7 @@
+## bss_section_route_area_link(存储路线段与服务区关联关系)
+bss_section_route_area_link 表存储路线段与服务区关联关系,管理高速线路与服务区归属
+字段列表:
+- section_route_id (varchar(32)) - 路段路线ID [主键, 非空]
+- service_area_id (varchar(32)) - 服务区编码 [主键, 非空]
+字段补充说明:
+- 复合主键:section_route_id, service_area_id

+ 16 - 0
data_pipeline/training_data/task_20250702_194611/bss_section_route_detail.md

@@ -0,0 +1,16 @@
+## bss_section_route(业务支撑系统路段路线关联表)
+bss_section_route 表业务支撑系统路段路线关联表,记录路段与路线名称对应关系,用于服务区位置管理及路网信息维护
+字段列表:
+- id (varchar(32)) - 主键ID [主键, 非空]
+- version (integer) - 版本号 [非空]
+- create_ts (timestamp) - 创建时间
+- created_by (varchar(50)) - 创建人
+- update_ts (timestamp) - 更新时间
+- updated_by (varchar(50)) - 更新人
+- delete_ts (timestamp) - 删除时间
+- deleted_by (varchar(50)) - 删除人
+- section_name (varchar(255)) - 路段名称
+- route_name (varchar(255)) - 路线名称
+- code (varchar(255)) - 编号
+字段补充说明:
+- id 为主键

+ 19 - 0
data_pipeline/training_data/task_20250702_194611/bss_service_area.ddl

@@ -0,0 +1,19 @@
+-- 中文名: 存储高速公路服务区基础信息及管理记录
+-- 描述: 存储高速公路服务区基础信息及管理记录,包含服务区名称、编码、创建/更新时间等,用于统一管理服务区数据。
+create table public.bss_service_area (
+  id varchar(32) not null     -- 主键ID,主键,
+  version integer not null    -- 版本号,
+  create_ts timestamp         -- 创建时间,
+  created_by varchar(50)      -- 创建人ID,
+  update_ts timestamp         -- 最后更新时间,
+  updated_by varchar(50)      -- 最后更新人ID,
+  delete_ts timestamp         -- 删除时间,
+  deleted_by varchar(50)      -- 删除人ID,
+  service_area_name varchar(255) -- 服务区名称,
+  service_area_no varchar(255) -- 服务区编码,
+  company_id varchar(32)      -- 所属公司ID,
+  service_position varchar(255) -- 服务区经纬度,
+  service_area_type varchar(50) -- 服务区类型,
+  service_state varchar(50)   -- 服务区状态,
+  primary key (id)
+);

+ 19 - 0
data_pipeline/training_data/task_20250702_194611/bss_service_area_detail.md

@@ -0,0 +1,19 @@
+## bss_service_area(存储高速公路服务区基础信息及管理记录)
+bss_service_area 表存储高速公路服务区基础信息及管理记录,包含服务区名称、编码、创建/更新时间等,用于统一管理服务区数据。
+字段列表:
+- id (varchar(32)) - 主键ID [主键, 非空]
+- version (integer) - 版本号 [非空]
+- create_ts (timestamp) - 创建时间
+- created_by (varchar(50)) - 创建人ID
+- update_ts (timestamp) - 最后更新时间
+- updated_by (varchar(50)) - 最后更新人ID
+- delete_ts (timestamp) - 删除时间
+- deleted_by (varchar(50)) - 删除人ID
+- service_area_name (varchar(255)) - 服务区名称
+- service_area_no (varchar(255)) - 服务区编码
+- company_id (varchar(32)) - 所属公司ID
+- service_position (varchar(255)) - 服务区经纬度
+- service_area_type (varchar(50)) - 服务区类型
+- service_state (varchar(50)) - 服务区状态
+字段补充说明:
+- id 为主键

+ 18 - 0
data_pipeline/training_data/task_20250702_194611/bss_service_area_mapper.ddl

@@ -0,0 +1,18 @@
+-- 中文名: 记录BSS与服务区编码的映射关系
+-- 描述: 记录BSS与服务区编码的映射关系,包含版本、维护人及状态,用于跨系统数据同步。
+create table public.bss_service_area_mapper (
+  id varchar(32) not null     -- 主键ID,主键,
+  version integer not null    -- 版本号,
+  create_ts timestamp         -- 创建时间,
+  created_by varchar(50)      -- 创建人,
+  update_ts timestamp         -- 更新时间,
+  updated_by varchar(50)      -- 更新人,
+  delete_ts timestamp         -- 删除时间,
+  deleted_by varchar(50)      -- 删除人,
+  service_name varchar(255)   -- 服务区名称,
+  service_no varchar(255)     -- 服务区编码,
+  service_area_id varchar(32) -- 服务区ID,
+  source_system_type varchar(50) -- 数据来源系统类型,
+  source_type integer         -- 数据来源类别ID,
+  primary key (id)
+);

+ 18 - 0
data_pipeline/training_data/task_20250702_194611/bss_service_area_mapper_detail.md

@@ -0,0 +1,18 @@
+## bss_service_area_mapper(记录BSS与服务区编码的映射关系)
+bss_service_area_mapper 表记录BSS与服务区编码的映射关系,包含版本、维护人及状态,用于跨系统数据同步。
+字段列表:
+- id (varchar(32)) - 主键ID [主键, 非空]
+- version (integer) - 版本号 [非空]
+- create_ts (timestamp) - 创建时间
+- created_by (varchar(50)) - 创建人
+- update_ts (timestamp) - 更新时间
+- updated_by (varchar(50)) - 更新人
+- delete_ts (timestamp) - 删除时间
+- deleted_by (varchar(50)) - 删除人
+- service_name (varchar(255)) - 服务区名称
+- service_no (varchar(255)) - 服务区编码
+- service_area_id (varchar(32)) - 服务区ID
+- source_system_type (varchar(50)) - 数据来源系统类型
+- source_type (integer) - 数据来源类别ID
+字段补充说明:
+- id 为主键

+ 45 - 0
data_pipeline/training_data/task_20250702_194611/db_query_decision_prompt.txt

@@ -0,0 +1,45 @@
+{
+  "业务范围": "当前数据库存储的是高速公路服务区运营管理的相关数据,主要涉及服务区业务统计、车流量监测、基础信息维护及公司关联关系,包含以下业务数据:",
+  "数据范围": "包含服务区每日业务统计(支付金额/订单数)、车辆类型流量统计、服务区地理/运营信息、路段路线关联关系、运营公司信息等多维度数据",
+  "核心业务实体": [
+    {
+      "类型": "服务区",
+      "描述": "高速公路服务区基础信息及管理记录",
+      "字段": ["service_area_name", "service_area_no", "company_id", "service_position", "service_area_type", "service_state"]
+    },
+    {
+      "类型": "档口",
+      "描述": "服务区商户档口的经营单元",
+      "字段": ["branch_no", "branch_name"]
+    },
+    {
+      "类型": "车辆类型",
+      "描述": "服务区车流量分类统计维度",
+      "字段": ["car_type"]
+    },
+    {
+      "类型": "运营公司",
+      "描述": "服务区所属运营管理主体",
+      "字段": ["company_name", "company_no"]
+    },
+    {
+      "类型": "路段路线",
+      "描述": "高速公路路线段与服务区的空间关联关系",
+      "字段": ["section_name", "route_name", "code"]
+    }
+  ],
+  "关键业务指标": [
+    {
+      "类型": "支付分析",
+      "描述": "多支付渠道金额与订单统计(微信/支付宝/现金/行吧/金豆的支付金额及订单量,总支付金额与订单数)"
+    },
+    {
+      "类型": "车流监测",
+      "描述": "按车辆类型统计的服务区日车流量(customer_count)"
+    },
+    {
+      "类型": "运营状态",
+      "描述": "服务区运行状态分类(service_state)与数据来源类型(source_type)"
+    }
+  ]
+}

+ 10 - 0
data_pipeline/training_data/task_20250702_194611/filename_mapping.txt

@@ -0,0 +1,10 @@
+# 文件名映射报告
+# 格式: 原始表名 -> 实际文件名
+
+public.bss_business_day_data -> bss_business_day_data_detail.md
+public.bss_car_day_count -> bss_car_day_count_detail.md
+public.bss_company -> bss_company_detail.md
+public.bss_section_route -> bss_section_route_detail.md
+public.bss_section_route_area_link -> bss_section_route_area_link_detail.md
+public.bss_service_area -> bss_service_area_detail.md
+public.bss_service_area_mapper -> bss_service_area_mapper_detail.md

+ 62 - 0
data_pipeline/training_data/task_20250702_194611/metadata.txt

@@ -0,0 +1,62 @@
+-- Schema Tools生成的主题元数据
+-- 业务背景: 高速公路服务区管理系统
+-- 生成时间: 2025-07-02 20:03:05
+-- 数据库: highway_db
+
+-- 创建表(如果不存在)
+CREATE TABLE IF NOT EXISTS metadata (
+    id SERIAL PRIMARY KEY,    -- 主键
+    topic_name VARCHAR(100) NOT NULL,  -- 业务主题名称
+    description TEXT,                  -- 业务主体说明
+    related_tables TEXT[],			  -- 相关表名
+    biz_entities TEXT[],               -- 主要业务实体名称
+    biz_metrics TEXT[],                -- 主要业务指标名称
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP    -- 插入时间
+);
+
+-- 插入主题数据
+INSERT INTO metadata(topic_name, description, related_tables, biz_entities, biz_metrics) VALUES
+(
+  '日营收分析',
+  '分析各服务区/档口每日营收、支付方式分布及订单量变化趋势,优化经营策略',
+  'bss_business_day_data',
+  '服务区,档口,支付方式',
+  '收入趋势,支付分布,订单量对比'
+);
+
+INSERT INTO metadata(topic_name, description, related_tables, biz_entities, biz_metrics) VALUES
+(
+  '车流统计',
+  '统计各服务区不同车型车流量分布及日变化规律,指导设施规划与资源配置',
+  'bss_car_day_count,bss_service_area',
+  '服务区,车辆类型,统计日期',
+  '车流分布,高峰时段,环比增长'
+);
+
+INSERT INTO metadata(topic_name, description, related_tables, biz_entities, biz_metrics) VALUES
+(
+  '公司运营',
+  '对比不同运营公司管理的服务区数量、日均营收及车流量,评估运营效率差异',
+  'bss_company,bss_service_area,bss_business_day_data',
+  '运营公司,服务区,路段',
+  '营收排名,车流占比,单位效益'
+);
+
+INSERT INTO metadata(topic_name, description, related_tables, biz_entities, biz_metrics) VALUES
+(
+  '路线分布',
+  '分析不同高速路线对应服务区的车流量和消费活跃度,优化路网资源调配',
+  'bss_section_route,bss_section_route_area_link,bss_car_day_count',
+  '高速路线,服务区,统计日期',
+  '路线车流,消费热度,时段波动'
+);
+
+INSERT INTO metadata(topic_name, description, related_tables, biz_entities, biz_metrics) VALUES
+(
+  '支付偏好',
+  '研究各服务区不同支付方式的使用频率和金额占比,指导支付渠道优化决策',
+  'bss_business_day_data,bss_service_area',
+  '服务区,支付类型,档口',
+  '支付渗透率,金额占比,区域差异'
+);
+

+ 20 - 0
data_pipeline/training_data/task_20250702_194611/metadata_detail.md

@@ -0,0 +1,20 @@
+## metadata(存储分析主题元数据)
+
+`metadata` 主要描述了当前数据库包含了哪些数据内容,哪些分析主题,哪些指标等等。
+
+字段列表:
+
+- `id` (serial) - 主键ID [主键, 非空]
+- `topic_name` (varchar(100)) - 业务主题名称 [非空]
+- `description` (text) - 业务主题说明
+- `related_tables` (text[]) - 涉及的数据表 [示例: bss_car_day_count, bss_section_route]
+- `biz_entities` (text[]) - 主要业务实体名称 [示例: 路段, 车辆类型, 档口]
+- `biz_metrics` (text[]) - 主要业务指标名称 [示例: 营收排名, 时段波动, 消费热度]
+- `created_at` (timestamp) - 插入时间 [默认值: `CURRENT_TIMESTAMP`]
+
+字段补充说明:
+
+- `id` 为主键,自增;
+- `related_tables` 用于建立主题与具体明细表的依赖关系;
+- `biz_entities` 表示主题关注的核心对象,例如服务区、车辆、公司;
+- `biz_metrics` 表示该主题关注的业务分析指标,例如营收对比、趋势变化、占比结构等。

+ 194 - 0
data_pipeline/training_data/task_20250702_194611/qs_highway_db_20250702_200305_pair.json

@@ -0,0 +1,194 @@
+[
+  {
+    "question": "各服务区每日营收总额趋势分析(最近一周)",
+    "sql": "SELECT oper_date AS 统计日期, service_name AS 服务区名称, SUM(pay_sum) AS 总营收 FROM bss_business_day_data WHERE oper_date >= CURRENT_DATE - INTERVAL '7 days' AND delete_ts IS NULL GROUP BY oper_date, service_name ORDER BY oper_date;"
+  },
+  {
+    "question": "某日各档口订单量TOP10",
+    "sql": "SELECT branch_name AS 档口名称, SUM(order_sum) AS 订单总量 FROM bss_business_day_data WHERE oper_date = '2023-10-05' AND delete_ts IS NULL GROUP BY branch_name ORDER BY 订单总量 DESC LIMIT 10;"
+  },
+  {
+    "question": "最近30天各支付方式金额分布占比",
+    "sql": "SELECT SUM(wx) AS 微信支付, SUM(zfb) AS 支付宝支付, SUM(rmb) AS 现金支付, SUM(xs) AS 行吧支付, SUM(jd) AS 金豆支付 FROM bss_business_day_data WHERE oper_date >= CURRENT_DATE - INTERVAL '30 days' AND delete_ts IS NULL;"
+  },
+  {
+    "question": "最近一周日订单量变化趋势",
+    "sql": "SELECT oper_date AS 日期, SUM(order_sum) AS 日订单量 FROM bss_business_day_data WHERE oper_date >= CURRENT_DATE - INTERVAL '7 days' AND delete_ts IS NULL GROUP BY oper_date ORDER BY 日期;"
+  },
+  {
+    "question": "月度营收最高的服务区TOP5",
+    "sql": "SELECT TO_CHAR(oper_date, 'YYYY-MM') AS 月份, service_name AS 服务区名称, SUM(pay_sum) AS 月度营收 FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY TO_CHAR(oper_date, 'YYYY-MM'), service_name ORDER BY 月份, 月度营收 DESC LIMIT 5;"
+  },
+  {
+    "question": "各服务区现金支付比例分析",
+    "sql": "SELECT service_name AS 服务区名称, ROUND(SUM(rmb)/SUM(pay_sum)*100, 2) AS 现金支付占比 FROM bss_business_day_data WHERE oper_date >= CURRENT_DATE - INTERVAL '30 days' AND delete_ts IS NULL GROUP BY service_name;"
+  },
+  {
+    "question": "国庆黄金周(10.1-10.7)每日营收与订单对比",
+    "sql": "SELECT oper_date AS 日期, SUM(pay_sum) AS 总营收, SUM(order_sum) AS 订单总量 FROM bss_business_day_data WHERE oper_date BETWEEN '2023-10-01' AND '2023-10-07' AND delete_ts IS NULL GROUP BY oper_date ORDER BY 日期;"
+  },
+  {
+    "question": "车流量与营收关联分析(按车辆类型)",
+    "sql": "SELECT c.car_type AS 车辆类型, SUM(b.pay_sum) AS 总营收, SUM(c.customer_count) AS 总车流量 FROM bss_business_day_data b JOIN bss_service_area_mapper m ON b.service_no = m.service_no JOIN bss_car_day_count c ON m.service_area_id = c.service_area_id AND b.oper_date = c.count_date WHERE b.delete_ts IS NULL GROUP BY c.car_type;"
+  },
+  {
+    "question": "异常支付数据检测(金额非零但订单数为零)",
+    "sql": "SELECT oper_date AS 统计日期, service_name AS 服务区名称, '微信' AS 支付方式 FROM bss_business_day_data WHERE wx > 0 AND wx_order = 0 AND delete_ts IS NULL UNION ALL SELECT oper_date, service_name, '支付宝' FROM bss_business_day_data WHERE zfb > 0 AND zf_order = 0 AND delete_ts IS NULL;"
+  },
+  {
+    "question": "各公司下属服务区月均营收排名",
+    "sql": "SELECT comp.company_name AS 公司名称, sa.service_area_name AS 服务区名称, AVG(bd.pay_sum) AS 日均营收 FROM bss_business_day_data bd JOIN bss_service_area sa ON bd.service_no = sa.service_area_no JOIN bss_company comp ON sa.company_id = comp.id WHERE bd.delete_ts IS NULL GROUP BY comp.company_name, sa.service_area_name ORDER BY 公司名称, 日均营收 DESC;"
+  },
+  {
+    "question": "统计2023年10月各服务区总车流量,按流量降序排列",
+    "sql": "SELECT sa.service_area_name AS 服务区名称, SUM(car.customer_count) AS 总车流量 FROM bss_car_day_count car JOIN bss_service_area sa ON car.service_area_id = sa.id AND sa.delete_ts IS NULL WHERE car.count_date BETWEEN '2023-10-01' AND '2023-10-31' GROUP BY sa.service_area_name ORDER BY 总车流量 DESC;"
+  },
+  {
+    "question": "对比近30天不同车型的平均日车流量,找出最高车型",
+    "sql": "SELECT car_type AS 车型, AVG(customer_count) AS 平均日车流量 FROM bss_car_day_count WHERE count_date >= CURRENT_DATE - 30 GROUP BY car_type ORDER BY 平均日车流量 DESC LIMIT 1;"
+  },
+  {
+    "question": "分析最近7天每日车流量变化趋势",
+    "sql": "SELECT count_date AS 日期, SUM(customer_count) AS 日车流量 FROM bss_car_day_count WHERE count_date >= CURRENT_DATE - 7 GROUP BY count_date ORDER BY 日期;"
+  },
+  {
+    "question": "计算本月与上月总车流量的环比增长率",
+    "sql": "WITH this_month AS (SELECT SUM(customer_count) AS total FROM bss_car_day_count WHERE count_date >= date_trunc('month', CURRENT_DATE) AND count_date < date_trunc('month', CURRENT_DATE) + INTERVAL '1 month'), last_month AS (SELECT SUM(customer_count) AS total FROM bss_car_day_count WHERE count_date >= date_trunc('month', CURRENT_DATE) - INTERVAL '1 month' AND count_date < date_trunc('month', CURRENT_DATE)) SELECT (this_month.total - last_month.total) / last_month.total * 100 AS 环比增长率 FROM this_month, last_month;"
+  },
+  {
+    "question": "查询XX服务区各车型数量及占比",
+    "sql": "SELECT car.car_type AS 车型, SUM(car.customer_count) AS 数量, ROUND(SUM(car.customer_count)*100.0/(SELECT SUM(customer_count) FROM bss_car_day_count WHERE service_area_id = 'SA001'), 2) AS 占比百分比 FROM bss_car_day_count car JOIN bss_service_area sa ON car.service_area_id = sa.id WHERE sa.service_area_name = 'XX服务区' AND sa.delete_ts IS NULL GROUP BY car.car_type;"
+  },
+  {
+    "question": "找出上个月车流量最低的5个服务区",
+    "sql": "SELECT sa.service_area_name AS 服务区名称, SUM(car.customer_count) AS 总车流量 FROM bss_car_day_count car JOIN bss_service_area sa ON car.service_area_id = sa.id WHERE car.count_date >= '2023-09-01' AND car.count_date <= '2023-09-30' AND sa.delete_ts IS NULL GROUP BY sa.service_area_name ORDER BY 总车流量 ASC LIMIT 5;"
+  },
+  {
+    "question": "统计国庆节前中后各一周总车流量分析节庆影响",
+    "sql": "SELECT '节前' AS 阶段, SUM(customer_count) AS 总流量 FROM bss_car_day_count WHERE count_date BETWEEN '2023-09-24' AND '2023-09-30' UNION ALL SELECT '节中', SUM(customer_count) FROM bss_car_day_count WHERE count_date BETWEEN '2023-10-01' AND '2023-10-07' UNION ALL SELECT '节后', SUM(customer_count) FROM bss_car_day_count WHERE count_date BETWEEN '2023-10-08' AND '2023-10-14';"
+  },
+  {
+    "question": "查询某公司下属各服务区车流分布及总流量",
+    "sql": "SELECT sa.service_area_name AS 服务区名称, SUM(car.customer_count) AS 总车流量 FROM bss_car_day_count car JOIN bss_service_area sa ON car.service_area_id = sa.id JOIN bss_company com ON sa.company_id = com.id WHERE com.company_name = '某公司' AND com.delete_ts IS NULL AND sa.delete_ts IS NULL GROUP BY sa.service_area_name;"
+  },
+  {
+    "question": "分析连续三天车流量递增的服务区",
+    "sql": "SELECT DISTINCT t.service_area_name FROM (SELECT sa.id, sa.service_area_name, count_date, customer_count, LAG(customer_count, 1) OVER (PARTITION BY sa.id ORDER BY count_date) AS prev_day, LAG(customer_count, 2) OVER (PARTITION BY sa.id ORDER BY count_date) AS prev_prev_day FROM bss_car_day_count car JOIN bss_service_area sa ON car.service_area_id = sa.id WHERE sa.delete_ts IS NULL AND count_date >= CURRENT_DATE - 5) t WHERE customer_count > prev_day AND prev_day > prev_prev_day;"
+  },
+  {
+    "question": "找出2023年同比增长率最高的月份",
+    "sql": "SELECT TO_CHAR(count_date, 'YYYY-MM') AS 月份, SUM(customer_count) AS 当月流量, SUM(customer_count) - LAG(SUM(customer_count), 12) OVER (ORDER BY TO_CHAR(count_date, 'YYYY-MM')) AS 同比增长 FROM bss_car_day_count GROUP BY TO_CHAR(count_date, 'YYYY-MM') ORDER BY 同比增长 DESC LIMIT 1;"
+  },
+  {
+    "question": "统计各运营公司所辖服务区数量,并按数量降序排列",
+    "sql": "SELECT b.company_name AS 公司名称, COUNT(a.id) AS 服务区数量 FROM bss_service_area a JOIN bss_company b ON a.company_id = b.id WHERE a.delete_ts IS NULL AND b.delete_ts IS NULL GROUP BY b.company_name ORDER BY 服务区数量 DESC;"
+  },
+  {
+    "question": "计算各公司最近一个月日均营收总额(万元),并显示环比上月增长率",
+    "sql": "WITH monthly AS (SELECT company_id, SUM(pay_sum) AS total, DATE_TRUNC('month', oper_date) AS mon FROM bss_business_day_data a JOIN bss_service_area b ON a.service_no = b.service_area_no WHERE oper_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month') GROUP BY company_id, mon), growth AS (SELECT company_id, LEAD(total) OVER(PARTITION BY company_id ORDER BY mon) / total - 1 AS growth_rate FROM monthly) SELECT b.company_name, m.total/10000 AS 本月营收, g.growth_rate AS 环比增长率 FROM monthly m JOIN growth g ON m.company_id = g.company_id JOIN bss_company b ON m.company_id = b.id WHERE m.mon = DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month');"
+  },
+  {
+    "question": "对比不同运营公司管辖服务区的季度累计车流量(辆次)",
+    "sql": "SELECT c.company_name AS 公司名称, SUM(car.customer_count) AS 总车流量 FROM bss_car_day_count car JOIN bss_service_area sa ON car.service_area_id = sa.id JOIN bss_company c ON sa.company_id = c.id WHERE car.count_date BETWEEN DATE_TRUNC('quarter', CURRENT_DATE) AND CURRENT_DATE AND sa.delete_ts IS NULL GROUP BY c.company_name ORDER BY 总车流量 DESC;"
+  },
+  {
+    "question": "获取最近一周日均营收TOP10服务区及其所属公司",
+    "sql": "SELECT sa.service_area_name AS 服务区名称, c.company_name AS 运营公司, AVG(bdd.pay_sum) AS 日均营收 FROM bss_business_day_data bdd JOIN bss_service_area sa ON bdd.service_no = sa.service_area_no JOIN bss_company c ON sa.company_id = c.id WHERE bdd.oper_date >= CURRENT_DATE - 7 GROUP BY sa.service_area_name, c.company_name ORDER BY 日均营收 DESC LIMIT 10;"
+  },
+  {
+    "question": "分析各运营公司单日营收波动情况(最大值、最小值、标准差)",
+    "sql": "SELECT sa.company_id, MAX(bdd.pay_sum) AS 最高营收, MIN(bdd.pay_sum) AS 最低营收, STDDEV(bdd.pay_sum) AS 营收波动度 FROM bss_business_day_data bdd JOIN bss_service_area sa ON bdd.service_no = sa.service_area_no WHERE bdd.delete_ts IS NULL GROUP BY sa.company_id ORDER BY 营收波动度 DESC;"
+  },
+  {
+    "question": "计算各运营公司车流量占比(占全路网比例)",
+    "sql": "WITH total AS (SELECT SUM(customer_count) AS all_count FROM bss_car_day_count WHERE count_date = CURRENT_DATE - 1), company_count AS (SELECT sa.company_id, SUM(car.customer_count) AS com_count FROM bss_car_day_count car JOIN bss_service_area sa ON car.service_area_id = sa.id WHERE car.count_date = CURRENT_DATE - 1 GROUP BY sa.company_id) SELECT c.company_name, (com_count * 100.0 / t.all_count) || '%%' AS 车流占比 FROM company_count cc JOIN bss_company c ON cc.company_id = c.id CROSS JOIN total t ORDER BY 车流占比 DESC;"
+  },
+  {
+    "question": "比较不同运营公司节假日(周末)与工作日营收差异率",
+    "sql": "SELECT sa.company_id, AVG(CASE WHEN EXTRACT(ISODOW FROM bdd.oper_date) IN (6,7) THEN bdd.pay_sum ELSE 0 END) AS 周末均值, AVG(CASE WHEN EXTRACT(ISODOW FROM bdd.oper_date) NOT IN (6,7) THEN bdd.pay_sum ELSE 0 END) AS 工作日均值, (AVG(CASE WHEN EXTRACT(ISODOW FROM bdd.oper_date) IN (6,7) THEN bdd.pay_sum END) / AVG(CASE WHEN EXTRACT(ISODOW FROM bdd.oper_date) NOT IN (6,7) THEN bdd.pay_sum END) - 1) * 100 || '%%' AS 差异率 FROM bss_business_day_data bdd JOIN bss_service_area sa ON bdd.service_no = sa.service_area_no GROUP BY sa.company_id;"
+  },
+  {
+    "question": "查询连续3天营收下降的异常服务区(含运营公司信息)",
+    "sql": "WITH ranked AS (SELECT service_no, oper_date, pay_sum, LAG(pay_sum,1) OVER(PARTITION BY service_no ORDER BY oper_date) AS prev1, LAG(pay_sum,2) OVER(PARTITION BY service_no ORDER BY oper_date) AS prev2 FROM bss_business_day_data WHERE oper_date >= CURRENT_DATE - 5), decline AS (SELECT service_no FROM ranked WHERE pay_sum < prev1 AND prev1 < prev2) SELECT d.service_no, sa.service_area_name, c.company_name FROM decline d JOIN bss_service_area sa ON d.service_no = sa.service_area_no JOIN bss_company c ON sa.company_id = c.id;"
+  },
+  {
+    "question": "计算各运营公司单位效益(万元营收/千辆车次)",
+    "sql": "SELECT sa.company_id, SUM(bdd.pay_sum)/10000 AS 总营收, SUM(car.customer_count)/1000 AS 总车流, (SUM(bdd.pay_sum)/SUM(car.customer_count)) * 1000 AS 单位效益 FROM bss_business_day_data bdd JOIN bss_service_area sa ON bdd.service_no = sa.service_area_no JOIN bss_car_day_count car ON sa.id = car.service_area_id AND bdd.oper_date = car.count_date GROUP BY sa.company_id ORDER BY 单位效益 DESC;"
+  },
+  {
+    "question": "统计各高速路线对应服务区的总车流量,并按车流量降序排列",
+    "sql": "SELECT r.route_name AS 路线名称, SUM(c.customer_count) AS 总车流量 FROM bss_car_day_count c JOIN bss_section_route_area_link l ON c.service_area_id = l.service_area_id JOIN bss_section_route r ON l.section_route_id = r.id WHERE c.delete_ts IS NULL GROUP BY r.route_name ORDER BY 总车流量 DESC;"
+  },
+  {
+    "question": "查询最近一周每日各路线的平均车流量并观察时段波动",
+    "sql": "SELECT count_date AS 统计日期, route_name AS 路线名称, AVG(customer_count) AS 平均车流量 FROM bss_car_day_count c JOIN bss_section_route_area_link l ON c.service_area_id = l.service_area_id JOIN bss_section_route r ON l.section_route_id = r.id WHERE count_date >= CURRENT_DATE - 7 AND c.delete_ts IS NULL GROUP BY count_date, route_name ORDER BY count_date;"
+  },
+  {
+    "question": "查找2023年度车流量最高TOP5服务区及其所属路线",
+    "sql": "SELECT s.service_area_name AS 服务区名称, r.route_name AS 路线名称, SUM(c.customer_count) AS 年度总车流量 FROM bss_car_day_count c JOIN bss_section_route_area_link l ON c.service_area_id = l.service_area_id JOIN bss_section_route r ON l.section_route_id = r.id JOIN bss_service_area s ON c.service_area_id = s.id WHERE EXTRACT(YEAR FROM count_date) = 2023 AND c.delete_ts IS NULL GROUP BY s.service_area_name, r.route_name ORDER BY 年度总车流量 DESC LIMIT 5;"
+  },
+  {
+    "question": "对比不同月份各路线的月均车流量变化趋势",
+    "sql": "SELECT EXTRACT(MONTH FROM count_date) AS 月份, route_name AS 路线名称, AVG(customer_count) AS 月均车流量 FROM bss_car_day_count c JOIN bss_section_route_area_link l ON c.service_area_id = l.service_area_id JOIN bss_section_route r ON l.section_route_id = r.id WHERE c.delete_ts IS NULL GROUP BY 月份, route_name ORDER BY 月份;"
+  },
+  {
+    "question": "查询特定日期(2023-10-01)各路线的车流量并按路线分类汇总",
+    "sql": "SELECT r.route_name AS 路线名称, SUM(c.customer_count) AS 当日车流量 FROM bss_car_day_count c JOIN bss_section_route_area_link l ON c.service_area_id = l.service_area_id JOIN bss_section_route r ON l.section_route_id = r.id WHERE count_date = '2023-10-01' AND c.delete_ts IS NULL GROUP BY r.route_name;"
+  },
+  {
+    "question": "分析每个服务区关联的路线数量并找出覆盖路线最多的服务区",
+    "sql": "SELECT service_area_id AS 服务区ID, COUNT(section_route_id) AS 关联路线数 FROM bss_section_route_area_link GROUP BY service_area_id ORDER BY 关联路线数 DESC LIMIT 1;"
+  },
+  {
+    "question": "查询沪昆高速沿线各服务区2023年Q4的月均车流量",
+    "sql": "SELECT s.service_area_name AS 服务区名称, AVG(c.customer_count) AS 月均车流量 FROM bss_car_day_count c JOIN bss_section_route_area_link l ON c.service_area_id = l.service_area_id JOIN bss_section_route r ON l.section_route_id = r.id JOIN bss_service_area s ON c.service_area_id = s.id WHERE r.route_name = '沪昆高速' AND count_date BETWEEN '2023-10-01' AND '2023-12-31' AND c.delete_ts IS NULL GROUP BY s.service_area_name;"
+  },
+  {
+    "question": "统计各路线车流量占全路网总车流量的比例",
+    "sql": "SELECT route_name AS 路线名称, SUM(customer_count) * 100.0 / (SELECT SUM(customer_count) FROM bss_car_day_count WHERE delete_ts IS NULL) AS 占比百分比 FROM bss_car_day_count c JOIN bss_section_route_area_link l ON c.service_area_id = l.service_area_id JOIN bss_section_route r ON l.section_route_id = r.id WHERE c.delete_ts IS NULL GROUP BY route_name ORDER BY 占比百分比 DESC;"
+  },
+  {
+    "question": "分析杭州湾跨海大桥服务区过去30天每日车流量变化趋势",
+    "sql": "SELECT count_date AS 统计日期, customer_count AS 当日车流量 FROM bss_car_day_count WHERE service_area_id = (SELECT id FROM bss_service_area WHERE service_area_name = '杭州湾跨海大桥服务区') AND count_date >= CURRENT_DATE - 30 AND delete_ts IS NULL ORDER BY count_date;"
+  },
+  {
+    "question": "统计各服务区微信支付渗透率(使用订单数占比)TOP10",
+    "sql": "SELECT service_name AS 服务区名称, SUM(wx_order) / SUM(order_sum) AS 微信渗透率 FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY service_name ORDER BY 微信渗透率 DESC LIMIT 10;"
+  },
+  {
+    "question": "分析2023年Q2各支付方式金额占比趋势变化",
+    "sql": "SELECT oper_date AS 统计日期, SUM(wx)/SUM(pay_sum) AS 微信占比, SUM(zfb)/SUM(pay_sum) AS 支付宝占比, SUM(rmb)/SUM(pay_sum) AS 现金占比 FROM bss_business_day_data WHERE oper_date BETWEEN '2023-04-01' AND '2023-06-30' AND delete_ts IS NULL GROUP BY oper_date ORDER BY 统计日期;"
+  },
+  {
+    "question": "对比不同区域服务区现金支付占比分布",
+    "sql": "SELECT sa.service_area_type AS 区域类型, SUM(bd.rmb)/SUM(bd.pay_sum) AS 现金占比 FROM bss_business_day_data bd JOIN bss_service_area sa ON bd.service_no = sa.service_area_no WHERE bd.delete_ts IS NULL AND sa.delete_ts IS NULL GROUP BY sa.service_area_type;"
+  },
+  {
+    "question": "找出微信支付平均单笔金额最高的前5个档口",
+    "sql": "SELECT branch_name AS 档口名称, SUM(wx)/SUM(wx_order) AS 平均单笔金额 FROM bss_business_day_data WHERE wx_order > 0 AND delete_ts IS NULL GROUP BY branch_name ORDER BY 平均单笔金额 DESC LIMIT 5;"
+  },
+  {
+    "question": "统计各服务区支付宝支付渗透率(使用订单数)低于10%的记录",
+    "sql": "SELECT service_name AS 服务区名称, SUM(zf_order)/SUM(order_sum) AS 支付宝渗透率 FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY service_name HAVING SUM(zf_order)/SUM(order_sum) < 0.1;"
+  },
+  {
+    "question": "分析节假日(春节假期)期间各支付方式交易金额环比变化",
+    "sql": "SELECT oper_date AS 统计日期, SUM(wx) AS 微信交易额, SUM(zfb) AS 支付宝交易额, SUM(rmb) AS 现金交易额 FROM bss_business_day_data WHERE oper_date BETWEEN '2023-01-20' AND '2023-01-30' AND delete_ts IS NULL GROUP BY oper_date ORDER BY 统计日期;"
+  },
+  {
+    "question": "统计不同档口类型(餐饮/零售)的支付方式偏好对比",
+    "sql": "SELECT CASE WHEN branch_name LIKE '%餐饮%' THEN '餐饮' ELSE '零售' END AS 档口类型, SUM(wx)/SUM(pay_sum) AS 微信占比, SUM(zfb)/SUM(pay_sum) AS 支付宝占比 FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY 档口类型;"
+  },
+  {
+    "question": "计算各服务区行吧支付方式的月均交易次数",
+    "sql": "SELECT service_name AS 服务区名称, EXTRACT(MONTH FROM oper_date) AS 月份, AVG(xs_order) AS 月均交易次数 FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY 服务区名称, 月份 ORDER BY 服务区名称, 月份;"
+  },
+  {
+    "question": "找出金豆支付占比超过30%的服务区记录",
+    "sql": "SELECT service_name AS 服务区名称, SUM(jd)/SUM(pay_sum) AS 金豆占比 FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY service_name HAVING SUM(jd)/SUM(pay_sum) > 0.3;"
+  },
+  {
+    "question": "统计各区域档口数量与支付订单数的线性关系",
+    "sql": "SELECT service_name AS 服务区名称, COUNT(DISTINCT branch_no) AS 档口数量, SUM(order_sum) AS 总订单数 FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY service_name ORDER BY 总订单数 DESC;"
+  }
+]

+ 202 - 0
data_pipeline/training_data/task_20250702_194611/qs_highway_db_20250702_200305_pair.json.backup

@@ -0,0 +1,202 @@
+[
+  {
+    "question": "各服务区每日营收总额趋势分析(最近一周)",
+    "sql": "SELECT oper_date AS 统计日期, service_name AS 服务区名称, SUM(pay_sum) AS 总营收 FROM bss_business_day_data WHERE oper_date >= CURRENT_DATE - INTERVAL '7 days' AND delete_ts IS NULL GROUP BY oper_date, service_name ORDER BY oper_date;"
+  },
+  {
+    "question": "某日各档口订单量TOP10",
+    "sql": "SELECT branch_name AS 档口名称, SUM(order_sum) AS 订单总量 FROM bss_business_day_data WHERE oper_date = '2023-10-05' AND delete_ts IS NULL GROUP BY branch_name ORDER BY 订单总量 DESC LIMIT 10;"
+  },
+  {
+    "question": "最近30天各支付方式金额分布占比",
+    "sql": "SELECT SUM(wx) AS 微信支付, SUM(zfb) AS 支付宝支付, SUM(rmb) AS 现金支付, SUM(xs) AS 行吧支付, SUM(jd) AS 金豆支付 FROM bss_business_day_data WHERE oper_date >= CURRENT_DATE - INTERVAL '30 days' AND delete_ts IS NULL;"
+  },
+  {
+    "question": "最近一周日订单量变化趋势",
+    "sql": "SELECT oper_date AS 日期, SUM(order_sum) AS 日订单量 FROM bss_business_day_data WHERE oper_date >= CURRENT_DATE - INTERVAL '7 days' AND delete_ts IS NULL GROUP BY oper_date ORDER BY 日期;"
+  },
+  {
+    "question": "月度营收最高的服务区TOP5",
+    "sql": "SELECT TO_CHAR(oper_date, 'YYYY-MM') AS 月份, service_name AS 服务区名称, SUM(pay_sum) AS 月度营收 FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY TO_CHAR(oper_date, 'YYYY-MM'), service_name ORDER BY 月份, 月度营收 DESC LIMIT 5;"
+  },
+  {
+    "question": "各服务区现金支付比例分析",
+    "sql": "SELECT service_name AS 服务区名称, ROUND(SUM(rmb)/SUM(pay_sum)*100, 2) AS 现金支付占比 FROM bss_business_day_data WHERE oper_date >= CURRENT_DATE - INTERVAL '30 days' AND delete_ts IS NULL GROUP BY service_name;"
+  },
+  {
+    "question": "国庆黄金周(10.1-10.7)每日营收与订单对比",
+    "sql": "SELECT oper_date AS 日期, SUM(pay_sum) AS 总营收, SUM(order_sum) AS 订单总量 FROM bss_business_day_data WHERE oper_date BETWEEN '2023-10-01' AND '2023-10-07' AND delete_ts IS NULL GROUP BY oper_date ORDER BY 日期;"
+  },
+  {
+    "question": "车流量与营收关联分析(按车辆类型)",
+    "sql": "SELECT c.car_type AS 车辆类型, SUM(b.pay_sum) AS 总营收, SUM(c.customer_count) AS 总车流量 FROM bss_business_day_data b JOIN bss_service_area_mapper m ON b.service_no = m.service_no JOIN bss_car_day_count c ON m.service_area_id = c.service_area_id AND b.oper_date = c.count_date WHERE b.delete_ts IS NULL GROUP BY c.car_type;"
+  },
+  {
+    "question": "异常支付数据检测(金额非零但订单数为零)",
+    "sql": "SELECT oper_date AS 统计日期, service_name AS 服务区名称, '微信' AS 支付方式 FROM bss_business_day_data WHERE wx > 0 AND wx_order = 0 AND delete_ts IS NULL UNION ALL SELECT oper_date, service_name, '支付宝' FROM bss_business_day_data WHERE zfb > 0 AND zf_order = 0 AND delete_ts IS NULL;"
+  },
+  {
+    "question": "各公司下属服务区月均营收排名",
+    "sql": "SELECT comp.company_name AS 公司名称, sa.service_area_name AS 服务区名称, AVG(bd.pay_sum) AS 日均营收 FROM bss_business_day_data bd JOIN bss_service_area sa ON bd.service_no = sa.service_area_no JOIN bss_company comp ON sa.company_id = comp.id WHERE bd.delete_ts IS NULL GROUP BY comp.company_name, sa.service_area_name ORDER BY 公司名称, 日均营收 DESC;"
+  },
+  {
+    "question": "统计2023年10月各服务区总车流量,按流量降序排列",
+    "sql": "SELECT sa.service_area_name AS 服务区名称, SUM(car.customer_count) AS 总车流量 FROM bss_car_day_count car JOIN bss_service_area sa ON car.service_area_id = sa.id AND sa.delete_ts IS NULL WHERE car.count_date BETWEEN '2023-10-01' AND '2023-10-31' GROUP BY sa.service_area_name ORDER BY 总车流量 DESC;"
+  },
+  {
+    "question": "对比近30天不同车型的平均日车流量,找出最高车型",
+    "sql": "SELECT car_type AS 车型, AVG(customer_count) AS 平均日车流量 FROM bss_car_day_count WHERE count_date >= CURRENT_DATE - 30 GROUP BY car_type ORDER BY 平均日车流量 DESC LIMIT 1;"
+  },
+  {
+    "question": "分析最近7天每日车流量变化趋势",
+    "sql": "SELECT count_date AS 日期, SUM(customer_count) AS 日车流量 FROM bss_car_day_count WHERE count_date >= CURRENT_DATE - 7 GROUP BY count_date ORDER BY 日期;"
+  },
+  {
+    "question": "计算本月与上月总车流量的环比增长率",
+    "sql": "WITH this_month AS (SELECT SUM(customer_count) AS total FROM bss_car_day_count WHERE count_date >= date_trunc('month', CURRENT_DATE) AND count_date < date_trunc('month', CURRENT_DATE) + INTERVAL '1 month'), last_month AS (SELECT SUM(customer_count) AS total FROM bss_car_day_count WHERE count_date >= date_trunc('month', CURRENT_DATE) - INTERVAL '1 month' AND count_date < date_trunc('month', CURRENT_DATE)) SELECT (this_month.total - last_month.total) / last_month.total * 100 AS 环比增长率 FROM this_month, last_month;"
+  },
+  {
+    "question": "查询XX服务区各车型数量及占比",
+    "sql": "SELECT car.car_type AS 车型, SUM(car.customer_count) AS 数量, ROUND(SUM(car.customer_count)*100.0/(SELECT SUM(customer_count) FROM bss_car_day_count WHERE service_area_id = 'SA001'), 2) AS 占比百分比 FROM bss_car_day_count car JOIN bss_service_area sa ON car.service_area_id = sa.id WHERE sa.service_area_name = 'XX服务区' AND sa.delete_ts IS NULL GROUP BY car.car_type;"
+  },
+  {
+    "question": "找出上个月车流量最低的5个服务区",
+    "sql": "SELECT sa.service_area_name AS 服务区名称, SUM(car.customer_count) AS 总车流量 FROM bss_car_day_count car JOIN bss_service_area sa ON car.service_area_id = sa.id WHERE car.count_date >= '2023-09-01' AND car.count_date <= '2023-09-30' AND sa.delete_ts IS NULL GROUP BY sa.service_area_name ORDER BY 总车流量 ASC LIMIT 5;"
+  },
+  {
+    "question": "统计国庆节前中后各一周总车流量分析节庆影响",
+    "sql": "SELECT '节前' AS 阶段, SUM(customer_count) AS 总流量 FROM bss_car_day_count WHERE count_date BETWEEN '2023-09-24' AND '2023-09-30' UNION ALL SELECT '节中', SUM(customer_count) FROM bss_car_day_count WHERE count_date BETWEEN '2023-10-01' AND '2023-10-07' UNION ALL SELECT '节后', SUM(customer_count) FROM bss_car_day_count WHERE count_date BETWEEN '2023-10-08' AND '2023-10-14';"
+  },
+  {
+    "question": "查询某公司下属各服务区车流分布及总流量",
+    "sql": "SELECT sa.service_area_name AS 服务区名称, SUM(car.customer_count) AS 总车流量 FROM bss_car_day_count car JOIN bss_service_area sa ON car.service_area_id = sa.id JOIN bss_company com ON sa.company_id = com.id WHERE com.company_name = '某公司' AND com.delete_ts IS NULL AND sa.delete_ts IS NULL GROUP BY sa.service_area_name;"
+  },
+  {
+    "question": "分析连续三天车流量递增的服务区",
+    "sql": "SELECT DISTINCT sa.service_area_name FROM (SELECT sa.id, sa.service_area_name, count_date, customer_count, LAG(customer_count, 1) OVER (PARTITION BY sa.id ORDER BY count_date) AS prev_day, LAG(customer_count, 2) OVER (PARTITION BY sa.id ORDER BY count_date) AS prev_prev_day FROM bss_car_day_count car JOIN bss_service_area sa ON car.service_area_id = sa.id WHERE sa.delete_ts IS NULL AND count_date >= CURRENT_DATE - 5) t WHERE customer_count > prev_day AND prev_day > prev_prev_day;"
+  },
+  {
+    "question": "找出2023年同比增长率最高的月份",
+    "sql": "SELECT TO_CHAR(count_date, 'YYYY-MM') AS 月份, SUM(customer_count) AS 当月流量, SUM(customer_count) - LAG(SUM(customer_count), 12) OVER (ORDER BY TO_CHAR(count_date, 'YYYY-MM')) AS 同比增长 FROM bss_car_day_count GROUP BY TO_CHAR(count_date, 'YYYY-MM') ORDER BY 同比增长 DESC LIMIT 1;"
+  },
+  {
+    "question": "统计各运营公司所辖服务区数量,并按数量降序排列",
+    "sql": "SELECT b.company_name AS 公司名称, COUNT(a.id) AS 服务区数量 FROM bss_service_area a JOIN bss_company b ON a.company_id = b.id WHERE a.delete_ts IS NULL AND b.delete_ts IS NULL GROUP BY b.company_name ORDER BY 服务区数量 DESC;"
+  },
+  {
+    "question": "计算各公司最近一个月日均营收总额(万元),并显示环比上月增长率",
+    "sql": "WITH monthly AS (SELECT company_id, SUM(pay_sum) AS total, DATE_TRUNC('month', oper_date) AS mon FROM bss_business_day_data a JOIN bss_service_area b ON a.service_no = b.service_area_no WHERE oper_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month') GROUP BY company_id, mon), growth AS (SELECT company_id, LEAD(total) OVER(PARTITION BY company_id ORDER BY mon) / total - 1 AS growth_rate FROM monthly) SELECT b.company_name, m.total/10000 AS 本月营收, g.growth_rate AS 环比增长率 FROM monthly m JOIN growth g ON m.company_id = g.company_id JOIN bss_company b ON m.company_id = b.id WHERE m.mon = DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month');"
+  },
+  {
+    "question": "对比不同运营公司管辖服务区的季度累计车流量(辆次)",
+    "sql": "SELECT c.company_name AS 公司名称, SUM(car.customer_count) AS 总车流量 FROM bss_car_day_count car JOIN bss_service_area sa ON car.service_area_id = sa.id JOIN bss_company c ON sa.company_id = c.id WHERE car.count_date BETWEEN DATE_TRUNC('quarter', CURRENT_DATE) AND CURRENT_DATE AND sa.delete_ts IS NULL GROUP BY c.company_name ORDER BY 总车流量 DESC;"
+  },
+  {
+    "question": "获取最近一周日均营收TOP10服务区及其所属公司",
+    "sql": "SELECT sa.service_area_name AS 服务区名称, c.company_name AS 运营公司, AVG(bdd.pay_sum) AS 日均营收 FROM bss_business_day_data bdd JOIN bss_service_area sa ON bdd.service_no = sa.service_area_no JOIN bss_company c ON sa.company_id = c.id WHERE bdd.oper_date >= CURRENT_DATE - 7 GROUP BY sa.service_area_name, c.company_name ORDER BY 日均营收 DESC LIMIT 10;"
+  },
+  {
+    "question": "分析各运营公司单日营收波动情况(最大值、最小值、标准差)",
+    "sql": "SELECT sa.company_id, MAX(bdd.pay_sum) AS 最高营收, MIN(bdd.pay_sum) AS 最低营收, STDDEV(bdd.pay_sum) AS 营收波动度 FROM bss_business_day_data bdd JOIN bss_service_area sa ON bdd.service_no = sa.service_area_no WHERE bdd.delete_ts IS NULL GROUP BY sa.company_id ORDER BY 营收波动度 DESC;"
+  },
+  {
+    "question": "计算各运营公司车流量占比(占全路网比例)",
+    "sql": "WITH total AS (SELECT SUM(customer_count) AS all_count FROM bss_car_day_count WHERE count_date = CURRENT_DATE - 1), company_count AS (SELECT sa.company_id, SUM(car.customer_count) AS com_count FROM bss_car_day_count car JOIN bss_service_area sa ON car.service_area_id = sa.id WHERE car.count_date = CURRENT_DATE - 1 GROUP BY sa.company_id) SELECT c.company_name, (com_count * 100.0 / t.all_count) || '%%' AS 车流占比 FROM company_count cc JOIN bss_company c ON cc.company_id = c.id CROSS JOIN total t ORDER BY 车流占比 DESC;"
+  },
+  {
+    "question": "比较不同运营公司节假日(周末)与工作日营收差异率",
+    "sql": "SELECT sa.company_id, AVG(CASE WHEN EXTRACT(ISODOW FROM bdd.oper_date) IN (6,7) THEN bdd.pay_sum ELSE 0 END) AS 周末均值, AVG(CASE WHEN EXTRACT(ISODOW FROM bdd.oper_date) NOT IN (6,7) THEN bdd.pay_sum ELSE 0 END) AS 工作日均值, (AVG(CASE WHEN EXTRACT(ISODOW FROM bdd.oper_date) IN (6,7) THEN bdd.pay_sum END) / AVG(CASE WHEN EXTRACT(ISODOW FROM bdd.oper_date) NOT IN (6,7) THEN bdd.pay_sum END) - 1) * 100 || '%%' AS 差异率 FROM bss_business_day_data bdd JOIN bss_service_area sa ON bdd.service_no = sa.service_area_no GROUP BY sa.company_id;"
+  },
+  {
+    "question": "查询连续3天营收下降的异常服务区(含运营公司信息)",
+    "sql": "WITH ranked AS (SELECT service_no, oper_date, pay_sum, LAG(pay_sum,1) OVER(PARTITION BY service_no ORDER BY oper_date) AS prev1, LAG(pay_sum,2) OVER(PARTITION BY service_no ORDER BY oper_date) AS prev2 FROM bss_business_day_data WHERE oper_date >= CURRENT_DATE - 5), decline AS (SELECT service_no FROM ranked WHERE pay_sum < prev1 AND prev1 < prev2) SELECT d.service_no, sa.service_area_name, c.company_name FROM decline d JOIN bss_service_area sa ON d.service_no = sa.service_area_no JOIN bss_company c ON sa.company_id = c.id;"
+  },
+  {
+    "question": "计算各运营公司单位效益(万元营收/千辆车次)",
+    "sql": "SELECT sa.company_id, SUM(bdd.pay_sum)/10000 AS 总营收, SUM(car.customer_count)/1000 AS 总车流, (SUM(bdd.pay_sum)/SUM(car.customer_count)) * 1000 AS 单位效益 FROM bss_business_day_data bdd JOIN bss_service_area sa ON bdd.service_no = sa.service_area_no JOIN bss_car_day_count car ON sa.id = car.service_area_id AND bdd.oper_date = car.count_date GROUP BY sa.company_id ORDER BY 单位效益 DESC;"
+  },
+  {
+    "question": "获取各运营公司最近季度新增服务区及营收贡献度",
+    "sql": "WITH new_sa AS (SELECT id, company_id FROM bss_service_area WHERE create_ts >= DATE_TRUNC('quarter', CURRENT_DATE) - INTERVAL '3 months'), q_data AS (SELECT sa.company_id, COUNT(sa.id) AS 新增数量, SUM(bdd.pay_sum) AS 营收贡献 FROM new_sa JOIN bss_business_day_data bdd ON new_sa.id = bdd.service_no::uuid WHERE bdd.oper_date >= DATE_TRUNC('quarter', CURRENT_DATE) - INTERVAL '3 months' GROUP BY sa.company_id) SELECT c.company_name, q.新增数量, q.营收贡献 FROM q_data q JOIN bss_company c ON q.company_id = c.id ORDER BY 营收贡献 DESC;"
+  },
+  {
+    "question": "统计各高速路线对应服务区的总车流量,并按车流量降序排列",
+    "sql": "SELECT r.route_name AS 路线名称, SUM(c.customer_count) AS 总车流量 FROM bss_car_day_count c JOIN bss_section_route_area_link l ON c.service_area_id = l.service_area_id JOIN bss_section_route r ON l.section_route_id = r.id WHERE c.delete_ts IS NULL GROUP BY r.route_name ORDER BY 总车流量 DESC;"
+  },
+  {
+    "question": "查询最近一周每日各路线的平均车流量并观察时段波动",
+    "sql": "SELECT count_date AS 统计日期, route_name AS 路线名称, AVG(customer_count) AS 平均车流量 FROM bss_car_day_count c JOIN bss_section_route_area_link l ON c.service_area_id = l.service_area_id JOIN bss_section_route r ON l.section_route_id = r.id WHERE count_date >= CURRENT_DATE - 7 AND c.delete_ts IS NULL GROUP BY count_date, route_name ORDER BY count_date;"
+  },
+  {
+    "question": "查找2023年度车流量最高TOP5服务区及其所属路线",
+    "sql": "SELECT s.service_area_name AS 服务区名称, r.route_name AS 路线名称, SUM(c.customer_count) AS 年度总车流量 FROM bss_car_day_count c JOIN bss_section_route_area_link l ON c.service_area_id = l.service_area_id JOIN bss_section_route r ON l.section_route_id = r.id JOIN bss_service_area s ON c.service_area_id = s.id WHERE EXTRACT(YEAR FROM count_date) = 2023 AND c.delete_ts IS NULL GROUP BY s.service_area_name, r.route_name ORDER BY 年度总车流量 DESC LIMIT 5;"
+  },
+  {
+    "question": "对比不同月份各路线的月均车流量变化趋势",
+    "sql": "SELECT EXTRACT(MONTH FROM count_date) AS 月份, route_name AS 路线名称, AVG(customer_count) AS 月均车流量 FROM bss_car_day_count c JOIN bss_section_route_area_link l ON c.service_area_id = l.service_area_id JOIN bss_section_route r ON l.section_route_id = r.id WHERE c.delete_ts IS NULL GROUP BY 月份, route_name ORDER BY 月份;"
+  },
+  {
+    "question": "查询特定日期(2023-10-01)各路线的车流量并按路线分类汇总",
+    "sql": "SELECT r.route_name AS 路线名称, SUM(c.customer_count) AS 当日车流量 FROM bss_car_day_count c JOIN bss_section_route_area_link l ON c.service_area_id = l.service_area_id JOIN bss_section_route r ON l.section_route_id = r.id WHERE count_date = '2023-10-01' AND c.delete_ts IS NULL GROUP BY r.route_name;"
+  },
+  {
+    "question": "分析每个服务区关联的路线数量并找出覆盖路线最多的服务区",
+    "sql": "SELECT service_area_id AS 服务区ID, COUNT(section_route_id) AS 关联路线数 FROM bss_section_route_area_link WHERE delete_ts IS NULL GROUP BY service_area_id ORDER BY 关联路线数 DESC LIMIT 1;"
+  },
+  {
+    "question": "查询沪昆高速沿线各服务区2023年Q4的月均车流量",
+    "sql": "SELECT s.service_area_name AS 服务区名称, AVG(c.customer_count) AS 月均车流量 FROM bss_car_day_count c JOIN bss_section_route_area_link l ON c.service_area_id = l.service_area_id JOIN bss_section_route r ON l.section_route_id = r.id JOIN bss_service_area s ON c.service_area_id = s.id WHERE r.route_name = '沪昆高速' AND count_date BETWEEN '2023-10-01' AND '2023-12-31' AND c.delete_ts IS NULL GROUP BY s.service_area_name;"
+  },
+  {
+    "question": "统计各路线车流量占全路网总车流量的比例",
+    "sql": "SELECT route_name AS 路线名称, SUM(customer_count) * 100.0 / (SELECT SUM(customer_count) FROM bss_car_day_count WHERE delete_ts IS NULL) AS 占比百分比 FROM bss_car_day_count c JOIN bss_section_route_area_link l ON c.service_area_id = l.service_area_id JOIN bss_section_route r ON l.section_route_id = r.id WHERE c.delete_ts IS NULL GROUP BY route_name ORDER BY 占比百分比 DESC;"
+  },
+  {
+    "question": "分析杭州湾跨海大桥服务区过去30天每日车流量变化趋势",
+    "sql": "SELECT count_date AS 统计日期, customer_count AS 当日车流量 FROM bss_car_day_count WHERE service_area_id = (SELECT id FROM bss_service_area WHERE service_area_name = '杭州湾跨海大桥服务区') AND count_date >= CURRENT_DATE - 30 AND delete_ts IS NULL ORDER BY count_date;"
+  },
+  {
+    "question": "查询消费热度最高的三个服务区及其对应路线(按订单总数统计)",
+    "sql": "SELECT s.service_area_name AS 服务区名称, r.route_name AS 路线名称, SUM(order_sum) AS 总订单数 FROM bss_business_day_data b JOIN bss_service_area_mapper m ON b.service_no = m.service_no JOIN bss_section_route_area_link l ON m.service_area_id = l.service_area_id JOIN bss_section_route r ON l.section_route_id = r.id WHERE b.delete_ts IS NULL GROUP BY s.service_area_name, r.route_name ORDER BY 总订单数 DESC LIMIT 3;"
+  },
+  {
+    "question": "统计各服务区微信支付渗透率(使用订单数占比)TOP10",
+    "sql": "SELECT service_name AS 服务区名称, SUM(wx_order) / SUM(order_sum) AS 微信渗透率 FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY service_name ORDER BY 微信渗透率 DESC LIMIT 10;"
+  },
+  {
+    "question": "分析2023年Q2各支付方式金额占比趋势变化",
+    "sql": "SELECT oper_date AS 统计日期, SUM(wx)/SUM(pay_sum) AS 微信占比, SUM(zfb)/SUM(pay_sum) AS 支付宝占比, SUM(rmb)/SUM(pay_sum) AS 现金占比 FROM bss_business_day_data WHERE oper_date BETWEEN '2023-04-01' AND '2023-06-30' AND delete_ts IS NULL GROUP BY oper_date ORDER BY 统计日期;"
+  },
+  {
+    "question": "对比不同区域服务区现金支付占比分布",
+    "sql": "SELECT sa.service_area_type AS 区域类型, SUM(bd.rmb)/SUM(bd.pay_sum) AS 现金占比 FROM bss_business_day_data bd JOIN bss_service_area sa ON bd.service_no = sa.service_area_no WHERE bd.delete_ts IS NULL AND sa.delete_ts IS NULL GROUP BY sa.service_area_type;"
+  },
+  {
+    "question": "找出微信支付平均单笔金额最高的前5个档口",
+    "sql": "SELECT branch_name AS 档口名称, SUM(wx)/SUM(wx_order) AS 平均单笔金额 FROM bss_business_day_data WHERE wx_order > 0 AND delete_ts IS NULL GROUP BY branch_name ORDER BY 平均单笔金额 DESC LIMIT 5;"
+  },
+  {
+    "question": "统计各服务区支付宝支付渗透率(使用订单数)低于10%的记录",
+    "sql": "SELECT service_name AS 服务区名称, SUM(zf_order)/SUM(order_sum) AS 支付宝渗透率 FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY service_name HAVING SUM(zf_order)/SUM(order_sum) < 0.1;"
+  },
+  {
+    "question": "分析节假日(春节假期)期间各支付方式交易金额环比变化",
+    "sql": "SELECT oper_date AS 统计日期, SUM(wx) AS 微信交易额, SUM(zfb) AS 支付宝交易额, SUM(rmb) AS 现金交易额 FROM bss_business_day_data WHERE oper_date BETWEEN '2023-01-20' AND '2023-01-30' AND delete_ts IS NULL GROUP BY oper_date ORDER BY 统计日期;"
+  },
+  {
+    "question": "统计不同档口类型(餐饮/零售)的支付方式偏好对比",
+    "sql": "SELECT CASE WHEN branch_name LIKE '%餐饮%' THEN '餐饮' ELSE '零售' END AS 档口类型, SUM(wx)/SUM(pay_sum) AS 微信占比, SUM(zfb)/SUM(pay_sum) AS 支付宝占比 FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY 档口类型;"
+  },
+  {
+    "question": "计算各服务区行吧支付方式的月均交易次数",
+    "sql": "SELECT service_name AS 服务区名称, EXTRACT(MONTH FROM oper_date) AS 月份, AVG(xs_order) AS 月均交易次数 FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY 服务区名称, 月份 ORDER BY 服务区名称, 月份;"
+  },
+  {
+    "question": "找出金豆支付占比超过30%的服务区记录",
+    "sql": "SELECT service_name AS 服务区名称, SUM(jd)/SUM(pay_sum) AS 金豆占比 FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY service_name HAVING SUM(jd)/SUM(pay_sum) > 0.3;"
+  },
+  {
+    "question": "统计各区域档口数量与支付订单数的线性关系",
+    "sql": "SELECT service_name AS 服务区名称, COUNT(DISTINCT branch_no) AS 档口数量, SUM(order_sum) AS 总订单数 FROM bss_business_day_data WHERE delete_ts IS NULL GROUP BY service_name ORDER BY 总订单数 DESC;"
+  }
+]

+ 11 - 0
data_pipeline/training_data/task_20250702_194611/table_list.txt

@@ -0,0 +1,11 @@
+# 表清单文件
+# 生成时间: 2025-07-02 18:07:15
+# 表数量: 7
+
+bss_car_day_count
+bss_business_day_data
+bss_company
+bss_section_route
+bss_section_route_area_link
+bss_service_area
+bss_service_area_mapper

+ 15 - 0
data_pipeline/training_data/task_20250702_194611/task_config.json

@@ -0,0 +1,15 @@
+{
+  "task_id": "task_20250702_194611",
+  "created_at": "2025-07-02T19:46:11.570606",
+  "parameters": {
+    "db_connection": "postgresql://postgres:postgres@192.168.67.1:6432/highway_db",
+    "table_list_file": "{task_directory}/table_list.txt",
+    "business_context": "高速公路服务区管理系统",
+    "file_upload_mode": true,
+    "enable_llm_repair": true,
+    "modify_original_file": true,
+    "enable_sql_validation": true,
+    "enable_training_data_load": true
+  },
+  "output_directory": "C:\\Projects\\cursor_projects\\Vanna-Chainlit-Chromadb\\data_pipeline\\training_data\\task_20250702_194611"
+}

+ 117 - 0
data_pipeline/training_data/task_20250702_194611/task_result.json

@@ -0,0 +1,117 @@
+{
+  "success": true,
+  "workflow_state": {
+    "start_time": null,
+    "end_time": null,
+    "current_step": "training_data_load",
+    "completed_steps": [
+      "ddl_md_generation",
+      "question_sql_generation",
+      "sql_validation",
+      "training_data_load"
+    ],
+    "failed_steps": [],
+    "artifacts": {
+      "ddl_md_generation": {
+        "total_tables": 7,
+        "processed_successfully": 0,
+        "failed": 7,
+        "files_generated": 0,
+        "duration": 381.38542580604553
+      },
+      "question_sql_generation": {
+        "output_file": "C:\\Projects\\cursor_projects\\Vanna-Chainlit-Chromadb\\data_pipeline\\training_data\\task_20250702_194611\\qs_highway_db_20250702_200305_pair.json",
+        "total_questions": 50,
+        "total_themes": 5,
+        "successful_themes": 5,
+        "failed_themes": [],
+        "duration": 550.6145713329315
+      },
+      "sql_validation": {
+        "original_sql_count": 50,
+        "valid_sql_count": 48,
+        "invalid_sql_count": 2,
+        "success_rate": 0.96,
+        "repair_stats": {
+          "attempted": 4,
+          "successful": 2,
+          "failed": 2
+        },
+        "file_modification_stats": {
+          "modified": 2,
+          "deleted": 2,
+          "failed_modifications": 0
+        },
+        "average_execution_time": 0.039087777137756345,
+        "total_retries": 0,
+        "duration": 169.87258434295654
+      },
+      "training_data_load": {
+        "training_data_dir": "C:\\Projects\\cursor_projects\\Vanna-Chainlit-Chromadb\\data_pipeline\\training_data\\task_20250702_194611",
+        "load_successful": true,
+        "total_records": 568,
+        "data_type_counts": {
+          "sql": 489,
+          "documentation": 42,
+          "ddl": 36,
+          "error_sql": 1
+        },
+        "duration": 96.33159589767456
+      }
+    },
+    "statistics": {
+      "step1_duration": 381.38542580604553,
+      "step2_duration": 550.6145713329315,
+      "step3_duration": 169.87258434295654,
+      "step4_duration": 96.33159589767456
+    }
+  },
+  "artifacts": {
+    "ddl_md_generation": {
+      "total_tables": 7,
+      "processed_successfully": 0,
+      "failed": 7,
+      "files_generated": 0,
+      "duration": 381.38542580604553
+    },
+    "question_sql_generation": {
+      "output_file": "C:\\Projects\\cursor_projects\\Vanna-Chainlit-Chromadb\\data_pipeline\\training_data\\task_20250702_194611\\qs_highway_db_20250702_200305_pair.json",
+      "total_questions": 50,
+      "total_themes": 5,
+      "successful_themes": 5,
+      "failed_themes": [],
+      "duration": 550.6145713329315
+    },
+    "sql_validation": {
+      "original_sql_count": 50,
+      "valid_sql_count": 48,
+      "invalid_sql_count": 2,
+      "success_rate": 0.96,
+      "repair_stats": {
+        "attempted": 4,
+        "successful": 2,
+        "failed": 2
+      },
+      "file_modification_stats": {
+        "modified": 2,
+        "deleted": 2,
+        "failed_modifications": 0
+      },
+      "average_execution_time": 0.039087777137756345,
+      "total_retries": 0,
+      "duration": 169.87258434295654
+    },
+    "training_data_load": {
+      "training_data_dir": "C:\\Projects\\cursor_projects\\Vanna-Chainlit-Chromadb\\data_pipeline\\training_data\\task_20250702_194611",
+      "load_successful": true,
+      "total_records": 568,
+      "data_type_counts": {
+        "sql": 489,
+        "documentation": 42,
+        "ddl": 36,
+        "error_sql": 1
+      },
+      "duration": 96.33159589767456
+    }
+  }
+}

+ 844 - 0
docs/data_pipeline_api_auto_workflow_guide.md

@@ -0,0 +1,844 @@
+data_pipeline_api_auto_workflow_guide
+
+
+
+下面是完整执行步骤和API调用及返回说明
+
+### 1.创建训练任务
+
+POST `/api/v0/data_pipeline/tasks`
+
+POST http://localhost:8084/api/v0/data_pipeline/tasks 
+
+#### 1.1.参数示意:
+
+参数样例1:
+
+```JSON
+{
+    "task_name": "服务区初始化数据加载",
+    "db_name": "highway_db",
+    "business_context": "高速公路服务区管理系统"
+}
+```
+
+参数样例2:
+
+```json
+{
+    "db_name": "highway_db",
+    "business_context": "高速公路服务区管理系统",
+    "enable_sql_validation": true,
+    "enable_llm_repair": true,
+    "modify_original_file": true,
+    "enable_training_data_load": true
+}
+```
+
+#### 1.2.参数说明:
+
+##### 基础参数
+
+- table_list_file (string): 表清单文件路径,如未提供则进入文件上传模式,目前这种方式已经废弃。
+- business_context (string): 业务上下文描述,默认为"数据库管理系统",,使用默认值会严重LLM对数据表业务主题判断的准确性。
+- db_name (string): 数据库名称,如果不提供,从连接字符串中提取。
+- db_connection (string): 完整的PostgreSQL连接字符串
+
+##### 控制参数
+
+注意:目前所有的控制参数都不在WEB UI暴露给用户,它们的默认值都是true.
+
+- enable_sql_validation (boolean, 默认true): 是否启用SQL验证
+- enable_llm_repair (boolean, 默认true): 是否启用LLM修复
+- modify_original_file (boolean, 默认true): 是否修改原始文件
+- enable_training_data_load (boolean, 默认true): 是否启用训练数据加载
+
+```markdown
+1. DDL/MD生成 (必需)
+   ↓
+2. Question-SQL生成 (必需)
+   ↓
+3. SQL验证 (受enable_sql_validation控制)
+   ├─ SQL验证失败 → LLM修复 (受enable_llm_repair控制)
+   └─ 文件修改 (受modify_original_file控制)
+   ↓
+4. 训练数据加载 (受enable_training_data_load控制)
+```
+
+**对于前端UI**,主要提供四个参数 business_context 、db_name 、db_connection、task_name,如果db_connection连接串中填写了数据库的名字,那么db_name可以忽略。
+
+#### 1.3.预期返回结果
+
+POST http://localhost:8084/api/v0/data_pipeline/tasks
+
+```json
+{
+    "task_name": "服务区初始化数据加载",
+    "db_name": "highway_db",
+    "business_context": "高速公路服务区管理系统"
+}
+```
+
+下面是创建成功的返回结果,注意"task_id",后续的操作都需要使用这个"task_id".
+
+```Json
+{
+    "code": 200,
+    "data": {
+        "created_at": "2025-07-02T17:40:00.268100",
+        "file_upload_mode": true,
+        "next_step": "POST /api/v0/data_pipeline/tasks/task_20250702_174000/upload-table-list",
+        "response": "任务创建成功,请上传表清单文件后再执行任务",
+        "status": "pending",
+        "task_id": "task_20250702_174000",
+        "task_name": "服务区初始化数据加载"
+    },
+    "message": "操作成功",
+    "success": true
+}
+```
+
+
+
+### 2.提供表名列表
+
+有两种方式提交表名列表,这些表是将来用NL2SQL查询的,我们需要基于这些表的定义和数据生成训练数据集。另外,要注意上个步骤中返回的task_id,在接下来的步骤中都需要用到这个task_id.
+
+#### 2.1.直接提交表名列表
+
+##### a.) 通过下面的API获取当前数据库中的表名(可选步骤)
+
+**API**: `POST /api/v0/database/tables`
+
+支持下面两个参数,都是可选参数:
+如果要查询的数据库没有在app_config.py中配置,或者不是查询业务数据的表,那么需要提供db_connection字符串。
+
+```json
+{
+    "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db",
+    "schema": "public,ods,dw"
+}
+```
+
+POST: http://localhost:8084/api/v0/database/tables
+
+直接使用空参数{},会返回app_config.py中配置的业务数据库中所有public.* schema的表
+
+预期返回结果:
+
+```json
+{
+    "code": 200,
+    "data": {
+        "db_connection_info": {
+            "database": "highway_db"
+        },
+        "response": "获取表列表成功",
+        "schemas": [
+            "public"
+        ],
+        "tables": [
+            "public.bss_branch",
+            "public.bss_business_day_data",
+            "public.bss_car_day_count",
+            "public.bss_company",
+            "public.bss_section_route",
+            "public.bss_section_route_area_link",
+            "public.bss_service_area",
+            "public.bss_service_area_mapper",
+            "public.highway_metadata",
+            "public.qa_feedback"
+        ],
+        "total": 10
+    },
+    "message": "操作成功",
+    "success": true
+}
+```
+
+
+
+##### b.) 在线提交表名字符串列表
+
+API: POST /api/v0/data_pipeline/tasks/{task_id}/table-list
+
+POST  http://localhost:8084/api/v0/data_pipeline/tasks/task_20250702_144901/table-list
+
+只有一个必选参数 tables,后面的表名使用逗号分隔,支持 schema.table 的格式。
+
+```json
+{
+  "tables": "bss_car_day_count,bss_business_day_data,bss_company,bss_section_route,bss_section_route_area_link,bss_service_area,bss_service_area_mapper"
+}
+```
+
+预期返回结果:
+
+```json
+{
+    "code": 200,
+    "data": {
+        "created_time": "2025-07-02T18:07:15.596971",
+        "file_size": 220,
+        "file_size_formatted": "220.0 B",
+        "filename": "table_list.txt",
+        "original_count": 7,
+        "response": "表清单已成功创建,包含 7 个表",
+        "table_count": 7,
+        "task_id": "task_20250702_174000",
+        "unique_table_count": 7
+    },
+    "message": "操作成功",
+    "success": true
+}
+```
+
+
+
+#### 2.2.上传表名清单文件(*.txt)
+
+API: `POST /api/v0/data_pipeline/tasks/{task_id}/upload-table-list`
+
+预期返回结果:
+
+```json
+{
+    "code": 200,
+    "data": {
+        "file_size": 284,
+        "file_size_formatted": "284.0 B",
+        "filename": "table_list.txt",
+        "response": "表清单文件上传成功",
+        "task_id": "task_20250702_144901",
+        "upload_time": "2025-07-02T14:59:37.143754"
+    },
+    "message": "操作成功",
+    "success": true
+}
+```
+
+#### 2.3.验证表名(可选)
+
+主要用来排查问题的,目前前端UI不用关注这个API.
+
+**API**: `GET /api/v0/data_pipeline/tasks/{task_id}/table-list-info`
+
+GET http://localhost:8084/api/v0/data_pipeline/tasks/task_20250702_174000/table-list-info
+
+预期返回结果:
+
+```json
+{
+    "code": 200,
+    "data": {
+        "created_at": "2025-07-02T18:07:15.596353",
+        "exists": true,
+        "file_name": "table_list.txt",
+        "file_path": "C:\\Projects\\cursor_projects\\Vanna-Chainlit-Chromadb\\data_pipeline\\training_data\\task_20250702_174000\\table_list.txt",
+        "file_size": 220,
+        "file_size_formatted": "220.0 B",
+        "has_file": true,
+        "is_readable": true,
+        "response": "获取表清单文件信息成功",
+        "table_count": 7,
+        "task_id": "task_20250702_174000",
+        "uploaded_at": "2025-07-02T18:07:15.596971"
+    },
+    "message": "操作成功",
+    "success": true
+}
+```
+
+
+
+### 3.自动产生训练数据并加载的全过程 (完整工作流)
+
+API: POST:  /api/v0/data_pipeline/tasks/{task_id}/execute
+
+
+
+完整执行的参数:
+
+```json
+{
+    "execution_mode": "complete"
+}
+```
+
+预期返回结果:该作业属于异步执行,提交后调度成功就可以返回。
+
+```json
+{
+    "code": 200,
+    "data": {
+        "execution_mode": "complete",
+        "message": "任务正在后台执行,请通过状态接口查询进度",
+        "response": "任务执行已启动",
+        "step_name": null,
+        "task_id": "task_20250702_174000"
+    },
+    "message": "操作成功",
+    "success": true
+}
+```
+
+
+
+### 4.监控与日志
+
+#### 4.1. 任务状态监控
+
+**API**: `GET /api/v0/data_pipeline/tasks`
+
+GET: http://localhost:8084/api/v0/data_pipeline/tasks/task_20250702_174000
+
+下面的返回结果:
+
+a.) 正在执行第一步
+
+"ddl_generation": "running"
+
+```json
+{
+    "code": 200,
+    "data": {
+        "completed_at": null,
+        "created_at": "2025-07-02T17:40:00.268100",
+        "current_step": {
+            "execution_id": "task_20250702_174000_step_ddl_generation_exec_20250702_190410",
+            "started_at": "2025-07-02T19:04:09.933108",
+            "status": "running",
+            "step": "ddl_generation"
+        },
+        "error_message": null,
+        "parameters": {
+            "business_context": "高速公路服务区管理系统",
+            "db_connection": "postgresql://postgres:postgres@192.168.67.1:6432/highway_db",
+            "enable_llm_repair": true,
+            "enable_sql_validation": true,
+            "enable_training_data_load": true,
+            "file_upload_mode": true,
+            "modify_original_file": true,
+            "table_list_file": "{task_directory}/table_list.txt"
+        },
+        "response": "获取任务状态成功",
+        "result": null,
+        "started_at": "2025-07-02T19:04:09.925931",
+        "status": "in_progress",
+        "step_status": {
+            "ddl_generation": "running",
+            "qa_generation": "pending",
+            "sql_validation": "pending",
+            "training_load": "pending"
+        },
+        "steps": [
+            {
+                "completed_at": null,
+                "error_message": null,
+                "started_at": "2025-07-02T19:04:09.933108",
+                "step_name": "ddl_generation",
+                "step_status": "running"
+            },
+            {
+                "completed_at": null,
+                "error_message": null,
+                "started_at": null,
+                "step_name": "qa_generation",
+                "step_status": "pending"
+            },
+            {
+                "completed_at": null,
+                "error_message": null,
+                "started_at": null,
+                "step_name": "sql_validation",
+                "step_status": "pending"
+            },
+            {
+                "completed_at": null,
+                "error_message": null,
+                "started_at": null,
+                "step_name": "training_load",
+                "step_status": "pending"
+            }
+        ],
+        "task_id": "task_20250702_174000",
+        "task_name": "服务区初始化数据加载",
+        "total_steps": 4
+    },
+    "message": "操作成功",
+    "success": true
+}
+```
+
+b.) 四个步骤全部完成:
+        "status": "completed",
+        "step_status": {
+            "ddl_generation": "completed",
+            "qa_generation": "completed",
+            "sql_validation": "completed",
+            "training_load": "completed"
+        },
+
+```json
+{
+    "code": 200,
+    "data": {
+        "completed_at": "2025-07-02T19:21:03.007862",
+        "created_at": "2025-07-02T17:40:00.268100",
+        "current_step": null,
+        "error_message": null,
+        "parameters": {
+            "business_context": "高速公路服务区管理系统",
+            "db_connection": "postgresql://postgres:postgres@192.168.67.1:6432/highway_db",
+            "enable_llm_repair": true,
+            "enable_sql_validation": true,
+            "enable_training_data_load": true,
+            "file_upload_mode": true,
+            "modify_original_file": true,
+            "table_list_file": "{task_directory}/table_list.txt"
+        },
+        "response": "获取任务状态成功",
+        "result": null,
+        "started_at": "2025-07-02T19:04:09.925931",
+        "status": "completed",
+        "step_status": {
+            "ddl_generation": "completed",
+            "qa_generation": "completed",
+            "sql_validation": "completed",
+            "training_load": "completed"
+        },
+        "steps": [
+            {
+                "completed_at": "2025-07-02T19:10:18.599375",
+                "error_message": null,
+                "started_at": "2025-07-02T19:04:09.933108",
+                "step_name": "ddl_generation",
+                "step_status": "completed"
+            },
+            {
+                "completed_at": "2025-07-02T19:17:23.449415",
+                "error_message": null,
+                "started_at": "2025-07-02T19:10:18.602632",
+                "step_name": "qa_generation",
+                "step_status": "completed"
+            },
+            {
+                "completed_at": "2025-07-02T19:19:48.712247",
+                "error_message": null,
+                "started_at": "2025-07-02T19:17:23.453558",
+                "step_name": "sql_validation",
+                "step_status": "completed"
+            },
+            {
+                "completed_at": "2025-07-02T19:21:03.002708",
+                "error_message": null,
+                "started_at": "2025-07-02T19:19:48.715083",
+                "step_name": "training_load",
+                "step_status": "completed"
+            }
+        ],
+        "task_id": "task_20250702_174000",
+        "task_name": "服务区初始化数据加载",
+        "total_steps": 4
+    },
+    "message": "操作成功",
+    "success": true
+}
+```
+
+
+
+
+
+#### 4.2.查看任务日志
+
+**API**: `GET /api/v0/data_pipeline/tasks/{task_id}/logs`
+
+这个API 
+
+GET http://localhost:8084/api/v0/data_pipeline/tasks/task_20250702_174000/logs
+
+```json
+{
+    "code": 200,
+    "data": {
+        "log_file": "C:\\Projects\\cursor_projects\\Vanna-Chainlit-Chromadb\\data_pipeline\\training_data\\task_20250702_174000\\data_pipeline.log",
+        "logs": [
+            {
+                "level": "INFO",
+                "logger": "TaskDir_task_20250702_174000",
+                "message": "任务目录日志初始化完成 - 任务ID: task_20250702_174000",
+                "timestamp": "2025-07-02 19:04:10"
+            },
+            {
+                "level": "INFO",
+                "logger": "TaskDir_task_20250702_174000",
+                "message": "任务参数: {\"db_connection\": \"postgresql://postgres:postgres@192.168.67.1:6432/highway_db\", \"table_list_file\": \"{task_directory}/table_list.txt\", \"business_context\": \"高速公路服务区管理系统\", \"file_upload_mode\": true, \"enable_llm_repair\": true, \"modify_original_file\": true, \"enable_sql_validation\": true, \"enable_training_data_load\": true}",
+                "timestamp": "2025-07-02 19:04:10"
+            },
+            {
+                "level": "INFO",
+                "logger": "TaskDir_task_20250702_174000",
+                "message": "完整工作流任务开始执行",
+                "timestamp": "2025-07-02 19:04:10"
+            },
+            {
+                "level": "INFO",
+                "logger": "TaskDir_task_20250702_174000",
+                "message": "[ddl_generation] 开始执行步骤: ddl_generation",
+                "timestamp": "2025-07-02 19:04:10"
+            },
+            {
+                "level": "INFO",
+                "logger": "TaskDir_task_20250702_174000",
+                "message": "[ddl_generation] 开始执行DDL/MD生成步骤\n2025-07-02 19:04:10 [INFO] [data_pipeline.SchemaWorkflowOrchestrator] schema_workflow.py:167 - ============================================================",
+                "timestamp": "2025-07-02 19:04:10"
+            },
+            {
+                "level": "INFO",
+                "logger": "data_pipeline.SchemaWorkflowOrchestrator",
+                "message": "============================================================",
+                "timestamp": "2025-07-02 19:04:10"
+            },
+            {
+                "level": "INFO",
+                "logger": "[data_pipeline.SchemaWorkflowOrchestrator] schema_workflow.py:168 - 📝 步骤1",
+                "message": "开始生成DDL和MD文件",
+                "timestamp": "2025-07-02 19:04:10"
+            },
+            {
+                "level": "INFO",
+                "logger": "data_pipeline.SchemaWorkflowOrchestrator",
+                "message": "📝 步骤1: 开始生成DDL和MD文件\n2025-07-02 19:04:10 [INFO] [data_pipeline.SchemaWorkflowOrchestrator] schema_workflow.py:169 - ============================================================",
+                "timestamp": "2025-07-02 19:04:10"
+            },
+            {
+                "level": "INFO",
+                "logger": "data_pipeline.SchemaWorkflowOrchestrator",
+                "message": "============================================================\n2025-07-02 19:04:10 [INFO] [data_pipeline.SchemaTrainingDataAgent] training_data_agent.py:68 - 🚀 开始生成Schema训练数据",
+                "timestamp": "2025-07-02 19:04:10"
+            },
+            {
+                "level": "INFO",
+                "logger": "[data_pipeline.SchemaTrainingDataAgent] training_data_agent.py:115 - 初始化完成,输出目录",
+                "message": "C:\\Projects\\cursor_projects\\Vanna-Chainlit-Chromadb\\data_pipeline\\training_data\\task_20250702_174000",
+                "timestamp": "2025-07-02 19:04:10"
+            },
+            {
+                "level": "INFO",
+                "logger": "[data_pipeline.SchemaTrainingDataAgent] training_data_agent.py:136 - 数据库权限检查完成",
+                "message": "{'connect': True, 'select_metadata': True, 'select_data': True, 'is_readonly': False}\n2025-07-02 19:04:10 [INFO] [data_pipeline.SchemaTrainingDataAgent] training_data_agent.py:142 - 📋 从清单文件读取到 7 个表",
+                "timestamp": "2025-07-02 19:04:10"
+            },
+            {
+                "level": "INFO",
+                "logger": "[data_pipeline.SchemaTrainingDataAgent] training_data_agent.py:164 - 🔄 开始并发处理 7 个表 (最大并发",
+                "message": "1)",
+                "timestamp": "2025-07-02 19:04:10"
+            },
+            {
+                "level": "INFO",
+                "logger": "[data_pipeline.SchemaTrainingDataAgent] training_data_agent.py:203 - 🔍 开始处理表",
+                "message": "public.bss_car_day_count",
+                "timestamp": "2025-07-02 19:04:10"
+            },
+            {
+                "level": "ERROR",
+                "logger": "[data_pipeline.SchemaTrainingDataAgent] training_data_agent.py:234 - ❌ 表 public.bss_car_day_count 处理失败,耗时",
+                "message": "55.71秒",
+                "timestamp": "2025-07-02 19:05:06"
+            },
+			... ...
+        ],
+        "response": "获取任务日志成功",
+        "source": "file",
+        "task_id": "task_20250702_174000",
+        "total": 23
+    },
+    "message": "操作成功",
+    "success": true
+}
+```
+
+
+
+#### 4.3.查看和下载文件
+
+##### a.) 查看生成的训练数据文件
+
+**API**: `GET /api/v0/data_pipeline/tasks/{task_id}/files`
+
+GET: http://localhost:8084/api/v0/data_pipeline/tasks/task_20250702_174000/files
+
+```json
+{
+    "code": 200,
+    "data": {
+        "directory_info": {
+            "directory_path": "C:\\Projects\\cursor_projects\\Vanna-Chainlit-Chromadb\\data_pipeline\\training_data\\task_20250702_174000",
+            "exists": true,
+            "total_files": 26,
+            "total_size": 104982,
+            "total_size_formatted": "102.5 KB"
+        },
+        "files": [
+            {
+                "created_at": "2025-07-02T19:04:10.194958",
+                "file_name": "data_pipeline.log",
+                "file_size": 35951,
+                "file_size_formatted": "35.1 KB",
+                "file_type": "log",
+                "is_readable": true,
+                "modified_at": "2025-07-02T19:21:03.233582"
+            },
+            {
+                "created_at": "2025-07-02T19:21:03.230334",
+                "file_name": "task_result.json",
+                "file_size": 3601,
+                "file_size_formatted": "3.5 KB",
+                "file_type": "json",
+                "is_readable": true,
+                "modified_at": "2025-07-02T19:21:03.230878"
+            },
+            {
+                "created_at": "2025-07-02T19:19:48.483686",
+                "file_name": "sql_validation_20250702_191948_summary.log",
+                "file_size": 2839,
+                "file_size_formatted": "2.8 KB",
+                "file_type": "log",
+                "is_readable": true,
+                "modified_at": "2025-07-02T19:19:48.484199"
+            },
+			... ...
+            {
+                "created_at": "2025-07-02T18:07:15.596353",
+                "file_name": "table_list.txt",
+                "file_size": 220,
+                "file_size_formatted": "220.0 B",
+                "file_type": "text",
+                "is_readable": true,
+                "modified_at": "2025-07-02T18:07:15.596971"
+            }
+        ],
+        "response": "获取任务文件列表成功",
+        "task_id": "task_20250702_174000"
+    },
+    "message": "操作成功",
+    "success": true
+}
+```
+
+##### b.) 下载生成的文件
+
+**API**: `GET /api/v0/data_pipeline/tasks/{task_id}/files/{file_name}`
+
+GET http://localhost:8084/api/v0/data_pipeline/tasks/task_20250702_174000/files/bss_company.ddl
+
+返回文件的内容:
+
+```
+-- 中文名: 业务支撑系统公司信息表
+-- 描述: 业务支撑系统公司信息表,存储服务区关联企业的基础信息及状态变更记录
+create table public.bss_company (
+  id varchar(32) not null     -- 主键ID,主键,
+  version integer not null    -- 版本号,
+  create_ts timestamp         -- 创建时间,
+  created_by varchar(50)      -- 创建人ID,
+  update_ts timestamp         -- 更新时间,
+  updated_by varchar(50)      -- 更新人ID,
+  delete_ts timestamp         -- 删除时间,
+  deleted_by varchar(50)      -- 删除人ID,
+  company_name varchar(255)   -- 公司名称,
+  company_no varchar(255)     -- 公司编码,
+  primary key (id)
+);
+```
+
+
+
+#### 4.4.查看历史任务列表(管理员)
+
+**API**: `GET /api/v0/data_pipeline/tasks`
+
+GET: http://localhost:8084/api/v0/data_pipeline/tasks
+
+预期返回:
+
+```json
+{
+    "code": 200,
+    "data": {
+        "limit": 50,
+        "offset": 0,
+        "response": "获取任务列表成功",
+        "tasks": [
+            {
+                "business_context": "高速公路服务区管理系统",
+                "completed_at": "2025-07-02T19:21:03.007862",
+                "created_at": "2025-07-02T17:40:00.268100",
+                "created_by": "guest",
+                "db_name": "highway_db",
+                "started_at": "2025-07-02T19:04:09.925931",
+                "status": "completed",
+                "step_status": "all_completed",
+                "task_id": "task_20250702_174000",
+                "task_name": "服务区初始化数据加载"
+            },
+            {
+                "business_context": "测试向后兼容性",
+                "completed_at": null,
+                "created_at": "2025-07-02T17:39:31.751256",
+                "created_by": "guest",
+                "db_name": "test_db",
+                "started_at": null,
+                "status": "pending",
+                "step_status": "pending",
+                "task_id": "task_20250702_173932",
+                "task_name": null
+            },
+            {
+                "business_context": "高速公路服务区管理系统",
+                "completed_at": null,
+                "created_at": "2025-07-02T17:39:30.680619",
+                "created_by": "guest",
+                "db_name": "highway_db",
+                "started_at": null,
+                "status": "pending",
+                "step_status": "pending",
+                "task_id": "task_20250702_173931",
+                "task_name": "测试任务_高速公路数据分析"
+            },
+            {
+                "business_context": "高速公路服务区管理系统",
+                "completed_at": null,
+                "created_at": "2025-07-02T17:38:53.251452",
+                "created_by": "guest",
+                "db_name": "highway_db",
+                "started_at": null,
+                "status": "pending",
+                "step_status": "pending",
+                "task_id": "task_20250702_173852",
+                "task_name": "测试任务_高速公路数据分析"
+            },
+            {
+                "business_context": "高速公路服务区管理系统",
+                "completed_at": null,
+                "created_at": "2025-07-02T17:06:35.438861",
+                "created_by": "guest",
+                "db_name": "highway_db",
+                "started_at": null,
+                "status": "pending",
+                "step_status": "pending",
+                "task_id": "task_20250702_170635",
+                "task_name": null
+            },
+            {
+                "business_context": "高速公路服务区管理系统",
+                "completed_at": null,
+                "created_at": "2025-07-02T14:49:02.267179",
+                "created_by": "guest",
+                "db_name": "highway_db",
+                "started_at": null,
+                "status": "pending",
+                "step_status": "pending",
+                "task_id": "task_20250702_144901",
+                "task_name": null
+            },
+            {
+                "business_context": "高速公路服务区管理系统",
+                "completed_at": null,
+                "created_at": "2025-07-02T01:09:52.930419",
+                "created_by": "guest",
+                "db_name": "highway_db",
+                "started_at": "2025-07-02T01:22:14.539878",
+                "status": "in_progress",
+                "step_status": "partial_completed",
+                "task_id": "task_20250702_010952",
+                "task_name": null
+            },
+            {
+                "business_context": "高速公路服务区管理系统",
+                "completed_at": "2025-07-02T01:19:57.163044",
+                "created_at": "2025-07-01T23:18:50.085424",
+                "created_by": "guest",
+                "db_name": "highway_db",
+                "started_at": "2025-07-01T23:36:53.411362",
+                "status": "failed",
+                "step_status": "failed",
+                "task_id": "task_20250701_231850",
+                "task_name": null
+            },
+            {
+                "business_context": "高速公路服务区管理系统",
+                "completed_at": null,
+                "created_at": "2025-07-01T22:40:37.182904",
+                "created_by": "guest",
+                "db_name": "highway_db",
+                "started_at": null,
+                "status": "pending",
+                "step_status": "pending",
+                "task_id": "task_20250701_224036",
+                "task_name": null
+            },
+            {
+                "business_context": "高速公路服务区管理系统",
+                "completed_at": null,
+                "created_at": "2025-07-01T14:38:33.755737",
+                "created_by": "guest",
+                "db_name": "highway_db",
+                "started_at": null,
+                "status": "pending",
+                "step_status": "pending",
+                "task_id": "task_20250701_223833",
+                "task_name": null
+            },
+            {
+                "business_context": "高速公路服务区管理系统",
+                "completed_at": null,
+                "created_at": "2025-07-01T14:20:42.631833",
+                "created_by": "guest",
+                "db_name": "highway_db",
+                "started_at": null,
+                "status": "pending",
+                "step_status": "pending",
+                "task_id": "task_20250701_222042",
+                "task_name": null
+            },
+            {
+                "business_context": "高速公路服务区管理系统",
+                "completed_at": "2025-07-01T14:05:04.194755",
+                "created_at": "2025-07-01T13:34:35.478473",
+                "created_by": "guest",
+                "db_name": "highway_db",
+                "started_at": "2025-07-01T13:35:06.200488",
+                "status": "completed",
+                "step_status": "all_completed",
+                "task_id": "task_20250701_213434",
+                "task_name": null
+            },
+            {
+                "business_context": "高速公路服务区管理系统",
+                "completed_at": null,
+                "created_at": "2025-07-01T13:24:25.700551",
+                "created_by": "guest",
+                "db_name": "highway_db",
+                "started_at": "2025-07-01T13:25:59.712938",
+                "status": "in_progress",
+                "step_status": "pending",
+                "task_id": "task_20250701_212426",
+                "task_name": null
+            }
+        ],
+        "total": 13
+    },
+    "message": "操作成功",
+    "success": true
+}
+```
+
+
+
+
+
+
+

+ 271 - 0
docs/data_pipeline_api_workflow_guide.md

@@ -0,0 +1,271 @@
+# Data Training API 训练数据集产生与加载过程的概要说明
+
+## 概述
+
+Data Training API 提供了通过REST API创建训练数据集和加载训练数据集的完整解决方案。整个过程分为四个主要步骤,支持全自动执行和分步手动执行两种模式。
+
+## 核心处理流程
+
+训练数据集的创建和加载包含以下四个步骤:
+
+1. **DDL/MD生成** (`ddl_generation`) - 根据表名生成带有注释的DDL文件和Markdown文档
+2. **Question-SQL生成** (`qa_generation`) - 基于DDL和MD文件生成问答对JSON文件
+3. **SQL验证和修复** (`sql_validation`) - 验证生成的SQL语句并使用LLM纠正错误
+4. **训练数据加载** (`training_load`) - 将生成的文件加载到训练数据库中
+
+## 执行模式
+
+### 全自动模式 (`complete`)
+- 一次性执行完整的四步流程
+- 执行后将训练数据直接加载到训练数据库中
+
+### 分步执行模式 (`step`)
+- 逐步执行各个阶段,每步完成后可人工检查结果
+- 可在任一步骤暂停并修改训练数据
+
+## API工作流程
+
+### 阶段一:任务准备
+
+#### 1.1 创建任务
+**API**: `POST /api/v0/data_pipeline/tasks`
+
+创建一个新的数据管道任务,获取唯一的task_id用于后续操作。
+
+**两种创建方式**:
+- **在线提交表名列表**:直接提供表名列表,逗号分隔。
+- **文件上传模式**:不提供table_list_file,后续通过上传接口提供
+
+#### 1.2 提供表名清单(二选一)
+
+如果创建任务时选择了文件上传模式,需要通过以下两种方式之一提供表清单:
+
+**方式一:上传表清单文件**
+**API**: `POST /api/v0/data_pipeline/tasks/{task_id}/upload-table-list`
+
+上传包含目标表名列表的文本文件。
+
+**方式二:在线提交表名**
+
+a.) 获取目标表的表名:
+**API**: `POST /api/v0/database/tables`
+
+b.) 直接提交表名列表:
+**API**: `POST /api/v0/data_pipeline/tasks/{task_id}/table-list`
+
+支持数组格式或逗号分隔格式提交表名列表,系统会自动在task目录下创建table_list.txt文件。
+
+#### 1.3 获取表清单信息(可选)
+**API**: `GET /api/v0/data_pipeline/tasks/{task_id}/table-list-info`
+
+验证表清单文件是否正确上传,查看包含的表数量等信息。
+
+### 阶段二:任务执行
+
+#### 2.1 选择执行模式
+根据业务需求选择执行模式:
+
+**全自动执行**:
+```json
+POST /api/v0/data_pipeline/tasks/{task_id}/execute
+{
+  "execution_mode": "complete"
+}
+```
+
+**分步执行**:
+
+```json
+POST /api/v0/data_pipeline/tasks/{task_id}/execute
+{
+  "execution_mode": "step",
+  "step_name": "ddl_generation"
+}
+```
+
+这里的step_name只能写一个,它可以是:ddl_generation/qa_generation/sql_validation/training_load
+
+#### 2.2 步骤详细说明
+
+**步骤1: DDL/MD生成** (`ddl_generation`)
+- 连接业务数据库读取表结构
+- 使用LLM生成中文注释
+- 输出DDL文件和详细的Markdown文档
+- 生成文件:`{table_name}.ddl`, `{table_name}_detail.md`
+
+**步骤2: Question-SQL生成** (`qa_generation`)
+
+- 分析MD文档提取业务主题
+- 为每个主题生成多个问答对
+- 输出JSON格式的问答数据
+- 生成文件:`qs_{db_name}_{timestamp}_pair.json`, `metadata.txt`, `metadata_detail.md`
+
+**步骤3: SQL验证和修复** (`sql_validation`)
+- 使用PostgreSQL EXPLAIN验证SQL语法
+- 对无效SQL调用LLM进行修复
+- 生成验证报告和修复统计
+- 可选择是否修改原始JSON文件
+
+**步骤4: 训练数据加载** (`training_load`)
+- 将DDL、文档和问答对加载到向量数据库
+- 建立训练数据索引
+- 验证加载结果
+
+### 阶段三:监控和管理
+
+#### 3.1 任务状态监控
+**API**: `GET /api/v0/data_pipeline/tasks/{task_id}`
+
+实时查看任务整体状态和各步骤的执行进度。
+
+**状态值说明**:
+- `pending` - 等待执行
+- `in_progress` - 正在执行
+- `completed` - 执行完成
+- `failed` - 执行失败
+- `partial_completed` - 部分完成
+
+#### 3.2 查看任务日志
+**API**: `GET /api/v0/data_pipeline/tasks/{task_id}/logs`
+
+获取详细的执行日志,支持按级别过滤和行数限制。
+
+#### 3.3 文件管理
+**API**: `GET /api/v0/data_pipeline/tasks/{task_id}/files`
+
+查看任务生成的所有文件列表。
+
+**API**: `GET /api/v0/data_pipeline/tasks/{task_id}/files/{file_name}`
+
+下载指定的输出文件。
+
+### 阶段四:辅助功能
+
+#### 4.1 任务列表管理(管理员)
+
+**API**: `GET /api/v0/data_pipeline/tasks`
+
+查看历史任务列表,支持状态过滤和分页。
+
+#### 4.2 数据库表查询
+**API**: `POST /api/v0/database/tables`
+
+查询业务数据库中的可用表列表,用于构建表清单文件。
+
+#### 4.3 获取数据表的ddl和字段注释信息
+
+**API:** `POST /api/v0/database/table/ddl`
+
+获取指定表的ddl定义和LLM生成的字段注释信息
+
+## 典型使用场景
+
+### 场景一:全自动创建训练数据集
+
+```
+1. POST /api/v0/data_pipeline/tasks
+   (创建任务,提供table_list_file、business_context等参数)
+
+2. 提供表清单 (二选一):上传*.txt表名列表文件,或者在线提交逗号分隔的表名字符串
+   方式A: POST /api/v0/data_pipeline/tasks/{task_id}/upload-table-list
+          (上传表清单文件)
+   方式B: POST /api/v0/data_pipeline/tasks/{task_id}/table-list
+          (直接提交表名列表)
+
+3. POST /api/v0/data_pipeline/tasks/{task_id}/execute
+   (execution_mode: "complete")
+
+4. GET /api/v0/data_pipeline/tasks/{task_id}
+   (轮询监控执行状态)
+
+5. GET /api/v0/data_pipeline/tasks/{task_id}/files
+   (完成后查看生成的文件)
+```
+
+### 场景二:分步手动控制
+
+```
+1. POST /api/v0/data_pipeline/tasks
+   (创建任务,不提供table_list_file)
+
+2. 提供表清单 (二选一):上传*.txt表名列表文件,或者在线提交逗号分隔的表名字符串
+   方式A: POST /api/v0/data_pipeline/tasks/{task_id}/upload-table-list
+          (上传表清单文件)
+   方式B: POST /api/v0/data_pipeline/tasks/{task_id}/table-list
+          (直接提交表名列表)
+
+3. POST /api/v0/data_pipeline/tasks/{task_id}/execute
+   (execution_mode: "step", step_name: "ddl_generation")
+
+4. GET /api/v0/data_pipeline/tasks/{task_id}
+   (检查DDL生成结果)
+
+5. POST /api/v0/data_pipeline/tasks/{task_id}/execute
+   (execution_mode: "step", step_name: "qa_generation")
+
+6. 重复步骤4-5执行剩余步骤 (sql_validation, training_load)
+```
+
+## 依赖关系
+
+### 步骤间依赖
+- `qa_generation` 依赖 `ddl_generation` 完成
+- `sql_validation` 依赖 `qa_generation` 完成
+- `training_load` 依赖前三个步骤完成
+
+### 外部依赖
+- **业务数据库连接**:读取表结构和样本数据
+- **LLM服务**:生成注释、主题提取、SQL修复
+- **向量数据库**:存储最终的训练数据
+
+### 配置依赖
+- 数据库连接配置
+- LLM模型配置
+- 文件存储路径配置
+
+## 输出文件说明
+
+每个任务在 `./data_pipeline/training_data/{task_id}/` 目录下生成以下文件:
+
+### ddl_generation:
+
+- **DDL文件**: `{table_name}.ddl` - 包含注释的建表语句
+- **文档文件**: `{table_name}_detail.md` - 详细的表结构说明
+
+### qa_generation:
+
+- **问答文件**: `qs_{db_name}_{timestamp}_pair.json` - 问答对数据
+- **元数据文件**: `metadata.txt`, `metadata_detail.md` - 业务主题元数据
+
+### sql_validation:
+
+- **数据验证日志**:sql_validation_20250701_234912_summary.log
+- **qa文件修改日志**:file_modifications_20250701_234912.log
+
+### training_load:
+
+- **日志文件**: `data_pipeline.log`  - training_load没有专门日志,它写入到data_pipeline.log.
+
+**其它文件:**
+
+- **配置文件**: `task_config.json` - 任务配置信息
+- **日志文件**: `data_pipeline.log` - 详细执行日志
+
+通过文件下载API可获取所有生成的文件用于验证和后续处理。 
+
+## 错误处理
+
+### 任务级错误
+
+- 数据库连接失败
+- 权限不足
+- 配置错误
+
+### 步骤级错误
+
+- 表不存在或无权限访问
+- LLM调用失败
+- SQL验证失败
+- 文件写入失败
+
+所有错误信息通过任务状态API和日志API提供详细信息,支持问题诊断和故障排除。

+ 615 - 0
docs/database_table_api_guide.md

@@ -0,0 +1,615 @@
+# 数据库表API使用指南
+
+本文档介绍数据库表查询相关的API接口,包括获取表列表和表结构分析功能。
+
+## API概览
+
+| API端点 | 功能 | 必需参数 | 可选参数 |
+|---------|------|----------|----------|
+| `POST /api/v0/database/tables` | 获取数据库表列表 | 无 | `db_connection`, `schema` |
+| `POST /api/v0/database/table/ddl` | 获取表DDL和结构分析 | `table` | `db_connection`, `business_context`, `type` |
+
+## 1. 获取数据库表列表
+
+### 接口信息
+- **URL**: `POST /api/v0/database/tables`
+- **功能**: 获取指定数据库中的表列表
+- **特点**: 纯数据库查询,不涉及AI功能
+
+### 请求参数
+
+| 参数名 | 类型 | 必需 | 说明 | 示例 |
+|--------|------|------|------|------|
+| `db_connection` | string | 否 | PostgreSQL连接字符串<br/>不传则使用默认配置 | `"postgresql://user:pass@host:port/db"` |
+| `schema` | string | 否 | Schema名称,支持逗号分隔多个<br/>默认为"public" | `"public,ods,dw"` |
+
+### 请求示例
+
+#### 使用默认数据库配置
+```json
+{
+    "schema": "public,ods"
+}
+```
+
+#### 使用指定数据库
+```json
+{
+    "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/bank_db",
+    "schema": "public"
+}
+```
+
+#### 最简调用(使用所有默认值)
+
+此时的数据库连接配置来自于app_config.py
+
+```json
+{}
+```
+
+### 响应示例
+
+```json
+{
+    "code": 200,
+    "data": {
+        "db_connection_info": {
+            "database": "highway_db"
+        },
+        "response": "获取表列表成功",
+        "schemas": [
+            "public"
+        ],
+        "tables": [
+            "public.bss_branch_copy",
+            "public.bss_business_day_data",
+            "public.bss_car_day_count",
+            "public.bss_company",
+            "public.bss_section_route",
+            "public.bss_section_route_area_link",
+            "public.bss_service_area",
+            "public.bss_service_area_mapper"
+        ],
+        "total": 8
+    },
+    "message": "操作成功",
+    "success": true
+}
+```
+
+## 2. 获取表DDL和结构分析
+
+### 接口信息
+- **URL**: `POST /api/v0/database/table/ddl`
+- **功能**: 获取表的DDL语句或Markdown文档,支持AI智能注释生成
+- **特点**: 结合数据库查询和AI分析能力
+
+### 请求参数
+
+| 参数名 | 类型 | 必需 | 说明 | 示例 |
+|--------|------|------|------|------|
+| `table` | string | **是** | 表名,支持schema.table格式 | `"public.bank_churners"` |
+| `db_connection` | string | 否 | PostgreSQL连接字符串<br/>不传则使用默认配置 | `"postgresql://user:pass@host:port/db"` |
+| `business_context` | string | 否 | 业务上下文描述<br/>传入则启用AI注释生成 | `"银行信用卡持卡人信息"` |
+| `type` | string | 否 | 输出类型:`ddl`/`md`/`both`<br/>默认为"ddl" | `"md"` |
+
+### 请求示例
+
+#### 基础DDL获取(使用默认配置)
+```json
+{
+    "table": "public.bank_churners"
+}
+```
+
+```json
+{
+    "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/bank_db",
+    "table": "public.bank_churners",
+    "business_context": "银行信用卡用户统计表"
+}
+```
+
+注意:只有提供 "business_context" 参数且不为空时,才会在返回结果中生成字段注释。
+
+```json
+{
+    "code": 200,
+    "data": {
+        "ddl": "-- 中文名: 银行信用卡客户流失分析表\n-- 描述: 银行信用卡客户流失分析表,记录用户人口统计特征及流失状态,用于预测客户流失风险并制定客户保留策略。\ncreate table public.bank_churners (\n  client_num bigint not null  -- 客户编号,主键,\n  attrition_flag varchar(32)  -- 客户流失标识,\n  customer_age smallint       -- 客户年龄,\n  gender varchar(8)           -- 性别,\n  dependent_count smallint    -- 家属数量,\n  education_level varchar(32) -- 学历等级,\n  marital_status varchar(16)  -- 婚姻状况,\n  income_category varchar(32) -- 收入等级,\n  card_category varchar(16)   -- 信用卡类别,\n  months_on_book smallint     -- 开户月份数,\n  credit_limit numeric(12,2)  -- 信用额度,\n  total_revolving_bal numeric(12,2) -- 总循环余额,\n  avg_open_to_buy numeric(12,2) -- 平均可用额度,\n  total_amt_chng_q4_q1 double precision -- 季度交易金额变化率,\n  total_trans_amt numeric(12,2) -- 总交易金额,\n  total_trans_ct smallint     -- 总交易次数,\n  total_ct_chng_q4_q1 double precision -- 季度交易次数变化率,\n  avg_utilization_ratio double precision -- 平均利用率,\n  nb_classifier_attrition_flag_1 double precision -- 流失预测模型1得分,\n  nb_classifier_attrition_flag_2 double precision -- 流失预测模型2得分,\n  primary key (client_num)\n);",
+        "fields": [
+            {
+                "comment": "客户编号",
+                "default_value": null,
+                "enum_values": null,
+                "is_enum": false,
+                "is_foreign_key": false,
+                "is_primary_key": true,
+                "name": "client_num",
+                "nullable": false,
+                "type": "bigint"
+            },
+            {
+                "comment": "客户流失标识",
+                "default_value": null,
+                "enum_values": [
+                    "Existing Customer",
+                    "Attrited Customer"
+                ],
+                "is_enum": true,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "attrition_flag",
+                "nullable": true,
+                "type": "character varying"
+            },
+            {
+                "comment": "客户年龄",
+                "default_value": null,
+                "enum_values": null,
+                "is_enum": false,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "customer_age",
+                "nullable": true,
+                "type": "smallint"
+            },
+            {
+                "comment": "性别",
+                "default_value": null,
+                "enum_values": [
+                    "F",
+                    "M"
+                ],
+                "is_enum": true,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "gender",
+                "nullable": true,
+                "type": "character varying"
+            },
+            {
+                "comment": "家属数量",
+                "default_value": null,
+                "enum_values": null,
+                "is_enum": false,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "dependent_count",
+                "nullable": true,
+                "type": "smallint"
+            },
+            {
+                "comment": "学历等级",
+                "default_value": null,
+                "enum_values": [
+                    "Graduate",
+                    "High School",
+                    "Unknown",
+                    "Uneducated",
+                    "College",
+                    "Post-Graduate",
+                    "Doctorate"
+                ],
+                "is_enum": true,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "education_level",
+                "nullable": true,
+                "type": "character varying"
+            },
+            {
+                "comment": "婚姻状况",
+                "default_value": null,
+                "enum_values": [
+                    "Married",
+                    "Single",
+                    "Unknown",
+                    "Divorced"
+                ],
+                "is_enum": true,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "marital_status",
+                "nullable": true,
+                "type": "character varying"
+            },
+            {
+                "comment": "收入等级",
+                "default_value": null,
+                "enum_values": [
+                    "Less than $40K",
+                    "$40K - $60K",
+                    "$80K - $120K",
+                    "$60K - $80K",
+                    "Unknown",
+                    "$120K +"
+                ],
+                "is_enum": true,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "income_category",
+                "nullable": true,
+                "type": "character varying"
+            },
+            {
+                "comment": "信用卡类别",
+                "default_value": null,
+                "enum_values": [
+                    "Blue",
+                    "Silver",
+                    "Gold",
+                    "Platinum"
+                ],
+                "is_enum": true,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "card_category",
+                "nullable": true,
+                "type": "character varying"
+            },
+            {
+                "comment": "开户月份数",
+                "default_value": null,
+                "enum_values": null,
+                "is_enum": false,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "months_on_book",
+                "nullable": true,
+                "type": "smallint"
+            },
+            {
+                "comment": "信用额度",
+                "default_value": null,
+                "enum_values": null,
+                "is_enum": false,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "credit_limit",
+                "nullable": true,
+                "type": "numeric"
+            },
+            {
+                "comment": "总循环余额",
+                "default_value": null,
+                "enum_values": null,
+                "is_enum": false,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "total_revolving_bal",
+                "nullable": true,
+                "type": "numeric"
+            },
+            {
+                "comment": "平均可用额度",
+                "default_value": null,
+                "enum_values": null,
+                "is_enum": false,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "avg_open_to_buy",
+                "nullable": true,
+                "type": "numeric"
+            },
+            {
+                "comment": "季度交易金额变化率",
+                "default_value": null,
+                "enum_values": null,
+                "is_enum": false,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "total_amt_chng_q4_q1",
+                "nullable": true,
+                "type": "double precision"
+            },
+            {
+                "comment": "总交易金额",
+                "default_value": null,
+                "enum_values": null,
+                "is_enum": false,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "total_trans_amt",
+                "nullable": true,
+                "type": "numeric"
+            },
+            {
+                "comment": "总交易次数",
+                "default_value": null,
+                "enum_values": null,
+                "is_enum": false,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "total_trans_ct",
+                "nullable": true,
+                "type": "smallint"
+            },
+            {
+                "comment": "季度交易次数变化率",
+                "default_value": null,
+                "enum_values": null,
+                "is_enum": false,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "total_ct_chng_q4_q1",
+                "nullable": true,
+                "type": "double precision"
+            },
+            {
+                "comment": "平均利用率",
+                "default_value": null,
+                "enum_values": null,
+                "is_enum": false,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "avg_utilization_ratio",
+                "nullable": true,
+                "type": "double precision"
+            },
+            {
+                "comment": "流失预测模型1得分",
+                "default_value": null,
+                "enum_values": null,
+                "is_enum": false,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "nb_classifier_attrition_flag_1",
+                "nullable": true,
+                "type": "double precision"
+            },
+            {
+                "comment": "流失预测模型2得分",
+                "default_value": null,
+                "enum_values": null,
+                "is_enum": false,
+                "is_foreign_key": false,
+                "is_primary_key": false,
+                "name": "nb_classifier_attrition_flag_2",
+                "nullable": true,
+                "type": "double precision"
+            }
+        ],
+        "generation_info": {
+            "business_context": "银行信用卡用户统计表",
+            "database": "bank_db",
+            "has_llm_comments": true,
+            "output_type": "ddl"
+        },
+        "response": "获取表DDL成功",
+        "table_info": {
+            "comment": "银行信用卡客户流失分析表,记录用户人口统计特征及流失状态,用于预测客户流失风险并制定客户保留策略。",
+            "field_count": 20,
+            "full_name": "public.bank_churners",
+            "row_count": 10127,
+            "schema_name": "public",
+            "table_name": "bank_churners",
+            "table_size": "2008 kB"
+        }
+    },
+    "message": "操作成功",
+    "success": true
+}
+```
+
+
+
+
+
+#### 生成智能注释的Markdown文档
+
+```json
+{
+    "table": "public.bank_churners",
+    "business_context": "银行信用卡持卡人信息表,用于分析客户流失情况",
+    "type": "md"
+}
+```
+
+#### 指定数据库获取DDL和MD
+```json
+{
+    "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/bank_db",
+    "table": "public.bank_churners",
+    "business_context": "银行信用卡持卡人信息",
+    "type": "both"
+}
+```
+
+### 响应示例
+
+#### DDL模式响应
+```json
+{
+    "success": true,
+    "code": 200,
+    "message": "获取表DDL成功",
+    "data": {
+        "ddl": "-- 中文名: 银行信用卡持卡人信息表\ncreate table public.bank_churners (\n  client_num bigint not null,\n  attrition_flag varchar(32),\n  ...\n);",
+        "table_info": {
+            "table_name": "bank_churners",
+            "schema_name": "public",
+            "full_name": "public.bank_churners",
+            "comment": "银行信用卡持卡人信息表,记录客户流失状态、人口统计特征及账户活跃度数据",
+            "field_count": 20,
+            "row_count": 10127,
+            "table_size": "2008 kB"
+        },
+        "fields": [
+            {
+                "name": "client_num",
+                "type": "bigint",
+                "nullable": false,
+                "comment": "客户编号",
+                "is_primary_key": true,
+                "is_foreign_key": false,
+                "default_value": null,
+                "is_enum": false,
+                "enum_values": null
+            },
+            {
+                "name": "attrition_flag",
+                "type": "character varying",
+                "nullable": true,
+                "comment": "客户流失标志",
+                "is_primary_key": false,
+                "is_foreign_key": false,
+                "default_value": null,
+                "is_enum": true,
+                "enum_values": ["Existing Customer", "Attrited Customer"]
+            }
+        ],
+        "generation_info": {
+            "business_context": "银行信用卡持卡人信息",
+            "output_type": "ddl",
+            "has_llm_comments": true,
+            "database": "bank_db"
+        }
+    }
+}
+```
+
+## 功能特性说明
+
+### AI智能注释功能
+
+当传入 `business_context` 参数时,系统会:
+
+1. **生成中文注释**: 基于业务上下文为表和字段生成准确的中文注释
+2. **识别枚举字段**: 自动检测可能的枚举类型字段(如状态、类型、级别等)
+3. **验证枚举值**: 查询数据库获取字段的实际枚举值
+4. **优化字段描述**: 结合字段名、数据类型和样例数据生成更准确的描述
+
+### 输出类型说明
+
+| 类型 | 说明 | 适用场景 |
+|------|------|----------|
+| `ddl` | 返回CREATE TABLE语句 | 数据库迁移、表结构复制 |
+| `md` | 返回Markdown格式文档 | 文档生成、团队共享 |
+| `both` | 同时返回DDL和MD | 完整的表分析需求 |
+
+### 默认数据库配置
+
+当不传入 `db_connection` 参数时,系统会自动使用 `app_config.py` 中的 `APP_DB_CONFIG` 配置:
+
+- 便于内部系统调用
+- 减少重复的连接参数传递
+- 保持与其他服务的数据库一致性
+
+## 错误处理
+
+### 常见错误码
+
+| 错误码 | 说明 | 解决方案 |
+|--------|------|----------|
+| 400 | 缺少必需参数 | 检查请求参数,确保传入必需的参数 |
+| 500 | 数据库连接失败 | 检查数据库连接字符串和网络连接 |
+| 500 | 表不存在 | 确认表名正确,检查schema和表是否存在 |
+
+### 错误响应示例
+
+```json
+{
+    "success": false,
+    "code": 400,
+    "message": "请求参数错误",
+    "data": {
+        "response": "缺少必需参数:table",
+        "missing_params": ["table"],
+        "timestamp": "2025-07-02T10:30:00"
+    }
+}
+```
+
+## 使用注意事项
+
+### 性能考虑
+
+1. **表列表查询**: 速度较快,适合频繁调用
+2. **DDL分析**: 涉及AI处理,响应时间较长(5-30秒)
+3. **大表处理**: 系统会自动进行智能采样,避免性能问题
+
+### 安全考虑
+
+1. **数据库连接**: 使用独立连接,不影响其他服务
+2. **权限要求**: 需要数据库的SELECT权限
+3. **并发安全**: 支持多用户同时调用,无资源冲突
+
+### 最佳实践
+
+1. **内部调用**: 推荐不传 `db_connection`,使用默认配置
+2. **外部调用**: 明确传入 `db_connection` 参数
+3. **文档生成**: 传入详细的 `business_context` 以获得更好的AI注释
+4. **批量处理**: 先调用表列表API获取所有表,再逐个分析
+
+## 示例代码
+
+### Python调用示例
+
+```python
+import requests
+
+# 获取表列表
+def get_tables(schema="public"):
+    url = "http://localhost:8084/api/v0/database/tables"
+    data = {"schema": schema}
+    response = requests.post(url, json=data)
+    return response.json()
+
+# 获取表DDL
+def get_table_ddl(table, business_context=None, output_type="ddl"):
+    url = "http://localhost:8084/api/v0/database/table/ddl"
+    data = {
+        "table": table,
+        "type": output_type
+    }
+    if business_context:
+        data["business_context"] = business_context
+    
+    response = requests.post(url, json=data)
+    return response.json()
+
+# 使用示例
+tables = get_tables("public")
+ddl = get_table_ddl("public.bank_churners", "银行客户信息", "md")
+```
+
+### JavaScript调用示例
+
+```javascript
+// 获取表列表
+async function getTables(schema = 'public') {
+    const response = await fetch('/api/v0/database/tables', {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ schema })
+    });
+    return await response.json();
+}
+
+// 获取表DDL
+async function getTableDDL(table, businessContext = null, type = 'ddl') {
+    const data = { table, type };
+    if (businessContext) {
+        data.business_context = businessContext;
+    }
+    
+    const response = await fetch('/api/v0/database/table/ddl', {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify(data)
+    });
+    return await response.json();
+}
+```
+
+## 更新日志
+
+- **v1.0** (2025-07-02): 初始版本,支持基础的表查询和DDL生成
+- **v1.1** (2025-07-02): 新增AI智能注释功能,支持枚举字段识别
+- **v1.2** (2025-07-02): `db_connection` 参数改为可选,支持使用默认配置
+
+---
+
+如有问题或建议,请联系开发团队。 

+ 369 - 0
test_table_inspector_api.py

@@ -0,0 +1,369 @@
+#!/usr/bin/env python3
+"""
+表检查API测试脚本
+
+用于测试新实现的表列表获取API功能
+"""
+
+import requests
+import json
+
+# 测试配置
+API_BASE_URL = "http://localhost:8084"
+ENDPOINT = "/api/v0/database/tables"
+
+def test_get_tables():
+    """测试获取表列表API"""
+    
+    # 测试数据
+    test_cases = [
+        {
+            "name": "测试默认schema(public)",
+            "payload": {
+                "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db"
+            },
+            "expected_schemas": ["public"]
+        },
+        {
+            "name": "测试指定单个schema",
+            "payload": {
+                "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db",
+                "schema": "public"
+            },
+            "expected_schemas": ["public"]
+        },
+        {
+            "name": "测试多个schema",
+            "payload": {
+                "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db",
+                "schema": "public,information_schema"
+            },
+            "expected_schemas": ["public", "information_schema"]
+        },
+        {
+            "name": "测试空schema参数",
+            "payload": {
+                "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db",
+                "schema": ""
+            },
+            "expected_schemas": ["public"]
+        }
+    ]
+    
+    print("🧪 开始测试表检查API")
+    print("=" * 50)
+    
+    for i, test_case in enumerate(test_cases, 1):
+        print(f"\n{i}. {test_case['name']}")
+        print("-" * 30)
+        
+        try:
+            # 发送请求
+            response = requests.post(
+                f"{API_BASE_URL}{ENDPOINT}",
+                json=test_case["payload"],
+                headers={"Content-Type": "application/json"},
+                timeout=30
+            )
+            
+            print(f"📤 请求: {json.dumps(test_case['payload'], ensure_ascii=False)}")
+            print(f"📊 状态码: {response.status_code}")
+            
+            if response.status_code == 200:
+                data = response.json()
+                
+                if data.get("success"):
+                    result_data = data.get("data", {})
+                    tables = result_data.get("tables", [])
+                    schemas = result_data.get("schemas", [])
+                    
+                    print(f"✅ 成功")
+                    print(f"📋 返回表数量: {len(tables)}")
+                    print(f"🏷️  查询的schemas: {schemas}")
+                    print(f"📝 前5个表: {tables[:5]}")
+                    
+                    # 验证schema
+                    if schemas == test_case["expected_schemas"]:
+                        print(f"✅ Schema验证通过")
+                    else:
+                        print(f"❌ Schema验证失败: 期望{test_case['expected_schemas']}, 实际{schemas}")
+                        
+                else:
+                    print(f"❌ API返回失败: {data.get('message')}")
+            else:
+                print(f"❌ HTTP错误: {response.status_code}")
+                try:
+                    error_data = response.json()
+                    print(f"   错误信息: {error_data.get('message', 'N/A')}")
+                except:
+                    print(f"   响应内容: {response.text}")
+                    
+        except requests.exceptions.RequestException as e:
+            print(f"❌ 请求异常: {e}")
+        except Exception as e:
+            print(f"❌ 其他错误: {e}")
+
+def test_error_cases():
+    """测试错误情况"""
+    
+    print("\n\n🚨 测试错误情况")
+    print("=" * 50)
+    
+    error_test_cases = [
+        {
+            "name": "缺少db_connection参数",
+            "payload": {
+                "schema": "public"
+            },
+            "expected_status": 400
+        },
+        {
+            "name": "无效的数据库连接",
+            "payload": {
+                "db_connection": "postgresql://invalid:invalid@localhost:5432/invalid"
+            },
+            "expected_status": 500
+        }
+    ]
+    
+    for i, test_case in enumerate(error_test_cases, 1):
+        print(f"\n{i}. {test_case['name']}")
+        print("-" * 30)
+        
+        try:
+            response = requests.post(
+                f"{API_BASE_URL}{ENDPOINT}",
+                json=test_case["payload"],
+                headers={"Content-Type": "application/json"},
+                timeout=10
+            )
+            
+            print(f"📤 请求: {json.dumps(test_case['payload'], ensure_ascii=False)}")
+            print(f"📊 状态码: {response.status_code}")
+            
+            if response.status_code == test_case["expected_status"]:
+                print(f"✅ 错误处理正确")
+            else:
+                print(f"❌ 期望状态码{test_case['expected_status']}, 实际{response.status_code}")
+                
+            # 显示错误信息
+            try:
+                error_data = response.json()
+                print(f"📄 错误信息: {error_data.get('message', 'N/A')}")
+            except:
+                print(f"📄 响应内容: {response.text[:200]}")
+                
+        except requests.exceptions.Timeout:
+            print(f"⏰ 请求超时(这是预期的,对于无效连接)")
+        except Exception as e:
+            print(f"❌ 异常: {e}")
+
+def test_get_table_ddl():
+    """测试获取表DDL API"""
+    
+    print("\n\n🧪 测试表DDL生成API")
+    print("=" * 50)
+    
+    # 测试数据
+    test_cases = [
+        {
+            "name": "测试DDL格式输出",
+            "payload": {
+                "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db",
+                "table": "public.bss_company",
+                "business_context": "高速公路服务区管理系统",
+                "type": "ddl"
+            }
+        },
+        {
+            "name": "测试MD格式输出",
+            "payload": {
+                "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db",
+                "table": "public.bss_company",
+                "business_context": "高速公路服务区管理系统",
+                "type": "md"
+            }
+        },
+        {
+            "name": "测试同时输出DDL和MD",
+            "payload": {
+                "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db",
+                "table": "public.bss_company",
+                "business_context": "高速公路服务区管理系统",
+                "type": "both"
+            }
+        },
+        {
+            "name": "测试不指定业务上下文",
+            "payload": {
+                "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db",
+                "table": "public.bss_company",
+                "type": "ddl"
+            }
+        }
+    ]
+    
+    endpoint = "/api/v0/database/table/ddl"
+    
+    for i, test_case in enumerate(test_cases, 1):
+        print(f"\n{i}. {test_case['name']}")
+        print("-" * 30)
+        
+        try:
+            # 发送请求
+            response = requests.post(
+                f"{API_BASE_URL}{endpoint}",
+                json=test_case["payload"],
+                headers={"Content-Type": "application/json"},
+                timeout=60  # DDL生成可能需要更长时间(LLM调用)
+            )
+            
+            print(f"📤 请求: {json.dumps(test_case['payload'], ensure_ascii=False)}")
+            print(f"📊 状态码: {response.status_code}")
+            
+            if response.status_code == 200:
+                data = response.json()
+                
+                if data.get("success"):
+                    result_data = data.get("data", {})
+                    table_info = result_data.get("table_info", {})
+                    generation_info = result_data.get("generation_info", {})
+                    
+                    print(f"✅ 成功")
+                    print(f"📋 表信息: {table_info.get('full_name')} ({table_info.get('field_count')}字段)")
+                    print(f"💡 生成信息: {generation_info}")
+                    
+                    # 检查输出内容
+                    output_type = test_case["payload"].get("type", "ddl")
+                    if output_type in ["ddl", "both"] and "ddl" in result_data:
+                        ddl_lines = result_data["ddl"].count('\n')
+                        print(f"🔧 DDL内容: {ddl_lines}行")
+                        # 显示DDL的前几行
+                        ddl_preview = '\n'.join(result_data["ddl"].split('\n')[:3])
+                        print(f"   预览: {ddl_preview}...")
+                    
+                    if output_type in ["md", "both"] and "md" in result_data:
+                        md_lines = result_data["md"].count('\n')
+                        print(f"📄 MD内容: {md_lines}行")
+                        # 显示MD的标题行
+                        md_lines_list = result_data["md"].split('\n')
+                        if md_lines_list:
+                            print(f"   标题: {md_lines_list[0]}")
+                    
+                    if "fields" in result_data:
+                        print(f"🗂️  字段数量: {len(result_data['fields'])}")
+                        
+                else:
+                    print(f"❌ API返回失败: {data.get('message')}")
+            else:
+                print(f"❌ HTTP错误: {response.status_code}")
+                try:
+                    error_data = response.json()
+                    print(f"   错误信息: {error_data.get('message', 'N/A')}")
+                except:
+                    print(f"   响应内容: {response.text[:200]}")
+                    
+        except requests.exceptions.Timeout:
+            print(f"⏰ 请求超时(LLM处理可能需要较长时间)")
+        except requests.exceptions.RequestException as e:
+            print(f"❌ 请求异常: {e}")
+        except Exception as e:
+            print(f"❌ 其他错误: {e}")
+
+def test_ddl_error_cases():
+    """测试DDL API的错误情况"""
+    
+    print("\n\n🚨 测试DDL API错误情况")
+    print("=" * 50)
+    
+    endpoint = "/api/v0/database/table/ddl"
+    error_test_cases = [
+        {
+            "name": "缺少db_connection参数",
+            "payload": {
+                "table": "public.test"
+            },
+            "expected_status": 400
+        },
+        {
+            "name": "缺少table参数",
+            "payload": {
+                "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db"
+            },
+            "expected_status": 400
+        },
+        {
+            "name": "无效的type参数",
+            "payload": {
+                "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db",
+                "table": "public.test",
+                "type": "invalid"
+            },
+            "expected_status": 400
+        },
+        {
+            "name": "不存在的表",
+            "payload": {
+                "db_connection": "postgresql://postgres:postgres@192.168.67.1:5432/highway_db",
+                "table": "public.non_existent_table_12345"
+            },
+            "expected_status": 500
+        }
+    ]
+    
+    for i, test_case in enumerate(error_test_cases, 1):
+        print(f"\n{i}. {test_case['name']}")
+        print("-" * 30)
+        
+        try:
+            response = requests.post(
+                f"{API_BASE_URL}{endpoint}",
+                json=test_case["payload"],
+                headers={"Content-Type": "application/json"},
+                timeout=10
+            )
+            
+            print(f"📤 请求: {json.dumps(test_case['payload'], ensure_ascii=False)}")
+            print(f"📊 状态码: {response.status_code}")
+            
+            if response.status_code == test_case["expected_status"]:
+                print(f"✅ 错误处理正确")
+            else:
+                print(f"❌ 期望状态码{test_case['expected_status']}, 实际{response.status_code}")
+                
+            # 显示错误信息
+            try:
+                error_data = response.json()
+                print(f"📄 错误信息: {error_data.get('message', 'N/A')}")
+            except:
+                print(f"📄 响应内容: {response.text[:200]}")
+                
+        except requests.exceptions.Timeout:
+            print(f"⏰ 请求超时(这可能是预期的)")
+        except Exception as e:
+            print(f"❌ 异常: {e}")
+
+if __name__ == "__main__":
+    print("🚀 表检查API测试开始")
+    print(f"🌐 API地址: {API_BASE_URL}")
+    
+    # 首先测试表列表API
+    test_get_tables()
+    
+    # 然后测试表列表API的错误情况
+    test_error_cases()
+    
+    # 测试DDL生成API
+    test_get_table_ddl()
+    
+    # 测试DDL API的错误情况
+    test_ddl_error_cases()
+    
+    print("\n" + "=" * 50)
+    print("🏁 所有测试完成")
+    print("\n💡 使用说明:")
+    print("   - 表列表API: POST /api/v0/database/tables")
+    print("   - 表DDL API: POST /api/v0/database/table/ddl")
+    print("   - 如果看到连接错误,请确保数据库服务器可访问")
+    print("   - DDL生成包含LLM调用,可能需要较长时间")
+    print("   - 支持三种输出格式:ddl、md、both")