/ddl/parse 接口存在格式不一致的问题:
在 DDLParser.parse_ddl() 方法中存在两个不同的格式要求:
table_info + columns_optimize_ddl_prompt() 要求多表格式:{table_name: {name_zh, meta: [...]}}两个提示词相互冲突,导致LLM返回不确定的格式。
当前格式缺少参考文档中的重要字段:
is_primary - 是否主键comment - 注释内容nullable - 是否可为空修复前(第86-140行):
def parse_ddl(self, sql_content):
prompt = self._optimize_ddl_prompt()
payload = {
"model": self.model_name,
"messages": [
{
"role": "system",
"content": """请从上传的SQL DDL文件中提取数据库表的结构信息...
{
"table_info": {...},
"columns": [...]
}
"""
},
{
"role": "user",
"content": f"{prompt}\n\n{sql_content}"
}
]
}
修复后:
def parse_ddl(self, sql_content):
prompt = self._optimize_ddl_prompt()
payload = {
"model": self.model_name,
"messages": [
{
"role": "system",
"content": "你是一个专业的SQL DDL语句解析专家,擅长从DDL语句中提取表结构信息并转换为结构化的JSON格式。"
},
{
"role": "user",
"content": f"{prompt}\n\n{sql_content}"
}
]
}
_optimize_ddl_prompt() 函数修复前(第251-289行):
def _optimize_ddl_prompt(self):
return """
...
参考格式如下:
{
"users_table": {
"name_zh": "用户表",
"schema": "public",
"meta": [{
"name_en": "id",
"data_type": "integer",
"name_zh": "用户ID"
}]
}
}
"""
修复后:
def _optimize_ddl_prompt(self):
return """
请解析以下DDL建表语句,并按照指定的JSON格式返回结果:
规则说明:
1. 从DDL语句中识别所有表,可能会有多个表。每个表作为一个独立条目。
2. 表的英文名称(name_en)使用原始大小写,不要转换为小写。
3. 表的中文名称(name_zh)提取规则:
- 优先从COMMENT ON TABLE语句中提取
- 如果没有注释,则name_zh为空字符串
4. 对于每个表,提取所有字段信息到columns数组中,每个字段包含:
- name_zh: 字段中文名称
- name_en: 字段英文名称(保持原始大小写)
- data_type: 数据类型(包含长度信息,如VARCHAR(22))
- is_primary: 是否主键("是"或"否",从PRIMARY KEY约束判断)
- comment: 注释内容(从COMMENT ON COLUMN提取完整注释)
- nullable: 是否可为空("是"或"否",从NOT NULL约束判断)
返回格式(支持多表):
{
"TB_JC_KSDZB": {
"table_info": {
"name_zh": "科室对照表",
"name_en": "TB_JC_KSDZB"
},
"columns": [
{
"name_zh": "医疗机构代码",
"name_en": "YLJGDM",
"data_type": "VARCHAR(22)",
"is_primary": "是",
"comment": "医疗机构代码,复合主键",
"nullable": "否"
}
]
}
}
"""
[
{
"table_info": {
"name_zh": "科室对照表",
"name_en": "TB_JC_KSDZB"
},
"columns": [
{
"name_zh": "医疗机构代码",
"name_en": "YLJGDM",
"data_type": "VARCHAR(22)",
"is_primary": "是",
"comment": "医疗机构代码,复合主键",
"nullable": "否"
},
{
"name_zh": "HIS科室代码",
"name_en": "HISKSDM",
"data_type": "CHAR(20)",
"is_primary": "是",
"comment": "HIS科室代码,主键、唯一",
"nullable": "否"
}
]
}
]
[
{
"table_info": {
"name_zh": "科室对照表",
"name_en": "TB_JC_KSDZB"
},
"columns": [...]
},
{
"table_info": {
"name_zh": "用户表",
"name_en": "TB_USER"
},
"columns": [...]
}
]
{
"code": 200,
"data": [
{
"table_info": {
"name_zh": "科室对照表",
"name_en": "TB_JC_KSDZB"
},
"columns": [...]
}
],
"message": "操作成功"
}
_optimize_ddl_prompt() 来定义格式is_primary: 是否主键("是"/"否")comment: 完整的注释内容nullable: 是否可为空("是"/"否")table_info: 包含表的基本信息columns: 字段数组,每个字段包含所有必需属性TB_JC_KSDZB, YLJGDM)CREATE TABLE TB_JC_KSDZB (
YLJGDM VARCHAR(22) NOT NULL,
HISKSDM CHAR(20) NOT NULL,
HISKSMC CHAR(20) NOT NULL,
PRIMARY KEY (YLJGDM, HISKSDM)
);
COMMENT ON TABLE TB_JC_KSDZB IS '科室对照表';
COMMENT ON COLUMN TB_JC_KSDZB.YLJGDM IS '医疗机构代码,复合主键';
COMMENT ON COLUMN TB_JC_KSDZB.HISKSDM IS 'HIS科室代码,主键、唯一';
COMMENT ON COLUMN TB_JC_KSDZB.HISKSMC IS 'HIS科室名称';
[
{
"table_info": {
"name_zh": "科室对照表",
"name_en": "TB_JC_KSDZB"
},
"columns": [
{
"name_zh": "医疗机构代码",
"name_en": "YLJGDM",
"data_type": "VARCHAR(22)",
"is_primary": "是",
"comment": "医疗机构代码,复合主键",
"nullable": "否"
},
{
"name_zh": "HIS科室代码",
"name_en": "HISKSDM",
"data_type": "CHAR(20)",
"is_primary": "是",
"comment": "HIS科室代码,主键、唯一",
"nullable": "否"
},
{
"name_zh": "HIS科室名称",
"name_en": "HISKSMC",
"data_type": "CHAR(20)",
"is_primary": "否",
"comment": "HIS科室名称",
"nullable": "否"
}
]
}
]
table_info + columns 结构table_info 包含 name_zh 和 name_encolumns 是数组,每个元素包含:
name_zh - 字段中文名name_en - 字段英文名data_type - 数据类型(含长度)is_primary - 是否主键comment - 注释内容nullable - 是否可空# 方式1:上传SQL文件
curl -X POST http://localhost:5000/api/resource/ddl/parse \
-F "file=@科室对照表_原始.sql"
# 方式2:JSON请求
curl -X POST http://localhost:5000/api/resource/ddl/parse \
-H "Content-Type: application/json" \
-d '{"sql": "CREATE TABLE ..."}'
from app.core.llm.ddl_parser import DDLParser
# 读取SQL文件
with open('科室对照表_原始.sql', 'r', encoding='utf-8') as f:
sql_content = f.read()
# 解析DDL
parser = DDLParser()
result = parser.parse_ddl(sql_content)
# 结果格式(数组)
# [
# {
# "table_info": {...},
# "columns": [...]
# }
# ]
app/core/llm/ddl_parser.py
parse_ddl() 方法(第86-140行)_optimize_ddl_prompt() 方法(第251-289行)接口路由
app/api/data_resource/routes.py - /ddl/parse 接口(第602行)2024-11-27