Bläddra i källkod

新增data_parse里处理重复记录的接口。新增职业轨迹时,使用主记录的图片路径。
新增重复记录处理逻辑。

hedpma 3 veckor sedan
förälder
incheckning
4e4ad42495
3 ändrade filer med 1193 tillägg och 57 borttagningar
  1. 261 0
      API_DOCUMENTATION_DUPLICATE_RECORDS.md
  2. 260 1
      app/api/data_parse/routes.py
  3. 672 56
      app/core/data_parse/parse.py

+ 261 - 0
API_DOCUMENTATION_DUPLICATE_RECORDS.md

@@ -0,0 +1,261 @@
+# 重复记录处理API文档
+
+## 概述
+
+本文档描述了DataOps平台中用于处理名片重复记录的API接口。当系统检测到新上传的名片可能与现有记录重复时,会创建重复记录条目供管理员处理。
+
+## API接口列表
+
+### 1. 获取重复记录列表
+
+**接口地址**: `GET /api/data-parse/get-duplicate-records`
+
+**查询参数**:
+- `status` (可选): 筛选特定状态的记录
+  - `pending`: 待处理
+  - `processed`: 已处理
+  - `ignored`: 已忽略
+
+**请求示例**:
+```bash
+# 获取所有重复记录
+GET /api/data-parse/get-duplicate-records
+
+# 获取待处理的重复记录
+GET /api/data-parse/get-duplicate-records?status=pending
+```
+
+**响应示例**:
+```json
+{
+  "code": 200,
+  "success": true,
+  "message": "获取重复记录列表成功",
+  "count": 2,
+  "data": [
+    {
+      "id": 1,
+      "main_card_id": 123,
+      "suspected_duplicates": [
+        {
+          "id": 101,
+          "name_zh": "张三",
+          "mobile": "13812345678",
+          "hotel_zh": "北京丽思卡尔顿酒店",
+          "title_zh": "总监",
+          "created_at": "2024-01-15 10:30:00"
+        }
+      ],
+      "duplicate_reason": "姓名相同但手机号码不同:张三,新手机号:13987654321",
+      "processing_status": "pending",
+      "created_at": "2024-01-16 09:15:00",
+      "main_card": {
+        "id": 123,
+        "name_zh": "张三",
+        "mobile": "13987654321",
+        "hotel_zh": "上海丽思卡尔顿酒店",
+        "title_zh": "总经理",
+        "image_path": "abc123def456.jpg"
+      }
+    }
+  ]
+}
+```
+
+### 2. 处理重复记录
+
+**接口地址**: `POST /api/data-parse/process-duplicate-record/<duplicate_id>`
+
+**路径参数**:
+- `duplicate_id`: 重复记录ID
+
+**请求参数**:
+```json
+{
+  "action": "merge_to_suspected|keep_main|ignore",
+  "selected_duplicate_id": 101,  // 当action为merge_to_suspected时必填
+  "processed_by": "admin",       // 可选
+  "notes": "确认为同一人,合并记录"  // 可选
+}
+```
+
+**处理动作说明**:
+- `merge_to_suspected`: 合并到选中的疑似重复记录,删除主记录
+- `keep_main`: 保留主记录,标记为已处理
+- `ignore`: 忽略重复提醒,标记为已处理
+
+**请求示例**:
+```bash
+POST /api/data-parse/process-duplicate-record/1
+Content-Type: application/json
+
+{
+  "action": "merge_to_suspected",
+  "selected_duplicate_id": 101,
+  "processed_by": "admin",
+  "notes": "确认为同一人,职位有升职,合并到原记录"
+}
+```
+
+**响应示例**:
+```json
+{
+  "code": 200,
+  "success": true,
+  "message": "重复记录处理成功,操作: merge_to_suspected",
+  "data": {
+    "duplicate_record": {
+      "id": 1,
+      "processing_status": "processed",
+      "processed_at": "2024-01-16 14:30:00",
+      "processed_by": "admin",
+      "processing_notes": "确认为同一人,职位有升职,合并到原记录"
+    },
+    "result": {
+      "id": 101,
+      "name_zh": "张三",
+      "mobile": "13987654321",
+      "hotel_zh": "上海丽思卡尔顿酒店",
+      "title_zh": "总经理",
+      "image_path": "abc123def456.jpg",
+      "career_path": [
+        {
+          "date": "2024-01-15",
+          "hotel_zh": "北京丽思卡尔顿酒店",
+          "title_zh": "总监",
+          "image_path": "old123image456.jpg",
+          "source": "business_card_creation"
+        },
+        {
+          "date": "2024-01-16",
+          "hotel_zh": "上海丽思卡尔顿酒店",
+          "title_zh": "总经理",
+          "image_path": "abc123def456.jpg",
+          "source": "business_card_update"
+        }
+      ]
+    }
+  }
+}
+```
+
+### 3. 获取重复记录详情
+
+**接口地址**: `GET /api/data-parse/get-duplicate-record-detail/<duplicate_id>`
+
+**路径参数**:
+- `duplicate_id`: 重复记录ID
+
+**请求示例**:
+```bash
+GET /api/data-parse/get-duplicate-record-detail/1
+```
+
+**响应示例**:
+```json
+{
+  "code": 200,
+  "success": true,
+  "message": "获取重复记录详情成功",
+  "data": {
+    "id": 1,
+    "main_card_id": 123,
+    "suspected_duplicates": [
+      {
+        "id": 101,
+        "name_zh": "张三",
+        "name_en": "John Zhang",
+        "mobile": "13812345678",
+        "hotel_zh": "北京丽思卡尔顿酒店",
+        "hotel_en": "The Ritz-Carlton Beijing",
+        "title_zh": "总监",
+        "title_en": "Director",
+        "created_at": "2024-01-15 10:30:00"
+      }
+    ],
+    "duplicate_reason": "姓名相同但手机号码不同:张三,新手机号:13987654321",
+    "processing_status": "pending",
+    "created_at": "2024-01-16 09:15:00",
+    "main_card": {
+      "id": 123,
+      "name_zh": "张三",
+      "name_en": "John Zhang",
+      "mobile": "13987654321",
+      "hotel_zh": "上海丽思卡尔顿酒店",
+      "hotel_en": "The Ritz-Carlton Shanghai",
+      "title_zh": "总经理",
+      "title_en": "General Manager",
+      "image_path": "abc123def456.jpg",
+      "career_path": [
+        {
+          "date": "2024-01-16",
+          "hotel_zh": "上海丽思卡尔顿酒店",
+          "hotel_en": "The Ritz-Carlton Shanghai",
+          "title_zh": "总经理",
+          "title_en": "General Manager",
+          "image_path": "abc123def456.jpg",
+          "source": "business_card_creation"
+        }
+      ]
+    }
+  }
+}
+```
+
+## 业务流程
+
+### 1. 重复记录检测流程
+
+```mermaid
+graph TD
+    A[上传新名片] --> B[AI解析名片信息]
+    B --> C[检查重复记录]
+    C --> D{是否发现重复?}
+    D -->|否| E[创建新记录]
+    D -->|是| F{姓名和手机号都相同?}
+    F -->|是| G[自动更新现有记录]
+    F -->|否| H[创建主记录 + 重复记录条目]
+    H --> I[等待管理员处理]
+```
+
+### 2. 重复记录处理流程
+
+```mermaid
+graph TD
+    A[管理员查看重复记录] --> B[选择处理方式]
+    B --> C{处理动作}
+    C -->|merge_to_suspected| D[合并到疑似重复记录]
+    C -->|keep_main| E[保留主记录]
+    C -->|ignore| F[忽略重复提醒]
+    D --> G[更新目标记录信息]
+    G --> H[添加职业轨迹条目]
+    H --> I[删除主记录]
+    E --> J[标记为已处理]
+    F --> J
+    I --> K[完成处理]
+    J --> K
+```
+
+## 错误码说明
+
+| HTTP状态码 | 错误码 | 说明 |
+|-----------|--------|------|
+| 200 | 200 | 操作成功 |
+| 400 | 400 | 请求参数错误 |
+| 404 | 404 | 记录不存在 |
+| 500 | 500 | 服务器内部错误 |
+
+## 注意事项
+
+1. **权限控制**: 重复记录处理操作建议仅对管理员开放
+2. **数据一致性**: 合并操作会删除主记录,请确保操作前数据备份
+3. **图片路径**: 合并后的记录会使用最新的名片图片路径
+4. **职业轨迹**: 合并操作会自动更新职业轨迹,包含对应的图片路径
+5. **处理状态**: 一旦处理状态改为`processed`,记录不能再次处理
+
+## 使用建议
+
+1. **定期检查**: 建议定期查看待处理的重复记录
+2. **谨慎合并**: 合并操作不可逆,请仔细核对信息再操作
+3. **添加备注**: 建议在处理时添加详细的处理备注便于追溯
+4. **批量处理**: 可以通过脚本批量处理大量重复记录 

+ 260 - 1
app/api/data_parse/routes.py

@@ -1,6 +1,6 @@
 from flask import jsonify, request, make_response, Blueprint, current_app, send_file
 from app.api.data_parse import bp
-from app.core.data_parse.parse import parse_data, process_business_card, update_business_card, get_business_cards, update_business_card_status, create_talent_tag, get_talent_tag_list, update_talent_tag, delete_talent_tag, query_neo4j_graph, talent_get_tags, talent_update_tags, get_business_card, get_hotel_positions_list, add_hotel_positions, update_hotel_positions, query_hotel_positions, delete_hotel_positions, get_hotel_group_brands_list, add_hotel_group_brands, update_hotel_group_brands, query_hotel_group_brands, delete_hotel_group_brands
+from app.core.data_parse.parse import parse_data, process_business_card, update_business_card, get_business_cards, update_business_card_status, create_talent_tag, get_talent_tag_list, update_talent_tag, delete_talent_tag, query_neo4j_graph, talent_get_tags, talent_update_tags, get_business_card, get_hotel_positions_list, add_hotel_positions, update_hotel_positions, query_hotel_positions, delete_hotel_positions, get_hotel_group_brands_list, add_hotel_group_brands, update_hotel_group_brands, query_hotel_group_brands, delete_hotel_group_brands, get_duplicate_records, process_duplicate_record, get_duplicate_record_detail
 from app.config.config import DevelopmentConfig, ProductionConfig
 import logging
 import boto3
@@ -12,6 +12,88 @@ import os
 import urllib.parse
 from minio import Minio
 
+"""
+DataOps平台 - 数据解析API路由模块
+
+本模块包含以下功能的API接口:
+
+1. 名片解析功能
+   - POST /business-card-parse: 上传名片图片并解析信息
+   - PUT /business-cards/<id>: 更新名片信息
+   - GET /get-business-cards: 获取所有名片记录
+   - GET /get-business-card/<id>: 获取指定ID的名片记录
+   - PUT /update-business-cards/<id>/status: 更新名片状态
+
+2. 重复记录处理功能(新增)
+   - GET /get-duplicate-records[?status=<status>]: 获取重复记录列表
+   - POST /process-duplicate-record/<id>: 处理重复记录
+   - GET /get-duplicate-record-detail/<id>: 获取重复记录详情
+
+3. 人才标签管理功能
+   - POST /create-talent-tag: 创建人才标签
+   - GET /get-talent-tag-list: 获取人才标签列表
+   - PUT /update-talent-tag/<id>: 更新人才标签
+   - DELETE /delete-talent-tag/<id>: 删除人才标签
+
+4. 人才标签关系管理功能
+   - GET /talent-get-tags/<talent_id>: 获取人才关联的标签
+   - POST /talent-update-tags: 批量更新人才标签关系
+
+5. 知识图谱查询功能
+   - POST /query-kg: 通过自然语言查询图数据库
+
+6. 酒店职位数据管理功能
+   - GET /get-hotel-positions-list: 获取酒店职位列表
+   - POST /add-hotel-positions: 新增酒店职位记录
+   - PUT /update-hotel-positions/<id>: 更新酒店职位记录
+   - GET /query-hotel-positions/<id>: 查询指定职位记录
+   - DELETE /delete-hotel-positions/<id>: 删除职位记录
+
+7. 酒店集团品牌数据管理功能
+   - GET /get-hotel-group-brands-list: 获取酒店集团品牌列表
+   - POST /add-hotel-group-brands: 新增酒店集团品牌记录
+   - PUT /update-hotel-group-brands/<id>: 更新酒店集团品牌记录
+   - GET /query-hotel-group-brands/<id>: 查询指定品牌记录
+   - DELETE /delete-hotel-group-brands/<id>: 删除品牌记录
+
+8. MinIO文件管理功能
+   - GET /business-cards/image/<path>: 获取名片图片
+   - GET /test-minio-connection: 测试MinIO连接
+
+重复记录处理API详细说明:
+═══════════════════════════════
+
+1. 获取重复记录列表
+   GET /get-duplicate-records[?status=<status>]
+   - 查询参数: status (可选): 'pending'/'processed'/'ignored'
+   - 返回: 重复记录列表,包含主记录和疑似重复记录信息
+
+2. 处理重复记录
+   POST /process-duplicate-record/<duplicate_id>
+   - 路径参数: duplicate_id (必填): 重复记录ID
+   - 请求体参数:
+     * action (必填): 'merge_to_suspected'/'keep_main'/'ignore'
+     * selected_duplicate_id (可选): 当action为merge_to_suspected时必填
+     * processed_by (可选): 处理人标识
+     * notes (可选): 处理备注
+   - 处理动作说明:
+     * merge_to_suspected: 合并到选中的疑似重复记录,删除主记录
+     * keep_main: 保留主记录,标记为已处理
+     * ignore: 忽略重复提醒,标记为已处理
+
+3. 获取重复记录详情
+   GET /get-duplicate-record-detail/<duplicate_id>
+   - 路径参数: duplicate_id (必填): 重复记录ID
+   - 返回: 重复记录的详细信息,包含主记录和所有疑似重复记录
+
+业务流程说明:
+1. 上传名片后,系统自动检测重复记录
+2. 如发现重复,创建主记录并生成重复记录条目
+3. 管理员通过API查看待处理的重复记录
+4. 管理员选择处理方式:合并、保留或忽略
+5. 系统根据选择执行相应操作并更新状态
+"""
+
 # Define logger
 logger = logging.getLogger(__name__)
 
@@ -1070,3 +1152,180 @@ def delete_hotel_group_brands_route(brand_id):
             'data': None
         }), 500
 
+
+# ==================================
+# 重复记录处理API接口
+# ==================================
+
+@bp.route('/get-duplicate-records', methods=['GET'])
+def get_duplicate_records_route():
+    """
+    获取重复记录列表的API接口
+    
+    查询参数:
+        - status: 可选,筛选特定状态的记录 ('pending', 'processed', 'ignored')
+    
+    返回:
+        - JSON: 包含重复记录列表和处理状态
+    """
+    try:
+        # 获取查询参数
+        status = request.args.get('status', None)
+        
+        # 验证status参数的有效性
+        if status and status not in ['pending', 'processed', 'ignored']:
+            return jsonify({
+                'success': False,
+                'message': 'status参数无效,必须为 pending、processed 或 ignored',
+                'data': None
+            }), 400
+        
+        # 调用业务逻辑函数获取重复记录列表
+        result = get_duplicate_records(status)
+        
+        # 根据处理结果设置HTTP状态码
+        status_code = 200 if result['success'] else 500
+        
+        return jsonify(result), status_code
+        
+    except Exception as e:
+        # 处理未预期的异常
+        error_msg = f"获取重复记录列表时发生错误: {str(e)}"
+        logger.error(error_msg, exc_info=True)
+        
+        return jsonify({
+            'success': False,
+            'message': error_msg,
+            'data': [],
+            'count': 0
+        }), 500
+
+
+@bp.route('/process-duplicate-record/<int:duplicate_id>', methods=['POST'])
+def process_duplicate_record_route(duplicate_id):
+    """
+    处理重复记录的API接口
+    
+    路径参数:
+        - duplicate_id: 重复记录ID
+    
+    请求参数:
+        - JSON格式,包含以下字段:
+            - action: 处理动作 (必填) ('merge_to_suspected', 'keep_main', 'ignore')
+            - selected_duplicate_id: 当action为'merge_to_suspected'时,选择的疑似重复记录ID (可选)
+            - processed_by: 处理人 (可选)
+            - notes: 处理备注 (可选)
+    
+    返回:
+        - JSON: 包含处理结果和状态信息
+    """
+    try:
+        # 获取请求数据
+        data = request.get_json()
+        
+        if not data:
+            return jsonify({
+                'success': False,
+                'message': '请求数据为空',
+                'data': None
+            }), 400
+        
+        # 验证必填字段
+        action = data.get('action')
+        if not action:
+            return jsonify({
+                'success': False,
+                'message': '缺少必填字段: action',
+                'data': None
+            }), 400
+        
+        # 验证action参数的有效性
+        if action not in ['merge_to_suspected', 'keep_main', 'ignore']:
+            return jsonify({
+                'success': False,
+                'message': 'action参数无效,必须为 merge_to_suspected、keep_main 或 ignore',
+                'data': None
+            }), 400
+        
+        # 提取其他参数
+        selected_duplicate_id = data.get('selected_duplicate_id')
+        processed_by = data.get('processed_by')
+        notes = data.get('notes')
+        
+        # 特殊验证:如果action为merge_to_suspected,必须提供selected_duplicate_id
+        if action == 'merge_to_suspected' and not selected_duplicate_id:
+            return jsonify({
+                'success': False,
+                'message': '执行merge_to_suspected操作时必须提供selected_duplicate_id',
+                'data': None
+            }), 400
+        
+        # 调用业务逻辑函数处理重复记录
+        result = process_duplicate_record(
+            duplicate_id=duplicate_id,
+            action=action,
+            selected_duplicate_id=selected_duplicate_id,
+            processed_by=processed_by,
+            notes=notes
+        )
+        
+        # 根据处理结果设置HTTP状态码
+        if result['code'] == 200:
+            status_code = 200  # OK
+        elif result['code'] == 400:
+            status_code = 400  # Bad Request
+        elif result['code'] == 404:
+            status_code = 404  # Not Found
+        else:
+            status_code = 500  # Internal Server Error
+        
+        return jsonify(result), status_code
+        
+    except Exception as e:
+        # 处理未预期的异常
+        error_msg = f"处理重复记录时发生错误: {str(e)}"
+        logger.error(error_msg, exc_info=True)
+        
+        return jsonify({
+            'success': False,
+            'message': error_msg,
+            'data': None
+        }), 500
+
+
+@bp.route('/get-duplicate-record-detail/<int:duplicate_id>', methods=['GET'])
+def get_duplicate_record_detail_route(duplicate_id):
+    """
+    获取指定重复记录详细信息的API接口
+    
+    路径参数:
+        - duplicate_id: 重复记录ID
+    
+    返回:
+        - JSON: 包含重复记录详细信息
+    """
+    try:
+        # 调用业务逻辑函数获取重复记录详情
+        result = get_duplicate_record_detail(duplicate_id)
+        
+        # 根据处理结果设置HTTP状态码
+        if result['code'] == 200:
+            status_code = 200  # OK
+        elif result['code'] == 404:
+            status_code = 404  # Not Found
+        else:
+            status_code = 500  # Internal Server Error
+        
+        return jsonify(result), status_code
+        
+    except Exception as e:
+        # 处理未预期的异常
+        error_msg = f"获取重复记录详情时发生错误: {str(e)}"
+        logger.error(error_msg, exc_info=True)
+        
+        return jsonify({
+            'success': False,
+            'message': error_msg,
+            'data': None
+        }), 500
+

+ 672 - 56
app/core/data_parse/parse.py

@@ -16,25 +16,42 @@ import base64
 from openai import OpenAI
 from app.config.config import DevelopmentConfig, ProductionConfig
 
+"""
+名片解析功能模块升级说明:
 
-# 测试用的解析数据接口。没有实际使用。      
-def parse_data(data: Dict[str, Any]) -> Dict[str, Any]:
-    """
-    解析数据的主函数
-    
-    Args:
-        data: 要解析的数据
-        
-    Returns:
-        解析后的数据
-    """
-    # TODO: 实现数据解析逻辑
-    return {
-        'code': 200,
-        'status': 'success',
-        'message': 'Data parsed successfully',
-        'data': data
-    } 
+本模块新增了重复记录处理功能,主要包括:
+
+1. 新增数据模型:
+   - DuplicateBusinessCard:用于存储重复记录处理信息
+     * main_card_id: 指向新创建的主记录
+     * suspected_duplicates: JSON格式的疑似重复记录列表
+
+2. 新增功能函数:
+   - check_duplicate_business_card():检查是否存在重复记录
+   - update_career_path():更新职业轨迹信息
+   - create_main_card_with_duplicates():创建主记录并保存疑似重复信息
+   - get_duplicate_records():获取重复记录列表
+   - process_duplicate_record():处理重复记录
+   - get_duplicate_record_detail():获取重复记录详情
+
+3. 重复记录处理逻辑:
+   - 基于中文姓名和手机号码进行重复检查
+   - 如果姓名和手机号码都相同:自动更新现有记录并添加职业轨迹
+   - 如果姓名相同但手机号码不同或缺失:创建新记录作为主记录,疑似重复记录保存为JSON列表
+
+4. 处理状态管理:
+   - pending:待处理
+   - processed:已处理
+   - ignored:已忽略
+
+5. 手动处理选项:
+   - merge_to_suspected:合并到选中的疑似重复记录,删除主记录
+   - keep_main:保留主记录,标记为已处理
+   - ignore:忽略重复记录提醒
+
+升级后的process_business_card()函数会自动应用重复记录检查逻辑。
+新逻辑优势:一个新记录可能与多条现有记录重复,统一管理更加高效。
+"""
 
 # 名片解析数据模型
 class BusinessCard(db.Model):
@@ -96,8 +113,281 @@ class BusinessCard(db.Model):
         }
 
 
+# 重复名片处理数据模型
+class DuplicateBusinessCard(db.Model):
+    __tablename__ = 'duplicate_business_cards'
+    
+    id = db.Column(db.Integer, primary_key=True, autoincrement=True)
+    main_card_id = db.Column(db.Integer, db.ForeignKey('business_cards.id'), nullable=False)  # 新创建的主记录ID
+    suspected_duplicates = db.Column(db.JSON, nullable=False)  # 疑似重复记录列表,JSON格式
+    duplicate_reason = db.Column(db.String(200), nullable=False)  # 重复原因
+    processing_status = db.Column(db.String(20), default='pending')  # 处理状态:pending/processed/ignored
+    created_at = db.Column(db.DateTime, default=datetime.now, nullable=False)
+    processed_at = db.Column(db.DateTime)  # 处理时间
+    processed_by = db.Column(db.String(50))  # 处理人
+    processing_notes = db.Column(db.Text)  # 处理备注
+    
+    # 关联主记录
+    main_card = db.relationship('BusinessCard', backref=db.backref('as_main_duplicate_records', lazy=True))
+    
+    def to_dict(self):
+        return {
+            'id': self.id,
+            'main_card_id': self.main_card_id,
+            'suspected_duplicates': self.suspected_duplicates,
+            'duplicate_reason': self.duplicate_reason,
+            'processing_status': self.processing_status,
+            'created_at': self.created_at.strftime('%Y-%m-%d %H:%M:%S') if self.created_at else None,
+            'processed_at': self.processed_at.strftime('%Y-%m-%d %H:%M:%S') if self.processed_at else None,
+            'processed_by': self.processed_by,
+            'processing_notes': self.processing_notes
+        }
+
+
 # 名片解析功能模块
 
+def check_duplicate_business_card(extracted_data):
+    """
+    检查是否存在重复的名片记录
+    
+    Args:
+        extracted_data (dict): 提取的名片信息
+        
+    Returns:
+        dict: 包含检查结果的字典,格式为:
+            {
+                'is_duplicate': bool,
+                'action': str,  # 'update', 'create_with_duplicates' 或 'create_new'
+                'existing_card': BusinessCard 或 None,
+                'suspected_duplicates': list,  # 疑似重复记录列表
+                'reason': str
+            }
+    """
+    try:
+        # 获取提取的中文姓名和手机号码
+        name_zh = extracted_data.get('name_zh', '').strip()
+        mobile = extracted_data.get('mobile', '').strip()
+        
+        if not name_zh:
+            return {
+                'is_duplicate': False,
+                'action': 'create_new',
+                'existing_card': None,
+                'suspected_duplicates': [],
+                'reason': '无中文姓名,创建新记录'
+            }
+        
+        # 查找具有相同中文姓名的记录
+        existing_cards = BusinessCard.query.filter_by(name_zh=name_zh).all()
+        
+        if not existing_cards:
+            return {
+                'is_duplicate': False,
+                'action': 'create_new',
+                'existing_card': None,
+                'suspected_duplicates': [],
+                'reason': '未找到同名记录,创建新记录'
+            }
+        
+        # 如果找到同名记录,进一步检查手机号码
+        if mobile:
+            # 有手机号码的情况
+            for existing_card in existing_cards:
+                existing_mobile = existing_card.mobile.strip() if existing_card.mobile else ''
+                
+                if existing_mobile == mobile:
+                    # 手机号码相同,更新现有记录
+                    return {
+                        'is_duplicate': True,
+                        'action': 'update',
+                        'existing_card': existing_card,
+                        'suspected_duplicates': [],
+                        'reason': f'姓名和手机号码均相同:{name_zh} - {mobile}'
+                    }
+            
+            # 有手机号码但与现有记录不匹配,创建新记录并标记疑似重复
+            suspected_list = []
+            for card in existing_cards:
+                suspected_list.append({
+                    'id': card.id,
+                    'name_zh': card.name_zh,
+                    'name_en': card.name_en,
+                    'mobile': card.mobile,
+                    'hotel_zh': card.hotel_zh,
+                    'hotel_en': card.hotel_en,
+                    'title_zh': card.title_zh,
+                    'title_en': card.title_en,
+                    'created_at': card.created_at.strftime('%Y-%m-%d %H:%M:%S') if card.created_at else None
+                })
+            
+            return {
+                'is_duplicate': True,
+                'action': 'create_with_duplicates',
+                'existing_card': None,
+                'suspected_duplicates': suspected_list,
+                'reason': f'姓名相同但手机号码不同:{name_zh},新手机号:{mobile},发现{len(suspected_list)}条疑似重复记录'
+            }
+        else:
+            # 无手机号码的情况,创建新记录并标记疑似重复
+            suspected_list = []
+            for card in existing_cards:
+                suspected_list.append({
+                    'id': card.id,
+                    'name_zh': card.name_zh,
+                    'name_en': card.name_en,
+                    'mobile': card.mobile,
+                    'hotel_zh': card.hotel_zh,
+                    'hotel_en': card.hotel_en,
+                    'title_zh': card.title_zh,
+                    'title_en': card.title_en,
+                    'created_at': card.created_at.strftime('%Y-%m-%d %H:%M:%S') if card.created_at else None
+                })
+            
+            return {
+                'is_duplicate': True,
+                'action': 'create_with_duplicates',
+                'existing_card': None,
+                'suspected_duplicates': suspected_list,
+                'reason': f'姓名相同但新记录无手机号码可比较:{name_zh},发现{len(suspected_list)}条疑似重复记录'
+            }
+            
+    except Exception as e:
+        logging.error(f"检查重复记录时发生错误: {str(e)}", exc_info=True)
+        return {
+            'is_duplicate': False,
+            'action': 'create_new',
+            'existing_card': None,
+            'suspected_duplicates': [],
+            'reason': f'检查过程出错,创建新记录: {str(e)}'
+        }
+
+
+def update_career_path(existing_card, new_data, image_path=None):
+    """
+    更新职业轨迹信息
+    
+    Args:
+        existing_card (BusinessCard): 现有名片记录
+        new_data (dict): 新的名片信息
+        image_path (str, optional): 对应的图片路径
+        
+    Returns:
+        list: 更新后的职业轨迹
+    """
+    try:
+        # 获取现有的职业轨迹
+        career_path = existing_card.career_path if existing_card.career_path else []
+        
+        # 准备新的职业轨迹条目
+        new_entry = {
+            'date': datetime.now().strftime('%Y-%m-%d'),
+            'hotel_zh': new_data.get('hotel_zh', ''),
+            'hotel_en': new_data.get('hotel_en', ''),
+            'title_zh': new_data.get('title_zh', ''),
+            'title_en': new_data.get('title_en', ''),
+            'image_path': image_path or '',  # 添加图片路径
+            'source': 'business_card_update'
+        }
+        
+        # 检查是否已存在相似的条目(避免重复添加)
+        is_duplicate_entry = False
+        for entry in career_path:
+            if (entry.get('hotel_zh') == new_entry['hotel_zh'] and 
+                entry.get('title_zh') == new_entry['title_zh'] and
+                entry.get('date') == new_entry['date']):
+                is_duplicate_entry = True
+                break
+        
+        if not is_duplicate_entry:
+            career_path.append(new_entry)
+            logging.info(f"为名片ID {existing_card.id} 添加了新的职业轨迹条目,包含图片路径: {image_path}")
+        else:
+            logging.info(f"名片ID {existing_card.id} 的职业轨迹条目已存在,跳过添加")
+        
+        return career_path
+        
+    except Exception as e:
+        logging.error(f"更新职业轨迹时发生错误: {str(e)}", exc_info=True)
+        return existing_card.career_path if existing_card.career_path else []
+
+
+def create_main_card_with_duplicates(extracted_data, minio_path, suspected_duplicates, reason):
+    """
+    创建新的主记录并保存疑似重复记录信息
+    
+    Args:
+        extracted_data (dict): 提取的新名片信息
+        minio_path (str): 新图片的MinIO路径
+        suspected_duplicates (list): 疑似重复记录列表
+        reason (str): 重复原因
+        
+    Returns:
+        tuple: (main_card, duplicate_record) 主记录和重复记录信息
+    """
+    try:
+        # 1. 先创建主记录
+        # 准备初始职业轨迹,包含当前名片信息和图片路径
+        initial_career_path = extracted_data.get('career_path', [])
+        if extracted_data.get('hotel_zh') or extracted_data.get('hotel_en') or extracted_data.get('title_zh') or extracted_data.get('title_en'):
+            initial_entry = {
+                'date': datetime.now().strftime('%Y-%m-%d'),
+                'hotel_zh': extracted_data.get('hotel_zh', ''),
+                'hotel_en': extracted_data.get('hotel_en', ''),
+                'title_zh': extracted_data.get('title_zh', ''),
+                'title_en': extracted_data.get('title_en', ''),
+                'image_path': minio_path or '',  # 当前名片的图片路径
+                'source': 'business_card_creation'
+            }
+            initial_career_path.append(initial_entry)
+        
+        main_card = BusinessCard(
+            name_zh=extracted_data.get('name_zh', ''),
+            name_en=extracted_data.get('name_en', ''),
+            title_zh=extracted_data.get('title_zh', ''),
+            title_en=extracted_data.get('title_en', ''),
+            mobile=extracted_data.get('mobile', ''),
+            phone=extracted_data.get('phone', ''),
+            email=extracted_data.get('email', ''),
+            hotel_zh=extracted_data.get('hotel_zh', ''),
+            hotel_en=extracted_data.get('hotel_en', ''),
+            address_zh=extracted_data.get('address_zh', ''),
+            address_en=extracted_data.get('address_en', ''),
+            postal_code_zh=extracted_data.get('postal_code_zh', ''),
+            postal_code_en=extracted_data.get('postal_code_en', ''),
+            brand_zh=extracted_data.get('brand_zh', ''),
+            brand_en=extracted_data.get('brand_en', ''),
+            affiliation_zh=extracted_data.get('affiliation_zh', ''),
+            affiliation_en=extracted_data.get('affiliation_en', ''),
+            image_path=minio_path,  # 最新的图片路径
+            career_path=initial_career_path,  # 包含图片路径的职业轨迹
+            brand_group=extracted_data.get('brand_group', ''),
+            status='active',
+            updated_by='system'
+        )
+        
+        db.session.add(main_card)
+        db.session.flush()  # 获取主记录的ID
+        
+        # 2. 创建重复记录信息
+        duplicate_record = DuplicateBusinessCard(
+            main_card_id=main_card.id,
+            suspected_duplicates=suspected_duplicates,
+            duplicate_reason=reason,
+            processing_status='pending'
+        )
+        
+        db.session.add(duplicate_record)
+        db.session.commit()
+        
+        logging.info(f"已创建主记录(ID: {main_card.id})并保存{len(suspected_duplicates)}条疑似重复记录信息(重复记录ID: {duplicate_record.id})")
+        return main_card, duplicate_record
+        
+    except Exception as e:
+        db.session.rollback()
+        logging.error(f"创建主记录和重复记录信息失败: {str(e)}", exc_info=True)
+        raise e
+
+
 # DeepSeek API配置
 DEEPSEEK_API_KEY = os.environ.get('DEEPSEEK_API_KEY', 'sk-2aea6e8b159b448aa3c1e29acd6f4349')
 DEEPSEEK_API_URL = os.environ.get('DEEPSEEK_API_URL', 'https://api.deepseek.com/v1/chat/completions')
@@ -629,6 +919,20 @@ def process_business_card(image_file):
                 'data': None
             }
         
+        # 检查重复记录
+        try:
+            duplicate_check = check_duplicate_business_card(extracted_data)
+            logging.info(f"重复记录检查结果: {duplicate_check['reason']}")
+        except Exception as e:
+            logging.error(f"重复记录检查失败: {str(e)}", exc_info=True)
+            # 如果检查失败,默认创建新记录
+            duplicate_check = {
+                'is_duplicate': False,
+                'action': 'create_new',
+                'existing_card': None,
+                'reason': f'重复检查失败,创建新记录: {str(e)}'
+            }
+        
         try:
             # 生成唯一的文件名
             file_ext = os.path.splitext(image_file.filename)[1].lower()
@@ -663,43 +967,120 @@ def process_business_card(image_file):
             minio_path = None
         
         try:
-            # 保存到数据库
-            business_card = BusinessCard(
-                name_zh=extracted_data.get('name_zh', ''),
-                name_en=extracted_data.get('name_en', ''),
-                title_zh=extracted_data.get('title_zh', ''),
-                title_en=extracted_data.get('title_en', ''),
-                mobile=extracted_data.get('mobile', ''),
-                phone=extracted_data.get('phone', ''),
-                email=extracted_data.get('email', ''),
-                hotel_zh=extracted_data.get('hotel_zh', ''),
-                hotel_en=extracted_data.get('hotel_en', ''),
-                address_zh=extracted_data.get('address_zh', ''),
-                address_en=extracted_data.get('address_en', ''),
-                postal_code_zh=extracted_data.get('postal_code_zh', ''),
-                postal_code_en=extracted_data.get('postal_code_en', ''),
-                brand_zh=extracted_data.get('brand_zh', ''),
-                brand_en=extracted_data.get('brand_en', ''),
-                affiliation_zh=extracted_data.get('affiliation_zh', ''),
-                affiliation_en=extracted_data.get('affiliation_en', ''),
-                image_path=minio_path,  # 存储相对路径
-                career_path=extracted_data.get('career_path', []),  # 添加职业轨迹
-                brand_group=extracted_data.get('brand_group', ''),  # 添加品牌组合
-                status='active',
-                updated_by='system'
-            )
-            
-            db.session.add(business_card)
-            db.session.commit()
-            
-            logging.info(f"名片信息已保存到数据库,ID: {business_card.id}")
-            
-            return {
-                'code': 200,
-                'success': True,
-                'message': '名片解析成功',
-                'data': business_card.to_dict()
-            }
+            # 根据重复检查结果执行不同操作
+            if duplicate_check['action'] == 'update':
+                # 更新现有记录
+                existing_card = duplicate_check['existing_card']
+                
+                # 更新基本信息
+                existing_card.name_en = extracted_data.get('name_en', existing_card.name_en)
+                existing_card.title_zh = extracted_data.get('title_zh', existing_card.title_zh)
+                existing_card.title_en = extracted_data.get('title_en', existing_card.title_en)
+                existing_card.phone = extracted_data.get('phone', existing_card.phone)
+                existing_card.email = extracted_data.get('email', existing_card.email)
+                existing_card.hotel_zh = extracted_data.get('hotel_zh', existing_card.hotel_zh)
+                existing_card.hotel_en = extracted_data.get('hotel_en', existing_card.hotel_en)
+                existing_card.address_zh = extracted_data.get('address_zh', existing_card.address_zh)
+                existing_card.address_en = extracted_data.get('address_en', existing_card.address_en)
+                existing_card.postal_code_zh = extracted_data.get('postal_code_zh', existing_card.postal_code_zh)
+                existing_card.postal_code_en = extracted_data.get('postal_code_en', existing_card.postal_code_en)
+                existing_card.brand_zh = extracted_data.get('brand_zh', existing_card.brand_zh)
+                existing_card.brand_en = extracted_data.get('brand_en', existing_card.brand_en)
+                existing_card.affiliation_zh = extracted_data.get('affiliation_zh', existing_card.affiliation_zh)
+                existing_card.affiliation_en = extracted_data.get('affiliation_en', existing_card.affiliation_en)
+                existing_card.brand_group = extracted_data.get('brand_group', existing_card.brand_group)
+                existing_card.image_path = minio_path  # 更新为最新的图片路径
+                existing_card.updated_by = 'system'
+                
+                # 更新职业轨迹,传递图片路径
+                existing_card.career_path = update_career_path(existing_card, extracted_data, minio_path)
+                
+                db.session.commit()
+                
+                logging.info(f"已更新现有名片记录,ID: {existing_card.id}")
+                
+                return {
+                    'code': 200,
+                    'success': True,
+                    'message': f'名片解析成功,已更新现有记录。{duplicate_check["reason"]}',
+                    'data': existing_card.to_dict()
+                }
+                
+            elif duplicate_check['action'] == 'create_with_duplicates':
+                # 创建新记录作为主记录,并保存疑似重复记录信息
+                main_card, duplicate_record = create_main_card_with_duplicates(
+                    extracted_data, 
+                    minio_path, 
+                    duplicate_check['suspected_duplicates'],
+                    duplicate_check['reason']
+                )
+                
+                return {
+                    'code': 202,  # Accepted,表示已接受但需要进一步处理
+                    'success': True,
+                    'message': f'创建新记录成功,发现疑似重复记录待处理。{duplicate_check["reason"]}',
+                    'data': {
+                        'main_card': main_card.to_dict(),
+                        'duplicate_record_id': duplicate_record.id,
+                        'suspected_duplicates_count': len(duplicate_check['suspected_duplicates']),
+                        'processing_status': 'pending',
+                        'duplicate_reason': duplicate_record.duplicate_reason,
+                        'created_at': duplicate_record.created_at.strftime('%Y-%m-%d %H:%M:%S')
+                    }
+                }
+                
+            else:
+                # 创建新记录
+                # 准备初始职业轨迹,包含当前名片信息和图片路径
+                initial_career_path = extracted_data.get('career_path', [])
+                if extracted_data.get('hotel_zh') or extracted_data.get('hotel_en') or extracted_data.get('title_zh') or extracted_data.get('title_en'):
+                    initial_entry = {
+                        'date': datetime.now().strftime('%Y-%m-%d'),
+                        'hotel_zh': extracted_data.get('hotel_zh', ''),
+                        'hotel_en': extracted_data.get('hotel_en', ''),
+                        'title_zh': extracted_data.get('title_zh', ''),
+                        'title_en': extracted_data.get('title_en', ''),
+                        'image_path': minio_path or '',  # 当前名片的图片路径
+                        'source': 'business_card_creation'
+                    }
+                    initial_career_path.append(initial_entry)
+                
+                business_card = BusinessCard(
+                    name_zh=extracted_data.get('name_zh', ''),
+                    name_en=extracted_data.get('name_en', ''),
+                    title_zh=extracted_data.get('title_zh', ''),
+                    title_en=extracted_data.get('title_en', ''),
+                    mobile=extracted_data.get('mobile', ''),
+                    phone=extracted_data.get('phone', ''),
+                    email=extracted_data.get('email', ''),
+                    hotel_zh=extracted_data.get('hotel_zh', ''),
+                    hotel_en=extracted_data.get('hotel_en', ''),
+                    address_zh=extracted_data.get('address_zh', ''),
+                    address_en=extracted_data.get('address_en', ''),
+                    postal_code_zh=extracted_data.get('postal_code_zh', ''),
+                    postal_code_en=extracted_data.get('postal_code_en', ''),
+                    brand_zh=extracted_data.get('brand_zh', ''),
+                    brand_en=extracted_data.get('brand_en', ''),
+                    affiliation_zh=extracted_data.get('affiliation_zh', ''),
+                    affiliation_en=extracted_data.get('affiliation_en', ''),
+                    image_path=minio_path,  # 最新的图片路径
+                    career_path=initial_career_path,  # 包含图片路径的职业轨迹
+                    brand_group=extracted_data.get('brand_group', ''),  # 添加品牌组合
+                    status='active',
+                    updated_by='system'
+                )
+                
+                db.session.add(business_card)
+                db.session.commit()
+                
+                logging.info(f"名片信息已保存到数据库,ID: {business_card.id}")
+                
+                return {
+                    'code': 200,
+                    'success': True,
+                    'message': f'名片解析成功。{duplicate_check["reason"]}',
+                    'data': business_card.to_dict()
+                }
         except Exception as e:
             db.session.rollback()
             error_msg = f"保存名片信息到数据库失败: {str(e)}"
@@ -730,7 +1111,7 @@ def process_business_card(image_file):
                     'affiliation_zh': extracted_data.get('affiliation_zh', ''),
                     'affiliation_en': extracted_data.get('affiliation_en', ''),
                     'image_path': minio_path,  # 返回相对路径
-                    'career_path': extracted_data.get('career_path', []),  # 添加职业轨迹
+                    'career_path': initial_career_path,  # 包含图片路径的职业轨迹
                     'brand_group': extracted_data.get('brand_group', ''),  # 添加品牌组合
                     'created_at': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                     'updated_at': None,
@@ -2569,6 +2950,241 @@ def delete_hotel_group_brands(brand_id):
         error_msg = f"删除品牌记录失败: {str(e)}"
         logging.error(error_msg, exc_info=True)
         
+        return {
+            'code': 500,
+            'success': False,
+            'message': error_msg,
+            'data': None
+        }
+
+def get_duplicate_records(status=None):
+    """
+    获取重复记录列表
+    
+    Args:
+        status (str, optional): 筛选特定状态的记录 ('pending', 'processed', 'ignored')
+        
+    Returns:
+        dict: 包含操作结果和重复记录列表
+    """
+    try:
+        # 构建查询
+        query = DuplicateBusinessCard.query
+        if status:
+            query = query.filter_by(processing_status=status)
+        
+        # 按创建时间倒序排列
+        duplicate_records = query.order_by(DuplicateBusinessCard.created_at.desc()).all()
+        
+        # 获取详细信息,包括主记录
+        records_data = []
+        for record in duplicate_records:
+            record_dict = record.to_dict()
+            # 添加主记录信息
+            if record.main_card:
+                record_dict['main_card'] = record.main_card.to_dict()
+            records_data.append(record_dict)
+        
+        return {
+            'code': 200,
+            'success': True,
+            'message': '获取重复记录列表成功',
+            'data': records_data,
+            'count': len(records_data)
+        }
+    
+    except Exception as e:
+        error_msg = f"获取重复记录列表失败: {str(e)}"
+        logging.error(error_msg, exc_info=True)
+        
+        return {
+            'code': 500,
+            'success': False,
+            'message': error_msg,
+            'data': [],
+            'count': 0
+        }
+
+
+def process_duplicate_record(duplicate_id, action, selected_duplicate_id=None, processed_by=None, notes=None):
+    """
+    处理重复记录
+    
+    Args:
+        duplicate_id (int): 重复记录ID
+        action (str): 处理动作 ('merge_to_suspected', 'keep_main', 'ignore')
+        selected_duplicate_id (int, optional): 当action为'merge_to_suspected'时,选择的疑似重复记录ID
+        processed_by (str, optional): 处理人
+        notes (str, optional): 处理备注
+        
+    Returns:
+        dict: 包含操作结果
+    """
+    try:
+        # 查找重复记录
+        duplicate_record = DuplicateBusinessCard.query.get(duplicate_id)
+        if not duplicate_record:
+            return {
+                'code': 404,
+                'success': False,
+                'message': f'未找到ID为{duplicate_id}的重复记录',
+                'data': None
+            }
+        
+        if duplicate_record.processing_status != 'pending':
+            return {
+                'code': 400,
+                'success': False,
+                'message': f'重复记录状态为{duplicate_record.processing_status},无法处理',
+                'data': None
+            }
+        
+        main_card = duplicate_record.main_card
+        if not main_card:
+            return {
+                'code': 404,
+                'success': False,
+                'message': '未找到对应的主记录',
+                'data': None
+            }
+        
+        result_data = None
+        
+        if action == 'merge_to_suspected':
+            # 合并到选中的疑似重复记录
+            if not selected_duplicate_id:
+                return {
+                    'code': 400,
+                    'success': False,
+                    'message': '执行合并操作时必须提供selected_duplicate_id',
+                    'data': None
+                }
+            
+            # 查找选中的疑似重复记录
+            target_card = BusinessCard.query.get(selected_duplicate_id)
+            if not target_card:
+                return {
+                    'code': 404,
+                    'success': False,
+                    'message': f'未找到ID为{selected_duplicate_id}的目标记录',
+                    'data': None
+                }
+            
+            # 将主记录的信息合并到目标记录,并更新职业轨迹
+            target_card.name_en = main_card.name_en or target_card.name_en
+            target_card.title_zh = main_card.title_zh or target_card.title_zh
+            target_card.title_en = main_card.title_en or target_card.title_en
+            target_card.mobile = main_card.mobile or target_card.mobile
+            target_card.phone = main_card.phone or target_card.phone
+            target_card.email = main_card.email or target_card.email
+            target_card.hotel_zh = main_card.hotel_zh or target_card.hotel_zh
+            target_card.hotel_en = main_card.hotel_en or target_card.hotel_en
+            target_card.address_zh = main_card.address_zh or target_card.address_zh
+            target_card.address_en = main_card.address_en or target_card.address_en
+            target_card.postal_code_zh = main_card.postal_code_zh or target_card.postal_code_zh
+            target_card.postal_code_en = main_card.postal_code_en or target_card.postal_code_en
+            target_card.brand_zh = main_card.brand_zh or target_card.brand_zh
+            target_card.brand_en = main_card.brand_en or target_card.brand_en
+            target_card.affiliation_zh = main_card.affiliation_zh or target_card.affiliation_zh
+            target_card.affiliation_en = main_card.affiliation_en or target_card.affiliation_en
+            target_card.brand_group = main_card.brand_group or target_card.brand_group
+            target_card.image_path = main_card.image_path  # 更新为最新的MinIO图片路径
+            target_card.updated_by = processed_by or 'system'
+            
+            # 更新职业轨迹,使用主记录的图片路径
+            new_data = {
+                'hotel_zh': main_card.hotel_zh,
+                'hotel_en': main_card.hotel_en,
+                'title_zh': main_card.title_zh,
+                'title_en': main_card.title_en
+            }
+            target_card.career_path = update_career_path(target_card, new_data, main_card.image_path)
+            
+            # 删除主记录
+            db.session.delete(main_card)
+            
+            result_data = target_card.to_dict()
+            
+        elif action == 'keep_main':
+            # 保留主记录,不做任何合并
+            result_data = main_card.to_dict()
+            
+        elif action == 'ignore':
+            # 忽略,不做任何操作
+            result_data = main_card.to_dict()
+        
+        # 更新重复记录状态
+        duplicate_record.processing_status = 'processed'
+        duplicate_record.processed_at = datetime.now()
+        duplicate_record.processed_by = processed_by or 'system'
+        duplicate_record.processing_notes = notes or f'执行操作: {action}'
+        
+        db.session.commit()
+        
+        logging.info(f"成功处理重复记录,ID: {duplicate_id},操作: {action}")
+        
+        return {
+            'code': 200,
+            'success': True,
+            'message': f'重复记录处理成功,操作: {action}',
+            'data': {
+                'duplicate_record': duplicate_record.to_dict(),
+                'result': result_data
+            }
+        }
+    
+    except Exception as e:
+        db.session.rollback()
+        error_msg = f"处理重复记录失败: {str(e)}"
+        logging.error(error_msg, exc_info=True)
+        
+        return {
+            'code': 500,
+            'success': False,
+            'message': error_msg,
+            'data': None
+        }
+
+
+def get_duplicate_record_detail(duplicate_id):
+    """
+    获取指定重复记录的详细信息
+    
+    Args:
+        duplicate_id (int): 重复记录ID
+        
+    Returns:
+        dict: 包含重复记录详细信息
+    """
+    try:
+        # 查找重复记录
+        duplicate_record = DuplicateBusinessCard.query.get(duplicate_id)
+        if not duplicate_record:
+            return {
+                'code': 404,
+                'success': False,
+                'message': f'未找到ID为{duplicate_id}的重复记录',
+                'data': None
+            }
+        
+        # 构建详细信息
+        record_dict = duplicate_record.to_dict()
+        
+        # 添加主记录信息
+        if duplicate_record.main_card:
+            record_dict['main_card'] = duplicate_record.main_card.to_dict()
+        
+        return {
+            'code': 200,
+            'success': True,
+            'message': '获取重复记录详情成功',
+            'data': record_dict
+        }
+    
+    except Exception as e:
+        error_msg = f"获取重复记录详情失败: {str(e)}"
+        logging.error(error_msg, exc_info=True)
+        
         return {
             'code': 500,
             'success': False,