# DataOps Platform - Business Rules & Validation Standards ## Overview This document defines the core business rules, validation standards, and processing workflows for the DataOps platform. These rules ensure data integrity, consistent API behavior, and reliable business logic execution. ## 1. Data Validation Rules ### 1.1 Talent Data Validation (Business Cards) **Rule ID**: `TALENT_VALIDATION_001` #### Required Fields - `name_zh` (Chinese name) - MANDATORY - Must be non-empty string - Maximum length: 100 characters #### Recommended Fields - `mobile` - Mobile phone number - `title_zh` - Chinese job title - `hotel_zh` - Chinese hotel name #### Format Validation Rules ```python # Mobile phone validation - Remove all non-digit characters for validation - Must match pattern: ^1[3-9]\d{9}$ (Chinese mobile format) - Invalid format generates WARNING, not ERROR # Email validation - Must match pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ - Invalid format generates ERROR # Array fields validation - affiliation: Must be array type if present - career_path: Must be array type if present ``` ### 1.2 Parse Task Validation **Rule ID**: `PARSE_TASK_001` #### Task Type Validation ```python ALLOWED_TASK_TYPES = ['名片', '简历', '新任命', '招聘', '杂项'] ``` #### File Upload Rules - **招聘 (Recruitment)** tasks: NO files required, data parameter mandatory - **Other task types**: Files array mandatory and non-empty - File format validation based on task type - Maximum file size and allowed extensions per BaseConfig.ALLOWED_EXTENSIONS #### Parameter Requirements - `task_type`: Required, must be from ALLOWED_TASK_TYPES - `created_by`: Optional, defaults to 'system' - `files`: Required for non-recruitment tasks - `data`: Required for recruitment tasks - `publish_time`: Required for 新任命 (appointment) tasks ## 2. API Response Standards ### 2.1 Standard Response Format **Rule ID**: `API_RESPONSE_001` All API responses MUST follow this structure: ```json { "success": boolean, "message": string, "data": any, "code": number (optional) } ``` #### Success Response Example ```json { "success": true, "message": "操作成功", "data": { ... } } ``` #### Error Response Example ```json { "success": false, "message": "详细错误描述", "data": null, "code": 400 } ``` ### 2.2 HTTP Status Code Rules **Rule ID**: `API_STATUS_001` - `200`: Successful operation - `400`: Bad request (validation errors, missing parameters) - `404`: Resource not found - `500`: Internal server error ### 2.3 Content-Type Headers **Rule ID**: `API_HEADERS_001` - All API responses: `application/json; charset=utf-8` - File downloads: Preserve original content-type - CORS headers automatically configured ## 3. Database Rules ### 3.1 Data Integrity Rules **Rule ID**: `DB_INTEGRITY_001` #### Duplicate Detection - Business cards: Check for duplicates based on name_zh + mobile combination - Create DuplicateBusinessCard record when duplicates detected - Status tracking: 'pending' → 'processed' → 'ignored' #### Timestamp Management ```python # Use East Asia timezone for all timestamps created_at = get_east_asia_time_naive() ``` #### Required Relationships - BusinessCard ↔ ParsedTalent (one-to-many) - DuplicateBusinessCard → BusinessCard (foreign key) ### 3.2 Data Model Rules **Rule ID**: `DB_MODEL_001` #### Field Constraints ```python # String fields name_zh: max_length=100, nullable=False email: max_length=100, nullable=True mobile: max_length=100, nullable=True # JSON fields career_path: JSON format for structured career data origin_source: JSON format for source tracking ``` ## 4. File Processing Rules ### 4.1 File Upload Rules **Rule ID**: `FILE_UPLOAD_001` #### Allowed Extensions ```python ALLOWED_EXTENSIONS = { 'txt', 'pdf', 'png', 'jpg', 'jpeg', 'gif', 'xlsx', 'xls', 'csv', 'sql', 'dll' } ``` #### Storage Rules - Development: Local filesystem (`C:\tmp\upload`, `C:\tmp\archive`) - Production: MinIO object storage - File path tracking in database #### Processing Workflow 1. Validate file extension 2. Upload to storage (MinIO/filesystem) 3. Create database record 4. Process file content (OCR, parsing) 5. Extract structured data 6. Validate extracted data 7. Store in appropriate tables ## 5. Business Logic Rules ### 5.1 Talent Processing Workflow **Rule ID**: `BUSINESS_LOGIC_001` #### Neo4j Graph Processing 1. Create or get talent node 2. Process career path relationships 3. Create WORK_AS, BELONGS_TO, WORK_FOR relationships 4. Maximum traversal depth: 10 levels 5. Duplicate node prevention #### Data Enrichment - Automatic brand group mapping - Hotel position standardization - Career path timeline construction ### 5.2 Query Processing Rules **Rule ID**: `BUSINESS_LOGIC_002` #### Graph Query Optimization ```python # Use recursive traversal for label-based queries # Pattern: (start_node)-[*1..10]->(end_node) # Stop conditions: No outgoing relationships OR Talent node reached ``` ## 6. Security Rules ### 6.1 Input Validation **Rule ID**: `SECURITY_001` #### Sanitization Requirements - All user inputs MUST be validated - SQL injection prevention through SQLAlchemy ORM - XSS prevention through proper encoding - File upload validation (extension, size, content-type) #### Authentication & Authorization - Environment variables for sensitive data - API key validation for external services - CORS configuration for cross-origin requests ### 6.2 Error Handling **Rule ID**: `SECURITY_002` #### Information Disclosure Prevention - Generic error messages for production - Detailed logging for debugging - No sensitive data in error responses - Stack traces only in development mode ## 7. Configuration Rules ### 7.1 Environment-Specific Rules **Rule ID**: `CONFIG_001` #### Development Environment - Debug mode: ON - Detailed logging: ON - Local database connections - Console logging: ON #### Production Environment - Debug mode: OFF - Info-level logging only - Remote database connections - File logging only - Security headers enforced ### 7.2 Service Integration Rules **Rule ID**: `CONFIG_002` #### External Service Configuration ```python # LLM Services (Qwen API) - API key from environment variables - Fallback to default for development - Rate limiting and retry logic # Database Services - Connection pooling enabled - Health check (pool_pre_ping: True) - Connection recycling (300 seconds) ``` ## 8. Logging & Monitoring Rules ### 8.1 Logging Standards **Rule ID**: `LOGGING_001` #### Log Format ```python LOG_FORMAT = '%(asctime)s - %(levelname)s - %(filename)s - %(funcName)s - %(lineno)s - %(message)s' ``` #### Log Levels - **DEBUG**: Development detailed information - **INFO**: General operational information - **WARNING**: Validation warnings, non-critical issues - **ERROR**: Error conditions, exceptions - **CRITICAL**: System failures #### Log Rotation - Development: Console + file logging - Production: File logging only - UTF-8 encoding for Chinese character support ## 9. Performance Rules ### 9.1 Database Performance **Rule ID**: `PERFORMANCE_001` #### Query Optimization - Use proper indexing for frequently queried fields - Batch processing for large datasets (batch_size: 1000) - Connection pooling (pool_size: 10, max_overflow: 20) #### Caching Strategy - Session-based caching for Neo4j queries - File processing result caching - API response caching for static data ## 10. Compliance & Audit Rules ### 10.1 Data Tracking **Rule ID**: `AUDIT_001` #### Change Tracking - All data modifications logged with timestamp - User attribution for all operations - Source tracking in origin_source field #### Data Retention - Archive processed files - Maintain processing history - Duplicate detection records retention --- ## Rule Enforcement ### Implementation Guidelines 1. **Validation**: Implement validation functions following the patterns in `parse_menduner.py` 2. **Error Handling**: Use standardized error response format 3. **Testing**: Create unit tests for each business rule 4. **Documentation**: Update API documentation when rules change ### Rule Violation Handling - **Critical violations**: Return HTTP 400/500 with detailed error message - **Warning violations**: Log warning, continue processing - **Data quality issues**: Create audit records for manual review ### Review Process - Monthly review of business rules effectiveness - Update rules based on operational feedback - Version control for rule changes - Impact assessment for rule modifications