BUSINESS_RULES.md 8.4 KB

DataOps Platform - Business Rules & Validation Standards

Overview

This document defines the core business rules, validation standards, and processing workflows for the DataOps platform. These rules ensure data integrity, consistent API behavior, and reliable business logic execution.

1. Data Validation Rules

1.1 Talent Data Validation (Business Cards)

Rule ID: TALENT_VALIDATION_001

Required Fields

  • name_zh (Chinese name) - MANDATORY
  • Must be non-empty string
  • Maximum length: 100 characters

Recommended Fields

  • mobile - Mobile phone number
  • title_zh - Chinese job title
  • hotel_zh - Chinese hotel name

Format Validation Rules

# Mobile phone validation
- Remove all non-digit characters for validation
- Must match pattern: ^1[3-9]\d{9}$ (Chinese mobile format)
- Invalid format generates WARNING, not ERROR

# Email validation  
- Must match pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
- Invalid format generates ERROR

# Array fields validation
- affiliation: Must be array type if present
- career_path: Must be array type if present

1.2 Parse Task Validation

Rule ID: PARSE_TASK_001

Task Type Validation

ALLOWED_TASK_TYPES = ['名片', '简历', '新任命', '招聘', '杂项']

File Upload Rules

  • 招聘 (Recruitment) tasks: NO files required, data parameter mandatory
  • Other task types: Files array mandatory and non-empty
  • File format validation based on task type
  • Maximum file size and allowed extensions per BaseConfig.ALLOWED_EXTENSIONS

Parameter Requirements

  • task_type: Required, must be from ALLOWED_TASK_TYPES
  • created_by: Optional, defaults to 'system'
  • files: Required for non-recruitment tasks
  • data: Required for recruitment tasks
  • publish_time: Required for 新任命 (appointment) tasks

2. API Response Standards

2.1 Standard Response Format

Rule ID: API_RESPONSE_001

All API responses MUST follow this structure:

{
    "success": boolean,
    "message": string,
    "data": any,
    "code": number (optional)
}

Success Response Example

{
    "success": true,
    "message": "操作成功",
    "data": { ... }
}

Error Response Example

{
    "success": false,
    "message": "详细错误描述",
    "data": null,
    "code": 400
}

2.2 HTTP Status Code Rules

Rule ID: API_STATUS_001

  • 200: Successful operation
  • 400: Bad request (validation errors, missing parameters)
  • 404: Resource not found
  • 500: Internal server error

2.3 Content-Type Headers

Rule ID: API_HEADERS_001

  • All API responses: application/json; charset=utf-8
  • File downloads: Preserve original content-type
  • CORS headers automatically configured

3. Database Rules

3.1 Data Integrity Rules

Rule ID: DB_INTEGRITY_001

Duplicate Detection

  • Business cards: Check for duplicates based on name_zh + mobile combination
  • Create DuplicateBusinessCard record when duplicates detected
  • Status tracking: 'pending' → 'processed' → 'ignored'

Timestamp Management

# Use East Asia timezone for all timestamps
created_at = get_east_asia_time_naive()

Required Relationships

  • BusinessCard ↔ ParsedTalent (one-to-many)
  • DuplicateBusinessCard → BusinessCard (foreign key)

3.2 Data Model Rules

Rule ID: DB_MODEL_001

Field Constraints

# String fields
name_zh: max_length=100, nullable=False
email: max_length=100, nullable=True
mobile: max_length=100, nullable=True

# JSON fields
career_path: JSON format for structured career data
origin_source: JSON format for source tracking

4. File Processing Rules

4.1 File Upload Rules

Rule ID: FILE_UPLOAD_001

Allowed Extensions

ALLOWED_EXTENSIONS = {
    'txt', 'pdf', 'png', 'jpg', 'jpeg', 'gif', 
    'xlsx', 'xls', 'csv', 'sql', 'dll'
}

Storage Rules

  • Development: Local filesystem (C:\tmp\upload, C:\tmp\archive)
  • Production: MinIO object storage
  • File path tracking in database

Processing Workflow

  1. Validate file extension
  2. Upload to storage (MinIO/filesystem)
  3. Create database record
  4. Process file content (OCR, parsing)
  5. Extract structured data
  6. Validate extracted data
  7. Store in appropriate tables

5. Business Logic Rules

5.1 Talent Processing Workflow

Rule ID: BUSINESS_LOGIC_001

Neo4j Graph Processing

  1. Create or get talent node
  2. Process career path relationships
  3. Create WORK_AS, BELONGS_TO, WORK_FOR relationships
  4. Maximum traversal depth: 10 levels
  5. Duplicate node prevention

Data Enrichment

  • Automatic brand group mapping
  • Hotel position standardization
  • Career path timeline construction

5.2 Query Processing Rules

Rule ID: BUSINESS_LOGIC_002

Graph Query Optimization

# Use recursive traversal for label-based queries
# Pattern: (start_node)-[*1..10]->(end_node)
# Stop conditions: No outgoing relationships OR Talent node reached

6. Security Rules

6.1 Input Validation

Rule ID: SECURITY_001

Sanitization Requirements

  • All user inputs MUST be validated
  • SQL injection prevention through SQLAlchemy ORM
  • XSS prevention through proper encoding
  • File upload validation (extension, size, content-type)

Authentication & Authorization

  • Environment variables for sensitive data
  • API key validation for external services
  • CORS configuration for cross-origin requests

6.2 Error Handling

Rule ID: SECURITY_002

Information Disclosure Prevention

  • Generic error messages for production
  • Detailed logging for debugging
  • No sensitive data in error responses
  • Stack traces only in development mode

7. Configuration Rules

7.1 Environment-Specific Rules

Rule ID: CONFIG_001

Development Environment

  • Debug mode: ON
  • Detailed logging: ON
  • Local database connections
  • Console logging: ON

Production Environment

  • Debug mode: OFF
  • Info-level logging only
  • Remote database connections
  • File logging only
  • Security headers enforced

7.2 Service Integration Rules

Rule ID: CONFIG_002

External Service Configuration

# LLM Services (Qwen API)
- API key from environment variables
- Fallback to default for development
- Rate limiting and retry logic

# Database Services
- Connection pooling enabled
- Health check (pool_pre_ping: True)
- Connection recycling (300 seconds)

8. Logging & Monitoring Rules

8.1 Logging Standards

Rule ID: LOGGING_001

Log Format

LOG_FORMAT = '%(asctime)s - %(levelname)s - %(filename)s - %(funcName)s - %(lineno)s - %(message)s'

Log Levels

  • DEBUG: Development detailed information
  • INFO: General operational information
  • WARNING: Validation warnings, non-critical issues
  • ERROR: Error conditions, exceptions
  • CRITICAL: System failures

Log Rotation

  • Development: Console + file logging
  • Production: File logging only
  • UTF-8 encoding for Chinese character support

9. Performance Rules

9.1 Database Performance

Rule ID: PERFORMANCE_001

Query Optimization

  • Use proper indexing for frequently queried fields
  • Batch processing for large datasets (batch_size: 1000)
  • Connection pooling (pool_size: 10, max_overflow: 20)

Caching Strategy

  • Session-based caching for Neo4j queries
  • File processing result caching
  • API response caching for static data

10. Compliance & Audit Rules

10.1 Data Tracking

Rule ID: AUDIT_001

Change Tracking

  • All data modifications logged with timestamp
  • User attribution for all operations
  • Source tracking in origin_source field

Data Retention

  • Archive processed files
  • Maintain processing history
  • Duplicate detection records retention

Rule Enforcement

Implementation Guidelines

  1. Validation: Implement validation functions following the patterns in parse_menduner.py
  2. Error Handling: Use standardized error response format
  3. Testing: Create unit tests for each business rule
  4. Documentation: Update API documentation when rules change

Rule Violation Handling

  • Critical violations: Return HTTP 400/500 with detailed error message
  • Warning violations: Log warning, continue processing
  • Data quality issues: Create audit records for manual review

Review Process

  • Monthly review of business rules effectiveness
  • Update rules based on operational feedback
  • Version control for rule changes
  • Impact assessment for rule modifications