# DataOps Platform - Business Rules & Validation Standards ## Overview This document defines the core business rules, validation standards, and processing workflows for the DataOps platform. These rules ensure data integrity, consistent API behavior, and reliable business logic execution. ## 1. Data Validation Rules ### 1.1 General Field Validation **Rule ID**: `VALIDATION_001` #### Format Validation Rules ```python # Email validation - Must match pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ - Invalid format generates ERROR # Array fields validation - Must be array type if present ``` ## 2. API Response Standards ### 2.1 Standard Response Format **Rule ID**: `API_RESPONSE_001` All API responses MUST follow this structure: ```json { "success": boolean, "message": string, "data": any, "code": number (optional) } ``` #### Success Response Example ```json { "success": true, "message": "操作成功", "data": { ... } } ``` #### Error Response Example ```json { "success": false, "message": "详细错误描述", "data": null, "code": 400 } ``` ### 2.2 HTTP Status Code Rules **Rule ID**: `API_STATUS_001` - `200`: Successful operation - `400`: Bad request (validation errors, missing parameters) - `404`: Resource not found - `500`: Internal server error ### 2.3 Content-Type Headers **Rule ID**: `API_HEADERS_001` - All API responses: `application/json; charset=utf-8` - File downloads: Preserve original content-type - CORS headers automatically configured ## 3. Database Rules ### 3.1 Data Integrity Rules **Rule ID**: `DB_INTEGRITY_001` #### Timestamp Management ```python # Use East Asia timezone for all timestamps from datetime import datetime import pytz ``` ### 3.2 Data Model Rules **Rule ID**: `DB_MODEL_001` #### Field Constraints ```python # String fields name: max_length=100, nullable=False email: max_length=100, nullable=True # JSON fields - use for structured data ``` ## 4. File Processing Rules ### 4.1 File Upload Rules **Rule ID**: `FILE_UPLOAD_001` #### Allowed Extensions ```python ALLOWED_EXTENSIONS = { 'txt', 'pdf', 'png', 'jpg', 'jpeg', 'gif', 'xlsx', 'xls', 'csv', 'sql', 'dll' } ``` #### Storage Rules - Development: Local filesystem - Production: MinIO object storage - File path tracking in database ## 5. Business Logic Rules ### 5.1 Graph Processing Rules **Rule ID**: `BUSINESS_LOGIC_001` #### Neo4j Graph Processing - Maximum traversal depth: 10 levels - Duplicate node prevention - Proper relationship management ### 5.2 Query Processing Rules **Rule ID**: `BUSINESS_LOGIC_002` #### Graph Query Optimization ```python # Use recursive traversal for label-based queries # Pattern: (start_node)-[*1..10]->(end_node) ``` ## 6. Security Rules ### 6.1 Input Validation **Rule ID**: `SECURITY_001` #### Sanitization Requirements - All user inputs MUST be validated - SQL injection prevention through SQLAlchemy ORM - XSS prevention through proper encoding - File upload validation (extension, size, content-type) #### Authentication & Authorization - Environment variables for sensitive data - API key validation for external services - CORS configuration for cross-origin requests ### 6.2 Error Handling **Rule ID**: `SECURITY_002` #### Information Disclosure Prevention - Generic error messages for production - Detailed logging for debugging - No sensitive data in error responses - Stack traces only in development mode ## 7. Configuration Rules ### 7.1 Environment-Specific Rules **Rule ID**: `CONFIG_001` #### Development Environment - Debug mode: ON - Detailed logging: ON - Local database connections - Console logging: ON #### Production Environment - Debug mode: OFF - Info-level logging only - Remote database connections - File logging only - Security headers enforced ### 7.2 Service Integration Rules **Rule ID**: `CONFIG_002` #### External Service Configuration ```python # LLM Services (Qwen API) - API key from environment variables - Fallback to default for development - Rate limiting and retry logic # Database Services - Connection pooling enabled - Health check (pool_pre_ping: True) - Connection recycling (300 seconds) ``` ## 8. Logging & Monitoring Rules ### 8.1 Logging Standards **Rule ID**: `LOGGING_001` #### Log Format ```python LOG_FORMAT = '%(asctime)s - %(levelname)s - %(filename)s - %(funcName)s - %(lineno)s - %(message)s' ``` #### Log Levels - **DEBUG**: Development detailed information - **INFO**: General operational information - **WARNING**: Validation warnings, non-critical issues - **ERROR**: Error conditions, exceptions - **CRITICAL**: System failures #### Log Rotation - Development: Console + file logging - Production: File logging only - UTF-8 encoding for Chinese character support ## 9. Performance Rules ### 9.1 Database Performance **Rule ID**: `PERFORMANCE_001` #### Query Optimization - Use proper indexing for frequently queried fields - Batch processing for large datasets (batch_size: 1000) - Connection pooling (pool_size: 10, max_overflow: 20) #### Caching Strategy - Session-based caching for Neo4j queries - API response caching for static data ## 10. Compliance & Audit Rules ### 10.1 Data Tracking **Rule ID**: `AUDIT_001` #### Change Tracking - All data modifications logged with timestamp - User attribution for all operations #### Data Retention - Archive processed files - Maintain processing history --- ## Rule Enforcement ### Implementation Guidelines 1. **Validation**: Implement validation functions 2. **Error Handling**: Use standardized error response format 3. **Testing**: Create unit tests for each business rule 4. **Documentation**: Update API documentation when rules change ### Rule Violation Handling - **Critical violations**: Return HTTP 400/500 with detailed error message - **Warning violations**: Log warning, continue processing - **Data quality issues**: Create audit records for manual review ### Review Process - Monthly review of business rules effectiveness - Update rules based on operational feedback - Version control for rule changes - Impact assessment for rule modifications