mxl_citu
/
DataOps-platform


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168
							# DataOps Platform - Cursor Editor Rules

## Project Overview
This is a Flask-based DataOps platform for data management, processing, and analytics.
The platform integrates with Neo4j graph database for relationship management and supports n8n workflow automation.

---

# Python 编码规范

## 代码风格
- 使用 Ruff 进行代码检查和格式化（替代 Black + Flake8 + isort）
- 使用 Pyright 进行类型检查（替代 MyPy）
- 行长度限制：88 字符
- 使用双引号作为字符串默认引号
- 使用 4 空格缩进，不使用制表符

## 类型注解
- 所有函数必须包含类型注解（参数和返回值）
- 使用 Python 3.8+ 兼容的类型语法
- 对于 Python 3.9+ 的特性（如 `list[str]`），需要 `from __future__ import annotations`
- 复杂类型使用 `typing` 模块

## 导入规范
- 按照标准库、第三方库、本地导入的顺序组织
- 使用绝对导入，避免相对导入
- 每个导入单独一行
- 常用工具函数（如 `create_or_get_talent_node`）应在文件顶部导入

## 命名规范
- 类名使用 PascalCase
- 函数和变量使用 snake_case
- 常量使用 UPPER_SNAKE_CASE
- 私有成员使用单下划线前缀

## 文档字符串
- 所有公共函数、类和模块必须包含 docstring
- 使用 Google 风格的 docstring
- 包含参数说明、返回值说明和异常说明（如适用）

## 错误处理
- 使用具体的异常类型，避免裸露的 `except:`
- 优先使用上下文管理器（`with` 语句）
- 使用 Loguru 记录异常信息用于调试

## 日志规范
- 使用 Loguru 进行日志记录
- 日志级别：DEBUG（调试）、INFO（信息）、WARNING（警告）、ERROR（错误）
- 避免在生产环境使用 print() 语句

## 代码质量
- 避免使用 `type: ignore`，除非绝对必要并添加说明
- 函数保持简短（建议不超过 50 行）
- 避免深层嵌套（最多 3 层）
- 使用列表推导式和生成器表达式（但保持可读性）

## 示例

```python
from __future__ import annotations

from typing import Optional

from loguru import logger


def process_data(
    items: list[str],
    max_length: int = 100,
    strict: bool = False,
) -> dict[str, int]:
    """
    Process a list of items and return statistics.

    Args:
        items: List of strings to process.
        max_length: Maximum allowed length for items.
        strict: Whether to raise error on invalid items.

    Returns:
        Dictionary containing processing statistics.

    Raises:
        ValueError: If strict mode and invalid item found.
    """
    result: dict[str, int] = {}
    try:
        # Implementation here
        logger.info(f"Processing {len(items)} items")
    except ValueError as e:
        logger.error(f"Processing failed: {e}")
        raise
    return result
```

---

## Architecture
- Flask application with modular structure
- SQLAlchemy for PostgreSQL database operations
- Neo4j for graph database and relationship management
- RESTful API design with Blueprint-based routing
- Configuration-based environment management
- n8n workflow integration via MCP servers

## File Organization
- `app/` - Main application code
  - `app/api/` - API endpoints and routes (Blueprint-based)
  - `app/core/` - Core business logic and domain services
  - `app/models/` - SQLAlchemy database models
  - `app/services/` - Shared services (Neo4j driver, utilities)
  - `app/config/` - Configuration files
  - `app/scripts/` - Database initialization scripts
- `database/` - SQL scripts and migrations
- `docs/` - Documentation
- `tests/` - Test files
- `scripts/` - Automation scripts
- `mcp-servers/` - MCP server implementations (e.g., task-manager)
- `logs/` - Application logs

## Dependencies
- Python >= 3.8
- Flask >= 2.3.0
- Flask-SQLAlchemy >= 3.1.0
- SQLAlchemy >= 2.0.0
- Neo4j Python Driver (for graph database)
- PostgreSQL (via psycopg2-binary)
- Loguru (for logging)
- Pandas & NumPy (for data processing)

## Development Tools
- Ruff (linting & formatting, replaces Black + Flake8 + isort)
- Pyright (type checking, replaces MyPy)
- Pytest (testing)

## Development Guidelines
- Always use virtual environment
- Test API endpoints before committing
- Update documentation for API changes
- Use Loguru for logging, avoid print() statements
- Handle errors gracefully with proper logging

## API Conventions
- Use snake_case for Python functions and variables
- Use kebab-case for API endpoints
- Return consistent JSON responses with `code`, `message`, `data` structure
- Include proper HTTP status codes
- Validate input data

## Database
- PostgreSQL for relational data
- Neo4j for graph data and relationships
- Use Flask-Migrate/Alembic for schema migrations
- Follow naming conventions for tables and columns
- Implement proper indexing
- Use transactions for data consistency

## Neo4j Graph Database
- Use `Neo4jDriverSingleton` for connection management
- Follow Cypher query best practices
- Use parameterized queries to prevent injection
- Close sessions properly after use

## Security
- Validate all user inputs
- Use environment variables for sensitive data (see `env.example`)
- Implement proper authentication
- Use parameterized queries for both SQL and Cypher