Vector恢复备份API提供了完整的pgvector表数据恢复功能,包括备份文件列表查询和数据恢复操作。这套API与现有的备份API形成完整的数据管理解决方案。
服务运行: 确保统一API服务正在运行
python unified_api.py
数据库连接: 确保pgvector数据库连接正常
备份文件: 确保存在可用的备份文件(通过备份API创建)
GET /api/v0/data_pipeline/vector/restore/list
参数名 | 类型 | 必填 | 默认值 | 说明 |
---|---|---|---|---|
global_only |
boolean | 否 | false | 仅查询全局备份(training_data/vector_bak/ 目录) |
task_id |
string | 否 | - | 查询指定任务的备份文件 |
curl "http://localhost:8084/api/v0/data_pipeline/vector/restore/list"
curl "http://localhost:8084/api/v0/data_pipeline/vector/restore/list?global_only=true"
curl "http://localhost:8084/api/v0/data_pipeline/vector/restore/list?task_id=task_20250721_213627"
{
"code": 200,
"success": true,
"message": "操作成功",
"data": {
"response": "成功扫描到 6 个备份位置,共 6 个备份集",
"backup_locations": [
{
"type": "global",
"relative_path": "./data_pipeline/training_data/vector_bak",
"backups": [
{
"timestamp": "20250722_010318",
"collection_file": "langchain_pg_collection_20250722_010318.csv",
"embedding_file": "langchain_pg_embedding_20250722_010318.csv",
"collection_size": "209 B",
"embedding_size": "819 KB",
"backup_date": "2025-07-22 01:03:18",
"has_log": true,
"log_file": "vector_backup_log.txt"
}
]
},
{
"type": "task",
"task_id": "task_20250721_213627",
"relative_path": "./data_pipeline/training_data/task_20250721_213627/vector_bak",
"backups": [
{
"timestamp": "20250721_215758",
"collection_file": "langchain_pg_collection_20250721_215758.csv",
"embedding_file": "langchain_pg_embedding_20250721_215758.csv",
"collection_size": "209 B",
"embedding_size": "764 KB",
"backup_date": "2025-07-21 21:57:58",
"has_log": true,
"log_file": "vector_backup_log.txt"
}
]
}
],
"summary": {
"total_locations": 6,
"total_backup_sets": 6,
"global_backups": 1,
"task_backups": 5,
"scan_time": "2025-07-22T11:28:25.156158"
},
"timestamp": "2025-07-22T11:28:25.156158"
}
}
global
或 task
)POST /api/v0/data_pipeline/vector/restore
参数名 | 类型 | 必填 | 默认值 | 说明 |
---|---|---|---|---|
backup_path |
string | ✅ | - | 备份文件目录路径(相对路径) |
timestamp |
string | ✅ | - | 备份时间戳(YYYYMMDD_HHMMSS格式) |
tables |
array | 否 | null | 要恢复的表名列表,空则恢复所有表 |
db_connection |
string | 否 | null | 自定义PostgreSQL连接字符串 |
truncate_before_restore |
boolean | 否 | false | 恢复前是否清空目标表 |
"./data_pipeline/training_data/vector_bak"
"./data_pipeline/training_data/task_20250721_213627/vector_bak"
YYYYMMDD_HHMMSS
"20250721_215758"
["langchain_pg_collection"]
, ["langchain_pg_embedding"]
, ["langchain_pg_collection", "langchain_pg_embedding"]
null
(恢复所有表)"postgresql://user:password@host:port/database"
false
true
: 恢复前清空目标表(推荐用于生产环境)false
: 直接追加数据(可能导致主键冲突)curl -X POST http://localhost:8084/api/v0/data_pipeline/vector/restore \
-H "Content-Type: application/json" \
-d '{
"backup_path": "./data_pipeline/training_data/task_20250721_213627/vector_bak",
"timestamp": "20250721_215758",
"truncate_before_restore": true
}'
curl -X POST http://localhost:8084/api/v0/data_pipeline/vector/restore \
-H "Content-Type: application/json" \
-d '{
"backup_path": "./data_pipeline/training_data/vector_bak",
"timestamp": "20250722_010318",
"tables": ["langchain_pg_embedding"],
"truncate_before_restore": false
}'
curl -X POST http://localhost:8084/api/v0/data_pipeline/vector/restore \
-H "Content-Type: application/json" \
-d '{
"backup_path": "./data_pipeline/training_data/vector_bak",
"timestamp": "20250722_010318",
"db_connection": "postgresql://user:password@localhost:5432/target_db",
"truncate_before_restore": true
}'
{
"code": 200,
"success": true,
"message": "操作成功",
"data": {
"response": "Vector表恢复完成",
"restore_performed": true,
"truncate_performed": true,
"backup_info": {
"backup_path": "./data_pipeline/training_data/task_20250721_213627/vector_bak",
"timestamp": "20250721_215758",
"backup_date": "2025-07-21 21:57:58"
},
"truncate_results": {
"langchain_pg_collection": {
"success": true,
"rows_before": 4,
"rows_after": 0,
"duration": 0.025
},
"langchain_pg_embedding": {
"success": true,
"rows_before": 58,
"rows_after": 0,
"duration": 0.063
}
},
"restore_results": {
"langchain_pg_collection": {
"success": true,
"source_file": "langchain_pg_collection_20250721_215758.csv",
"rows_restored": 4,
"file_size": "209 B",
"duration": 0.145
},
"langchain_pg_embedding": {
"success": true,
"source_file": "langchain_pg_embedding_20250721_215758.csv",
"rows_restored": 58,
"file_size": "764 KB",
"duration": 0.678
}
},
"errors": [],
"duration": 0.911,
"timestamp": "2025-07-22T10:35:20+08:00"
}
}
{
"code": 400,
"success": false,
"message": "请求参数错误",
"data": {
"response": "缺少必需参数: backup_path, timestamp",
"error_type": "MISSING_REQUIRED_PARAMS",
"missing_params": ["backup_path", "timestamp"],
"timestamp": "2025-07-22T10:35:20+08:00"
}
}
{
"code": 404,
"success": false,
"message": "资源未找到",
"data": {
"response": "备份目录不存在: ./data_pipeline/training_data/nonexistent",
"error_type": "RESOURCE_NOT_FOUND",
"timestamp": "2025-07-22T10:35:20+08:00"
}
}
{
"code": 500,
"success": false,
"message": "系统内部错误",
"data": {
"response": "数据库连接失败,请稍后重试",
"error_type": "DATABASE_ERROR",
"timestamp": "2025-07-22T10:35:20+08:00"
}
}
# 步骤1: 在源环境列出备份
curl "http://source-server:8084/api/v0/data_pipeline/vector/restore/list"
# 步骤2: 复制备份文件到目标环境(手动操作)
# scp source:/path/to/backups/* target:/path/to/backups/
# 步骤3: 在目标环境恢复数据
curl -X POST http://target-server:8084/api/v0/data_pipeline/vector/restore \
-H "Content-Type: application/json" \
-d '{
"backup_path": "./data_pipeline/training_data/vector_bak",
"timestamp": "20250722_010318",
"truncate_before_restore": true
}'
# 步骤1: 查找回滚点
curl "http://localhost:8084/api/v0/data_pipeline/vector/restore/list?task_id=task_20250721_213627"
# 步骤2: 恢复到指定时间点
curl -X POST http://localhost:8084/api/v0/data_pipeline/vector/restore \
-H "Content-Type: application/json" \
-d '{
"backup_path": "./data_pipeline/training_data/task_20250721_213627/vector_bak",
"timestamp": "20250721_215758",
"truncate_before_restore": true
}'
# 仅恢复embedding表,不影响collection表
curl -X POST http://localhost:8084/api/v0/data_pipeline/vector/restore \
-H "Content-Type: application/json" \
-d '{
"backup_path": "./data_pipeline/training_data/vector_bak",
"timestamp": "20250722_010318",
"tables": ["langchain_pg_embedding"],
"truncate_before_restore": false
}'
truncate_before_restore: true
确保数据干净truncate_before_restore: false
进行数据叠加测试tables
参数db_connection
参数指定目标数据库duration
字段,了解性能表现errors
数组,确保没有恢复失败的表rows_restored
与预期的数据量一致根据测试,恢复性能参考:
数据量级 | Collection表 | Embedding表 | 总耗时 | 说明 |
---|---|---|---|---|
小量数据(< 100行) | < 0.1s | < 0.7s | < 1s | 开发测试环境 |
中等数据(< 10K行) | < 0.5s | < 3s | < 4s | 小型生产环境 |
大量数据(< 100K行) | < 2s | < 15s | < 20s | 中型生产环境 |
超大数据(> 100K行) | < 10s | < 60s | < 80s | 大型生产环境 |
注:实际性能取决于数据库配置、硬件性能和网络状况
POST /api/v0/data_pipeline/vector/backup
- 创建vector表备份GET /health
- 检查API服务状态/api/v0/training_data/*
- 训练数据管理如果遇到问题,请检查:
http://localhost:8084/health
logs/app.log
了解详细错误信息文档版本: v1.0
最后更新: 2025-07-22
适用版本: unified_api.py v1.0+