DEPLOYMENT_CHECKLIST.md 9.6 KB

Deployment Checklist for API Refactor

Pre-Deployment Steps

1. Environment Variables

Update your .env files with required variables:

# Required
MYSQL_URL=mysql://user:pass@localhost:3306/dbname
MONGO_URL=mongodb://localhost:27017/dbname
JWT_SECRET=your-secure-random-secret-min-32-chars

# Optional (with defaults)
JWT_EXPIRES_IN_SECONDS=43200    # 12 hours
PORT=3000
NODE_ENV=production
ENCRYPTION_KEY=your-encryption-key-for-2fa

Validation: Run the app locally to verify config validation works:

pnpm dev:mgnt

If any required variables are missing, you'll see a clear error message.


2. Database Indexes (Recommended)

Run these SQL migrations to optimize performance:

-- User role lookups
CREATE INDEX IF NOT EXISTS idx_user_role_user_id
  ON sys_user_role(user_id);

CREATE INDEX IF NOT EXISTS idx_user_role_role_id
  ON sys_user_role(role_id);

-- Role menu lookups
CREATE INDEX IF NOT EXISTS idx_role_menu_role_id
  ON sys_role_menu(role_id);

CREATE INDEX IF NOT EXISTS idx_role_menu_menu_id
  ON sys_role_menu(menu_id);

-- Menu hierarchy traversal
CREATE INDEX IF NOT EXISTS idx_menu_parent_id
  ON sys_menu(parent_id);

-- Operation log queries
CREATE INDEX IF NOT EXISTS idx_operation_log_username
  ON sys_operation_log(username);

CREATE INDEX IF NOT EXISTS idx_operation_log_create_time
  ON sys_operation_log(create_time);

-- Login log queries
CREATE INDEX IF NOT EXISTS idx_login_log_username
  ON sys_login_log(username);

CREATE INDEX IF NOT EXISTS idx_login_log_create_time
  ON sys_login_log(create_time);

3. Frontend Changes Required

Update Response Handling

Old Code:

// ❌ Old response format
interface HttpResponse {
  error: string;
  status: 1 | 0;
  data: any;
}

// Check success
if (response.status === 1) {
  // success
}

New Code:

// ✅ New response format
interface ApiResponse<T> {
  success: boolean;
  code: string;
  message: string;
  data: T;
  timestamp: string;
}

// Check success
if (response.success) {
  // success
}

Update Error Handling

Old Code:

// ❌ All responses were HTTP 200
axios.post('/api/login', data).then((res) => {
  if (res.data.status === 1) {
    // success
  } else {
    // error - check res.data.error
  }
});

New Code:

// ✅ Proper HTTP status codes
axios
  .post('/api/login', data)
  .then((res) => {
    // HTTP 2xx - success
    if (res.data.success) {
      // success
    }
  })
  .catch((err) => {
    // HTTP 4xx/5xx - error
    const { code, message } = err.response.data;

    switch (code) {
      case 'UNAUTHORIZED':
        // redirect to login
        break;
      case 'RATE_LIMITED':
        // show rate limit message
        break;
      case 'MFA_REQUIRED':
        // redirect to MFA page
        break;
      default:
      // show generic error
    }
  });

Add Correlation ID Support

// Add correlation ID to all requests for tracing
axios.interceptors.request.use((config) => {
  const correlationId = localStorage.getItem('correlationId') || uuidv4();
  config.headers['x-correlation-id'] = correlationId;
  return config;
});

// Extract correlation ID from responses for logging
axios.interceptors.response.use(
  (response) => {
    const correlationId = response.headers['x-correlation-id'];
    if (correlationId) {
      console.log(`[${correlationId}] Response received`);
    }
    return response;
  },
  (error) => {
    const correlationId = error.response?.headers['x-correlation-id'];
    if (correlationId) {
      console.error(`[${correlationId}] Error:`, error.message);
    }
    return Promise.reject(error);
  },
);

4. Testing Before Deployment

Unit Tests

# Run existing tests (if any)
pnpm test

# Check for TypeScript errors
pnpm build:mgnt

Manual Testing Checklist

  • [ ] Login Flow

    • Successful login returns success: true
    • Failed login returns HTTP 401 with success: false
    • Rate limiting kicks in after 10 attempts
  • [ ] 2FA Flow

    • Setup 2FA endpoint rate limited
    • Enable 2FA endpoint rate limited
    • MFA guard blocks access when 2FA not verified
  • [ ] Error Responses

    • 400 errors return HTTP 400 (not 200)
    • 401 errors return HTTP 401
    • 404 errors return HTTP 404
    • 500 errors return HTTP 500
  • [ ] Correlation IDs

    • Response includes x-request-id header
    • Custom x-request-id is preserved
    • Logs include correlation ID
  • [ ] Configuration

    • App fails to start with missing JWT_SECRET
    • App fails to start with invalid MYSQL_URL
    • App starts successfully with all required vars

Deployment Steps

1. Backup Current System

# Backup database
mysqldump -u user -p dbname > backup_$(date +%Y%m%d).sql

# Tag current version
git tag -a v1.0.0-pre-refactor -m "Pre-refactor backup"
git push origin v1.0.0-pre-refactor

2. Deploy Backend

# Pull latest code
git pull origin main

# Install dependencies
pnpm install

# Generate Prisma clients
pnpm prisma:generate

# Build application
pnpm build:mgnt

# Update environment variables
# (copy from Pre-Deployment Steps #1)
nano .env.mgnt.dev

# Start application
pm2 restart box-mgnt-api
# OR
pnpm start:mgnt

3. Monitor Logs

# Watch application logs
pm2 logs box-mgnt-api

# Check for errors
tail -f logs/error.log

# Verify correlation IDs in logs
grep "x-request-id" logs/combined.log

4. Smoke Tests

# Health check
curl http://localhost:3000/health

# Test login (should be rate limited after 10 attempts)
for i in {1..12}; do
  curl -X POST http://localhost:3000/mgnt/auth/login \
    -H "Content-Type: application/json" \
    -H "x-request-id: smoke-test-$i" \
    -d '{"username":"testuser","password":"wrong"}'
  echo "\n--- Request $i ---"
  sleep 1
done

# Should see HTTP 429 on requests 11-12

Post-Deployment Monitoring

Metrics to Watch

  1. Error Rates

    • Monitor 4xx/5xx response rates
    • Alert on sudden spikes in 500 errors
  2. Rate Limiting

    • Track rate limit violations
    • Adjust limits if legitimate users affected
  3. Response Times

    • Monitor P50, P95, P99 latencies
    • Compare to pre-refactor baselines
  4. Authentication

    • Monitor failed login attempts
    • Track 2FA verification rates

Log Analysis

# Count rate limit violations
grep "Rate limit exceeded" logs/app.log | wc -l

# Find slow requests (>1s)
grep "响应.*+[0-9]{4,}ms" logs/app.log

# Track correlation IDs for debugging
grep "550e8400-e29b-41d4-a716-446655440000" logs/app.log

Rollback Procedure

If critical issues occur:

Quick Rollback

# Revert to previous version
git revert HEAD --no-edit
pnpm install
pnpm build:mgnt
pm2 restart box-mgnt-api

Full Rollback

# Restore from backup tag
git reset --hard v1.0.0-pre-refactor
pnpm install
pnpm prisma:generate
pnpm build:mgnt
pm2 restart box-mgnt-api

Frontend Rollback (if deployed)

# Revert frontend changes to old response format
cd frontend-app
git revert <commit-hash>
pnpm install
pnpm build
pm2 restart frontend-app

Common Issues & Solutions

Issue: App won't start - "JWT_SECRET should not be empty"

Solution: Add JWT_SECRET to your .env file with a secure random string

Issue: Rate limiting too aggressive

Solution: Edit libs/common/src/guards/rate-limit.guard.ts:

private readonly limit = 20 // Increase from 10
private readonly windowMs = 120_000 // Increase to 2 minutes

Issue: Frontend getting CORS errors

Solution: Ensure CORS is properly configured in main.ts:

app.enableCors({
  origin: true,
  credentials: true,
  exposedHeaders: ['x-request-id', 'x-correlation-id'],
});

Issue: Logs not showing correlation IDs

Solution: Update Pino config to include correlation ID:

// libs/common/src/config/pino.config.ts
serializers: {
  req(req) {
    return {
      ...req,
      correlationId: req.correlationId
    }
  }
}

Performance Tuning

Redis Rate Limiting (Production)

Replace in-memory rate limiter with Redis:

// Install redis client
pnpm add ioredis

// Update rate-limit.guard.ts
import Redis from 'ioredis'

const redis = new Redis(process.env.REDIS_URL)

async canActivate(context: ExecutionContext): Promise<boolean> {
  const key = `ratelimit:${ip}:${endpoint}`
  const count = await redis.incr(key)

  if (count === 1) {
    await redis.expire(key, windowSeconds)
  }

  if (count > this.limit) {
    const ttl = await redis.ttl(key)
    throw new HttpException(...)
  }

  return true
}

Caching Layer

Add Redis caching for frequently accessed data:

// Cache user permissions for 5 minutes
const cacheKey = `user:${userId}:permissions`;
let permissions = await redis.get(cacheKey);

if (!permissions) {
  permissions = await this.fetchPermissions(userId);
  await redis.setex(cacheKey, 300, JSON.stringify(permissions));
}

return JSON.parse(permissions);

Success Criteria

  • Application starts without errors
  • All endpoints return new response format
  • HTTP status codes are preserved
  • Rate limiting blocks excessive requests
  • Correlation IDs appear in logs and responses
  • Frontend successfully handles new response format
  • No performance degradation (response times similar)
  • Error rates remain stable or improve

Support Contacts

  • Backend Issues: [Your Team]
  • Frontend Integration: [Frontend Team]
  • DevOps/Deployment: [DevOps Team]
  • On-Call Escalation: [On-Call Contact]