MONGODB_REMOVAL_FEASIBILITY.md 10 KB

MongoDB Removal Feasibility

Feasibility Summary

  • Removing MongoDB + MongoStats is feasible but high-effort.
  • MongoDB is the primary store for ads, videos, categories, tags, channels, system params, and stats/events.
  • Code paths assume Mongo-native arrays/JSON and ObjectId semantics.
  • A MySQL-only approach requires a full schema replacement plus extensive query rewrites and migration tooling.

1) Inventory of Mongo/MongoStats Usage

Prisma clients

  • Mongo: @prisma/mongo/client via MongoPrismaService (libs/db/src/prisma/mongo-prisma.service.ts)
  • MongoStats: @prisma/mongo-stats/client via MongoStatsPrismaService (libs/db/src/prisma/mongo-stats-prisma.service.ts)
  • App-scoped services:
    • apps/box-app-api/src/prisma/prisma-mongo.service.ts
    • apps/box-app-api/src/prisma/prisma-mongo-stats.service.ts
    • apps/box-stats-api/src/prisma/prisma-mongo.service.ts

Mongo (main) schema usage

  • Ads / AdsModule (mgnt CRUD, app reads, recommendations, cache warmups)
    • apps/box-mgnt-api/src/mgnt-backend/feature/ads/ads.service.ts
    • apps/box-app-api/src/feature/ads/ad.service.ts
    • apps/box-app-api/src/feature/recommendation/recommendation.service.ts
    • libs/core/src/ad/*, libs/common/src/services/ad-pool.service.ts
  • Category / Tag / Channel (mgnt CRUD, app reads, cache builders)
    • apps/box-mgnt-api/src/mgnt-backend/feature/category/category.service.ts
    • apps/box-mgnt-api/src/mgnt-backend/feature/tag/tag.service.ts
    • apps/box-mgnt-api/src/mgnt-backend/feature/channel/channel.service.ts
    • apps/box-app-api/src/feature/video/video.service.ts
    • libs/core/src/cache/*
  • VideoMedia (mgnt CRUD, app reads, recommendation, stats aggregation, cache builders)
    • apps/box-mgnt-api/src/mgnt-backend/feature/video-media/video-media.service.ts
    • apps/box-app-api/src/feature/video/video.service.ts
    • apps/box-app-api/src/feature/recommendation/recommendation.service.ts
    • apps/box-stats-api/src/feature/stats-events/stats-aggregation.service.ts
    • libs/core/src/cache/video/*
  • SystemParam (mgnt CRUD + app reads)
    • apps/box-mgnt-api/src/mgnt-backend/feature/system-params/system-params.service.ts
    • apps/box-app-api/src/feature/sys-params/sys-params.service.ts
  • Sync services (write-heavy)
    • apps/box-mgnt-api/src/mgnt-backend/feature/sync-videomedia/sync-videomedia.service.ts
    • apps/box-mgnt-api/src/mgnt-backend/feature/provider-video-sync/provider-video-sync.service.ts
  • Cache tooling/dev tools
    • apps/box-mgnt-api/src/cache-sync/cache-sync.service.ts
    • apps/box-mgnt-api/src/dev/services/video-cache-coverage.service.ts
    • apps/box-mgnt-api/src/dev/services/video-stats.service.ts

MongoStats schema usage

  • User + login history
    • apps/box-stats-api/src/feature/user-login/user-login.service.ts
  • Raw events + processed message dedupe
    • apps/box-stats-api/src/feature/stats-events/stats-events.consumer.ts
  • Aggregated stats + Redis sync
    • apps/box-stats-api/src/feature/stats-events/stats-aggregation.service.ts

Mongo schemas defined but not used

  • Home collection (prisma/mongo/schema/home.prisma) — no code references found
  • AdsClickHistory (prisma/mongo-stats/schema/ads-click-history.prisma) — no code references found

2) Data Types Stored in Mongo and Usage Patterns

Core content (Mongo main)

  • Ads / AdsModule
    • Fields: ObjectId ids, adType enum, start/end BigInt epoch, content/cover/url
    • Patterns: mgnt CRUD; app reads for lists and recommendation; cache warmups populate Redis
  • Category
    • Fields: name, subtitle, seq, status, tagNames (String[]), tags (Json)
    • Patterns: mgnt CRUD; app reads categories; cache builders precompute lists
  • Tag
    • Fields: name, categoryId (ObjectId), seq/status
    • Patterns: mgnt CRUD; app reads tags by category; cache builders build tag lists
  • Channel
    • Fields: channelId (unique), name, URLs, tags/categories as Json/arrays
    • Patterns: mgnt CRUD; app auth uses channel lookup; cache builders
  • VideoMedia
    • Large document with arrays (categoryIds, tagIds, tags, actors, secondTags, appids, japanNames), string metadata, BigInt size, DateTime timestamps, tagsFlat
    • Patterns: mgnt CRUD & sync; app reads by IDs (often from Redis); recommendation queries; stats aggregation needs tagIds; cache builders depend on indices

System parameters (Mongo main)

  • SystemParam
    • Fields: id (int), side/type enums, name/value, BigInt timestamps
    • Patterns: mgnt CRUD; app reads for enums and ad modules

Stats/event data (MongoStats)

  • User & UserLoginHistory
    • Writes on login events (upsert user, insert login record)
    • Reads for stats aggregation/reporting (inferred via indexes)
  • Raw events: AdClickEvents, VideoClickEvents, AdImpressionEvents
    • Append-only writes from RabbitMQ consumer
    • Reads by aggregation jobs to compute scores and sync Redis
  • ProcessedMessage
    • Deduplication on messageId, created by consumer, deleted on failure
  • Aggregated stats: VideoGlobalStats, AdsGlobalStats
    • Upserted by aggregation jobs; read for scoring/sync to Redis
  • Unused: AdsClickHistory (no code usage)

Mongo-specific features used

  • ObjectId primary keys stored as strings in code (@db.ObjectId)
  • Arrays (String[], Int[]) and Json fields
  • aggregateRaw used in apps/box-app-api/src/feature/video/video.service.ts

3) Minimal MySQL Schema Proposal

Core content tables

  • ads
    • id CHAR(24) PRIMARY KEY, ads_module_id CHAR(24), advertiser, title, ads_content, ads_cover_img, ads_url, img_source, start_dt BIGINT, expiry_dt BIGINT, seq INT, status INT, create_at BIGINT, update_at BIGINT
    • Index: ads_module_id, status, start_dt, expiry_dt
  • ads_module
    • id CHAR(24) PRIMARY KEY, ad_type VARCHAR, ads_module VARCHAR, module_desc, seq INT
    • Unique: ad_type, ads_module
  • category
    • id CHAR(24) PRIMARY KEY, name, subtitle, seq, status, create_at BIGINT, update_at BIGINT
  • tag
    • id CHAR(24) PRIMARY KEY, name, category_id CHAR(24), seq, status, create_at BIGINT, update_at BIGINT
    • Index: category_id, status
  • channel
    • id CHAR(24) PRIMARY KEY, channel_id VARCHAR UNIQUE, name, landing_url, video_cdn, cover_cdn, client_name, client_notice, remark, create_at BIGINT, update_at BIGINT
  • system_param
    • id INT PRIMARY KEY, side, data_type, name, value, remark, create_at BIGINT, update_at BIGINT
    • Unique: (side, name)
  • video_media
    • id CHAR(24) PRIMARY KEY, core fields + tags_flat VARCHAR, list_status INT, edited_at BIGINT, img_source, provider fields, timestamps
    • Index: (list_status, tags_flat), added_time, updated_at

Join tables to replace arrays

  • video_media_category (video_media_id, category_id, PRIMARY KEY both)
    • Index on category_id
  • video_media_tag (video_media_id, tag_id, PRIMARY KEY both)
    • Index on tag_id
  • Optional denormalized tag names: video_media_tags as JSON if minimizing join complexity

Stats/event tables

  • user_stats
    • id CHAR(24) PRIMARY KEY, uid VARCHAR UNIQUE, ip, os, u_channel_id, machine, create_at BIGINT, last_login_at BIGINT
  • user_login_history
    • id CHAR(24) PRIMARY KEY, uid, ip, user_agent, app_version, os, u_channel_id, machine, create_at BIGINT, token_id
    • Index: (uid, create_at), (u_channel_id, create_at), (ip, create_at)
  • ad_click_events, video_click_events, ad_impression_events
    • Fields mirror MongoStats schema; index on (ad_id|video_id, time), (uid, time), (ad_type, time)
  • processed_messages
    • message_id VARCHAR UNIQUE, event_type, processed_at BIGINT, created_at BIGINT
  • ads_global_stats, video_global_stats
    • per-id unique rows, indexes on computed_score, computed_recency, computed_ctr, last_seen_at

Notes

  • Use CHAR(24) or BINARY(12) for former ObjectId values to preserve IDs used in APIs.
  • Arrays used in filters should become join tables; JSON columns can be a stopgap but complicate query parity.

4) Migration Plan (Phases, Risks, Rollback)

Phases

  1. Schema introduction
    • Add new MySQL tables (content + stats) while keeping Mongo/MongoStats active.
  2. Backfill
    • One-time migration from Mongo/MongoStats into MySQL (ads, videoMedia, tags, events, stats).
  3. Dual-write
    • Update write paths to write both Mongo and MySQL (mgnt CRUD, stats consumer, sync services).
  4. Read switch
    • Shift read paths (app APIs, cache builders, stats aggregation) to MySQL.
  5. Stabilization
    • Keep Mongo in read-only mode for verification.
  6. Decommission
    • Remove Mongo dependencies after parity signoff.

Key risks

  • Array semantics: Mongo arrays for categoryIds, tagIds, tagsFlat used in filters and Redis sync.
  • Aggregation logic: aggregateRaw in apps/box-app-api/src/feature/video/video.service.ts is Mongo-specific.
  • Event volume: stats events are append-heavy; MySQL indexing/partitioning needed for aggregation performance.
  • ID consistency: APIs and caches assume ObjectId string IDs.
  • Cache rebuilds depend on Mongo data shape and indexing; porting to MySQL must preserve ordering and filters.

Rollback strategy

  • Keep Mongo/MongoStats live during migration.
  • If errors: switch reads back to Mongo, stop MySQL writes, keep MySQL for postmortem.
  • Preserve dual-write toggles to flip back quickly.

5) Test Checklist for Parity

API parity

  • Ads API: /api/v1/ads/list, /api/v1/ads/click, /api/v1/ads/impression
  • Video API: /api/v1/video/list, /api/v1/video/search-by-tag, /api/v1/video/category/:id/latest, /api/v1/video/recommended
  • Homepage API: /api/v1/homepage (ads, categories, announcements, videos)
  • Recommendation APIs: /api/v1/recommend/*, /api/v1/recommendation/*

Mgnt parity

  • CRUD: ads, categories, tags, channels, system params, video media
  • Sync endpoints: provider sync and video sync flow
  • Cache rebuild endpoints: cache-sync and admin rebuild

Stats parity

  • RabbitMQ ingestion → event writes
  • Aggregation results: ads_global_stats, video_global_stats values and Redis scores
  • Redis sync results: ads:global:score, video:global:score, video:tag:*:score

Performance

  • Cache rebuild job duration and Redis write throughput
  • Video listing latency (paged list by category/tag)
  • Recommendation query latency (similar ads/videos)
  • Stats aggregation runtime with realistic event volumes

Data integrity

  • ID mapping unchanged (ObjectId format preserved)
  • Tags/categories relations consistent for video media
  • No loss of BigInt epoch precision in timestamps