# MongoDB Removal Feasibility ## Feasibility Summary - Removing MongoDB + MongoStats is feasible but high-effort. - MongoDB is the primary store for ads, videos, categories, tags, channels, system params, and stats/events. - Code paths assume Mongo-native arrays/JSON and ObjectId semantics. - A MySQL-only approach requires a full schema replacement plus extensive query rewrites and migration tooling. ## 1) Inventory of Mongo/MongoStats Usage ### Prisma clients - Mongo: `@prisma/mongo/client` via `MongoPrismaService` (`libs/db/src/prisma/mongo-prisma.service.ts`) - MongoStats: `@prisma/mongo-stats/client` via `MongoStatsPrismaService` (`libs/db/src/prisma/mongo-stats-prisma.service.ts`) - App-scoped services: - `apps/box-app-api/src/prisma/prisma-mongo.service.ts` - `apps/box-app-api/src/prisma/prisma-mongo-stats.service.ts` - `apps/box-stats-api/src/prisma/prisma-mongo.service.ts` ### Mongo (main) schema usage - Ads / AdsModule (mgnt CRUD, app reads, recommendations, cache warmups) - `apps/box-mgnt-api/src/mgnt-backend/feature/ads/ads.service.ts` - `apps/box-app-api/src/feature/ads/ad.service.ts` - `apps/box-app-api/src/feature/recommendation/recommendation.service.ts` - `libs/core/src/ad/*`, `libs/common/src/services/ad-pool.service.ts` - Category / Tag / Channel (mgnt CRUD, app reads, cache builders) - `apps/box-mgnt-api/src/mgnt-backend/feature/category/category.service.ts` - `apps/box-mgnt-api/src/mgnt-backend/feature/tag/tag.service.ts` - `apps/box-mgnt-api/src/mgnt-backend/feature/channel/channel.service.ts` - `apps/box-app-api/src/feature/video/video.service.ts` - `libs/core/src/cache/*` - VideoMedia (mgnt CRUD, app reads, recommendation, stats aggregation, cache builders) - `apps/box-mgnt-api/src/mgnt-backend/feature/video-media/video-media.service.ts` - `apps/box-app-api/src/feature/video/video.service.ts` - `apps/box-app-api/src/feature/recommendation/recommendation.service.ts` - `apps/box-stats-api/src/feature/stats-events/stats-aggregation.service.ts` - `libs/core/src/cache/video/*` - SystemParam (mgnt CRUD + app reads) - `apps/box-mgnt-api/src/mgnt-backend/feature/system-params/system-params.service.ts` - `apps/box-app-api/src/feature/sys-params/sys-params.service.ts` - Sync services (write-heavy) - `apps/box-mgnt-api/src/mgnt-backend/feature/sync-videomedia/sync-videomedia.service.ts` - `apps/box-mgnt-api/src/mgnt-backend/feature/provider-video-sync/provider-video-sync.service.ts` - Cache tooling/dev tools - `apps/box-mgnt-api/src/cache-sync/cache-sync.service.ts` - `apps/box-mgnt-api/src/dev/services/video-cache-coverage.service.ts` - `apps/box-mgnt-api/src/dev/services/video-stats.service.ts` ### MongoStats schema usage - User + login history - `apps/box-stats-api/src/feature/user-login/user-login.service.ts` - Raw events + processed message dedupe - `apps/box-stats-api/src/feature/stats-events/stats-events.consumer.ts` - Aggregated stats + Redis sync - `apps/box-stats-api/src/feature/stats-events/stats-aggregation.service.ts` ### Mongo schemas defined but not used - `Home` collection (`prisma/mongo/schema/home.prisma`) — no code references found - `AdsClickHistory` (`prisma/mongo-stats/schema/ads-click-history.prisma`) — no code references found ## 2) Data Types Stored in Mongo and Usage Patterns ### Core content (Mongo main) - Ads / AdsModule - Fields: ObjectId ids, adType enum, start/end BigInt epoch, content/cover/url - Patterns: mgnt CRUD; app reads for lists and recommendation; cache warmups populate Redis - Category - Fields: name, subtitle, seq, status, tagNames (String[]), tags (Json) - Patterns: mgnt CRUD; app reads categories; cache builders precompute lists - Tag - Fields: name, categoryId (ObjectId), seq/status - Patterns: mgnt CRUD; app reads tags by category; cache builders build tag lists - Channel - Fields: channelId (unique), name, URLs, tags/categories as Json/arrays - Patterns: mgnt CRUD; app auth uses channel lookup; cache builders - VideoMedia - Large document with arrays (categoryIds, tagIds, tags, actors, secondTags, appids, japanNames), string metadata, BigInt size, DateTime timestamps, tagsFlat - Patterns: mgnt CRUD & sync; app reads by IDs (often from Redis); recommendation queries; stats aggregation needs tagIds; cache builders depend on indices ### System parameters (Mongo main) - SystemParam - Fields: id (int), side/type enums, name/value, BigInt timestamps - Patterns: mgnt CRUD; app reads for enums and ad modules ### Stats/event data (MongoStats) - User & UserLoginHistory - Writes on login events (upsert user, insert login record) - Reads for stats aggregation/reporting (inferred via indexes) - Raw events: AdClickEvents, VideoClickEvents, AdImpressionEvents - Append-only writes from RabbitMQ consumer - Reads by aggregation jobs to compute scores and sync Redis - ProcessedMessage - Deduplication on messageId, created by consumer, deleted on failure - Aggregated stats: VideoGlobalStats, AdsGlobalStats - Upserted by aggregation jobs; read for scoring/sync to Redis - Unused: AdsClickHistory (no code usage) ### Mongo-specific features used - ObjectId primary keys stored as strings in code (`@db.ObjectId`) - Arrays (String[], Int[]) and Json fields - `aggregateRaw` used in `apps/box-app-api/src/feature/video/video.service.ts` ## 3) Minimal MySQL Schema Proposal ### Core content tables - `ads` - `id CHAR(24) PRIMARY KEY`, `ads_module_id CHAR(24)`, `advertiser`, `title`, `ads_content`, `ads_cover_img`, `ads_url`, `img_source`, `start_dt BIGINT`, `expiry_dt BIGINT`, `seq INT`, `status INT`, `create_at BIGINT`, `update_at BIGINT` - Index: `ads_module_id`, `status`, `start_dt`, `expiry_dt` - `ads_module` - `id CHAR(24) PRIMARY KEY`, `ad_type VARCHAR`, `ads_module VARCHAR`, `module_desc`, `seq INT` - Unique: `ad_type`, `ads_module` - `category` - `id CHAR(24) PRIMARY KEY`, `name`, `subtitle`, `seq`, `status`, `create_at BIGINT`, `update_at BIGINT` - `tag` - `id CHAR(24) PRIMARY KEY`, `name`, `category_id CHAR(24)`, `seq`, `status`, `create_at BIGINT`, `update_at BIGINT` - Index: `category_id`, `status` - `channel` - `id CHAR(24) PRIMARY KEY`, `channel_id VARCHAR UNIQUE`, `name`, `landing_url`, `video_cdn`, `cover_cdn`, `client_name`, `client_notice`, `remark`, `create_at BIGINT`, `update_at BIGINT` - `system_param` - `id INT PRIMARY KEY`, `side`, `data_type`, `name`, `value`, `remark`, `create_at BIGINT`, `update_at BIGINT` - Unique: `(side, name)` - `video_media` - `id CHAR(24) PRIMARY KEY`, core fields + `tags_flat VARCHAR`, `list_status INT`, `edited_at BIGINT`, `img_source`, provider fields, timestamps - Index: `(list_status, tags_flat)`, `added_time`, `updated_at` ### Join tables to replace arrays - `video_media_category` (`video_media_id`, `category_id`, PRIMARY KEY both) - Index on `category_id` - `video_media_tag` (`video_media_id`, `tag_id`, PRIMARY KEY both) - Index on `tag_id` - Optional denormalized tag names: `video_media_tags` as JSON if minimizing join complexity ### Stats/event tables - `user_stats` - `id CHAR(24) PRIMARY KEY`, `uid VARCHAR UNIQUE`, `ip`, `os`, `u_channel_id`, `machine`, `create_at BIGINT`, `last_login_at BIGINT` - `user_login_history` - `id CHAR(24) PRIMARY KEY`, `uid`, `ip`, `user_agent`, `app_version`, `os`, `u_channel_id`, `machine`, `create_at BIGINT`, `token_id` - Index: `(uid, create_at)`, `(u_channel_id, create_at)`, `(ip, create_at)` - `ad_click_events`, `video_click_events`, `ad_impression_events` - Fields mirror MongoStats schema; index on `(ad_id|video_id, time)`, `(uid, time)`, `(ad_type, time)` - `processed_messages` - `message_id VARCHAR UNIQUE`, `event_type`, `processed_at BIGINT`, `created_at BIGINT` - `ads_global_stats`, `video_global_stats` - per-id unique rows, indexes on `computed_score`, `computed_recency`, `computed_ctr`, `last_seen_at` ### Notes - Use `CHAR(24)` or `BINARY(12)` for former ObjectId values to preserve IDs used in APIs. - Arrays used in filters should become join tables; JSON columns can be a stopgap but complicate query parity. ## 4) Migration Plan (Phases, Risks, Rollback) ### Phases 1. Schema introduction - Add new MySQL tables (content + stats) while keeping Mongo/MongoStats active. 2. Backfill - One-time migration from Mongo/MongoStats into MySQL (ads, videoMedia, tags, events, stats). 3. Dual-write - Update write paths to write both Mongo and MySQL (mgnt CRUD, stats consumer, sync services). 4. Read switch - Shift read paths (app APIs, cache builders, stats aggregation) to MySQL. 5. Stabilization - Keep Mongo in read-only mode for verification. 6. Decommission - Remove Mongo dependencies after parity signoff. ### Key risks - Array semantics: Mongo arrays for `categoryIds`, `tagIds`, `tagsFlat` used in filters and Redis sync. - Aggregation logic: `aggregateRaw` in `apps/box-app-api/src/feature/video/video.service.ts` is Mongo-specific. - Event volume: stats events are append-heavy; MySQL indexing/partitioning needed for aggregation performance. - ID consistency: APIs and caches assume ObjectId string IDs. - Cache rebuilds depend on Mongo data shape and indexing; porting to MySQL must preserve ordering and filters. ### Rollback strategy - Keep Mongo/MongoStats live during migration. - If errors: switch reads back to Mongo, stop MySQL writes, keep MySQL for postmortem. - Preserve dual-write toggles to flip back quickly. ## 5) Test Checklist for Parity ### API parity - Ads API: `/api/v1/ads/list`, `/api/v1/ads/click`, `/api/v1/ads/impression` - Video API: `/api/v1/video/list`, `/api/v1/video/search-by-tag`, `/api/v1/video/category/:id/latest`, `/api/v1/video/recommended` - Homepage API: `/api/v1/homepage` (ads, categories, announcements, videos) - Recommendation APIs: `/api/v1/recommend/*`, `/api/v1/recommendation/*` ### Mgnt parity - CRUD: ads, categories, tags, channels, system params, video media - Sync endpoints: provider sync and video sync flow - Cache rebuild endpoints: cache-sync and admin rebuild ### Stats parity - RabbitMQ ingestion → event writes - Aggregation results: `ads_global_stats`, `video_global_stats` values and Redis scores - Redis sync results: `ads:global:score`, `video:global:score`, `video:tag:*:score` ### Performance - Cache rebuild job duration and Redis write throughput - Video listing latency (paged list by category/tag) - Recommendation query latency (similar ads/videos) - Stats aggregation runtime with realistic event volumes ### Data integrity - ID mapping unchanged (ObjectId format preserved) - Tags/categories relations consistent for video media - No loss of BigInt epoch precision in timestamps