MongoDB Removal Feasibility
Feasibility Summary
- Removing MongoDB + MongoStats is feasible but high-effort.
- MongoDB is the primary store for ads, videos, categories, tags, channels, system params, and stats/events.
- Code paths assume Mongo-native arrays/JSON and ObjectId semantics.
- A MySQL-only approach requires a full schema replacement plus extensive query rewrites and migration tooling.
1) Inventory of Mongo/MongoStats Usage
Prisma clients
- Mongo:
@prisma/mongo/client via MongoPrismaService (libs/db/src/prisma/mongo-prisma.service.ts)
- MongoStats:
@prisma/mongo-stats/client via MongoStatsPrismaService (libs/db/src/prisma/mongo-stats-prisma.service.ts)
- App-scoped services:
apps/box-app-api/src/prisma/prisma-mongo.service.ts
apps/box-app-api/src/prisma/prisma-mongo-stats.service.ts
apps/box-stats-api/src/prisma/prisma-mongo.service.ts
Mongo (main) schema usage
- Ads / AdsModule (mgnt CRUD, app reads, recommendations, cache warmups)
apps/box-mgnt-api/src/mgnt-backend/feature/ads/ads.service.ts
apps/box-app-api/src/feature/ads/ad.service.ts
apps/box-app-api/src/feature/recommendation/recommendation.service.ts
libs/core/src/ad/*, libs/common/src/services/ad-pool.service.ts
- Category / Tag / Channel (mgnt CRUD, app reads, cache builders)
apps/box-mgnt-api/src/mgnt-backend/feature/category/category.service.ts
apps/box-mgnt-api/src/mgnt-backend/feature/tag/tag.service.ts
apps/box-mgnt-api/src/mgnt-backend/feature/channel/channel.service.ts
apps/box-app-api/src/feature/video/video.service.ts
libs/core/src/cache/*
- VideoMedia (mgnt CRUD, app reads, recommendation, stats aggregation, cache builders)
apps/box-mgnt-api/src/mgnt-backend/feature/video-media/video-media.service.ts
apps/box-app-api/src/feature/video/video.service.ts
apps/box-app-api/src/feature/recommendation/recommendation.service.ts
apps/box-stats-api/src/feature/stats-events/stats-aggregation.service.ts
libs/core/src/cache/video/*
- SystemParam (mgnt CRUD + app reads)
apps/box-mgnt-api/src/mgnt-backend/feature/system-params/system-params.service.ts
apps/box-app-api/src/feature/sys-params/sys-params.service.ts
- Sync services (write-heavy)
apps/box-mgnt-api/src/mgnt-backend/feature/sync-videomedia/sync-videomedia.service.ts
apps/box-mgnt-api/src/mgnt-backend/feature/provider-video-sync/provider-video-sync.service.ts
- Cache tooling/dev tools
apps/box-mgnt-api/src/cache-sync/cache-sync.service.ts
apps/box-mgnt-api/src/dev/services/video-cache-coverage.service.ts
apps/box-mgnt-api/src/dev/services/video-stats.service.ts
MongoStats schema usage
- User + login history
apps/box-stats-api/src/feature/user-login/user-login.service.ts
- Raw events + processed message dedupe
apps/box-stats-api/src/feature/stats-events/stats-events.consumer.ts
- Aggregated stats + Redis sync
apps/box-stats-api/src/feature/stats-events/stats-aggregation.service.ts
Mongo schemas defined but not used
Home collection (prisma/mongo/schema/home.prisma) — no code references found
AdsClickHistory (prisma/mongo-stats/schema/ads-click-history.prisma) — no code references found
2) Data Types Stored in Mongo and Usage Patterns
Core content (Mongo main)
- Ads / AdsModule
- Fields: ObjectId ids, adType enum, start/end BigInt epoch, content/cover/url
- Patterns: mgnt CRUD; app reads for lists and recommendation; cache warmups populate Redis
- Category
- Fields: name, subtitle, seq, status, tagNames (String[]), tags (Json)
- Patterns: mgnt CRUD; app reads categories; cache builders precompute lists
- Tag
- Fields: name, categoryId (ObjectId), seq/status
- Patterns: mgnt CRUD; app reads tags by category; cache builders build tag lists
- Channel
- Fields: channelId (unique), name, URLs, tags/categories as Json/arrays
- Patterns: mgnt CRUD; app auth uses channel lookup; cache builders
- VideoMedia
- Large document with arrays (categoryIds, tagIds, tags, actors, secondTags, appids, japanNames), string metadata, BigInt size, DateTime timestamps, tagsFlat
- Patterns: mgnt CRUD & sync; app reads by IDs (often from Redis); recommendation queries; stats aggregation needs tagIds; cache builders depend on indices
System parameters (Mongo main)
- SystemParam
- Fields: id (int), side/type enums, name/value, BigInt timestamps
- Patterns: mgnt CRUD; app reads for enums and ad modules
Stats/event data (MongoStats)
- User & UserLoginHistory
- Writes on login events (upsert user, insert login record)
- Reads for stats aggregation/reporting (inferred via indexes)
- Raw events: AdClickEvents, VideoClickEvents, AdImpressionEvents
- Append-only writes from RabbitMQ consumer
- Reads by aggregation jobs to compute scores and sync Redis
- ProcessedMessage
- Deduplication on messageId, created by consumer, deleted on failure
- Aggregated stats: VideoGlobalStats, AdsGlobalStats
- Upserted by aggregation jobs; read for scoring/sync to Redis
- Unused: AdsClickHistory (no code usage)
Mongo-specific features used
- ObjectId primary keys stored as strings in code (
@db.ObjectId)
- Arrays (String[], Int[]) and Json fields
aggregateRaw used in apps/box-app-api/src/feature/video/video.service.ts
3) Minimal MySQL Schema Proposal
Core content tables
ads
id CHAR(24) PRIMARY KEY, ads_module_id CHAR(24), advertiser, title, ads_content, ads_cover_img, ads_url, img_source, start_dt BIGINT, expiry_dt BIGINT, seq INT, status INT, create_at BIGINT, update_at BIGINT
- Index:
ads_module_id, status, start_dt, expiry_dt
ads_module
id CHAR(24) PRIMARY KEY, ad_type VARCHAR, ads_module VARCHAR, module_desc, seq INT
- Unique:
ad_type, ads_module
category
id CHAR(24) PRIMARY KEY, name, subtitle, seq, status, create_at BIGINT, update_at BIGINT
tag
id CHAR(24) PRIMARY KEY, name, category_id CHAR(24), seq, status, create_at BIGINT, update_at BIGINT
- Index:
category_id, status
channel
id CHAR(24) PRIMARY KEY, channel_id VARCHAR UNIQUE, name, landing_url, video_cdn, cover_cdn, client_name, client_notice, remark, create_at BIGINT, update_at BIGINT
system_param
id INT PRIMARY KEY, side, data_type, name, value, remark, create_at BIGINT, update_at BIGINT
- Unique:
(side, name)
video_media
id CHAR(24) PRIMARY KEY, core fields + tags_flat VARCHAR, list_status INT, edited_at BIGINT, img_source, provider fields, timestamps
- Index:
(list_status, tags_flat), added_time, updated_at
Join tables to replace arrays
video_media_category (video_media_id, category_id, PRIMARY KEY both)
video_media_tag (video_media_id, tag_id, PRIMARY KEY both)
- Optional denormalized tag names:
video_media_tags as JSON if minimizing join complexity
Stats/event tables
user_stats
id CHAR(24) PRIMARY KEY, uid VARCHAR UNIQUE, ip, os, u_channel_id, machine, create_at BIGINT, last_login_at BIGINT
user_login_history
id CHAR(24) PRIMARY KEY, uid, ip, user_agent, app_version, os, u_channel_id, machine, create_at BIGINT, token_id
- Index:
(uid, create_at), (u_channel_id, create_at), (ip, create_at)
ad_click_events, video_click_events, ad_impression_events
- Fields mirror MongoStats schema; index on
(ad_id|video_id, time), (uid, time), (ad_type, time)
processed_messages
message_id VARCHAR UNIQUE, event_type, processed_at BIGINT, created_at BIGINT
ads_global_stats, video_global_stats
- per-id unique rows, indexes on
computed_score, computed_recency, computed_ctr, last_seen_at
Notes
- Use
CHAR(24) or BINARY(12) for former ObjectId values to preserve IDs used in APIs.
- Arrays used in filters should become join tables; JSON columns can be a stopgap but complicate query parity.
4) Migration Plan (Phases, Risks, Rollback)
Phases
- Schema introduction
- Add new MySQL tables (content + stats) while keeping Mongo/MongoStats active.
- Backfill
- One-time migration from Mongo/MongoStats into MySQL (ads, videoMedia, tags, events, stats).
- Dual-write
- Update write paths to write both Mongo and MySQL (mgnt CRUD, stats consumer, sync services).
- Read switch
- Shift read paths (app APIs, cache builders, stats aggregation) to MySQL.
- Stabilization
- Keep Mongo in read-only mode for verification.
- Decommission
- Remove Mongo dependencies after parity signoff.
Key risks
- Array semantics: Mongo arrays for
categoryIds, tagIds, tagsFlat used in filters and Redis sync.
- Aggregation logic:
aggregateRaw in apps/box-app-api/src/feature/video/video.service.ts is Mongo-specific.
- Event volume: stats events are append-heavy; MySQL indexing/partitioning needed for aggregation performance.
- ID consistency: APIs and caches assume ObjectId string IDs.
- Cache rebuilds depend on Mongo data shape and indexing; porting to MySQL must preserve ordering and filters.
Rollback strategy
- Keep Mongo/MongoStats live during migration.
- If errors: switch reads back to Mongo, stop MySQL writes, keep MySQL for postmortem.
- Preserve dual-write toggles to flip back quickly.
5) Test Checklist for Parity
API parity
- Ads API:
/api/v1/ads/list, /api/v1/ads/click, /api/v1/ads/impression
- Video API:
/api/v1/video/list, /api/v1/video/search-by-tag, /api/v1/video/category/:id/latest, /api/v1/video/recommended
- Homepage API:
/api/v1/homepage (ads, categories, announcements, videos)
- Recommendation APIs:
/api/v1/recommend/*, /api/v1/recommendation/*
Mgnt parity
- CRUD: ads, categories, tags, channels, system params, video media
- Sync endpoints: provider sync and video sync flow
- Cache rebuild endpoints: cache-sync and admin rebuild
Stats parity
- RabbitMQ ingestion → event writes
- Aggregation results:
ads_global_stats, video_global_stats values and Redis scores
- Redis sync results:
ads:global:score, video:global:score, video:tag:*:score
Performance
- Cache rebuild job duration and Redis write throughput
- Video listing latency (paged list by category/tag)
- Recommendation query latency (similar ads/videos)
- Stats aggregation runtime with realistic event volumes
Data integrity
- ID mapping unchanged (ObjectId format preserved)
- Tags/categories relations consistent for video media
- No loss of BigInt epoch precision in timestamps