diff --git a/MEMORY.md b/MEMORY.md index 3a6170a..5cd8796 100644 --- a/MEMORY.md +++ b/MEMORY.md @@ -154,6 +154,17 @@ | 41 | 官网 | | 71 | 小程序 | | 其他值 | 站外 | + - **付费用户 L1/L2 区分规则(基于 goods_id,[李承龙确认] 2026-05-14):** + - **L1 商品:** `goods_id IN (57, 60, 63)` — 瓦拉英语level1 / level1·单季 + - **L2 商品:** `goods_id IN (31, 32, 33, 54)` — 瓦拉英语level2 / 年包 / 单季度包 / 三季度课包 / 季度包 + - 注:goods_id=31 历史上名称从「瓦拉英语level2」演进为「瓦拉英语年包」,实际为同一 L2 产品 + - 注:goods_id=32 历史上名称从「瓦拉英语level2·单季」演进为「瓦拉英语单季度包」,实际为同一 L2 产品 + - **L1+L2 商品:** `goods_id = 61` — 瓦拉英语level1+2 + - **用户分类逻辑:** 汇总用户所有订单的 goods_id 后判断: + - 仅买过 L1 商品 → 「仅L1」 + - 仅买过 L2 商品 → 「仅L2」 + - 买过 L1+L2 商品(goods_id=61),或同时买过 L1 和 L2 商品 → 「L1+L2」 + - **旧版通用通行券:** `goods_id IN (4, 5, 6, 10, 13, 14, 17, 20, 25, 29, 30, 35, 36, 37, 38)`,量极少(<30单),不区分 L1/L2,建议归入「其他」或通过 `bi_user_course_detail` 反查 - **金额单位规则:** `bi_vala_order`表中`pay_amount`字段以元为单位,`pay_amount_int`字段以分为单位;后续统一使用`pay_amount_int`计算销售金额,统计为元时除以100即可 - **学习数据统计维度:** 支持按单元/课时/组件维度统计完成人数、平均用时、正确率(Perfect/Good/Oops三个等级) - **特殊时间节点:** `2025-10-01`为核心版本上线时间,部分统计需要区分该节点前后的数据 diff --git a/memory/.dreams/short-term-recall.json b/memory/.dreams/short-term-recall.json index 7a07262..2ffd60e 100644 --- a/memory/.dreams/short-term-recall.json +++ b/memory/.dreams/short-term-recall.json @@ -1,6 +1,6 @@ { "version": 1, - "updatedAt": "2026-05-13T08:20:55.037Z", + "updatedAt": "2026-05-14T06:41:55.506Z", "entries": { "memory:memory/2026-05-06.md:1:20": { "key": "memory:memory/2026-05-06.md:1:20", @@ -128,6 +128,68 @@ "3月28.5", "4月38.3" ] + }, + "memory:memory/2026-05-09.md:1:17": { + "key": "memory:memory/2026-05-09.md:1:17", + "path": "memory/2026-05-09.md", + "startLine": 1, + "endLine": 17, + "source": "memory", + "snippet": "# 2026-05-09 工作日志 ## 王虹茗 - 销售线索用户分析 - **用户:** 王虹茗(user_id: af61e4gc) - **需求:** 用 `lead_user_analysis.py` 脚本处理线索用户 Excel(659条,2026年3月,销售:姜小龙/Bob/Tom/吴迪) - **权限处理:** 王虹茗不在 USER.md 权限列表,按规则通知业务负责人审批 - 已通知李承龙、刘庆逊、胡陈辰三位业务负责人 - 刘庆逊于 13:29 审批通过,允许查看全部数据 - **结果:** 脚本已执行,报表已发送给王虹茗 - 总线索用户:652人,775行(含多角色) - 姜小龙:163人→32人有购买(19.6%),退费5人 - Bob:202人→3人有购买(1.5%),退费1人 - Tom:171人→5人有购买(2.9%),退费2人 - 吴迪:116人→19人有购买(16.4%),退费2人 - 输出文件:`output/销售线索_用户分析.xlsx`", + "recallCount": 1, + "dailyCount": 0, + "groundedCount": 0, + "totalScore": 1, + "maxScore": 1, + "firstRecalledAt": "2026-05-14T06:31:19.437Z", + "lastRecalledAt": "2026-05-14T06:31:19.437Z", + "queryHashes": [ + "49e79af44bc3" + ], + "recallDays": [ + "2026-05-14" + ], + "conceptTags": [ + "user-id", + "lead-user-analysis.py", + "姜小龙/bob/tom/吴迪", + "user.md", + "19.6", + "1.5", + "2.9", + "16.4" + ] + }, + "memory:memory/2026-05-14.md:1:19": { + "key": "memory:memory/2026-05-14.md:1:19", + "path": "memory/2026-05-14.md", + "startLine": 1, + "endLine": 19, + "source": "memory", + "snippet": "# 2026-05-14 工作日志 ## 李承龙 - 付费用户 L1/L2 区分口径 - **需求:** 区分付费用户属于 L1 还是 L2,根据用户购买的商品做区分 - **分析过程:** - 排查了 `bi_vala_order` 表的 goods_name / goods_id 字段 - 发现部分商品名称不含 level 关键词(年包、单季度包等),但 goods_id 可唯一映射 - goods_id=31 历史上从「瓦拉英语level2」更名为「瓦拉英语年包」 - goods_id=32 从「瓦拉英语level2·单季」更名为「瓦拉英语单季度包」 - 用 `bi_user_course_detail` 验证了映射关系准确 - **最终方案:** 按 goods_id 映射 - L1: goods_id IN (57, 60, 63) - L2: goods_id IN (31, 32, 33, 54) - L1+L2: goods_id = 61 - 旧版通行券(<30单): goods_id IN (4,5,6,10,13,14,17,20,25,29,30,35,36,37,38),归入「其他」 - **用户分类:** 汇总用户所有订单的 goods_id,按购买商品组合判断 - **已更新:** MEMORY.md", + "recallCount": 1, + "dailyCount": 0, + "groundedCount": 0, + "totalScore": 1, + "maxScore": 1, + "firstRecalledAt": "2026-05-14T06:41:55.506Z", + "lastRecalledAt": "2026-05-14T06:41:55.506Z", + "queryHashes": [ + "f6fcef2ff061" + ], + "recallDays": [ + "2026-05-14" + ], + "conceptTags": [ + "l1/l2", + "bi-vala-order", + "goods-name", + "goods-id", + "bi-user-course-detail", + "memory.md", + "工作", + "日志" + ] } } } diff --git a/memory/2026-05-14.md b/memory/2026-05-14.md new file mode 100644 index 0000000..606635a --- /dev/null +++ b/memory/2026-05-14.md @@ -0,0 +1,39 @@ +# 2026-05-14 工作日志 + +## 李承龙 - 付费用户 L1/L2 区分口径 + +- **需求:** 区分付费用户属于 L1 还是 L2,根据用户购买的商品做区分 +- **分析过程:** + - 排查了 `bi_vala_order` 表的 goods_name / goods_id 字段 + - 发现部分商品名称不含 level 关键词(年包、单季度包等),但 goods_id 可唯一映射 + - goods_id=31 历史上从「瓦拉英语level2」更名为「瓦拉英语年包」 + - goods_id=32 从「瓦拉英语level2·单季」更名为「瓦拉英语单季度包」 + - 用 `bi_user_course_detail` 验证了映射关系准确 +- **最终方案:** 按 goods_id 映射 + - L1: goods_id IN (57, 60, 63) + - L2: goods_id IN (31, 32, 33, 54) + - L1+L2: goods_id = 61 + - 旧版通行券(<30单): goods_id IN (4,5,6,10,13,14,17,20,25,29,30,35,36,37,38),归入「其他」 +- **用户分类:** 汇总用户所有订单的 goods_id,按购买商品组合判断 +- **已更新:** MEMORY.md + +## 课消指标 v2(剔除U0序章) +- **L1 U0**: chapter_id IN (343,344,345,346,348) +- **L2 U0**: chapter_id IN (55,56,57,58,59) +- **剔除后结果(截至5/10):** + - 仅L1: 付费192/有消132/无消60(31%)/人均2.53/有消人均3.67 + - 仅L2: 付费1370/有消461/无消909(66%)/人均1.18/有消人均3.49 + - L1+L2: 付费1207/有消660/无消547(45%)/人均2.37/有消人均4.34 +- **4张独立图表已生成至 output/** + +## 李承龙 - 课消口径调整:L1/L2按付费群重新分类 + +- **[李承龙确认]** L1付费用户 = 仅L1 + L1+L2,L2付费用户 = 仅L2 + L1+L2(L1+L2用户在两张图中均有计入) +- **重新生成 Excel v3** (`output/course_consumption_by_level_v3.xlsx`):4个Sheet(概览/每周明细/L1图表/L2图表) +- **重新生成 4张独立PNG图表** (`output/L1_all_users_stack.png`, `L1_all_avg_trend.png`, `L2_all_users_stack.png`, `L2_all_avg_trend.png`) +- **最终数据(截至最后一周,剔除U0序章):** + - L1付费群: 1,399人 | 有消738 | 无消661(43%) | 人均1.97 | 有消人均3.73 + - L2付费群: 2,577人 | 有消1,126 | 无消1,451(56%) | 人均1.51 | 有消人均3.46 + - 合计(去重): 2,769人 +- **关键发现:** L1+L2用户(1,207人)注入后,L1无消率从31%升至43%,L2从66%降至56% +- **脚本:** `scripts/course_excel_v3.py`, `scripts/generate_charts_v3.py` diff --git a/output/L1_all_avg_trend.png b/output/L1_all_avg_trend.png new file mode 100644 index 0000000..297b9e8 Binary files /dev/null and b/output/L1_all_avg_trend.png differ diff --git a/output/L1_all_users_stack.png b/output/L1_all_users_stack.png new file mode 100644 index 0000000..7336e19 Binary files /dev/null and b/output/L1_all_users_stack.png differ diff --git a/output/L1_avg_trend.png b/output/L1_avg_trend.png new file mode 100644 index 0000000..64c3e32 Binary files /dev/null and b/output/L1_avg_trend.png differ diff --git a/output/L1_avg_trend_v4.png b/output/L1_avg_trend_v4.png new file mode 100644 index 0000000..901b41c Binary files /dev/null and b/output/L1_avg_trend_v4.png differ diff --git a/output/L1_users_stack.png b/output/L1_users_stack.png new file mode 100644 index 0000000..56c10c6 Binary files /dev/null and b/output/L1_users_stack.png differ diff --git a/output/L1_users_stack_v4.png b/output/L1_users_stack_v4.png new file mode 100644 index 0000000..e46c94c Binary files /dev/null and b/output/L1_users_stack_v4.png differ diff --git a/output/L2_all_avg_trend.png b/output/L2_all_avg_trend.png new file mode 100644 index 0000000..7fed319 Binary files /dev/null and b/output/L2_all_avg_trend.png differ diff --git a/output/L2_all_users_stack.png b/output/L2_all_users_stack.png new file mode 100644 index 0000000..452484a Binary files /dev/null and b/output/L2_all_users_stack.png differ diff --git a/output/L2_avg_trend.png b/output/L2_avg_trend.png new file mode 100644 index 0000000..ac047f5 Binary files /dev/null and b/output/L2_avg_trend.png differ diff --git a/output/L2_avg_trend_v4.png b/output/L2_avg_trend_v4.png new file mode 100644 index 0000000..e2e309b Binary files /dev/null and b/output/L2_avg_trend_v4.png differ diff --git a/output/L2_users_stack.png b/output/L2_users_stack.png new file mode 100644 index 0000000..b412447 Binary files /dev/null and b/output/L2_users_stack.png differ diff --git a/output/L2_users_stack_v4.png b/output/L2_users_stack_v4.png new file mode 100644 index 0000000..1978772 Binary files /dev/null and b/output/L2_users_stack_v4.png differ diff --git a/output/course_data_v4.json b/output/course_data_v4.json new file mode 100644 index 0000000..9a0c8f8 --- /dev/null +++ b/output/course_data_v4.json @@ -0,0 +1 @@ +{"results": [{"ws": "2025-09-01", "we": "2025-09-07", "L1_paid": 0, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 0, "L1_avg_all": 0, "L1_avg_cons": 0, "L2_paid": 8, "L2_cons": 14, "L2_cons_users": 3, "L2_no_cons": 5, "L2_avg_all": 1.75, "L2_avg_cons": 4.67, "total_paid": 8, "total_cons": 14, "total_cons_users": 3, "total_no_cons": 5, "total_avg_all": 1.75, "total_avg_cons": 4.67}, {"ws": "2025-09-08", "we": "2025-09-14", "L1_paid": 0, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 0, "L1_avg_all": 0, "L1_avg_cons": 0, "L2_paid": 9, "L2_cons": 6, "L2_cons_users": 3, "L2_no_cons": 6, "L2_avg_all": 0.67, "L2_avg_cons": 2.0, "total_paid": 9, "total_cons": 6, "total_cons_users": 3, "total_no_cons": 6, "total_avg_all": 0.67, "total_avg_cons": 2.0}, {"ws": "2025-09-15", "we": "2025-09-21", "L1_paid": 20, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 20, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 91, "L2_cons": 36, "L2_cons_users": 12, "L2_no_cons": 79, "L2_avg_all": 0.4, "L2_avg_cons": 3.0, "total_paid": 91, "total_cons": 36, "total_cons_users": 12, "total_no_cons": 79, "total_avg_all": 0.4, "total_avg_cons": 3.0}, {"ws": "2025-09-22", "we": "2025-09-28", "L1_paid": 27, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 27, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 197, "L2_cons": 186, "L2_cons_users": 57, "L2_no_cons": 140, "L2_avg_all": 0.94, "L2_avg_cons": 3.26, "total_paid": 197, "total_cons": 186, "total_cons_users": 57, "total_no_cons": 140, "total_avg_all": 0.94, "total_avg_cons": 3.26}, {"ws": "2025-09-29", "we": "2025-10-05", "L1_paid": 27, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 27, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 204, "L2_cons": 291, "L2_cons_users": 82, "L2_no_cons": 122, "L2_avg_all": 1.43, "L2_avg_cons": 3.55, "total_paid": 204, "total_cons": 291, "total_cons_users": 82, "total_no_cons": 122, "total_avg_all": 1.43, "total_avg_cons": 3.55}, {"ws": "2025-10-06", "we": "2025-10-12", "L1_paid": 30, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 30, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 242, "L2_cons": 293, "L2_cons_users": 86, "L2_no_cons": 156, "L2_avg_all": 1.21, "L2_avg_cons": 3.41, "total_paid": 242, "total_cons": 293, "total_cons_users": 86, "total_no_cons": 156, "total_avg_all": 1.21, "total_avg_cons": 3.41}, {"ws": "2025-10-13", "we": "2025-10-19", "L1_paid": 31, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 31, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 361, "L2_cons": 293, "L2_cons_users": 88, "L2_no_cons": 273, "L2_avg_all": 0.81, "L2_avg_cons": 3.33, "total_paid": 361, "total_cons": 293, "total_cons_users": 88, "total_no_cons": 273, "total_avg_all": 0.81, "total_avg_cons": 3.33}, {"ws": "2025-10-20", "we": "2025-10-26", "L1_paid": 32, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 32, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 391, "L2_cons": 386, "L2_cons_users": 124, "L2_no_cons": 267, "L2_avg_all": 0.99, "L2_avg_cons": 3.11, "total_paid": 391, "total_cons": 386, "total_cons_users": 124, "total_no_cons": 267, "total_avg_all": 0.99, "total_avg_cons": 3.11}, {"ws": "2025-10-27", "we": "2025-11-02", "L1_paid": 37, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 37, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 465, "L2_cons": 498, "L2_cons_users": 157, "L2_no_cons": 308, "L2_avg_all": 1.07, "L2_avg_cons": 3.17, "total_paid": 465, "total_cons": 498, "total_cons_users": 157, "total_no_cons": 308, "total_avg_all": 1.07, "total_avg_cons": 3.17}, {"ws": "2025-11-03", "we": "2025-11-09", "L1_paid": 37, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 37, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 490, "L2_cons": 482, "L2_cons_users": 169, "L2_no_cons": 321, "L2_avg_all": 0.98, "L2_avg_cons": 2.85, "total_paid": 490, "total_cons": 482, "total_cons_users": 169, "total_no_cons": 321, "total_avg_all": 0.98, "total_avg_cons": 2.85}, {"ws": "2025-11-10", "we": "2025-11-16", "L1_paid": 42, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 42, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 549, "L2_cons": 584, "L2_cons_users": 193, "L2_no_cons": 356, "L2_avg_all": 1.06, "L2_avg_cons": 3.03, "total_paid": 549, "total_cons": 584, "total_cons_users": 193, "total_no_cons": 356, "total_avg_all": 1.06, "total_avg_cons": 3.03}, {"ws": "2025-11-17", "we": "2025-11-23", "L1_paid": 47, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 47, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 608, "L2_cons": 730, "L2_cons_users": 215, "L2_no_cons": 393, "L2_avg_all": 1.2, "L2_avg_cons": 3.4, "total_paid": 608, "total_cons": 730, "total_cons_users": 215, "total_no_cons": 393, "total_avg_all": 1.2, "total_avg_cons": 3.4}, {"ws": "2025-11-24", "we": "2025-11-30", "L1_paid": 47, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 47, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 617, "L2_cons": 749, "L2_cons_users": 238, "L2_no_cons": 379, "L2_avg_all": 1.21, "L2_avg_cons": 3.15, "total_paid": 617, "total_cons": 749, "total_cons_users": 238, "total_no_cons": 379, "total_avg_all": 1.21, "total_avg_cons": 3.15}, {"ws": "2025-12-01", "we": "2025-12-07", "L1_paid": 65, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 65, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 666, "L2_cons": 703, "L2_cons_users": 230, "L2_no_cons": 436, "L2_avg_all": 1.06, "L2_avg_cons": 3.06, "total_paid": 666, "total_cons": 703, "total_cons_users": 230, "total_no_cons": 436, "total_avg_all": 1.06, "total_avg_cons": 3.06}, {"ws": "2025-12-08", "we": "2025-12-14", "L1_paid": 65, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 65, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 674, "L2_cons": 765, "L2_cons_users": 233, "L2_no_cons": 441, "L2_avg_all": 1.14, "L2_avg_cons": 3.28, "total_paid": 674, "total_cons": 765, "total_cons_users": 233, "total_no_cons": 441, "total_avg_all": 1.14, "total_avg_cons": 3.28}, {"ws": "2025-12-15", "we": "2025-12-21", "L1_paid": 71, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 71, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 747, "L2_cons": 692, "L2_cons_users": 232, "L2_no_cons": 515, "L2_avg_all": 0.93, "L2_avg_cons": 2.98, "total_paid": 747, "total_cons": 692, "total_cons_users": 232, "total_no_cons": 515, "total_avg_all": 0.93, "total_avg_cons": 2.98}, {"ws": "2025-12-22", "we": "2025-12-28", "L1_paid": 75, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 75, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 777, "L2_cons": 750, "L2_cons_users": 228, "L2_no_cons": 549, "L2_avg_all": 0.97, "L2_avg_cons": 3.29, "total_paid": 777, "total_cons": 750, "total_cons_users": 228, "total_no_cons": 549, "total_avg_all": 0.97, "total_avg_cons": 3.29}, {"ws": "2025-12-29", "we": "2026-01-04", "L1_paid": 75, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 75, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 802, "L2_cons": 692, "L2_cons_users": 225, "L2_no_cons": 577, "L2_avg_all": 0.86, "L2_avg_cons": 3.08, "total_paid": 802, "total_cons": 692, "total_cons_users": 225, "total_no_cons": 577, "total_avg_all": 0.86, "total_avg_cons": 3.08}, {"ws": "2026-01-05", "we": "2026-01-11", "L1_paid": 83, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 83, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 826, "L2_cons": 753, "L2_cons_users": 224, "L2_no_cons": 602, "L2_avg_all": 0.91, "L2_avg_cons": 3.36, "total_paid": 826, "total_cons": 753, "total_cons_users": 224, "total_no_cons": 602, "total_avg_all": 0.91, "total_avg_cons": 3.36}, {"ws": "2026-01-12", "we": "2026-01-18", "L1_paid": 85, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 85, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 855, "L2_cons": 694, "L2_cons_users": 211, "L2_no_cons": 644, "L2_avg_all": 0.81, "L2_avg_cons": 3.29, "total_paid": 855, "total_cons": 694, "total_cons_users": 211, "total_no_cons": 644, "total_avg_all": 0.81, "total_avg_cons": 3.29}, {"ws": "2026-01-19", "we": "2026-01-25", "L1_paid": 102, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 102, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 910, "L2_cons": 851, "L2_cons_users": 249, "L2_no_cons": 661, "L2_avg_all": 0.94, "L2_avg_cons": 3.42, "total_paid": 910, "total_cons": 851, "total_cons_users": 249, "total_no_cons": 661, "total_avg_all": 0.94, "total_avg_cons": 3.42}, {"ws": "2026-01-26", "we": "2026-02-01", "L1_paid": 114, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 114, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 967, "L2_cons": 976, "L2_cons_users": 283, "L2_no_cons": 684, "L2_avg_all": 1.01, "L2_avg_cons": 3.45, "total_paid": 967, "total_cons": 976, "total_cons_users": 283, "total_no_cons": 684, "total_avg_all": 1.01, "total_avg_cons": 3.45}, {"ws": "2026-02-02", "we": "2026-02-08", "L1_paid": 114, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 114, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 976, "L2_cons": 971, "L2_cons_users": 279, "L2_no_cons": 697, "L2_avg_all": 0.99, "L2_avg_cons": 3.48, "total_paid": 976, "total_cons": 971, "total_cons_users": 279, "total_no_cons": 697, "total_avg_all": 0.99, "total_avg_cons": 3.48}, {"ws": "2026-02-09", "we": "2026-02-15", "L1_paid": 116, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 116, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 1058, "L2_cons": 936, "L2_cons_users": 264, "L2_no_cons": 794, "L2_avg_all": 0.88, "L2_avg_cons": 3.55, "total_paid": 1058, "total_cons": 936, "total_cons_users": 264, "total_no_cons": 794, "total_avg_all": 0.88, "total_avg_cons": 3.55}, {"ws": "2026-02-16", "we": "2026-02-22", "L1_paid": 116, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 116, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 1066, "L2_cons": 797, "L2_cons_users": 232, "L2_no_cons": 834, "L2_avg_all": 0.75, "L2_avg_cons": 3.44, "total_paid": 1066, "total_cons": 797, "total_cons_users": 232, "total_no_cons": 834, "total_avg_all": 0.75, "total_avg_cons": 3.44}, {"ws": "2026-02-23", "we": "2026-03-01", "L1_paid": 116, "L1_cons": 0, "L1_cons_users": 0, "L1_no_cons": 116, "L1_avg_all": 0.0, "L1_avg_cons": 0, "L2_paid": 1077, "L2_cons": 1163, "L2_cons_users": 286, "L2_no_cons": 791, "L2_avg_all": 1.08, "L2_avg_cons": 4.07, "total_paid": 1077, "total_cons": 1163, "total_cons_users": 286, "total_no_cons": 791, "total_avg_all": 1.08, "total_avg_cons": 4.07}, {"ws": "2026-03-02", "we": "2026-03-08", "L1_paid": 247, "L1_cons": 200, "L1_cons_users": 49, "L1_no_cons": 198, "L1_avg_all": 0.81, "L1_avg_cons": 4.08, "L2_paid": 1216, "L2_cons": 944, "L2_cons_users": 290, "L2_no_cons": 926, "L2_avg_all": 0.78, "L2_avg_cons": 3.26, "total_paid": 1249, "total_cons": 1144, "total_cons_users": 332, "total_no_cons": 917, "total_avg_all": 0.92, "total_avg_cons": 3.45}, {"ws": "2026-03-09", "we": "2026-03-15", "L1_paid": 512, "L1_cons": 911, "L1_cons_users": 215, "L1_no_cons": 297, "L1_avg_all": 1.78, "L1_avg_cons": 4.24, "L2_paid": 1507, "L2_cons": 1123, "L2_cons_users": 321, "L2_no_cons": 1186, "L2_avg_all": 0.75, "L2_avg_cons": 3.5, "total_paid": 1555, "total_cons": 2034, "total_cons_users": 524, "total_no_cons": 1031, "total_avg_all": 1.31, "total_avg_cons": 3.88}, {"ws": "2026-03-16", "we": "2026-03-22", "L1_paid": 561, "L1_cons": 1398, "L1_cons_users": 338, "L1_no_cons": 223, "L1_avg_all": 2.49, "L1_avg_cons": 4.14, "L2_paid": 1553, "L2_cons": 1253, "L2_cons_users": 339, "L2_no_cons": 1214, "L2_avg_all": 0.81, "L2_avg_cons": 3.7, "total_paid": 1615, "total_cons": 2651, "total_cons_users": 651, "total_no_cons": 964, "total_avg_all": 1.64, "total_avg_cons": 4.07}, {"ws": "2026-03-23", "we": "2026-03-29", "L1_paid": 594, "L1_cons": 1558, "L1_cons_users": 374, "L1_no_cons": 220, "L1_avg_all": 2.62, "L1_avg_cons": 4.17, "L2_paid": 1588, "L2_cons": 1370, "L2_cons_users": 372, "L2_no_cons": 1216, "L2_avg_all": 0.86, "L2_avg_cons": 3.68, "total_paid": 1668, "total_cons": 2928, "total_cons_users": 720, "total_no_cons": 948, "total_avg_all": 1.76, "total_avg_cons": 4.07}, {"ws": "2026-03-30", "we": "2026-04-05", "L1_paid": 624, "L1_cons": 1599, "L1_cons_users": 397, "L1_no_cons": 227, "L1_avg_all": 2.56, "L1_avg_cons": 4.03, "L2_paid": 1626, "L2_cons": 1307, "L2_cons_users": 374, "L2_no_cons": 1252, "L2_avg_all": 0.8, "L2_avg_cons": 3.49, "total_paid": 1716, "total_cons": 2906, "total_cons_users": 745, "total_no_cons": 971, "total_avg_all": 1.69, "total_avg_cons": 3.9}, {"ws": "2026-04-06", "we": "2026-04-12", "L1_paid": 1068, "L1_cons": 2228, "L1_cons_users": 560, "L1_no_cons": 508, "L1_avg_all": 2.09, "L1_avg_cons": 3.98, "L2_paid": 2223, "L2_cons": 1945, "L2_cons_users": 518, "L2_no_cons": 1705, "L2_avg_all": 0.87, "L2_avg_cons": 3.75, "total_paid": 2340, "total_cons": 4173, "total_cons_users": 1033, "total_no_cons": 1307, "total_avg_all": 1.78, "total_avg_cons": 4.04}, {"ws": "2026-04-13", "we": "2026-04-19", "L1_paid": 1133, "L1_cons": 2692, "L1_cons_users": 660, "L1_no_cons": 473, "L1_avg_all": 2.38, "L1_avg_cons": 4.08, "L2_paid": 2302, "L2_cons": 2173, "L2_cons_users": 560, "L2_no_cons": 1742, "L2_avg_all": 0.94, "L2_avg_cons": 3.88, "total_paid": 2436, "total_cons": 4865, "total_cons_users": 1165, "total_no_cons": 1271, "total_avg_all": 2.0, "total_avg_cons": 4.18}, {"ws": "2026-04-20", "we": "2026-04-26", "L1_paid": 1205, "L1_cons": 2740, "L1_cons_users": 703, "L1_no_cons": 502, "L1_avg_all": 2.27, "L1_avg_cons": 3.9, "L2_paid": 2370, "L2_cons": 2062, "L2_cons_users": 571, "L2_no_cons": 1799, "L2_avg_all": 0.87, "L2_avg_cons": 3.61, "total_paid": 2530, "total_cons": 4802, "total_cons_users": 1216, "total_no_cons": 1314, "total_avg_all": 1.9, "total_avg_cons": 3.95}, {"ws": "2026-04-27", "we": "2026-05-03", "L1_paid": 1335, "L1_cons": 2691, "L1_cons_users": 705, "L1_no_cons": 630, "L1_avg_all": 2.02, "L1_avg_cons": 3.82, "L2_paid": 2506, "L2_cons": 1855, "L2_cons_users": 522, "L2_no_cons": 1984, "L2_avg_all": 0.74, "L2_avg_cons": 3.55, "total_paid": 2680, "total_cons": 4546, "total_cons_users": 1172, "total_no_cons": 1508, "total_avg_all": 1.7, "total_avg_cons": 3.88}, {"ws": "2026-05-04", "we": "2026-05-10", "L1_paid": 1399, "L1_cons": 2871, "L1_cons_users": 734, "L1_no_cons": 665, "L1_avg_all": 2.05, "L1_avg_cons": 3.91, "L2_paid": 2577, "L2_cons": 2044, "L2_cons_users": 564, "L2_no_cons": 2013, "L2_avg_all": 0.79, "L2_avg_cons": 3.62, "total_paid": 2769, "total_cons": 4915, "total_cons_users": 1241, "total_no_cons": 1528, "total_avg_all": 1.78, "total_avg_cons": 3.96}], "L1_chapters": [384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383], "L2_chapters": [60, 61, 62, 63, 64, 70, 71, 72, 73, 74, 78, 79, 80, 81, 82, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 125, 126, 127, 128, 129, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 244, 245, 246, 247, 248, 249, 250, 251, 252, 254, 255, 256, 257, 258, 259, 260, 261, 262, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331]} \ No newline at end of file diff --git a/scripts/charts_v4.py b/scripts/charts_v4.py new file mode 100644 index 0000000..fededf4 --- /dev/null +++ b/scripts/charts_v4.py @@ -0,0 +1,87 @@ +#!/usr/bin/env python3 +"""图表 v4: L1只看L1课程, L2只看L2课程""" +import json, os +from datetime import date, timedelta +import matplotlib +matplotlib.use('Agg') +import matplotlib.pyplot as plt +import matplotlib.dates as mdates +import matplotlib.font_manager as fm +import numpy as np + +fm.fontManager.addfont('/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc') +plt.rcParams['font.family'] = fm.FontProperties(fname='/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc').get_name() +plt.rcParams['axes.unicode_minus'] = False + +with open('/root/.openclaw/workspace/output/course_data_v4.json') as f: + data = json.load(f) +results = data['results'] + +out = '/root/.openclaw/workspace/output' + +configs = { + 'L1': {'prefix': 'L1', 'color': '#4A90D9', 'light': '#A8CFF1', 'label': 'L1'}, + 'L2': {'prefix': 'L2', 'color': '#E85D47', 'light': '#F4A9A0', 'label': 'L2'}, +} + +for key, cfg in configs.items(): + pfx = cfg['prefix']; color = cfg['color']; light = cfg['light']; label = cfg['label'] + first = next(i for i, r in enumerate(results) if r[f'{pfx}_paid'] > 0) + data_sub = results[first:] + + dates = [date.fromisoformat(r['ws']) for r in data_sub] + xs = [d + timedelta(days=3) for d in dates] + paid = [r[f'{pfx}_paid'] for r in data_sub] + cons_users = [r[f'{pfx}_cons_users'] for r in data_sub] + no_cons = [r[f'{pfx}_no_cons'] for r in data_sub] + avg_all = [r[f'{pfx}_avg_all'] for r in data_sub] + avg_cons = [r[f'{pfx}_avg_cons'] for r in data_sub] + + # 图1: 堆叠柱状 + fig, ax = plt.subplots(figsize=(18, 8)) + x_idx = np.arange(len(xs)) + ax.bar(x_idx, cons_users, 0.65, color=light, label='有课消用户', zorder=3) + ax.bar(x_idx, no_cons, 0.65, bottom=cons_users, color='#D0D0D0', label='无课消用户', zorder=3) + step = max(1, len(data_sub)//10) + for i in range(0, len(data_sub), step): + ax.annotate(str(paid[i]), (i, paid[i]), textcoords='offset points', xytext=(0, 5), + fontsize=7.5, ha='center', color='#333333', fontweight='bold') + ax.set_xticks(x_idx[::step]) + ax.set_xticklabels([dates[i].strftime('%m/%d') for i in range(0, len(data_sub), step)], fontsize=8.5, rotation=45) + ax.set_ylabel('用户数', fontsize=13) + ax.set_title(f'{label}付费用户周课消分布(只看{label}课程,剔除U0)', fontsize=16, fontweight='bold') + ax.legend(fontsize=12, loc='upper left') + ax.grid(axis='y', alpha=0.3, zorder=0) + ax.set_xlim(-0.5, len(x_idx)-0.5) + no_rate = no_cons[-1]/paid[-1]*100 if paid[-1] else 0 + ax.text(0.97, 0.95, f'付费{paid[-1]}人 | 无课消率{no_rate:.0f}%', + transform=ax.transAxes, fontsize=11, ha='right', va='top', color='#666666', fontstyle='italic') + plt.tight_layout() + plt.savefig(f'{out}/{pfx}_users_stack_v4.png', dpi=150, bbox_inches='tight', facecolor='white') + plt.close() + print(f' ✅ {pfx}_users_stack_v4.png') + + # 图2: 折线 + fig, ax = plt.subplots(figsize=(18, 8)) + ax.plot(xs, avg_all, 'o-', color='#999999', linewidth=2.2, markersize=5, label='人均课消(全部付费用户)', markerfacecolor='white') + ax.plot(xs, avg_cons, 's-', color=color, linewidth=2.8, markersize=5, label='人均课消(有课消用户)', markerfacecolor='white') + ax.fill_between(xs, avg_all, avg_cons, alpha=0.08, color=color) + for i in range(0, len(data_sub), max(1, len(data_sub)//8)): + ax.annotate(f'{avg_all[i]:.1f}', (xs[i], avg_all[i]), textcoords='offset points', + xytext=(0,-15), fontsize=7.5, color='#999999', ha='center') + ax.annotate(f'{avg_cons[i]:.1f}', (xs[i], avg_cons[i]), textcoords='offset points', + xytext=(0,7), fontsize=7.5, color=color, ha='center', fontweight='bold') + ax.xaxis.set_major_formatter(mdates.DateFormatter('%m/%d')) + ax.xaxis.set_major_locator(mdates.MonthLocator()) + plt.setp(ax.xaxis.get_majorticklabels(), rotation=45, fontsize=9) + ax.set_ylabel('课消数(节/周)', fontsize=13) + ax.set_title(f'{label}付费用户周人均课消趋势(只看{label}课程,剔除U0)', fontsize=16, fontweight='bold') + ax.legend(fontsize=12, loc='upper left') + ax.grid(True, alpha=0.3) + ax.set_xlim(date(2025,8,30), date(2026,5,12)) + plt.tight_layout() + plt.savefig(f'{out}/{pfx}_avg_trend_v4.png', dpi=150, bbox_inches='tight', facecolor='white') + plt.close() + print(f' ✅ {pfx}_avg_trend_v4.png') + +print('\n✅ 4张v4图表已生成') diff --git a/scripts/course_analysis_v4.py b/scripts/course_analysis_v4.py new file mode 100644 index 0000000..220c56a --- /dev/null +++ b/scripts/course_analysis_v4.py @@ -0,0 +1,165 @@ +#!/usr/bin/env python3 +""" +v4: L1付费群课消只看L1课程,L2付费群课消只看L2课程 +""" +import psycopg2 +from collections import defaultdict +from datetime import datetime, timedelta, date + +conn = psycopg2.connect( + host="bj-postgres-16pob4sg.sql.tencentcdb.com", + port=28591, user="ai_member", + password="LdfjdjL83h3h3^$&**YGG*", dbname="vala_bi" +) +cur = conn.cursor() + +# 获取L1/L2有效章节(剔除U0) +cur.execute("SELECT id FROM bi_level_unit_lesson WHERE course_level='L1'") +l1_chapters = set(r[0] for r in cur.fetchall()) +cur.execute("SELECT id FROM bi_level_unit_lesson WHERE course_level='L2'") +l2_chapters = set(r[0] for r in cur.fetchall()) +u0 = {55, 56, 57, 58, 59, 343, 344, 345, 346, 348} +l1_chapters -= u0 +l2_chapters -= u0 +print(f"L1章节: {len(l1_chapters)} | L2章节: {len(l2_chapters)}") + +overall_start = date(2025, 9, 1) +overall_end = date(2026, 5, 11) + +weeks = [] +d = overall_start +while d < overall_end: + ws = d + we = d + timedelta(days=6 - d.weekday()) + if we >= overall_end: we = overall_end - timedelta(days=1) + weeks.append((ws, we)) + d = we + timedelta(days=1) + +print("分类付费用户...") +cur.execute(""" + SELECT o.account_id, o.trade_no, o.order_status, o.pay_success_date, + CASE WHEN o.goods_id IN (57, 60, 63) THEN 'L1' + WHEN o.goods_id = 61 THEN 'L1+L2' + WHEN o.goods_id IN (31, 32, 33, 54) THEN 'L2' + ELSE '其他' END as level_type + FROM bi_vala_order o + INNER JOIN bi_vala_app_account a ON o.account_id = a.id + WHERE a.status = 1 AND a.deleted_at IS NULL AND o.pay_success_date IS NOT NULL +""") +orders = cur.fetchall() + +cur.execute("SELECT trade_no FROM bi_refund_order WHERE status = 3") +refund_trades = set(r[0] for r in cur.fetchall()) + +user_levels = defaultdict(set) +user_orders = defaultdict(list) +for aid, trade_no, order_status, pay_date, lt in orders: + is_refunded = (order_status == 4 and trade_no in refund_trades) + user_levels[aid].add(lt) + user_orders[aid].append((pay_date.date(), is_refunded)) + +def is_paid(aid, as_of): + return sum(1 for pd, ref in user_orders[aid] if pd <= as_of and not ref) > 0 + +l1_pool = {aid for aid, lv in user_levels.items() if 'L1' in lv or 'L1+L2' in lv} +l2_pool = {aid for aid, lv in user_levels.items() if 'L2' in lv or 'L1+L2' in lv} +all_pool = l1_pool | l2_pool +print(f"L1池: {len(l1_pool)}, L2池: {len(l2_pool)}, 合计: {len(all_pool)}") + +print("查询课消...") +cons_map = {} +for ti in range(8): + tbl = f"bi_user_chapter_play_record_{ti}" + cur.execute(f"""SELECT user_id, chapter_id, updated_at FROM {tbl} + WHERE play_status = 1 AND updated_at >= '2025-09-01' AND updated_at < '2026-05-11'""") + for uid, cid, ua in cur.fetchall(): + if cid in u0: continue + # 只保留L1或L2课程 + if cid not in l1_chapters and cid not in l2_chapters: continue + key = (uid, cid) + d = ua.date() if hasattr(ua, 'date') else datetime.strptime(str(ua)[:10], '%Y-%m-%d').date() + if key not in cons_map or d < cons_map[key]: + cons_map[key] = d + +print("角色映射...") +all_uids = list(set(k[0] for k in cons_map)) +char2acct = {} +for i in range(0, len(all_uids), 500): + batch = all_uids[i:i+500] + ph = ','.join(['%s'] * len(batch)) + cur.execute(f"SELECT id, account_id FROM bi_vala_app_character WHERE id IN ({ph})", batch) + for cid, aid in cur.fetchall(): char2acct[cid] = aid + +print("按周汇总...") +results = [] +for ws, we in weeks: + l1_paid = {aid for aid in l1_pool if is_paid(aid, we)} + l2_paid = {aid for aid in l2_pool if is_paid(aid, we)} + t_paid = {aid for aid in all_pool if is_paid(aid, we)} + + l1_cons, l1_cu = 0, set() + l2_cons, l2_cu = 0, set() + t_cons, t_cu = 0, set() + + for (uid, ch_id), cons_date in cons_map.items(): + if not (ws <= cons_date <= we): continue + aid = char2acct.get(uid) + if not aid: continue + + # L1付费群 且 是L1课程 + if aid in l1_paid and ch_id in l1_chapters: + l1_cons += 1 + l1_cu.add(aid) + # L2付费群 且 是L2课程 + if aid in l2_paid and ch_id in l2_chapters: + l2_cons += 1 + l2_cu.add(aid) + # 合计:付费用户在对应级别课程上的课消 + if aid in t_paid: + if (aid in l1_paid and ch_id in l1_chapters) or (aid in l2_paid and ch_id in l2_chapters): + t_cons += 1 + t_cu.add(aid) + + results.append({ + 'ws': ws, 'we': we, + 'L1_paid': len(l1_paid), 'L1_cons': l1_cons, 'L1_cons_users': len(l1_cu), + 'L1_no_cons': len(l1_paid) - len(l1_cu), + 'L1_avg_all': round(l1_cons / len(l1_paid), 2) if l1_paid else 0, + 'L1_avg_cons': round(l1_cons / len(l1_cu), 2) if l1_cu else 0, + 'L2_paid': len(l2_paid), 'L2_cons': l2_cons, 'L2_cons_users': len(l2_cu), + 'L2_no_cons': len(l2_paid) - len(l2_cu), + 'L2_avg_all': round(l2_cons / len(l2_paid), 2) if l2_paid else 0, + 'L2_avg_cons': round(l2_cons / len(l2_cu), 2) if l2_cu else 0, + 'total_paid': len(t_paid), 'total_cons': t_cons, 'total_cons_users': len(t_cu), + 'total_no_cons': len(t_paid) - len(t_cu), + 'total_avg_all': round(t_cons / len(t_paid), 2) if t_paid else 0, + 'total_avg_cons': round(t_cons / len(t_cu), 2) if t_cu else 0, + }) + + r = results[-1] + if (len(results) - 1) % 8 == 0 or len(results) == len(weeks): + print(f" W{len(results):2d} {ws}~{we} | L1:{r['L1_paid']}有消{r['L1_cons_users']} | L2:{r['L2_paid']}有消{r['L2_cons_users']}") + +cur.close() +conn.close() + +# 打印最终结果 +last = results[-1] +print(f"\n=== 最终数据(v4:L1只看L1课程, L2只看L2课程)===") +print(f"L1付费群: {last['L1_paid']}人 | 有消{last['L1_cons_users']} | 无消{last['L1_no_cons']}({last['L1_no_cons']/last['L1_paid']*100:.0f}%) | 人均{last['L1_avg_all']} | 有消人均{last['L1_avg_cons']}") +print(f"L2付费群: {last['L2_paid']}人 | 有消{last['L2_cons_users']} | 无消{last['L2_no_cons']}({last['L2_no_cons']/last['L2_paid']*100:.0f}%) | 人均{last['L2_avg_all']} | 有消人均{last['L2_avg_cons']}") +print(f"合计(去重): {last['total_paid']}人 | 有消{last['total_cons_users']} | 无消{last['total_no_cons']}({last['total_no_cons']/last['total_paid']*100:.0f}%)") + +# 保存数据到 JSON 供后续图表脚本使用 +import json +out = '/root/.openclaw/workspace/output/course_data_v4.json' +serializable = [] +for r in results: + d = {} + for k, v in r.items(): + if isinstance(v, date): d[k] = v.isoformat() + else: d[k] = v + serializable.append(d) +with open(out, 'w') as f: + json.dump({'results': serializable, 'L1_chapters': list(l1_chapters), 'L2_chapters': list(l2_chapters)}, f, ensure_ascii=False) +print(f"\n数据已保存: {out}") diff --git a/scripts/course_consumption_by_level.py b/scripts/course_consumption_by_level.py index f665029..e8fa524 100644 --- a/scripts/course_consumption_by_level.py +++ b/scripts/course_consumption_by_level.py @@ -1,167 +1,191 @@ #!/usr/bin/env python3 """ -课消指标:按 L1/L2 分等级统计 +课消指标:按周统计 2025-09-01 ~ 2026-05-10,按 L1/L2/L1+L2 拆分 """ import psycopg2 from collections import defaultdict -from datetime import date, timedelta, datetime +from datetime import datetime, timedelta, date conn = psycopg2.connect( host="bj-postgres-16pob4sg.sql.tencentcdb.com", - port=28591, - user="ai_member", - password="LdfjdjL83h3h3^$&**YGG*", - dbname="vala_bi" + port=28591, user="ai_member", + password="LdfjdjL83h3h3^$&**YGG*", dbname="vala_bi" ) cur = conn.cursor() +# ===== 时间参数 ===== overall_start = date(2025, 9, 1) overall_end = date(2026, 5, 11) -# 生成周列表 +# 生成周列表(周一~周日) weeks = [] d = overall_start while d < overall_end: ws = d - we = d + timedelta(days=6 - d.weekday()) + days_to_sunday = 6 - d.weekday() + we = d + timedelta(days=days_to_sunday) if we >= overall_end: we = overall_end - timedelta(days=1) weeks.append((ws, we)) d = we + timedelta(days=1) -# ===== 获取 L1/L2 chapter_id ===== -u0_ids = {343, 344, 345, 346, 348, 55, 56, 57, 58, 59} -cur.execute("SELECT DISTINCT id, course_level FROM bi_level_unit_lesson WHERE course_level IN ('L1','L2')") -l1_chapters = set() -l2_chapters = set() -for cid, lv in cur.fetchall(): - if cid in u0_ids: - continue - if lv == 'L1': - l1_chapters.add(cid) - elif lv == 'L2': - l2_chapters.add(cid) +print(f"统计区间: {overall_start} ~ {overall_end - timedelta(days=1)}, 共 {len(weeks)} 周") -print(f"L1 chapters: {len(l1_chapters)}, L2 chapters: {len(l2_chapters)}") - -# ===== Step 1: 付费用户 ===== -print("Step 1: 查找付费用户...") +# ===== Step 1: 用户 L1/L2 分类 + 付费状态 ===== +print("\nStep 1: 分类付费用户...") cur.execute(""" - SELECT DISTINCT o.account_id + SELECT o.account_id, o.trade_no, o.order_status, o.pay_success_date, + CASE + WHEN o.goods_id IN (57, 60, 63) THEN 'L1' + WHEN o.goods_id = 61 THEN 'L1+L2' + WHEN o.goods_id IN (31, 32, 33, 54) THEN 'L2' + ELSE '其他' + END as level_type FROM bi_vala_order o INNER JOIN bi_vala_app_account a ON o.account_id = a.id - WHERE a.status = 1 AND a.deleted_at IS NULL AND o.pay_success_date IS NOT NULL - GROUP BY o.account_id - HAVING COUNT(CASE WHEN o.order_status != 4 - OR (o.order_status = 4 AND o.trade_no NOT IN ( - SELECT trade_no FROM bi_refund_order WHERE status=3 - )) THEN 1 END) > 0 -""") -paid_account_ids = [row[0] for row in cur.fetchall()] -print(f" 付费用户: {len(paid_account_ids)}") - -# 订单详情用于动态判断每周付费用户 -cur.execute(""" - SELECT o.account_id, o.trade_no, o.out_trade_no, o.pay_success_date, o.order_status - FROM bi_vala_order o - INNER JOIN bi_vala_app_account a ON o.account_id = a.id - WHERE a.status=1 AND a.deleted_at IS NULL AND o.pay_success_date IS NOT NULL - AND o.pay_success_date >= '2025-01-01' + WHERE a.status = 1 AND a.deleted_at IS NULL + AND o.pay_success_date IS NOT NULL """) orders = cur.fetchall() -cur.execute("SELECT trade_no, status FROM bi_refund_order WHERE status=3") -refund_set = {r[0] for r in cur.fetchall() if r[0]} +print(f" 订单数: {len(orders)}") -account_orders = defaultdict(list) -for aid, tn, otn, psd, os in orders: - is_ref = os == 4 and tn in refund_set - account_orders[aid].append((psd, is_ref)) +cur.execute("SELECT trade_no FROM bi_refund_order WHERE status = 3") +refund_trades = set(r[0] for r in cur.fetchall()) -def is_paid(aid, as_of): - return sum(1 for pd, ref in account_orders.get(aid, []) if pd.date() <= as_of and not ref) > 0 +# {account_id: {'levels': set, 'orders': [(pay_date, is_refunded, level), ...]}} +user_data = defaultdict(lambda: {'levels': set(), 'orders': []}) +for aid, trade_no, order_status, pay_date, lt in orders: + is_refunded = (order_status == 4 and trade_no in refund_trades) + user_data[aid]['levels'].add(lt) + user_data[aid]['orders'].append((pay_date.date(), is_refunded, lt)) -# ===== Step 2: 课消(分L1/L2)===== -print("Step 2: 查询课消(分L1/L2)...") -l1_consumption = {} # (user_id, chapter_id) -> earliest date -l2_consumption = {} +# 确定每位用户的 L1/L2 分类 +def classify_user(levels): + has_l1 = 'L1' in levels + has_l2 = 'L2' in levels + has_l1l2 = 'L1+L2' in levels + if has_l1l2 or (has_l1 and has_l2): + return 'L1+L2' + elif has_l1: + return '仅L1' + elif has_l2: + return '仅L2' + return '其他' -for t in range(8): - tbl = f"bi_user_chapter_play_record_{t}" +for aid in user_data: + user_data[aid]['category'] = classify_user(user_data[aid]['levels']) + +# 统计各类用户数 +cats = defaultdict(int) +for aid, d in user_data.items(): + cats[d['category']] += 1 +print(f" 仅L1: {cats['仅L1']}, 仅L2: {cats['仅L2']}, L1+L2: {cats['L1+L2']}, 其他: {cats['其他']}") + +# 判断某用户截至某日是否为付费用户 +def is_paid_as_of(aid, as_of_date): + d = user_data[aid] + unpaid = sum(1 for pd, ref, lt in d['orders'] if pd <= as_of_date and not ref) + return unpaid > 0 + +# ===== Step 2: 课消记录 ===== +print("\nStep 2: 查询课消...") +consumption_map = {} # (user_id, chapter_id) -> earliest date + +for table_idx in range(8): + tbl = f"bi_user_chapter_play_record_{table_idx}" cur.execute(f""" - SELECT user_id, chapter_id, updated_at FROM {tbl} - WHERE play_status=1 AND updated_at>='2025-09-01' AND updated_at<'2026-05-11' + SELECT user_id, chapter_id, updated_at + FROM {tbl} + WHERE play_status = 1 + AND updated_at >= '2025-09-01' + AND updated_at < '2026-05-11' """) - for uid, cid, upd in cur.fetchall(): - if cid in l1_chapters: - k, m = (uid, cid), l1_consumption - elif cid in l2_chapters: - k, m = (uid, cid), l2_consumption - else: - continue - d = upd.date() if hasattr(upd, 'date') else upd - if k not in m or d < m[k]: - m[k] = d + cnt = 0 + for user_id, chapter_id, updated_at in cur.fetchall(): + key = (user_id, chapter_id) + d = updated_at.date() if hasattr(updated_at, 'date') else datetime.strptime(str(updated_at)[:10], '%Y-%m-%d').date() + if key not in consumption_map or d < consumption_map[key]: + consumption_map[key] = d + cnt += 1 + print(f" {tbl}: {cnt} 条") +print(f" 去重后: {len(consumption_map)} 条") -print(f" L1 课消(去重): {len(l1_consumption)}") -print(f" L2 课消(去重): {len(l2_consumption)}") - -# ===== Step 3: 角色映射 ===== -print("Step 3: 关联角色...") -all_uids = set(k[0] for k in l1_consumption) | set(k[0] for k in l2_consumption) -char_to_account = {} -for i in range(0, len(all_uids), 500): - batch = list(all_uids)[i:i+500] - ph = ','.join(['%s']*len(batch)) +# ===== Step 3: character -> account ===== +print("\nStep 3: 角色映射...") +all_uids = list(set(k[0] for k in consumption_map)) +char2acct = {} +bs = 500 +for i in range(0, len(all_uids), bs): + batch = all_uids[i:i+bs] + ph = ','.join(['%s'] * len(batch)) cur.execute(f"SELECT id, account_id FROM bi_vala_app_character WHERE id IN ({ph})", batch) for cid, aid in cur.fetchall(): - char_to_account[cid] = aid + char2acct[cid] = aid +print(f" 映射: {len(char2acct)}") -# ===== Step 4: 按周汇总 ===== -print("Step 4: 按周汇总...") +# ===== Step 4: 按周 + 按分类汇总 ===== +print("\nStep 4: 按周汇总...\n") -def weekly_stats(consumption_map): - """返回每周的 (课消次数, 有消用户数)""" - results = [] - for ws, we in weeks: - cons = 0 - users = set() - for (uid, ch_id), d in consumption_map.items(): - if ws <= d <= we: - cons += 1 - aid = char_to_account.get(uid) - if aid: - users.add(aid) - results.append((ws, we, cons, len(users))) - return results - -l1_stats = weekly_stats(l1_consumption) -l2_stats = weekly_stats(l2_consumption) - -# 汇总 + 付费用户 results = [] -for i, (ws, we) in enumerate(weeks): - paid = set(aid for aid in account_orders if is_paid(aid, we)) - n_paid = len(paid) +for ws, we in weeks: + # 截至 we 的付费用户(按分类) + paid_by_cat = defaultdict(set) + for aid in user_data: + if is_paid_as_of(aid, we): + cat = user_data[aid]['category'] + paid_by_cat[cat].add(aid) - l1_cons, l1_users = l1_stats[i][2], l1_stats[i][3] - l2_cons, l2_users = l2_stats[i][2], l2_stats[i][3] + # 该周课消(付费用户) + cons_by_cat = defaultdict(int) + cons_users_by_cat = defaultdict(set) - l1_avg = l1_cons / n_paid if n_paid else 0 - l1_act_avg = l1_cons / l1_users if l1_users else 0 - l2_avg = l2_cons / n_paid if n_paid else 0 - l2_act_avg = l2_cons / l2_users if l2_users else 0 + for (uid, ch_id), cons_date in consumption_map.items(): + if ws <= cons_date <= we: + aid = char2acct.get(uid) + if aid: + cat = user_data.get(aid, {}).get('category', '其他') + if aid in paid_by_cat.get(cat, set()): + cons_by_cat[cat] += 1 + cons_users_by_cat[cat].add(aid) - results.append({ - 'week': f"{ws.strftime('%m/%d')}-{we.strftime('%m/%d')}", - 'ws': ws, 'we': we, 'paid': n_paid, - 'l1_cons': l1_cons, 'l1_users': l1_users, 'l1_avg': l1_avg, 'l1_act': l1_act_avg, - 'l2_cons': l2_cons, 'l2_users': l2_users, 'l2_avg': l2_avg, 'l2_act': l2_act_avg, - }) + week_label = f"{ws.strftime('%m/%d')}-{we.strftime('%m/%d')}" + row = {'week': week_label, 'ws': ws, 'we': we} + + for cat in ['仅L1', '仅L2', 'L1+L2', '其他', '合计']: + if cat == '合计': + n_paid = sum(len(v) for v in paid_by_cat.values()) + n_cons = sum(cons_by_cat.values()) + n_cons_users = len(set.union(*cons_users_by_cat.values())) if cons_users_by_cat else 0 + else: + n_paid = len(paid_by_cat.get(cat, set())) + n_cons = cons_by_cat.get(cat, 0) + n_cons_users = len(cons_users_by_cat.get(cat, set())) + + avg_all = n_cons / n_paid if n_paid > 0 else 0 + avg_cons = n_cons / n_cons_users if n_cons_users > 0 else 0 + + row[f'{cat}_paid'] = n_paid + row[f'{cat}_cons'] = n_cons + row[f'{cat}_users'] = n_cons_users + row[f'{cat}_avg_all'] = avg_all + row[f'{cat}_avg_cons'] = avg_cons + + results.append(row) + print(f" {week_label} | 合计:付费{row['合计_paid']} 课消{row['合计_cons']} " + f"人均{row['合计_avg_all']:.2f} | " + f"L1:{row['仅L1_avg_all']:.2f} L2:{row['仅L2_avg_all']:.2f} L1+L2:{row['L1+L2_avg_all']:.2f}") + +# ===== 输出完整表 ===== +print("\n" + "="*120) +header = f"{'周':<12} {'合计付费':>6} {'合计课消':>7} {'合计人均':>7} | {'L1付费':>6} {'L1课消':>6} {'L1人均':>6} {'L1有消人均':>7} | {'L2付费':>6} {'L2课消':>6} {'L2人均':>6} {'L2有消人均':>7} | {'L1L2付费':>7} {'L1L2课消':>7} {'L1L2人均':>7} {'L1L2有消人均':>8}" +print(header) +print("-"*120) -# 输出 -print(f"\n{'周':<16} {'付费':>6} {'L1课消':>7} {'L1有消':>7} {'L1人均':>7} {'L1有消人均':>9} {'L2课消':>7} {'L2有消':>7} {'L2人均':>7} {'L2有消人均':>9}") for r in results: - print(f"{r['week']:<16} {r['paid']:>6} {r['l1_cons']:>7} {r['l1_users']:>7} {r['l1_avg']:>7.2f} {r['l1_act']:>9.2f} {r['l2_cons']:>7} {r['l2_users']:>7} {r['l2_avg']:>7.2f} {r['l2_act']:>9.2f}") + print(f"{r['week']:<12} {r['合计_paid']:>6} {r['合计_cons']:>7} {r['合计_avg_all']:>7.2f} | " + f"{r['仅L1_paid']:>6} {r['仅L1_cons']:>6} {r['仅L1_avg_all']:>6.2f} {r['仅L1_avg_cons']:>7.2f} | " + f"{r['仅L2_paid']:>6} {r['仅L2_cons']:>6} {r['仅L2_avg_all']:>6.2f} {r['仅L2_avg_cons']:>7.2f} | " + f"{r['L1+L2_paid']:>7} {r['L1+L2_cons']:>7} {r['L1+L2_avg_all']:>7.2f} {r['L1+L2_avg_cons']:>8.2f}") cur.close() conn.close() diff --git a/scripts/course_consumption_v2.py b/scripts/course_consumption_v2.py new file mode 100644 index 0000000..3e6e06e --- /dev/null +++ b/scripts/course_consumption_v2.py @@ -0,0 +1,395 @@ +#!/usr/bin/env python3 +""" +课消指标 v2:剔除 U0 序章,4张图按 L1/L2 拆分 +""" +import psycopg2 +from collections import defaultdict +from datetime import datetime, timedelta, date +import openpyxl +from openpyxl.styles import Font, Alignment, PatternFill, Border, Side +from openpyxl.chart import LineChart, BarChart, Reference +from openpyxl.chart.series import DataPoint +from openpyxl.chart.label import DataLabelList +from openpyxl.utils import get_column_letter + +conn = psycopg2.connect( + host="bj-postgres-16pob4sg.sql.tencentcdb.com", + port=28591, user="ai_member", + password="LdfjdjL83h3h3^$&**YGG*", dbname="vala_bi" +) +cur = conn.cursor() + +# ===== U0 chapter_ids to exclude ===== +u0_chapters = {55, 56, 57, 58, 59, 343, 344, 345, 346, 348} +print(f"剔除 U0 序章: {sorted(u0_chapters)}") + +# ===== 时间参数 ===== +overall_start = date(2025, 9, 1) +overall_end = date(2026, 5, 11) + +weeks = [] +d = overall_start +while d < overall_end: + ws = d + days_to_sunday = 6 - d.weekday() + we = d + timedelta(days=days_to_sunday) + if we >= overall_end: + we = overall_end - timedelta(days=1) + weeks.append((ws, we)) + d = we + timedelta(days=1) + +# ===== Step 1: 用户分类 ===== +print("\nStep 1: 分类付费用户...") +cur.execute(""" + SELECT o.account_id, o.trade_no, o.order_status, o.pay_success_date, + CASE WHEN o.goods_id IN (57, 60, 63) THEN 'L1' + WHEN o.goods_id = 61 THEN 'L1+L2' + WHEN o.goods_id IN (31, 32, 33, 54) THEN 'L2' + ELSE '其他' END as level_type + FROM bi_vala_order o + INNER JOIN bi_vala_app_account a ON o.account_id = a.id + WHERE a.status = 1 AND a.deleted_at IS NULL AND o.pay_success_date IS NOT NULL +""") +orders = cur.fetchall() + +cur.execute("SELECT trade_no FROM bi_refund_order WHERE status = 3") +refund_trades = set(r[0] for r in cur.fetchall()) + +user_data = defaultdict(lambda: {'levels': set(), 'orders': []}) +for aid, trade_no, order_status, pay_date, lt in orders: + is_refunded = (order_status == 4 and trade_no in refund_trades) + user_data[aid]['levels'].add(lt) + user_data[aid]['orders'].append((pay_date.date(), is_refunded, lt)) + +def classify_user(levels): + has_l1, has_l2 = 'L1' in levels, 'L2' in levels + return 'L1+L2' if ('L1+L2' in levels or (has_l1 and has_l2)) else ('仅L1' if has_l1 else ('仅L2' if has_l2 else '其他')) + +for aid in user_data: + user_data[aid]['category'] = classify_user(user_data[aid]['levels']) + +def is_paid_as_of(aid, as_of_date): + return sum(1 for pd, ref, lt in user_data[aid]['orders'] if pd <= as_of_date and not ref) > 0 + +# ===== Step 2: 课消 (剔除 U0) ===== +print("\nStep 2: 查询课消(剔除U0)...") +consumption_map = {} +u0_skipped = 0 +for table_idx in range(8): + tbl = f"bi_user_chapter_play_record_{table_idx}" + cur.execute(f""" + SELECT user_id, chapter_id, updated_at + FROM {tbl} + WHERE play_status = 1 AND updated_at >= '2025-09-01' AND updated_at < '2026-05-11' + """) + for user_id, chapter_id, updated_at in cur.fetchall(): + if chapter_id in u0_chapters: + u0_skipped += 1 + continue + key = (user_id, chapter_id) + d = updated_at.date() if hasattr(updated_at, 'date') else datetime.strptime(str(updated_at)[:10], '%Y-%m-%d').date() + if key not in consumption_map or d < consumption_map[key]: + consumption_map[key] = d + +print(f" 剔除U0课消: {u0_skipped} 条, 去重后: {len(consumption_map)} 条") + +# ===== Step 3: 角色映射 ===== +print("Step 3: 角色映射...") +all_uids = list(set(k[0] for k in consumption_map)) +char2acct = {} +bs = 500 +for i in range(0, len(all_uids), bs): + batch = all_uids[i:i+bs] + ph = ','.join(['%s'] * len(batch)) + cur.execute(f"SELECT id, account_id FROM bi_vala_app_character WHERE id IN ({ph})", batch) + for cid, aid in cur.fetchall(): + char2acct[cid] = aid +print(f" 映射: {len(char2acct)}") + +# ===== Step 4: 按周汇总 ===== +print("Step 4: 按周汇总...") +results = [] +for ws, we in weeks: + paid_by_cat = defaultdict(set) + for aid in user_data: + if is_paid_as_of(aid, we): + paid_by_cat[user_data[aid]['category']].add(aid) + + cons_by_cat = defaultdict(int) + cons_users_by_cat = defaultdict(set) + + for (uid, ch_id), cons_date in consumption_map.items(): + if ws <= cons_date <= we: + aid = char2acct.get(uid) + if aid: + cat = user_data.get(aid, {}).get('category', '其他') + if aid in paid_by_cat.get(cat, set()): + cons_by_cat[cat] += 1 + cons_users_by_cat[cat].add(aid) + + row = {'ws': ws, 'we': we} + for cat in ['仅L1', '仅L2', 'L1+L2', '其他', '合计']: + if cat == '合计': + n_paid = sum(len(v) for v in paid_by_cat.values()) + n_cons = sum(cons_by_cat.values()) + n_cons_users = len(set.union(*cons_users_by_cat.values())) if cons_users_by_cat else 0 + else: + n_paid = len(paid_by_cat.get(cat, set())) + n_cons = cons_by_cat.get(cat, 0) + n_cons_users = len(cons_users_by_cat.get(cat, set())) + + row[f'{cat}_paid'] = n_paid + row[f'{cat}_cons'] = n_cons + row[f'{cat}_cons_users'] = n_cons_users + row[f'{cat}_no_cons'] = n_paid - n_cons_users + row[f'{cat}_avg_all'] = round(n_cons / n_paid, 2) if n_paid > 0 else 0 + row[f'{cat}_avg_cons'] = round(n_cons / n_cons_users, 2) if n_cons_users > 0 else 0 + + results.append(row) + +cur.close() +conn.close() + +# ===== 过滤: 仅保留有足够数据的周(付费人数>0)===== +for cat in ['仅L1', '仅L2', 'L1+L2']: + # 找到第一个付费>0的周 + first_idx = next((i for i, r in enumerate(results) if r[f'{cat}_paid'] > 0), 0) + print(f"{cat} 数据起于第 {first_idx+1} 周 ({results[first_idx]['ws']})") + +# ===== 生成 Excel ===== +print("\n生成 Excel...") +wb = openpyxl.Workbook() +wb.remove(wb.active) + +# 样式 +header_font = Font(name='微软雅黑', bold=True, size=9, color='FFFFFF') +header_fill = PatternFill(start_color='2F5496', end_color='2F5496', fill_type='solid') +data_font = Font(name='微软雅黑', size=9) +title_font = Font(name='微软雅黑', bold=True, size=14, color='2F5496') +subtitle_font = Font(name='微软雅黑', bold=True, size=11, color='2F5496') +border = Border(left=Side(style='thin'), right=Side(style='thin'), top=Side(style='thin'), bottom=Side(style='thin')) +center = Alignment(horizontal='center', vertical='center') + +l1_color = '4A90D9' +l2_color = 'E85D47' +l1l2_color = '7B9E4B' + +def apply_cell(ws, row, col, value, font=data_font, fill=None, align=center, border_style=border): + c = ws.cell(row=row, column=col, value=value) + c.font, c.border, c.alignment = font, border_style, align + if fill: c.fill = fill + return c + +def apply_header(ws, row, col, value): + c = ws.cell(row=row, column=col, value=value) + c.font, c.fill, c.border, c.alignment = header_font, header_fill, border, center + return c + +# ===== Sheet 1: 概览 ===== +ws1 = wb.create_sheet("概览") +ws1.merge_cells('A1:H1') +apply_cell(ws1, 1, 1, "付费用户 L1/L2 课消分析(剔除U0序章)", font=title_font, border_style=None, align=Alignment(horizontal='left')) + +notes = [ + "口径:剔除L1/L2的U0序章课时(L1 U00: 343-348, L2 U00: 55-59),仅统计U1及之后的课消", + "课消:用户首次完成某一课时;付费用户:status=1 + 未删除 + 有订单 + 未全部退款", +] +for i, n in enumerate(notes): + ws1.merge_cells(f'A{3+i}:H{3+i}') + apply_cell(ws1, 3+i, 1, n, font=Font(name='微软雅黑', size=9, color='666666'), border_style=None, align=Alignment(horizontal='left')) + +# ===== Sheet 2: 每周明细 ===== +ws2 = wb.create_sheet("每周明细") +headers_main = ['周', '周一起', '周日'] + ['合计付费', '合计有消', '合计无消', '合计课消', '合计人均', '合计有消人均', + '仅L1付费', '仅L1有消', '仅L1无消', '仅L1课消', '仅L1人均', '仅L1有消人均', + '仅L2付费', '仅L2有消', '仅L2无消', '仅L2课消', '仅L2人均', '仅L2有消人均', + 'L1+L2付费', 'L1+L2有消', 'L1+L2无消', 'L1+L2课消', 'L1+L2人均', 'L1+L2有消人均'] + +for j, h in enumerate(headers_main, 1): + apply_header(ws2, 1, j, h) + +for ri, r in enumerate(results): + row = ri + 2 + wl = f"{r['ws'].strftime('%m/%d')}-{r['we'].strftime('%m/%d')}" + apply_cell(ws2, row, 1, wl) + apply_cell(ws2, row, 2, r['ws'].strftime('%Y-%m-%d')) + apply_cell(ws2, row, 3, r['we'].strftime('%Y-%m-%d')) + col = 4 + for prefix in ['合计', '仅L1', '仅L2', 'L1+L2']: + for metric in ['paid', 'cons_users', 'no_cons', 'cons', 'avg_all', 'avg_cons']: + val = r[f'{prefix}_{metric}'] + apply_cell(ws2, row, col, val if isinstance(val, str) else val) + col += 1 + +for ci in range(1, len(headers_main)+1): + ws2.column_dimensions[get_column_letter(ci)].width = 11 if ci <= 3 else 10 +ws2.freeze_panes = 'D2' + +# ===== Sheet 3: L1 图表 ===== +sheet_names = {'仅L1': ('L1图表', 'L1', l1_color, '4A90D9'), '仅L2': ('L2图表', 'L2', l2_color, 'E85D47')} + +for cat, (sname, label, color, light_color) in sheet_names.items(): + ws_chart_data = wb.create_sheet(sname) + + # 只取该分类有付费用户的周 + first_idx = next((i for i, r in enumerate(results) if r[f'{cat}_paid'] > 0), 0) + cat_results = results[first_idx:] + + # Header + headers = ['周', '付费用户', '有课消用户', '无课消用户', '课消总数', '人均课消', '有消人均'] + for j, h in enumerate(headers, 1): + apply_header(ws_chart_data, 1, j, h) + + for ri, r in enumerate(cat_results): + row = ri + 2 + wl = f"{r['ws'].strftime('%m/%d')}" + apply_cell(ws_chart_data, row, 1, wl) + apply_cell(ws_chart_data, row, 2, r[f'{cat}_paid']) + apply_cell(ws_chart_data, row, 3, r[f'{cat}_cons_users']) + apply_cell(ws_chart_data, row, 4, r[f'{cat}_no_cons']) + apply_cell(ws_chart_data, row, 5, r[f'{cat}_cons']) + apply_cell(ws_chart_data, row, 6, r[f'{cat}_avg_all']) + apply_cell(ws_chart_data, row, 7, r[f'{cat}_avg_cons']) + + n_rows = len(cat_results) + cats_ref = Reference(ws_chart_data, min_col=1, min_row=2, max_row=n_rows+1) + + # --- Chart 1: 堆叠柱状图 (有课消/无课消) --- + chart1 = BarChart() + chart1.type = "col" + chart1.grouping = "stacked" + chart1.title = f"{label} 付费用户课消分布(剔除U0序章)" + chart1.style = 10 + chart1.width = 24 + chart1.height = 13 + + # 有课消用户 + ref1 = Reference(ws_chart_data, min_col=3, min_row=1, max_row=n_rows+1) + chart1.add_data(ref1, titles_from_data=True) + chart1.set_categories(cats_ref) + chart1.series[0].graphicalProperties.solidFill = light_color + + # 无课消用户 + ref2 = Reference(ws_chart_data, min_col=4, min_row=1, max_row=n_rows+1) + chart1.add_data(ref2, titles_from_data=True) + chart1.series[1].graphicalProperties.solidFill = 'D9D9D9' + + chart1.y_axis.title = '用户数' + chart1.legend.position = 'b' + ws_chart_data.add_chart(chart1, "A9") + + # --- Chart 2: 折线图 (人均课消 + 有消人均) --- + chart2 = LineChart() + chart2.title = f"{label} 周人均课消趋势(剔除U0序章)" + chart2.style = 10 + chart2.width = 24 + chart2.height = 13 + chart2.y_axis.title = '课消数(节/周)' + + ref3 = Reference(ws_chart_data, min_col=6, min_row=1, max_row=n_rows+1) + chart2.add_data(ref3, titles_from_data=True) + chart2.set_categories(cats_ref) + chart2.series[0].graphicalProperties.line.solidFill = '999999' + chart2.series[0].graphicalProperties.line.width = 20000 + chart2.series[0].graphicalProperties.line.dashStyle = 'dash' + + ref4 = Reference(ws_chart_data, min_col=7, min_row=1, max_row=n_rows+1) + chart2.add_data(ref4, titles_from_data=True) + chart2.series[1].graphicalProperties.line.solidFill = color + chart2.series[1].graphicalProperties.line.width = 28000 + + chart2.y_axis.scaling.min = 0 + chart2.legend.position = 'b' + ws_chart_data.add_chart(chart2, "A27") + + # Column widths + for ci in range(1, 8): + ws_chart_data.column_dimensions[get_column_letter(ci)].width = 12 + +# ===== Sheet 4: L1+L2 图表(第三个分类)===== +ws_l1l2 = wb.create_sheet("L1+L2图表") +cat = 'L1+L2' +color = l1l2_color +light_color = 'A8C88E' +first_idx = next((i for i, r in enumerate(results) if r[f'{cat}_paid'] > 0), 0) +cat_results = results[first_idx:] + +headers = ['周', '付费用户', '有课消用户', '无课消用户', '课消总数', '人均课消', '有消人均'] +for j, h in enumerate(headers, 1): + apply_header(ws_l1l2, 1, j, h) + +n_rows = len(cat_results) +for ri, r in enumerate(cat_results): + row = ri + 2 + wl = f"{r['ws'].strftime('%m/%d')}" + apply_cell(ws_l1l2, row, 1, wl) + apply_cell(ws_l1l2, row, 2, r[f'{cat}_paid']) + apply_cell(ws_l1l2, row, 3, r[f'{cat}_cons_users']) + apply_cell(ws_l1l2, row, 4, r[f'{cat}_no_cons']) + apply_cell(ws_l1l2, row, 5, r[f'{cat}_cons']) + apply_cell(ws_l1l2, row, 6, r[f'{cat}_avg_all']) + apply_cell(ws_l1l2, row, 7, r[f'{cat}_avg_cons']) + +cats_ref = Reference(ws_l1l2, min_col=1, min_row=2, max_row=n_rows+1) + +chart1 = BarChart() +chart1.type = "col" +chart1.grouping = "stacked" +chart1.title = f"L1+L2 付费用户课消分布(剔除U0序章)" +chart1.style = 10 +chart1.width = 24 +chart1.height = 13 + +ref1 = Reference(ws_l1l2, min_col=3, min_row=1, max_row=n_rows+1) +chart1.add_data(ref1, titles_from_data=True) +chart1.set_categories(cats_ref) +chart1.series[0].graphicalProperties.solidFill = light_color + +ref2 = Reference(ws_l1l2, min_col=4, min_row=1, max_row=n_rows+1) +chart1.add_data(ref2, titles_from_data=True) +chart1.series[1].graphicalProperties.solidFill = 'D9D9D9' + +chart1.y_axis.title = '用户数' +chart1.legend.position = 'b' +ws_l1l2.add_chart(chart1, "A9") + +chart2 = LineChart() +chart2.title = f"L1+L2 周人均课消趋势(剔除U0序章)" +chart2.style = 10 +chart2.width = 24 +chart2.height = 13 +chart2.y_axis.title = '课消数(节/周)' + +ref3 = Reference(ws_l1l2, min_col=6, min_row=1, max_row=n_rows+1) +chart2.add_data(ref3, titles_from_data=True) +chart2.set_categories(cats_ref) +chart2.series[0].graphicalProperties.line.solidFill = '999999' +chart2.series[0].graphicalProperties.line.width = 20000 +chart2.series[0].graphicalProperties.line.dashStyle = 'dash' + +ref4 = Reference(ws_l1l2, min_col=7, min_row=1, max_row=n_rows+1) +chart2.add_data(ref4, titles_from_data=True) +chart2.series[1].graphicalProperties.line.solidFill = color +chart2.series[1].graphicalProperties.line.width = 28000 + +chart2.y_axis.scaling.min = 0 +chart2.legend.position = 'b' +ws_l1l2.add_chart(chart2, "A27") + +for ci in range(1, 8): + ws_l1l2.column_dimensions[get_column_letter(ci)].width = 12 + +# 保存 +path = '/root/.openclaw/workspace/output/course_consumption_by_level_v2.xlsx' +wb.save(path) +print(f"\n✅ Excel v2 已保存: {path}") + +# 简要摘要 +last = results[-1] +print(f""" +=== 剔除U0后最终数据(截至5/10) === +仅L1: 付费{last['仅L1_paid']} 有消{last['仅L1_cons_users']} 无消{last['仅L1_no_cons']} 人均{last['仅L1_avg_all']} 有消人均{last['仅L1_avg_cons']} +仅L2: 付费{last['仅L2_paid']} 有消{last['仅L2_cons_users']} 无消{last['仅L2_no_cons']} 人均{last['仅L2_avg_all']} 有消人均{last['仅L2_avg_cons']} +L1+L2: 付费{last['L1+L2_paid']} 有消{last['L1+L2_cons_users']} 无消{last['L1+L2_no_cons']} 人均{last['L1+L2_avg_all']} 有消人均{last['L1+L2_avg_cons']} +合计: 付费{last['合计_paid']} 有消{last['合计_cons_users']} 无消{last['合计_no_cons']} 人均{last['合计_avg_all']} 有消人均{last['合计_avg_cons']} +""") diff --git a/scripts/course_excel_v3.py b/scripts/course_excel_v3.py new file mode 100644 index 0000000..c4c32de --- /dev/null +++ b/scripts/course_excel_v3.py @@ -0,0 +1,287 @@ +#!/usr/bin/env python3 +import psycopg2 +from collections import defaultdict +from datetime import datetime, timedelta, date +import openpyxl +from openpyxl.styles import Font, Alignment, PatternFill, Border, Side +from openpyxl.chart import LineChart, BarChart, Reference +from openpyxl.utils import get_column_letter + +conn = psycopg2.connect( + host="bj-postgres-16pob4sg.sql.tencentcdb.com", + port=28591, user="ai_member", + password="LdfjdjL83h3h3^$&**YGG*", dbname="vala_bi" +) +cur = conn.cursor() + +u0_chapters = {55, 56, 57, 58, 59, 343, 344, 345, 346, 348} +overall_start = date(2025, 9, 1) +overall_end = date(2026, 5, 11) + +weeks = [] +d = overall_start +while d < overall_end: + ws = d + we = d + timedelta(days=6 - d.weekday()) + if we >= overall_end: we = overall_end - timedelta(days=1) + weeks.append((ws, we)) + d = we + timedelta(days=1) + +print("分类付费用户...") +cur.execute(""" + SELECT o.account_id, o.trade_no, o.order_status, o.pay_success_date, + CASE WHEN o.goods_id IN (57, 60, 63) THEN 'L1' + WHEN o.goods_id = 61 THEN 'L1+L2' + WHEN o.goods_id IN (31, 32, 33, 54) THEN 'L2' + ELSE '其他' END as level_type + FROM bi_vala_order o + INNER JOIN bi_vala_app_account a ON o.account_id = a.id + WHERE a.status = 1 AND a.deleted_at IS NULL AND o.pay_success_date IS NOT NULL +""") +orders = cur.fetchall() + +cur.execute("SELECT trade_no FROM bi_refund_order WHERE status = 3") +refund_trades = set(r[0] for r in cur.fetchall()) + +user_levels = defaultdict(set) +user_orders = defaultdict(list) +for aid, trade_no, order_status, pay_date, lt in orders: + is_refunded = (order_status == 4 and trade_no in refund_trades) + user_levels[aid].add(lt) + user_orders[aid].append((pay_date.date(), is_refunded)) + +def is_paid(aid, as_of): + return sum(1 for pd, ref in user_orders[aid] if pd <= as_of and not ref) > 0 + +l1_pool = {aid for aid, lv in user_levels.items() if 'L1' in lv or 'L1+L2' in lv} +l2_pool = {aid for aid, lv in user_levels.items() if 'L2' in lv or 'L1+L2' in lv} +all_pool = l1_pool | l2_pool + +print(f"L1池: {len(l1_pool)}, L2池: {len(l2_pool)}, 合计: {len(all_pool)}") + +print("查询课消...") +cons_map = {} +for ti in range(8): + tbl = f"bi_user_chapter_play_record_{ti}" + cur.execute(f"""SELECT user_id, chapter_id, updated_at FROM {tbl} + WHERE play_status = 1 AND updated_at >= '2025-09-01' AND updated_at < '2026-05-11'""") + for uid, cid, ua in cur.fetchall(): + if cid in u0_chapters: continue + key = (uid, cid) + d = ua.date() if hasattr(ua, 'date') else datetime.strptime(str(ua)[:10], '%Y-%m-%d').date() + if key not in cons_map or d < cons_map[key]: + cons_map[key] = d + +print("角色映射...") +all_uids = list(set(k[0] for k in cons_map)) +char2acct = {} +for i in range(0, len(all_uids), 500): + batch = all_uids[i:i+500] + ph = ','.join(['%s'] * len(batch)) + cur.execute(f"SELECT id, account_id FROM bi_vala_app_character WHERE id IN ({ph})", batch) + for cid, aid in cur.fetchall(): char2acct[cid] = aid + +print("按周汇总...") +results = [] +for ws, we in weeks: + l1_paid = {aid for aid in l1_pool if is_paid(aid, we)} + l2_paid = {aid for aid in l2_pool if is_paid(aid, we)} + t_paid = {aid for aid in all_pool if is_paid(aid, we)} + + l1_cons, l1_cons_users = 0, set() + l2_cons, l2_cons_users = 0, set() + t_cons, t_cu = 0, set() + + for (uid, ch_id), cons_date in cons_map.items(): + if ws <= cons_date <= we: + aid = char2acct.get(uid) + if not aid: continue + if aid in l1_paid: + l1_cons += 1 + l1_cons_users.add(aid) + if aid in l2_paid: + l2_cons += 1 + l2_cons_users.add(aid) + if aid in t_paid: + t_cons += 1 + t_cu.add(aid) + + results.append({ + 'ws': ws, 'we': we, + 'L1_paid': len(l1_paid), 'L1_cons': l1_cons, 'L1_cons_users': len(l1_cons_users), + 'L1_no_cons': len(l1_paid) - len(l1_cons_users), + 'L1_avg_all': round(l1_cons / len(l1_paid), 2) if l1_paid else 0, + 'L1_avg_cons': round(l1_cons / len(l1_cons_users), 2) if l1_cons_users else 0, + 'L2_paid': len(l2_paid), 'L2_cons': l2_cons, 'L2_cons_users': len(l2_cons_users), + 'L2_no_cons': len(l2_paid) - len(l2_cons_users), + 'L2_avg_all': round(l2_cons / len(l2_paid), 2) if l2_paid else 0, + 'L2_avg_cons': round(l2_cons / len(l2_cons_users), 2) if l2_cons_users else 0, + 'total_paid': len(t_paid), 'total_cons': t_cons, 'total_cons_users': len(t_cu), + 'total_no_cons': len(t_paid) - len(t_cu), + 'total_avg_all': round(t_cons / len(t_paid), 2) if t_paid else 0, + 'total_avg_cons': round(t_cons / len(t_cu), 2) if t_cu else 0, + }) + +cur.close() +conn.close() + +print("\n生成 Excel...") +wb = openpyxl.Workbook() +wb.remove(wb.active) + +hfont = Font(name='微软雅黑', bold=True, size=9, color='FFFFFF') +hfill = PatternFill(start_color='2F5496', end_color='2F5496', fill_type='solid') +dfont = Font(name='微软雅黑', size=9) +tfont = Font(name='微软雅黑', bold=True, size=14, color='2F5496') +sfont = Font(name='微软雅黑', bold=True, size=11, color='2F5496') +bd = Border(left=Side(style='thin'), right=Side(style='thin'), top=Side(style='thin'), bottom=Side(style='thin')) +ctr = Alignment(horizontal='center', vertical='center') + +def ac(ws, r, c, v, font=dfont, fill=None, align=ctr): + cl = ws.cell(row=r, column=c, value=v) + cl.font, cl.border, cl.alignment = font, bd, align + if fill: cl.fill = fill + return cl + +def ah(ws, r, c, v): + cl = ws.cell(row=r, column=c, value=v) + cl.font, cl.fill, cl.border, cl.alignment = hfont, hfill, bd, ctr + return cl + +# Sheet 1: 概览 +ws1 = wb.create_sheet("概览") +ws1.merge_cells('A1:H1') +ac(ws1, 1, 1, "付费用户课消分析(剔除U0序章)", font=tfont, fill=None, align=Alignment(horizontal='left')) + +notes = [ + "口径:L1付费用户 = 买过L1商品(含L1+L2)的付费用户 | L2付费用户 = 买过L2商品(含L1+L2)的付费用户", + "L1+L2用户同时出现在L1和L2两个视角中 | 合计为去重统计", + "课消:用户首次完成某一课时(剔除U0序章,仅U1+)", + "付费用户:status=1 + 未删除 + 有未退款订单", +] +for i, n in enumerate(notes): + ws1.merge_cells(f'A{3+i}:H{3+i}') + ac(ws1, 3+i, 1, n, font=Font(name='微软雅黑', size=9, color='666666'), fill=None, align=Alignment(horizontal='left')) + +row = 9 +ws1.merge_cells(f'A{row}:H{row}') +ac(ws1, row, 1, "汇总(截至最后一周)", font=sfont, fill=None, align=Alignment(horizontal='left')) +row += 1 + +for j, h in enumerate(['分类', '付费用户', '有课消', '无课消', '无课消率', '人均课消', '有消人均'], 1): + ah(ws1, row, j, h) +row += 1 + +last = results[-1] +summary = [ + ('L1付费群', last['L1_paid'], last['L1_cons_users'], last['L1_no_cons'], last['L1_avg_all'], last['L1_avg_cons'], '#A8CFF1'), + ('L2付费群', last['L2_paid'], last['L2_cons_users'], last['L2_no_cons'], last['L2_avg_all'], last['L2_avg_cons'], '#F4A9A0'), + ('合计(去重)', last['total_paid'], last['total_cons_users'], last['total_no_cons'], last['total_avg_all'], last['total_avg_cons'], '#C8E6C9'), +] +for name, p, cu, nc, aa, ac_, clr in summary: + no_rate = f"{nc/p*100:.0f}%" if p else "0%" + fl = PatternFill(start_color='00'+clr[1:], end_color='00'+clr[1:], fill_type='solid') + for j, v in enumerate([name, p, cu, nc, no_rate, aa, ac_], 1): + f = Font(name='微软雅黑', bold=(j==1), size=10) + ac(ws1, row, j, v, font=f, fill=fl) + row += 1 + +# Sheet 2: 每周明细 +ws2 = wb.create_sheet("每周明细") +headers = ['周', '周一起', '周日'] +for prefix in ['合计', 'L1付费群', 'L2付费群']: + for m in ['付费', '有消', '无消', '课消', '人均', '有消人均']: + headers.append(f'{prefix}{m}') + +for j, h in enumerate(headers, 1): + ah(ws2, 1, j, h) + +for ri, r in enumerate(results): + rw = ri + 2 + ac(ws2, rw, 1, r['ws'].strftime('%m/%d')) + ac(ws2, rw, 2, r['ws'].strftime('%Y-%m-%d')) + ac(ws2, rw, 3, r['we'].strftime('%Y-%m-%d')) + col = 4 + for prefix in ['total', 'L1', 'L2']: + for k in ['paid', 'cons_users', 'no_cons', 'cons', 'avg_all', 'avg_cons']: + ac(ws2, rw, col, r[f'{prefix}_{k}']) + col += 1 + +for ci in range(1, len(headers)+1): + ws2.column_dimensions[get_column_letter(ci)].width = 11 if ci <= 3 else 10 +ws2.freeze_panes = 'D2' + +# Sheet 3: L1图表 +ws_l1 = wb.create_sheet("L1图表") +lh = ['周', '付费用户', '有课消用户', '无课消用户', '课消总数', '人均课消', '有消人均'] +first = next(i for i, r in enumerate(results) if r['L1_paid'] > 0) +l1d = results[first:] +for j, h in enumerate(lh, 1): ah(ws_l1, 1, j, h) +for ri, r in enumerate(l1d): + rw = ri + 2 + ac(ws_l1, rw, 1, r['ws'].strftime('%m/%d')) + for j, k in enumerate(['L1_paid','L1_cons_users','L1_no_cons','L1_cons','L1_avg_all','L1_avg_cons'], 2): + ac(ws_l1, rw, j, r[k]) + +n = len(l1d) +cr = Reference(ws_l1, min_col=1, min_row=2, max_row=n+1) + +ch1 = BarChart(); ch1.type = "col"; ch1.grouping = "stacked" +ch1.title = "L1付费用户周课消分布(剔除U0序章)"; ch1.style = 10; ch1.width = 24; ch1.height = 13 +r1 = Reference(ws_l1, min_col=3, min_row=1, max_row=n+1); ch1.add_data(r1, titles_from_data=True) +r2 = Reference(ws_l1, min_col=4, min_row=1, max_row=n+1); ch1.add_data(r2, titles_from_data=True) +ch1.set_categories(cr) +ch1.series[0].graphicalProperties.solidFill = 'A8CFF1' +ch1.series[1].graphicalProperties.solidFill = 'D9D9D9' +ch1.y_axis.title = '用户数'; ch1.legend.position = 'b' +ws_l1.add_chart(ch1, "A9") + +ch2 = LineChart(); ch2.title = "L1付费用户周人均课消趋势(剔除U0序章)"; ch2.style = 10; ch2.width = 24; ch2.height = 13 +r3 = Reference(ws_l1, min_col=6, min_row=1, max_row=n+1); ch2.add_data(r3, titles_from_data=True) +r4 = Reference(ws_l1, min_col=7, min_row=1, max_row=n+1); ch2.add_data(r4, titles_from_data=True) +ch2.set_categories(cr) +ch2.series[0].graphicalProperties.line.solidFill = '999999'; ch2.series[0].graphicalProperties.line.width = 20000 +ch2.series[1].graphicalProperties.line.solidFill = '4A90D9'; ch2.series[1].graphicalProperties.line.width = 28000 +ch2.y_axis.scaling.min = 0; ch2.y_axis.title = '课消数(节/周)'; ch2.legend.position = 'b' +ws_l1.add_chart(ch2, "A27") +for ci in range(1, 8): ws_l1.column_dimensions[get_column_letter(ci)].width = 12 + +# Sheet 4: L2图表 +ws_l2 = wb.create_sheet("L2图表") +first2 = next(i for i, r in enumerate(results) if r['L2_paid'] > 0) +l2d = results[first2:] +for j, h in enumerate(lh, 1): ah(ws_l2, 1, j, h) +for ri, r in enumerate(l2d): + rw = ri + 2 + ac(ws_l2, rw, 1, r['ws'].strftime('%m/%d')) + for j, k in enumerate(['L2_paid','L2_cons_users','L2_no_cons','L2_cons','L2_avg_all','L2_avg_cons'], 2): + ac(ws_l2, rw, j, r[k]) + +n2 = len(l2d) +cr2 = Reference(ws_l2, min_col=1, min_row=2, max_row=n2+1) + +ch3 = BarChart(); ch3.type = "col"; ch3.grouping = "stacked" +ch3.title = "L2付费用户周课消分布(剔除U0序章)"; ch3.style = 10; ch3.width = 24; ch3.height = 13 +r5 = Reference(ws_l2, min_col=3, min_row=1, max_row=n2+1); ch3.add_data(r5, titles_from_data=True) +r6 = Reference(ws_l2, min_col=4, min_row=1, max_row=n2+1); ch3.add_data(r6, titles_from_data=True) +ch3.set_categories(cr2) +ch3.series[0].graphicalProperties.solidFill = 'F4A9A0' +ch3.series[1].graphicalProperties.solidFill = 'D9D9D9' +ch3.y_axis.title = '用户数'; ch3.legend.position = 'b' +ws_l2.add_chart(ch3, "A9") + +ch4 = LineChart(); ch4.title = "L2付费用户周人均课消趋势(剔除U0序章)"; ch4.style = 10; ch4.width = 24; ch4.height = 13 +r7 = Reference(ws_l2, min_col=6, min_row=1, max_row=n2+1); ch4.add_data(r7, titles_from_data=True) +r8 = Reference(ws_l2, min_col=7, min_row=1, max_row=n2+1); ch4.add_data(r8, titles_from_data=True) +ch4.set_categories(cr2) +ch4.series[0].graphicalProperties.line.solidFill = '999999'; ch4.series[0].graphicalProperties.line.width = 20000 +ch4.series[1].graphicalProperties.line.solidFill = 'E85D47'; ch4.series[1].graphicalProperties.line.width = 28000 +ch4.y_axis.scaling.min = 0; ch4.y_axis.title = '课消数(节/周)'; ch4.legend.position = 'b' +ws_l2.add_chart(ch4, "A27") +for ci in range(1, 8): ws_l2.column_dimensions[get_column_letter(ci)].width = 12 + +path = '/root/.openclaw/workspace/output/course_consumption_by_level_v3.xlsx' +wb.save(path) +print(f"\n✅ {path}") +print(f"L1付费群: {last['L1_paid']}人 | L2付费群: {last['L2_paid']}人 | 合计(去重): {last['total_paid']}人") +print(f"L1无消率: {last['L1_no_cons']/last['L1_paid']*100:.0f}% | L2无消率: {last['L2_no_cons']/last['L2_paid']*100:.0f}%") diff --git a/scripts/excel_v4.py b/scripts/excel_v4.py new file mode 100644 index 0000000..d387e17 --- /dev/null +++ b/scripts/excel_v4.py @@ -0,0 +1,129 @@ +#!/usr/bin/env python3 +"""Excel v4: L1只看L1课程, L2只看L2课程""" +import json, openpyxl +from datetime import date +from openpyxl.styles import Font, Alignment, PatternFill, Border, Side +from openpyxl.chart import LineChart, BarChart, Reference +from openpyxl.utils import get_column_letter + +with open('/root/.openclaw/workspace/output/course_data_v4.json') as f: + raw = json.load(f) +results = raw['results'] + +for r in results: + r['ws'] = date.fromisoformat(r['ws']) + r['we'] = date.fromisoformat(r['we']) + +wb = openpyxl.Workbook() +wb.remove(wb.active) +hfont = Font(name='微软雅黑', bold=True, size=9, color='FFFFFF') +hfill = PatternFill(start_color='002F5496', end_color='002F5496', fill_type='solid') +dfont = Font(name='微软雅黑', size=9) +tfont = Font(name='微软雅黑', bold=True, size=14, color='002F5496') +sfont = Font(name='微软雅黑', bold=True, size=11, color='002F5496') +bd = Border(left=Side(style='thin'), right=Side(style='thin'), top=Side(style='thin'), bottom=Side(style='thin')) +ctr = Alignment(horizontal='center', vertical='center') + +def ac(ws, r, c, v, font=dfont, fill=None, align=ctr): + cl = ws.cell(row=r, column=c, value=v) + cl.font, cl.border, cl.alignment = font, bd, align + if fill: cl.fill = fill + +def ah(ws, r, c, v): + cl = ws.cell(row=r, column=c, value=v) + cl.font, cl.fill, cl.border, cl.alignment = hfont, hfill, bd, ctr + +# Sheet 1 +ws1 = wb.create_sheet("概览") +ws1.merge_cells('A1:H1') +ac(ws1,1,1,"付费用户课消分析 v4(只看对应级别课程,剔除U0)",font=tfont,align=Alignment(horizontal='left')) +notes = [ + "口径:L1付费群 = 买过L1商品的付费用户, 只看L1课程课消 | L2付费群 = 买过L2商品的付费用户, 只看L2课程课消", + "L1+L2用户:在L1视角只统计L1课程课消, L2视角只统计L2课程课消", + "课消:用户首次完成某一课时(剔除U0序章)", + "付费用户:status=1 + 未删除 + 有未退款订单", +] +for i,n in enumerate(notes): + ws1.merge_cells(f'A{3+i}:H{3+i}') + ac(ws1,3+i,1,n,font=Font(name='微软雅黑',size=9,color='666666'),align=Alignment(horizontal='left')) + +row=9 +ws1.merge_cells(f'A{row}:H{row}') +ac(ws1,row,1,"汇总(截至最后一周)",font=sfont,align=Alignment(horizontal='left')) +row+=1 +for j,h in enumerate(['分类','付费用户','有课消','无课消','无课消率','人均课消','有消人均'],1): + ah(ws1,row,j,h) +row+=1 + +last=results[-1] +skus = [ + ('L1付费群(只看L1课程)', last['L1_paid'],last['L1_cons_users'],last['L1_no_cons'],last['L1_avg_all'],last['L1_avg_cons'], '00A8CFF1'), + ('L2付费群(只看L2课程)', last['L2_paid'],last['L2_cons_users'],last['L2_no_cons'],last['L2_avg_all'],last['L2_avg_cons'], '00F4A9A0'), + ('合计(去重)', last['total_paid'],last['total_cons_users'],last['total_no_cons'],last['total_avg_all'],last['total_avg_cons'], '00C8E6C9'), +] +for name,p,cu,nc,aa,ac_,clr in skus: + no_rate=f"{nc/p*100:.0f}%" if p else "0%" + fl=PatternFill(start_color=clr,end_color=clr,fill_type='solid') + for j,v in enumerate([name,p,cu,nc,no_rate,aa,ac_],1): + ac(ws1,row,j,v,font=Font(name='微软雅黑',bold=(j==1),size=10),fill=fl) + row+=1 + +# Sheet 2 +ws2=wb.create_sheet("每周明细") +headers=['周','周一起','周日'] +for pfx in ['合计','L1付费群','L2付费群']: + for m in ['付费','有消','无消','课消','人均','有消人均']: + headers.append(f'{pfx}{m}') +for j,h in enumerate(headers,1): ah(ws2,1,j,h) +for ri,r in enumerate(results): + rw=ri+2 + ac(ws2,rw,1,r['ws'].strftime('%m/%d')) + ac(ws2,rw,2,r['ws'].strftime('%Y-%m-%d')) + ac(ws2,rw,3,r['we'].strftime('%Y-%m-%d')) + col=4 + for prefix in ['total','L1','L2']: + for k in ['paid','cons_users','no_cons','cons','avg_all','avg_cons']: + ac(ws2,rw,col,r[f'{prefix}_{k}']) + col+=1 +for ci in range(1,len(headers)+1): + ws2.column_dimensions[get_column_letter(ci)].width=11 if ci<=3 else 10 +ws2.freeze_panes='D2' + +# Sheet 3+4: charts +for lvl, pf, clr in [('L1','L1','4A90D9'),('L2','L2','E85D47')]: + ws=wb.create_sheet(f"{pf}图表") + lh=['周','付费用户','有课消用户','无课消用户','课消总数','人均课消','有消人均'] + first=next(i for i,r in enumerate(results) if r[f'{pf}_paid']>0) + ld=results[first:] + for j,h in enumerate(lh,1): ah(ws,1,j,h) + for ri,r in enumerate(ld): + rw=ri+2 + ac(ws,rw,1,r['ws'].strftime('%m/%d')) + for j,k in enumerate([f'{pf}_paid',f'{pf}_cons_users',f'{pf}_no_cons',f'{pf}_cons',f'{pf}_avg_all',f'{pf}_avg_cons'],2): + ac(ws,rw,j,r[k]) + n=len(ld) + cr=Reference(ws,min_col=1,min_row=2,max_row=n+1) + + ch1=BarChart(); ch1.type="col"; ch1.grouping="stacked" + ch1.title=f"{pf}付费用户周课消分布(只看{pf}课程)"; ch1.style=10; ch1.width=24; ch1.height=13 + ch1.add_data(Reference(ws,min_col=3,min_row=1,max_row=n+1),titles_from_data=True) + ch1.add_data(Reference(ws,min_col=4,min_row=1,max_row=n+1),titles_from_data=True) + ch1.set_categories(cr) + ch1.series[0].graphicalProperties.solidFill='A8CFF1' if pf=='L1' else 'F4A9A0' + ch1.series[1].graphicalProperties.solidFill='D9D9D9' + ch1.y_axis.title='用户数'; ch1.legend.position='b' + ws.add_chart(ch1,"A9") + + ch2=LineChart(); ch2.title=f"{pf}付费用户周人均课消趋势(只看{pf}课程)"; ch2.style=10; ch2.width=24; ch2.height=13 + ch2.add_data(Reference(ws,min_col=6,min_row=1,max_row=n+1),titles_from_data=True) + ch2.add_data(Reference(ws,min_col=7,min_row=1,max_row=n+1),titles_from_data=True) + ch2.set_categories(cr) + ch2.series[0].graphicalProperties.line.solidFill='999999'; ch2.series[0].graphicalProperties.line.width=20000 + ch2.series[1].graphicalProperties.line.solidFill=clr; ch2.series[1].graphicalProperties.line.width=28000 + ch2.y_axis.scaling.min=0; ch2.y_axis.title='课消数(节/周)'; ch2.legend.position='b' + ws.add_chart(ch2,"A27") + for ci in range(1,8): ws.column_dimensions[get_column_letter(ci)].width=12 + +path='/root/.openclaw/workspace/output/course_consumption_by_level_v4.xlsx' +wb.save(path) +print(f'✅ {path}') diff --git a/scripts/generate_charts.py b/scripts/generate_charts.py new file mode 100644 index 0000000..2e9be6f --- /dev/null +++ b/scripts/generate_charts.py @@ -0,0 +1,247 @@ +#!/usr/bin/env python3 +""" +生成 4 张课消图表(剔除U0序章): +1. L1 付费用户课消分布(堆叠柱状图) +2. L2 付费用户课消分布(堆叠柱状图) +3. L1 周人均课消趋势(折线图) +4. L2 周人均课消趋势(折线图) +""" +import psycopg2 +from collections import defaultdict +from datetime import datetime, timedelta, date +import matplotlib +matplotlib.use('Agg') +import matplotlib.pyplot as plt +import matplotlib.dates as mdates +import matplotlib.ticker as ticker +import numpy as np + +# 中文字体 +import matplotlib.font_manager as fm +font_path = '/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc' +fm.fontManager.addfont(font_path) +prop = fm.FontProperties(fname=font_path) +font_name = prop.get_name() +plt.rcParams['font.family'] = font_name +plt.rcParams['axes.unicode_minus'] = False +print(f'使用字体: {font_name}') + +conn = psycopg2.connect( + host="bj-postgres-16pob4sg.sql.tencentcdb.com", + port=28591, user="ai_member", + password="LdfjdjL83h3h3^$&**YGG*", dbname="vala_bi" +) +cur = conn.cursor() + +# ===== 配置 ===== +u0_chapters = {55, 56, 57, 58, 59, 343, 344, 345, 346, 348} +overall_start = date(2025, 9, 1) +overall_end = date(2026, 5, 11) + +weeks = [] +d = overall_start +while d < overall_end: + ws = d + we = d + timedelta(days=6 - d.weekday()) + if we >= overall_end: + we = overall_end - timedelta(days=1) + weeks.append((ws, we)) + d = we + timedelta(days=1) + +# ===== 用户分类 ===== +print("分类付费用户...") +cur.execute(""" + SELECT o.account_id, o.trade_no, o.order_status, o.pay_success_date, + CASE WHEN o.goods_id IN (57, 60, 63) THEN 'L1' + WHEN o.goods_id = 61 THEN 'L1+L2' + WHEN o.goods_id IN (31, 32, 33, 54) THEN 'L2' + ELSE '其他' END as level_type + FROM bi_vala_order o + INNER JOIN bi_vala_app_account a ON o.account_id = a.id + WHERE a.status = 1 AND a.deleted_at IS NULL AND o.pay_success_date IS NOT NULL +""") +orders = cur.fetchall() + +cur.execute("SELECT trade_no FROM bi_refund_order WHERE status = 3") +refund_trades = set(r[0] for r in cur.fetchall()) + +user_data = defaultdict(lambda: {'levels': set(), 'orders': []}) +for aid, trade_no, order_status, pay_date, lt in orders: + is_refunded = (order_status == 4 and trade_no in refund_trades) + user_data[aid]['levels'].add(lt) + user_data[aid]['orders'].append((pay_date.date(), is_refunded, lt)) + +def classify(levels): + h1, h2 = 'L1' in levels, 'L2' in levels + return 'L1+L2' if ('L1+L2' in levels or (h1 and h2)) else ('仅L1' if h1 else ('仅L2' if h2 else '其他')) + +for aid in user_data: + user_data[aid]['category'] = classify(user_data[aid]['levels']) + +def is_paid(aid, as_of): + return sum(1 for pd, ref, lt in user_data[aid]['orders'] if pd <= as_of and not ref) > 0 + +# ===== 课消 ===== +print("查询课消...") +cons_map = {} +for table_idx in range(8): + tbl = f"bi_user_chapter_play_record_{table_idx}" + cur.execute(f""" + SELECT user_id, chapter_id, updated_at + FROM {tbl} + WHERE play_status = 1 AND updated_at >= '2025-09-01' AND updated_at < '2026-05-11' + """) + for uid, cid, ua in cur.fetchall(): + if cid in u0_chapters: continue + key = (uid, cid) + d = ua.date() if hasattr(ua, 'date') else datetime.strptime(str(ua)[:10], '%Y-%m-%d').date() + if key not in cons_map or d < cons_map[key]: + cons_map[key] = d + +# 角色映射 +print("角色映射...") +all_uids = list(set(k[0] for k in cons_map)) +char2acct = {} +bs = 500 +for i in range(0, len(all_uids), bs): + batch = all_uids[i:i+bs] + ph = ','.join(['%s'] * len(batch)) + cur.execute(f"SELECT id, account_id FROM bi_vala_app_character WHERE id IN ({ph})", batch) + for cid, aid in cur.fetchall(): + char2acct[cid] = aid + +# ===== 按周汇总 ===== +print("按周汇总...") +results = [] +for ws, we in weeks: + paid_by_cat = defaultdict(set) + for aid in user_data: + if is_paid(aid, we): + paid_by_cat[user_data[aid]['category']].add(aid) + + cons_by_cat = defaultdict(int) + cons_users_by_cat = defaultdict(set) + + for (uid, ch_id), cons_date in cons_map.items(): + if ws <= cons_date <= we: + aid = char2acct.get(uid) + if aid: + cat = user_data.get(aid, {}).get('category', '其他') + if aid in paid_by_cat.get(cat, set()): + cons_by_cat[cat] += 1 + cons_users_by_cat[cat].add(aid) + + row = {'ws': ws, 'we': we} + for cat in ['仅L1', '仅L2', 'L1+L2']: + n_paid = len(paid_by_cat.get(cat, set())) + n_cons = cons_by_cat.get(cat, 0) + n_cons_users = len(cons_users_by_cat.get(cat, set())) + row[f'{cat}_paid'] = n_paid + row[f'{cat}_cons'] = n_cons + row[f'{cat}_cons_users'] = n_cons_users + row[f'{cat}_no_cons'] = n_paid - n_cons_users + row[f'{cat}_avg_all'] = round(n_cons / n_paid, 2) if n_paid > 0 else 0 + row[f'{cat}_avg_cons'] = round(n_cons / n_cons_users, 2) if n_cons_users > 0 else 0 + results.append(row) + +cur.close() +conn.close() + +# ===== 图表生成 ===== +print("\n生成图表...") +output_dir = '/root/.openclaw/workspace/output' + +configs = { + 'L1': {'cat': '仅L1', 'color': '#4A90D9', 'light': '#A8CFF1', 'label': 'L1'}, + 'L2': {'cat': '仅L2', 'color': '#E85D47', 'light': '#F4A9A0', 'label': 'L2'}, +} + +for key, cfg in configs.items(): + cat = cfg['cat'] + color = cfg['color'] + light = cfg['light'] + label = cfg['label'] + + # 过滤无数据周 + first = next(i for i, r in enumerate(results) if r[f'{cat}_paid'] > 0) + data = results[first:] + + xs = [r['ws'] + timedelta(days=3) for r in data] + labels = [r['ws'].strftime('%m/%d') for r in data] + paid = [r[f'{cat}_paid'] for r in data] + cons_users = [r[f'{cat}_cons_users'] for r in data] + no_cons = [r[f'{cat}_no_cons'] for r in data] + avg_all = [r[f'{cat}_avg_all'] for r in data] + avg_cons = [r[f'{cat}_avg_cons'] for r in data] + + # --- 图1: 堆叠柱状图 --- + fig, ax = plt.subplots(figsize=(18, 8)) + + x_idx = np.arange(len(xs)) + bar_w = 0.65 + + p1 = ax.bar(x_idx, cons_users, bar_w, color=light, label='有课消用户', zorder=3) + p2 = ax.bar(x_idx, no_cons, bar_w, bottom=cons_users, color='#D0D0D0', label='无课消用户', zorder=3) + + # 标注付费总数 + for i, (p, c, n) in enumerate(zip(paid, cons_users, no_cons)): + if i % max(1, len(data)//12) == 0: + ax.annotate(str(p), (i, p), textcoords='offset points', xytext=(0, 6), + fontsize=8, ha='center', color='#333333', fontweight='bold') + + ax.set_xticks(x_idx[::max(1, len(data)//12)]) + ax.set_xticklabels([labels[i] for i in range(0, len(data), max(1, len(data)//12))], fontsize=9, rotation=45) + + ax.set_ylabel('用户数', fontsize=13) + ax.set_title(f'{label} 付费用户周课消分布(剔除U0序章)', fontsize=16, fontweight='bold') + ax.legend(fontsize=12, loc='upper left') + ax.grid(axis='y', alpha=0.3, zorder=0) + ax.set_xlim(-0.5, len(x_idx) - 0.5) + + # 无消率标注 + no_rate = no_cons[-1] / paid[-1] * 100 if paid[-1] else 0 + ax.text(0.97, 0.95, f'无课消率: {no_rate:.0f}%', transform=ax.transAxes, + fontsize=11, ha='right', va='top', color='#999999', fontstyle='italic') + + plt.tight_layout() + path1 = f'{output_dir}/{key}_users_stack.png' + plt.savefig(path1, dpi=150, bbox_inches='tight', facecolor='white') + plt.close() + print(f' ✅ {path1}') + + # --- 图2: 折线图 --- + fig, ax = plt.subplots(figsize=(18, 8)) + + ax.plot(xs, avg_all, 'o-', color='#999999', linewidth=2.2, markersize=5, + label='周人均课消(全部付费用户)', linestyle='--', markerfacecolor='white') + ax.plot(xs, avg_cons, 's-', color=color, linewidth=2.8, markersize=5, + label='周有消人均课消', markerfacecolor='white') + + # 填色区域 + ax.fill_between(xs, avg_all, avg_cons, alpha=0.08, color=color) + + # 标注关键数据点 + for i in range(len(xs)): + if i % max(1, len(data)//8) == 0: + ax.annotate(f'{avg_all[i]:.1f}', (xs[i], avg_all[i]), textcoords='offset points', + xytext=(0, -16), fontsize=7.5, color='#999999', ha='center') + ax.annotate(f'{avg_cons[i]:.1f}', (xs[i], avg_cons[i]), textcoords='offset points', + xytext=(0, 8), fontsize=7.5, color=color, ha='center', fontweight='bold') + + ax.xaxis.set_major_formatter(mdates.DateFormatter('%m/%d')) + ax.xaxis.set_major_locator(mdates.MonthLocator()) + plt.setp(ax.xaxis.get_majorticklabels(), rotation=45, fontsize=9) + + ax.set_ylabel('课消数(节/周)', fontsize=13) + ax.set_title(f'{label} 周人均课消趋势(剔除U0序章)', fontsize=16, fontweight='bold') + ax.legend(fontsize=12, loc='upper left') + ax.grid(True, alpha=0.3) + ax.set_xlim(date(2025, 8, 30), date(2026, 5, 12)) + + plt.tight_layout() + path2 = f'{output_dir}/{key}_avg_trend.png' + plt.savefig(path2, dpi=150, bbox_inches='tight', facecolor='white') + plt.close() + print(f' ✅ {path2}') + +print('\n全部 4 张图表已生成!') diff --git a/scripts/generate_charts_v3.py b/scripts/generate_charts_v3.py new file mode 100644 index 0000000..555144a --- /dev/null +++ b/scripts/generate_charts_v3.py @@ -0,0 +1,218 @@ +#!/usr/bin/env python3 +""" +图表 v2:L1付费用户 = 仅L1 + L1+L2,L2付费用户 = 仅L2 + L1+L2 +""" +import psycopg2 +from collections import defaultdict +from datetime import datetime, timedelta, date +import matplotlib +matplotlib.use('Agg') +import matplotlib.pyplot as plt +import matplotlib.dates as mdates +import matplotlib.font_manager as fm +import numpy as np + +fm.fontManager.addfont('/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc') +plt.rcParams['font.family'] = fm.FontProperties(fname='/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc').get_name() +plt.rcParams['axes.unicode_minus'] = False + +conn = psycopg2.connect( + host="bj-postgres-16pob4sg.sql.tencentcdb.com", + port=28591, user="ai_member", + password="LdfjdjL83h3h3^$&**YGG*", dbname="vala_bi" +) +cur = conn.cursor() + +u0_chapters = {55, 56, 57, 58, 59, 343, 344, 345, 346, 348} +overall_start = date(2025, 9, 1) +overall_end = date(2026, 5, 11) + +weeks = [] +d = overall_start +while d < overall_end: + ws = d + we = d + timedelta(days=6 - d.weekday()) + if we >= overall_end: we = overall_end - timedelta(days=1) + weeks.append((ws, we)) + d = we + timedelta(days=1) + +print("分类付费用户...") +cur.execute(""" + SELECT o.account_id, o.trade_no, o.order_status, o.pay_success_date, + CASE WHEN o.goods_id IN (57, 60, 63) THEN 'L1' + WHEN o.goods_id = 61 THEN 'L1+L2' + WHEN o.goods_id IN (31, 32, 33, 54) THEN 'L2' + ELSE '其他' END as level_type + FROM bi_vala_order o + INNER JOIN bi_vala_app_account a ON o.account_id = a.id + WHERE a.status = 1 AND a.deleted_at IS NULL AND o.pay_success_date IS NOT NULL +""") +orders = cur.fetchall() + +cur.execute("SELECT trade_no FROM bi_refund_order WHERE status = 3") +refund_trades = set(r[0] for r in cur.fetchall()) + +user_levels = defaultdict(set) +user_orders = defaultdict(list) +for aid, trade_no, order_status, pay_date, lt in orders: + is_refunded = (order_status == 4 and trade_no in refund_trades) + user_levels[aid].add(lt) + user_orders[aid].append((pay_date.date(), is_refunded)) + +def is_paid(aid, as_of): + return sum(1 for pd, ref in user_orders[aid] if pd <= as_of and not ref) > 0 + +# 分组:L1群 = 仅L1 + L1+L2;L2群 = 仅L2 + L1+L2 +l1_group = set() # 买了L1的所有用户 +l2_group = set() # 买了L2的所有用户 +for aid, levels in user_levels.items(): + has_l1 = 'L1' in levels or 'L1+L2' in levels + has_l2 = 'L2' in levels or 'L1+L2' in levels + if has_l1: l1_group.add(aid) + if has_l2: l2_group.add(aid) + +print(f"L1付费群: {len(l1_group)}人, L2付费群: {len(l2_group)}人, 重叠(L1+L2): {len(l1_group & l2_group)}人") + +print("查询课消...") +cons_map = {} +for ti in range(8): + tbl = f"bi_user_chapter_play_record_{ti}" + cur.execute(f"""SELECT user_id, chapter_id, updated_at FROM {tbl} + WHERE play_status = 1 AND updated_at >= '2025-09-01' AND updated_at < '2026-05-11'""") + for uid, cid, ua in cur.fetchall(): + if cid in u0_chapters: continue + key = (uid, cid) + d = ua.date() if hasattr(ua, 'date') else datetime.strptime(str(ua)[:10], '%Y-%m-%d').date() + if key not in cons_map or d < cons_map[key]: + cons_map[key] = d + +print("角色映射...") +all_uids = list(set(k[0] for k in cons_map)) +char2acct = {} +for i in range(0, len(all_uids), 500): + batch = all_uids[i:i+500] + ph = ','.join(['%s'] * len(batch)) + cur.execute(f"SELECT id, account_id FROM bi_vala_app_character WHERE id IN ({ph})", batch) + for cid, aid in cur.fetchall(): char2acct[cid] = aid + +print("按周汇总...") +results = [] +for ws, we in weeks: + # 截至 we 的付费用户 + l1_paid = {aid for aid in l1_group if is_paid(aid, we)} + l2_paid = {aid for aid in l2_group if is_paid(aid, we)} + + l1_cons, l1_cons_users = 0, set() + l2_cons, l2_cons_users = 0, set() + + for (uid, ch_id), cons_date in cons_map.items(): + if ws <= cons_date <= we: + aid = char2acct.get(uid) + if not aid: continue + if aid in l1_paid: + l1_cons += 1 + l1_cons_users.add(aid) + if aid in l2_paid: + l2_cons += 1 + l2_cons_users.add(aid) + + results.append({ + 'ws': ws, 'we': we, + 'L1_paid': len(l1_paid), 'L1_cons': l1_cons, 'L1_cons_users': len(l1_cons_users), + 'L1_no_cons': len(l1_paid) - len(l1_cons_users), + 'L1_avg_all': round(l1_cons / len(l1_paid), 2) if l1_paid else 0, + 'L1_avg_cons': round(l1_cons / len(l1_cons_users), 2) if l1_cons_users else 0, + 'L2_paid': len(l2_paid), 'L2_cons': l2_cons, 'L2_cons_users': len(l2_cons_users), + 'L2_no_cons': len(l2_paid) - len(l2_cons_users), + 'L2_avg_all': round(l2_cons / len(l2_paid), 2) if l2_paid else 0, + 'L2_avg_cons': round(l2_cons / len(l2_cons_users), 2) if l2_cons_users else 0, + }) + +cur.close() +conn.close() + +# ===== 生成图表 ===== +print("\n生成图表...") +out = '/root/.openclaw/workspace/output' + +configs = { + 'L1_all': {'prefix': 'L1', 'color': '#4A90D9', 'light': '#A8CFF1', 'label': 'L1'}, + 'L2_all': {'prefix': 'L2', 'color': '#E85D47', 'light': '#F4A9A0', 'label': 'L2'}, +} + +for key, cfg in configs.items(): + pfx = cfg['prefix'] + color = cfg['color'] + light = cfg['light'] + label = cfg['label'] + + first = next(i for i, r in enumerate(results) if r[f'{pfx}_paid'] > 0) + data = results[first:] + + xs = [r['ws'] + timedelta(days=3) for r in data] + dates = [r['ws'] for r in data] + paid = [r[f'{pfx}_paid'] for r in data] + cons_users = [r[f'{pfx}_cons_users'] for r in data] + no_cons = [r[f'{pfx}_no_cons'] for r in data] + avg_all = [r[f'{pfx}_avg_all'] for r in data] + avg_cons = [r[f'{pfx}_avg_cons'] for r in data] + + # 图1: 堆叠柱状 + fig, ax = plt.subplots(figsize=(18, 8)) + x_idx = np.arange(len(xs)) + bar_w = 0.65 + ax.bar(x_idx, cons_users, bar_w, color=light, label='有课消用户', zorder=3) + ax.bar(x_idx, no_cons, bar_w, bottom=cons_users, color='#D0D0D0', label='无课消用户', zorder=3) + + step = max(1, len(data)//10) + for i in range(0, len(data), step): + ax.annotate(str(paid[i]), (i, paid[i]), textcoords='offset points', xytext=(0, 5), + fontsize=7.5, ha='center', color='#333333', fontweight='bold') + + ax.set_xticks(x_idx[::step]) + ax.set_xticklabels([dates[i].strftime('%m/%d') for i in range(0, len(data), step)], fontsize=8.5, rotation=45) + ax.set_ylabel('用户数', fontsize=13) + ax.set_title(f'{label}付费用户周课消分布(剔除U0序章)', fontsize=16, fontweight='bold') + ax.legend(fontsize=12, loc='upper left') + ax.grid(axis='y', alpha=0.3, zorder=0) + ax.set_xlim(-0.5, len(x_idx) - 0.5) + + no_rate = no_cons[-1] / paid[-1] * 100 if paid[-1] else 0 + ax.text(0.97, 0.95, f'付费{paid[-1]}人 | 无课消率{no_rate:.0f}%', transform=ax.transAxes, + fontsize=11, ha='right', va='top', color='#666666', fontstyle='italic') + + plt.tight_layout() + plt.savefig(f'{out}/{key}_users_stack.png', dpi=150, bbox_inches='tight', facecolor='white') + plt.close() + print(f' ✅ {key}_users_stack.png') + + # 图2: 折线 + fig, ax = plt.subplots(figsize=(18, 8)) + + ax.plot(xs, avg_all, 'o-', color='#999999', linewidth=2.2, markersize=5, + label='人均课消(全部付费用户)', markerfacecolor='white') + ax.plot(xs, avg_cons, 's-', color=color, linewidth=2.8, markersize=5, + label='人均课消(有课消用户)', markerfacecolor='white') + ax.fill_between(xs, avg_all, avg_cons, alpha=0.08, color=color) + + for i in range(0, len(data), max(1, len(data)//8)): + ax.annotate(f'{avg_all[i]:.1f}', (xs[i], avg_all[i]), textcoords='offset points', + xytext=(0, -15), fontsize=7.5, color='#999999', ha='center') + ax.annotate(f'{avg_cons[i]:.1f}', (xs[i], avg_cons[i]), textcoords='offset points', + xytext=(0, 7), fontsize=7.5, color=color, ha='center', fontweight='bold') + + ax.xaxis.set_major_formatter(mdates.DateFormatter('%m/%d')) + ax.xaxis.set_major_locator(mdates.MonthLocator()) + plt.setp(ax.xaxis.get_majorticklabels(), rotation=45, fontsize=9) + ax.set_ylabel('课消数(节/周)', fontsize=13) + ax.set_title(f'{label}付费用户周人均课消趋势(剔除U0序章)', fontsize=16, fontweight='bold') + ax.legend(fontsize=12, loc='upper left') + ax.grid(True, alpha=0.3) + ax.set_xlim(date(2025, 8, 30), date(2026, 5, 12)) + + plt.tight_layout() + plt.savefig(f'{out}/{key}_avg_trend.png', dpi=150, bbox_inches='tight', facecolor='white') + plt.close() + print(f' ✅ {key}_avg_trend.png') + +print('\n✅ 4张图表已生成') diff --git a/scripts/generate_excel.py b/scripts/generate_excel.py new file mode 100644 index 0000000..751c7bf --- /dev/null +++ b/scripts/generate_excel.py @@ -0,0 +1,385 @@ +#!/usr/bin/env python3 +""" +生成课消指标 Excel:按周 + 按 L1/L2 拆分 +""" +import psycopg2 +from collections import defaultdict +from datetime import datetime, timedelta, date +import openpyxl +from openpyxl.styles import Font, Alignment, PatternFill, Border, Side +from openpyxl.chart import LineChart, Reference +from openpyxl.utils import get_column_letter +from openpyxl.chart.label import DataLabelList +from openpyxl.chart.series import DataPoint + +conn = psycopg2.connect( + host="bj-postgres-16pob4sg.sql.tencentcdb.com", + port=28591, user="ai_member", + password="LdfjdjL83h3h3^$&**YGG*", dbname="vala_bi" +) +cur = conn.cursor() + +# ===== 时间参数 ===== +overall_start = date(2025, 9, 1) +overall_end = date(2026, 5, 11) + +weeks = [] +d = overall_start +while d < overall_end: + ws = d + days_to_sunday = 6 - d.weekday() + we = d + timedelta(days=days_to_sunday) + if we >= overall_end: + we = overall_end - timedelta(days=1) + weeks.append((ws, we)) + d = we + timedelta(days=1) + +# ===== Step 1: 用户分类 ===== +print("Step 1: 分类付费用户...") +cur.execute(""" + SELECT o.account_id, o.trade_no, o.order_status, o.pay_success_date, + CASE WHEN o.goods_id IN (57, 60, 63) THEN 'L1' + WHEN o.goods_id = 61 THEN 'L1+L2' + WHEN o.goods_id IN (31, 32, 33, 54) THEN 'L2' + ELSE '其他' END as level_type + FROM bi_vala_order o + INNER JOIN bi_vala_app_account a ON o.account_id = a.id + WHERE a.status = 1 AND a.deleted_at IS NULL AND o.pay_success_date IS NOT NULL +""") +orders = cur.fetchall() +print(f" 订单数: {len(orders)}") + +cur.execute("SELECT trade_no FROM bi_refund_order WHERE status = 3") +refund_trades = set(r[0] for r in cur.fetchall()) + +user_data = defaultdict(lambda: {'levels': set(), 'orders': []}) +for aid, trade_no, order_status, pay_date, lt in orders: + is_refunded = (order_status == 4 and trade_no in refund_trades) + user_data[aid]['levels'].add(lt) + user_data[aid]['orders'].append((pay_date.date(), is_refunded, lt)) + +def classify_user(levels): + has_l1, has_l2 = 'L1' in levels, 'L2' in levels + return 'L1+L2' if ('L1+L2' in levels or (has_l1 and has_l2)) else ('仅L1' if has_l1 else ('仅L2' if has_l2 else '其他')) + +for aid in user_data: + user_data[aid]['category'] = classify_user(user_data[aid]['levels']) + +def is_paid_as_of(aid, as_of_date): + return sum(1 for pd, ref, lt in user_data[aid]['orders'] if pd <= as_of_date and not ref) > 0 + +# ===== Step 2: 课消 ===== +print("Step 2: 查询课消...") +consumption_map = {} +for table_idx in range(8): + tbl = f"bi_user_chapter_play_record_{table_idx}" + cur.execute(f""" + SELECT user_id, chapter_id, updated_at + FROM {tbl} + WHERE play_status = 1 AND updated_at >= '2025-09-01' AND updated_at < '2026-05-11' + """) + for user_id, chapter_id, updated_at in cur.fetchall(): + key = (user_id, chapter_id) + d = updated_at.date() if hasattr(updated_at, 'date') else datetime.strptime(str(updated_at)[:10], '%Y-%m-%d').date() + if key not in consumption_map or d < consumption_map[key]: + consumption_map[key] = d + +print(f" 去重后: {len(consumption_map)} 条") + +# ===== Step 3: 角色映射 ===== +print("Step 3: 角色映射...") +all_uids = list(set(k[0] for k in consumption_map)) +char2acct = {} +bs = 500 +for i in range(0, len(all_uids), bs): + batch = all_uids[i:i+bs] + ph = ','.join(['%s'] * len(batch)) + cur.execute(f"SELECT id, account_id FROM bi_vala_app_character WHERE id IN ({ph})", batch) + for cid, aid in cur.fetchall(): + char2acct[cid] = aid +print(f" 映射: {len(char2acct)}") + +# ===== Step 4: 按周汇总 ===== +print("Step 4: 按周汇总...") +results = [] +for ws, we in weeks: + paid_by_cat = defaultdict(set) + for aid in user_data: + if is_paid_as_of(aid, we): + paid_by_cat[user_data[aid]['category']].add(aid) + + cons_by_cat = defaultdict(int) + cons_users_by_cat = defaultdict(set) + + for (uid, ch_id), cons_date in consumption_map.items(): + if ws <= cons_date <= we: + aid = char2acct.get(uid) + if aid: + cat = user_data.get(aid, {}).get('category', '其他') + if aid in paid_by_cat.get(cat, set()): + cons_by_cat[cat] += 1 + cons_users_by_cat[cat].add(aid) + + row = {'ws': ws, 'we': we} + for cat in ['仅L1', '仅L2', 'L1+L2', '其他', '合计']: + if cat == '合计': + n_paid = sum(len(v) for v in paid_by_cat.values()) + n_cons = sum(cons_by_cat.values()) + n_cons_users = len(set.union(*cons_users_by_cat.values())) if cons_users_by_cat else 0 + else: + n_paid = len(paid_by_cat.get(cat, set())) + n_cons = cons_by_cat.get(cat, 0) + n_cons_users = len(cons_users_by_cat.get(cat, set())) + + row[f'{cat}_paid'] = n_paid + row[f'{cat}_cons'] = n_cons + row[f'{cat}_cons_users'] = n_cons_users + row[f'{cat}_avg_all'] = round(n_cons / n_paid, 2) if n_paid > 0 else 0 + row[f'{cat}_avg_cons'] = round(n_cons / n_cons_users, 2) if n_cons_users > 0 else 0 + + results.append(row) + +cur.close() +conn.close() + +# ===== 生成 Excel ===== +print("\n生成 Excel...") +wb = openpyxl.Workbook() + +# 样式 +header_font = Font(name='微软雅黑', bold=True, size=10, color='FFFFFF') +header_fill = PatternFill(start_color='2F5496', end_color='2F5496', fill_type='solid') +data_font = Font(name='微软雅黑', size=10) +title_font = Font(name='微软雅黑', bold=True, size=14, color='2F5496') +subtitle_font = Font(name='微软雅黑', bold=True, size=11, color='2F5496') +border = Border(left=Side(style='thin'), right=Side(style='thin'), top=Side(style='thin'), bottom=Side(style='thin')) +center = Alignment(horizontal='center', vertical='center') + +l1_fill = PatternFill(start_color='DAEEF3', end_color='DAEEF3', fill_type='solid') +l2_fill = PatternFill(start_color='FDE9D9', end_color='FDE9D9', fill_type='solid') +l1l2_fill = PatternFill(start_color='E4DFEC', end_color='E4DFEC', fill_type='solid') +total_fill = PatternFill(start_color='D9EAD3', end_color='D9EAD3', fill_type='solid') + +def apply_cell(ws, row, col, value, font=data_font, fill=None, border_style=border, align=center): + c = ws.cell(row=row, column=col, value=value) + c.font, c.border, c.alignment = font, border_style, align + if fill: c.fill = fill + return c + +def apply_header(ws, row, col, value): + c = ws.cell(row=row, column=col, value=value) + c.font, c.fill, c.border, c.alignment = header_font, header_fill, border, center + return c + +# ===== Sheet 1: 概览 ===== +ws1 = wb.active +ws1.title = "概览" +ws1.merge_cells('A1:G1') +apply_cell(ws1, 1, 1, "付费用户 L1/L2 课消分析", font=title_font, border_style=Border(), align=Alignment(horizontal='left')) +ws1.merge_cells('A2:G2') +apply_cell(ws1, 2, 1, f"数据区间: 2025-09-01 ~ 2026-05-10 | 更新日期: 2026-05-14", font=Font(name='微软雅黑', size=9, color='666666'), border_style=Border(), align=Alignment(horizontal='left')) + +# 口径说明 +notes = [ + "口径说明:", + "• 课消:用户首次完成某一课时(play_status=1,按(user_id,chapter_id)取最早updated_at)", + "• L1商品: goods_id IN (57,60,63) | L2商品: goods_id IN (31,32,33,54) | L1+L2商品: goods_id=61", + "• 付费用户:status=1 + deleted_at IS NULL + 有订单 + 未全部退款", + "• 人均课消 = 周内课消次数 / 付费用户数", + "• 有消用户人均 = 周内课消次数 / 至少完成1次课消的付费用户数", +] +for i, note in enumerate(notes): + apply_cell(ws1, 4+i, 1, note, font=Font(name='微软雅黑', size=9), border_style=Border(), align=Alignment(horizontal='left')) + +# 汇总表 +row = 11 +ws1.merge_cells(f'A{row}:K{row}') +apply_cell(ws1, row, 1, "付费用户分类(截至最后一周)", font=subtitle_font, border_style=Border(), align=Alignment(horizontal='left')) +row += 1 + +headers_summary = ['分类', '付费用户数', '占比'] +for j, h in enumerate(headers_summary, 1): + apply_header(ws1, row, j, h) +row += 1 + +last = results[-1] +cats_data = [('仅L1', last['仅L1_paid']), ('仅L2', last['仅L2_paid']), ('L1+L2', last['L1+L2_paid'])] +total = sum(v for _, v in cats_data) +for cat, v in cats_data: + apply_cell(ws1, row, 1, cat) + apply_cell(ws1, row, 2, v) + apply_cell(ws1, row, 3, f"{v/total*100:.1f}%") + if '仅L1' in cat: fill = l1_fill + elif '仅L2' in cat: fill = l2_fill + else: fill = l1l2_fill + for c in range(1, 4): ws1.cell(row=row, column=c).fill = fill + row += 1 + +apply_cell(ws1, row, 1, '合计', font=Font(name='微软雅黑', bold=True, size=10)) +apply_cell(ws1, row, 2, total, font=Font(name='微软雅黑', bold=True, size=10)) +apply_cell(ws1, row, 3, '100%', font=Font(name='微软雅黑', bold=True, size=10)) +for c in range(1, 4): ws1.cell(row=row, column=c).fill = total_fill + +# 近期趋势摘要 +row += 2 +ws1.merge_cells(f'A{row}:K{row}') +apply_cell(ws1, row, 1, "近期人均课消趋势", font=subtitle_font, border_style=Border(), align=Alignment(horizontal='left')) +row += 1 + +trend_headers = ['周', '合计人均', '仅L1人均', '仅L2人均', 'L1+L2人均', '合计有消人均', '仅L1有消人均', '仅L2有消人均', 'L1+L2有消人均'] +for j, h in enumerate(trend_headers, 1): + apply_header(ws1, row, j, h) +row += 1 + +for r in results[-8:]: # 最近8周 + wl = f"{r['ws'].strftime('%m/%d')}-{r['we'].strftime('%m/%d')}" + apply_cell(ws1, row, 1, wl, font=Font(name='微软雅黑', size=9)) + apply_cell(ws1, row, 2, r['合计_avg_all'], font=Font(name='微软雅黑', size=9)) + apply_cell(ws1, row, 3, r['仅L1_avg_all'], font=Font(name='微软雅黑', size=9)) + apply_cell(ws1, row, 4, r['仅L2_avg_all'], font=Font(name='微软雅黑', size=9)) + apply_cell(ws1, row, 5, r['L1+L2_avg_all'], font=Font(name='微软雅黑', size=9)) + apply_cell(ws1, row, 6, r['合计_avg_cons'], font=Font(name='微软雅黑', size=9)) + apply_cell(ws1, row, 7, r['仅L1_avg_cons'], font=Font(name='微软雅黑', size=9)) + apply_cell(ws1, row, 8, r['仅L2_avg_cons'], font=Font(name='微软雅黑', size=9)) + apply_cell(ws1, row, 9, r['L1+L2_avg_cons'], font=Font(name='微软雅黑', size=9)) + row += 1 + +# 列宽 +for col in range(1, 10): + ws1.column_dimensions[get_column_letter(col)].width = 14 + +# ===== Sheet 2: 明细 ===== +ws2 = wb.create_sheet("每周明细") + +# 标题行 +row2 = 1 +# 第一部分:付费用户数 +group_headers = [ + ('付费用户数', ['合计', '仅L1', '仅L2', 'L1+L2']), + ('课消次数', ['合计', '仅L1', '仅L2', 'L1+L2']), + ('有课消用户数', ['合计', '仅L1', '仅L2', 'L1+L2']), + ('人均课消(全部付费用户)', ['合计', '仅L1', '仅L2', 'L1+L2']), + ('人均课消(有课消用户)', ['合计', '仅L1', '仅L2', 'L1+L2']), +] + +apply_header(ws2, row2, 1, '周') +apply_header(ws2, row2, 2, '周一起') +apply_header(ws2, row2, 3, '周日') +col = 4 +spans = [] +for grp_name, cols in group_headers: + start_col = col + for _ in cols: + col += 1 + end_col = col - 1 + if start_col < end_col: + ws2.merge_cells(start_row=row2, start_column=start_col, end_row=row2, end_column=end_col) + apply_header(ws2, row2, start_col, grp_name) + spans.append((start_col, end_col, grp_name, cols)) + for ic, cname in enumerate(cols): + apply_header(ws2, row2+1, start_col+ic, cname) +col_count = col - 1 + +# 数据 +row2 = 3 +for r in results: + wl = f"{r['ws'].strftime('%m/%d')}-{r['we'].strftime('%m/%d')}" + apply_cell(ws2, row2, 1, wl) + apply_cell(ws2, row2, 2, r['ws'].strftime('%Y-%m-%d')) + apply_cell(ws2, row2, 3, r['we'].strftime('%Y-%m-%d')) + col = 4 + for grp_name, cols in group_headers: + for cname in cols: + key_map = { + '付费用户数': f"{cname}_paid", + '课消次数': f"{cname}_cons", + '有课消用户数': f"{cname}_cons_users", + '人均课消(全部付费用户)': f"{cname}_avg_all", + '人均课消(有课消用户)': f"{cname}_avg_cons", + } + val = r[key_map[grp_name]] + apply_cell(ws2, row2, col, val) + col += 1 + row2 += 1 + +# 列宽 +ws2.column_dimensions['A'].width = 14 +ws2.column_dimensions['B'].width = 12 +ws2.column_dimensions['C'].width = 12 +for ci in range(4, col_count + 1): + ws2.column_dimensions[get_column_letter(ci)].width = 10 + +# 冻结首3列+标题 +ws2.freeze_panes = 'D4' + +# ===== 图表 ===== +chart_sheet = wb.create_sheet("图表") + +# Chart 1: 人均课消趋势(按类别) +chart1 = LineChart() +chart1.title = "人均课消数(全部付费用户)" +chart1.style = 10 +chart1.y_axis.title = "课消数(节/周)" +chart1.x_axis.title = None +chart1.width = 28 +chart1.height = 14 +chart1.y_axis.scaling.min = 0 + +data_row_start = 3 +data_row_end = row2 - 1 + +# Categories (周标签) +cats_ref = Reference(ws2, min_col=1, min_row=data_row_start, max_row=data_row_end) + +# 各系列列号(人均课消 - 全部付费用户 section) +# 合计: col 16, 仅L1: col 17, 仅L2: col 18, L1+L2: col 19 +# 需要先确定列号 +header_row = 2 +grp_col_map = {} +col = 4 +for grp_name, cols in group_headers: + grp_col_map[grp_name] = col + col += len(cols) + +# 人均课消(全部): group 4, 从 grp_col_map['人均课消(全部付费用户)'] +start_avg = grp_col_map['人均课消(全部付费用户)'] +colors = ['333333', '4A90D9', 'E85D47', '7B9E4B'] +labels = ['合计', '仅L1', '仅L2', 'L1+L2'] +for i in range(4): + ref = Reference(ws2, min_col=start_avg+i, min_row=data_row_start-1, max_row=data_row_end) # -1 for header in row2 + chart1.add_data(ref, titles_from_data=True) + chart1.set_categories(cats_ref) + s = chart1.series[i] + s.graphicalProperties.line.solidFill = colors[i] + s.graphicalProperties.line.width = 25000 if i == 0 else 20000 + if i > 0: + s.graphicalProperties.line.dashStyle = 'solid' + +chart_sheet.add_chart(chart1, "A1") + +# Chart 2: 付费用户数增长 +chart2 = LineChart() +chart2.title = "付费用户数增长趋势" +chart2.style = 10 +chart2.y_axis.title = "用户数" +chart2.width = 28 +chart2.height = 14 + +start_paid = grp_col_map['付费用户数'] +for i in range(4): + ref = Reference(ws2, min_col=start_paid+i, min_row=data_row_start-1, max_row=data_row_end) + chart2.add_data(ref, titles_from_data=True) + chart2.set_categories(cats_ref) + s = chart2.series[i] + s.graphicalProperties.line.solidFill = colors[i] + s.graphicalProperties.line.width = 25000 if i == 0 else 20000 + +chart_sheet.add_chart(chart2, "A18") + +# ===== 保存 ===== +path = '/root/.openclaw/workspace/output/course_consumption_by_level.xlsx' +wb.save(path) +print(f"\n✅ Excel 已保存: {path}") +print(f" Sheet 1: 概览(口径说明 + 近期趋势)") +print(f" Sheet 2: 每周明细(36周完整数据)") +print(f" Sheet 3: 图表(人均课消趋势 + 付费用户增长)")