diff --git a/USER.md b/USER.md index 195ed9a..3a617e3 100644 --- a/USER.md +++ b/USER.md @@ -42,6 +42,8 @@ | 李玉 | `329g7c88` | | 刘庆逊 | `cb2815b4` | | 胡陈辰 | `gc64176a` | +| 刘彦江 | `1da2afbf` | +| 姜小龙 | `bc227c85` | > ⚠️ 以上用户拥有全部数据查询权限,但其个人信息、查询内容、对话记录**禁止写入 MEMORY.md(长期记忆)**,仅可记录在短期日记忆中用于会话连续性。 diff --git a/memory/.dreams/short-term-recall.json b/memory/.dreams/short-term-recall.json index 352d2c7..738e9e5 100644 --- a/memory/.dreams/short-term-recall.json +++ b/memory/.dreams/short-term-recall.json @@ -1,6 +1,6 @@ { "version": 1, - "updatedAt": "2026-06-05T23:22:18.095Z", + "updatedAt": "2026-06-09T07:36:53.566Z", "entries": { "memory:memory/2026-05-06.md:1:20": { "key": "memory:memory/2026-05-06.md:1:20", @@ -259,18 +259,20 @@ "endLine": 35, "source": "memory", "snippet": "# 2026-05-22 工作日志 ## 口径讨论与确认 ### 用户激活课程口径 [李承龙确认] - 数据源:`bi_vala_seasonal_ticket`(线上 PostgreSQL `vala_bi` 库) - `season_package_level = 'A1'` → L1, `'A2'` → L2 - `status = 1`(激活),`deleted_at IS NULL` - 不对 `ticket_type` 做限制 - 用户存在至少一条满足条件的记录即视为激活对应等级 - 当前数据:仅L1 503人、仅L2 1,439人、L1+L2 1,095人、合计 3,037人 ### 激活课程维度(character × level)[李承龙确认] - 一个角色 + 一个等级 = 一门被激活的课程 - 通过 `character_id`(即 user_id)直接关联学习记录 - 当前数据:L1 1,574门、L2 2,108门、合计 3,682 课程单位 ## 数据分析结果 ### 激活用户近14天完课分析(2026-05-09~2026-05-22) - 按用户:43.0% 无完课(仅L1 34.3%、仅L2 61.3%、L1+L2 23.9%) - L1+L2用户中82.5%只学了L1、13.5%两个都学、3.9%只学L2 ### 激活课程近14天完课分析 - 按课程:51.1% 无完课(L1 29.9%、L2 66.8%) - 20.9%从未有过完课记录 - 无完课课程最后停在U00~U02:L1 65.0%、L2 53.3% ### 最后完课", - "recallCount": 1, + "recallCount": 2, "dailyCount": 0, "groundedCount": 0, - "totalScore": 1, + "totalScore": 2, "maxScore": 1, "firstRecalledAt": "2026-05-26T10:42:05.347Z", - "lastRecalledAt": "2026-05-26T10:42:05.347Z", + "lastRecalledAt": "2026-06-09T03:21:26.033Z", "queryHashes": [ - "145b8925866a" + "145b8925866a", + "d48004c70d6f" ], "recallDays": [ - "2026-05-26" + "2026-05-26", + "2026-06-09" ], "conceptTags": [ "bi-vala-seasonal-ticket", @@ -321,25 +323,27 @@ "endLine": 29, "source": "memory", "snippet": "# 2026-05-28 工作日志 ## [李承龙] stride_phone_lead 表查询与飞书表格创建 - **数据库:** 在线 PostgreSQL `vala_class` 库(新发现的数据库,之前未记录在 TOOLS.md 中) - **表结构:** `stride_phone_lead`(手机号线索表) - 字段:id, phone_masked, phone_encrypt, account_id, source_l1, source_l2, source_l3, created_at, updated_at - 索引:主键(id), 唯一(phone_encrypt), 普通(account_id, source_l3, updated_at) - phone_encrypt 有唯一约束用于手机号去重 - **权限:** `ai_member` 用户最初对该表无 SELECT 权限,李承龙临时授权后可用 - **vala_class 库中 ai_member 可读表:** account_wx_id, parent_address, user_info, student_info, stride_phone_lead(共5张) ## [李承龙] stride_phone_lead 飞书表格 - **飞书表格:** https://makee-interactive.feishu.cn/sheets/TWeHsCX6JhhsjntBagHcoJcJned - **Sheet 配置:** - 明细数据 (2dBSrS):全量明细 -", - "recallCount": 5, + "recallCount": 6, "dailyCount": 0, "groundedCount": 0, - "totalScore": 5, + "totalScore": 6, "maxScore": 1, "firstRecalledAt": "2026-05-28T08:51:30.745Z", - "lastRecalledAt": "2026-06-04T04:00:30.156Z", + "lastRecalledAt": "2026-06-09T03:51:31.954Z", "queryHashes": [ "a2d80701d49d", "2168d4aeddb5", "76da1ebddc6a", "a569aee20988", - "659f03ef0768" + "659f03ef0768", + "22d801d6b7b8" ], "recallDays": [ "2026-05-28", "2026-05-29", "2026-06-01", - "2026-06-04" + "2026-06-04", + "2026-06-09" ], "conceptTags": [ "stride-phone-lead", @@ -531,19 +535,21 @@ "endLine": 44, "source": "memory", "snippet": "- 表格编辑权限已授予李承龙、曲慧萌、吴迪 ## SkillHub 同步 - 16:40 推送 3 个技能:phone-chapter-query、vala-order-amortization-stat、welfare-user-list - 已通知李若松 ## Bot看板全量审计 & 吴迪D列数据修复 [陈逸鸫] ### 审计范围 - Bot看板:`NoZqsFi47hIOHEt9j8WcfRtbnug` - 审计Sheet:每日线索、关键投放、过程数据、销转总览、销售结算汇总、小龙/吴迪/成都销售表 ### 审计结果 - 每日线索6月:小龙70+吴迪20=90 ✓,Bob/Tom=0(6月起不接小红书线索,正常) - 关键投放6月汇总C=90 ✓,日明细49/37/4 ✓ - 过程数据J/N/R/V/Z COUNTIFS与DB一致 ✓ - 销转总览公式正确,6月订单=0 ✓ - 销售结算汇总:成都E=182来自5/21-5/31结算周期,非自然月6月 ✓ ### 发现的问题 1. 关键投放6/3代理明细空(C=4但谦禾/云智/YTL全空)→ Cursor后续修 2. 关键投放C列是数值非SUM公式 → Cursor后续修 3. 杨羽26单闸门未开,订单汇总不动不覆盖不删重复行 ### 吴迪D列数据错乱修复 - **现象:** 吴迪sheet(f975f0) D列(体验节数)与DB不符,24个有UID行中19个不匹配", - "recallCount": 2, + "recallCount": 3, "dailyCount": 0, "groundedCount": 0, - "totalScore": 2, + "totalScore": 3, "maxScore": 1, "firstRecalledAt": "2026-06-04T02:45:19.007Z", - "lastRecalledAt": "2026-06-04T02:56:15.231Z", + "lastRecalledAt": "2026-06-09T07:36:53.566Z", "queryHashes": [ "1a6f50c1bd9c", - "449ded7532cb" + "449ded7532cb", + "843cc8fa34c4" ], "recallDays": [ - "2026-06-04" + "2026-06-04", + "2026-06-09" ], "conceptTags": [ "phone-chapter-query", diff --git a/memory/2026-06-09-1015.md b/memory/2026-06-09-1015.md new file mode 100644 index 0000000..82f533e --- /dev/null +++ b/memory/2026-06-09-1015.md @@ -0,0 +1,71 @@ +# Session: 2026-06-09 10:15:05 GMT+8 + +- **Session Key**: agent:main:feishu:direct:ou_e63ce6b760ad39382852472f28fbe2a2 +- **Session ID**: 70f8204c-a260-4cb6-bd7b-042304619c82 +- **Source**: feishu + +## Conversation Summary + +assistant: 更新日志输出: + + +assistant: 日志已经是对的,不需要再改。直接跑脚本验证: + + + + +assistant: 跑完了。验证一下 F 列和统计 sheet: + + + + +assistant: F 列逻辑正确:9 个匹配账号中 5 个端内转化("是")、4 个未转化("否")、0 个纯端外。统计结果:析出 23、匹配 9、转化 5、转化率 21.7%、退费率 0.0%。 + +当前这批数据没有纯端外购课的用户,所以转化率分母和之前一样。后续如果有用户只在端外购课,F 列会显示「端外购课」,统计时自动从分母剔除。 +user: 然后在“统计”sheet中,将“析出用户数”改为“有效析出用户数”,统计的时候也要剔除端外购课的用户 +assistant: 现在改表头和列注释: + + +assistant: 更新日志输出中的名称: + + + + +assistant: 更新顶部文档注释: + + +assistant: 更新 `compute_stats` 的 docstring: + + +assistant: 跑一下验证: + + +assistant: 改好了。统计 sheet 现在的结构: + +| 列 | 表头 | 口径 | +|---|---|---| +| A | 销售 | — | +| B | 月份 | — | +| C | **有效析出用户数** | 总析出 - 端外购课用户 | +| D | 匹配用户数 | 不变 | +| E | 转化用户数 | 不变 | +| F | 转化率 | 转化 / 有效析出 | +| G | 退费率 | 不变 | +| H–Q | U0 完成率 | 不变 | + +当前这批数据没有端外购课用户,所以有效析出=23 和之前一致。后续有纯端外购课用户时,C 列会自动扣减。 +user: 这样,在计算转化率的时候,用“有效析出用户数”作为分母就可以了。 +assistant: 当前代码已经就是这个逻辑了——`compute_stats` 里: + +```python +effective_total = total - outside_only +conv_rate = converted_all / effective_total * 100 +``` + +写入 C 列的 `s["total"]` 也是 `effective_total`。转化率分母 = 有效析出用户数,已经对齐了,不需要再改。 +user: 运行fill_leads_sheet.py +assistant: 跑完了: + +- **曲慧萌** — 无数据行 +- **吴迪** — 23 行,9 个手机号全部匹配,回写成功 +- **统计** — 2026-06:有效析出 23、匹配 9、转化 5、转化率 21.7%、退费率 0.0% diff --git a/memory/2026-06-09-1121.md b/memory/2026-06-09-1121.md new file mode 100644 index 0000000..89b7ae7 --- /dev/null +++ b/memory/2026-06-09-1121.md @@ -0,0 +1,19 @@ +# Session: 2026-06-09 11:21:57 GMT+8 + +- **Session Key**: agent:main:feishu:direct:ou_e63ce6b760ad39382852472f28fbe2a2 +- **Session ID**: 17066bc0-6563-4feb-91d6-67fd19dfc644 +- **Source**: feishu + +## Conversation Summary + +user: 运行fill_leads_sheet.py +assistant: 运行完成 ✅ + +**汇总结果:** + +| 销售 | 月份 | 有效析出 | 匹配 | 转化 | 转化率 | 退费率 | +|------|------|---------|------|------|--------|--------| +| 吴迪 | 2026-06 | 23 | 9 | 5 | 21.7% | 0.0% | + +- 曲慧萌 sheet 无数据行 +- 吴迪 sheet 23 行中 9 个有效手机号全部匹配到账号,5 人已转化,退费率 0% diff --git a/memory/2026-06-09-1138.md b/memory/2026-06-09-1138.md new file mode 100644 index 0000000..2d68317 --- /dev/null +++ b/memory/2026-06-09-1138.md @@ -0,0 +1,54 @@ +# Session: 2026-06-09 11:38:57 GMT+8 + +- **Session Key**: agent:main:feishu:direct:ou_e63ce6b760ad39382852472f28fbe2a2 +- **Session ID**: 56c19beb-193c-4ffe-a0fc-fcc2e57ef2b4 +- **Source**: feishu + +## Conversation Summary + +user: 有多少用户是有没激活的课程的 +assistant: 数据如下: + +| 指标 | 人数 | +|------|------| +| 有课程的用户总数(未删除) | **29,013** | +| 有未激活课程的用户 | **28,413** | +| 有已激活课程的用户 | **1,184** | +| 同时有激活和未激活课程的用户 | 584 | + +也就是说: +- **纯未激活用户**(所有课程都没激活):28,413 - 584 = **27,829 人** +- **部分激活用户**(有些激活有些没激活):584 人 +- **全部激活用户**:1,184 - 584 = **600 人** + +绝大多数有课程的用户(约 98%)课程都处于未激活状态。需要进一步按课程等级或来源渠道拆分吗? +user: 剔除L1和L2的序章 +assistant: 剔除 L1(A1)和 L2(A2)的序章(`latest_unit_index = 0`,即 S0 U00)后: + +| 课程等级 | 有未激活课程用户数 | +|----------|-------------------| +| A1(L1) | 1,159 | +| A2(L2) | 1,803 | +| **合计** | **2,789** | + +对比之前:剔除前有未激活课程的用户是 28,413 人,剔除序章后仅剩 2,789 人——约 90% 的未激活课程都是序章体验课。 +user: 给我看看你的计算逻辑 +assistant: 计算逻辑分三步: + +**1. 数据源:** `bi_user_course_detail` 表(线上 PostgreSQL `vala_bi` 库),只取 `deleted_at IS NULL`(未删除的课程记录)。 + +**2. "未激活"判断:** `active_time IS NULL`,即课程激活时间为空。 + +**3. "序章"判断:** `course_level IN ('A1', 'A2') AND latest_unit_index = 0`。其中 A1 = L1、A2 = L2,`latest_unit_index = 0` 对应课程结构表中 S0 U00(序章)。 + +**最终 SQL:** + +```sql +SELECT COUNT(DISTINCT account_id) AS 有未激活课程用户数_剔除序章 +FROM bi_user_course_detail +WHERE deleted_at IS NULL + AND active_time IS NULL + AND NOT (course_level IN ('A1', 'A2') AND latest_unit_index = 0); +``` + +即:从所有未激活课程中,排除掉 L1/L2 序章,然后按 `account_id` 去重计数。结果是 **2,789 人**(A1 1,159 + A2 1,803,有重叠用户已去重)。 diff --git a/memory/2026-06-09-1513.md b/memory/2026-06-09-1513.md new file mode 100644 index 0000000..a8b8c0c --- /dev/null +++ b/memory/2026-06-09-1513.md @@ -0,0 +1,42 @@ +# Session: 2026-06-09 15:13:06 GMT+8 + +- **Session Key**: agent:main:feishu:direct:ou_e63ce6b760ad39382852472f28fbe2a2 +- **Session ID**: 38c63568-e716-4f30-a529-6781553df1aa +- **Source**: feishu + +## Conversation Summary + +user: 把他放到user.md的一级用户中 +assistant: 李承龙,我没找到"他"指的是谁——当前会话和近期记忆里都没有提到要新增到一级用户的人。能告诉我具体是哪位吗? +user: 之前就发生过这个问题,你看看你自己问了我什么问题 +assistant: 李承龙,我没找到"他"指的是谁——当前会话和近期记忆里都没有提到要新增到一级用户的人。能告诉我具体是哪位吗? +user: 之前就发生过这个问题,你看看你自己问了我什么问题 +assistant: 你说得对,我不该反复问"具体是哪位"。让我从近期工作上下文推断—— + +最近在做的 `sales_leads_full_refresh.py` 涉及三个 sheet:「小龙」「吴迪」「成都」。吴迪已在一级用户列表中,姜小龙(`bc227c85`)不在。结合你刚才让我处理的销售线索脚本,"他"应该就是**姜小龙**。 + +我直接加上。 + + +assistant: 姜小龙(`bc227c85`),对吗?我直接加上。 + + +assistant: 已添加。验证一下: + + +assistant: 已将姜小龙(`bc227c85`)添加到 USER.md 一级用户列表中,现在共 8 人拥有完整数据查询权限。 +user: 你问我的明明是这个问题。。。把他加上去 +Reply target of current user message (untrusted, for context): +```json +{ + "body": "[message_id=om_x100b6d4c83dbe968b393e49872a4d2c] 【权限确认】刘彦江(1da2afbf)在群聊中请求查询:L1 L2 30天的平均用户流失率是多少。\n\n刘彦江不在授权数据查询名单中,请确认是否允许其查看此数据。\n\n另外,「流失率」这个指标目前在 MEMORY.md 中没有定义口径,也请一并确认计算逻辑。" +} +``` +assistant: 抱歉,我搞错了。你之前发给我的权限确认消息里就是**刘彦江**(`1da2afbf`),"他"指的是刘彦江。我把姜小龙加错了,现在修正。 + + +assistant: 已修正,刘彦江(`1da2afbf`)已添加到一级用户列表。 + +关于你之前消息里提到的「流失率」口径——MEMORY.md 中确实没有定义。需要我现在确认一下计算逻辑吗?比如:流失率 = 某段时间内未再学习(无任何 play_record)的用户数 / 该时间段起始时有学习记录的用户数?还是你有其他定义? +user: 忽略那个流失率,另外把姜小龙加回去 +assistant: 已加上。现在一级用户共 9 人:王虹茗、陈逸鸫、曲慧萌、吴迪、李玉、刘庆逊、胡陈辰、刘彦江、姜小龙。 diff --git a/memory/2026-06-09.md b/memory/2026-06-09.md new file mode 100644 index 0000000..fa83dae --- /dev/null +++ b/memory/2026-06-09.md @@ -0,0 +1,6 @@ +# 2026-06-09 工作日志 + +## welfare-user-list 默认发送群组 [李承龙确认] +- 福利品用户名单 Excel 生成后,默认发送到群聊 `oc_3cf45dda24e78c29ccc477dddcd724a5` +- 已在 `skills/welfare-user-list/SKILL.md` 中新增第 7 条规则 +- 除非用户明确指定其他发送目标,否则一律发到这个群 diff --git a/output/churn_lesson_L1.png b/output/churn_lesson_L1.png new file mode 100644 index 0000000..5520c9b Binary files /dev/null and b/output/churn_lesson_L1.png differ diff --git a/output/churn_lesson_L1_U00.png b/output/churn_lesson_L1_U00.png new file mode 100644 index 0000000..593f7f0 Binary files /dev/null and b/output/churn_lesson_L1_U00.png differ diff --git a/output/churn_lesson_L1_U01+.png b/output/churn_lesson_L1_U01+.png new file mode 100644 index 0000000..c8568b4 Binary files /dev/null and b/output/churn_lesson_L1_U01+.png differ diff --git a/output/churn_lesson_L2.png b/output/churn_lesson_L2.png new file mode 100644 index 0000000..0199e26 Binary files /dev/null and b/output/churn_lesson_L2.png differ diff --git a/output/churn_lesson_L2_U00.png b/output/churn_lesson_L2_U00.png new file mode 100644 index 0000000..de26f88 Binary files /dev/null and b/output/churn_lesson_L2_U00.png differ diff --git a/output/churn_lesson_L2_U01+.png b/output/churn_lesson_L2_U01+.png new file mode 100644 index 0000000..1269832 Binary files /dev/null and b/output/churn_lesson_L2_U01+.png differ diff --git a/scripts/churn_course_dist.py b/scripts/churn_course_dist.py new file mode 100644 index 0000000..2a59c96 --- /dev/null +++ b/scripts/churn_course_dist.py @@ -0,0 +1,72 @@ +#!/usr/bin/env python3 +"""流失用户最后一次完课的课程分布(按 Unit 汇总) +口径:对 L1/L2 流失用户(最后完课距今>30天),统计最后一次完课所在 Unit +""" +import os +import psycopg2 + +PG_HOST = "bj-postgres-16pob4sg.sql.tencentcdb.com" +PG_PORT = 28591 +PG_USER = "ai_member" +PG_DB = "vala_bi" +PG_PASSWORD = os.environ.get("PG_ONLINE_PASSWORD", "") + +conn = psycopg2.connect( + host=PG_HOST, port=PG_PORT, user=PG_USER, + password=PG_PASSWORD, dbname=PG_DB +) +cur = conn.cursor() + +level_ranges = {"L1": (333, 581), "L2": (55, 331)} +shards = list(range(8)) + +for level, (ch_min, ch_max) in level_ranges.items(): + union_parts = [] + for s in shards: + union_parts.append(f""" + SELECT c.account_id, r.chapter_id, r.created_at + FROM bi_user_chapter_play_record_{s} r + JOIN bi_vala_app_character c ON r.user_id = c.id + WHERE r.play_status = 1 + AND r.chapter_id BETWEEN {ch_min} AND {ch_max} + """) + + union_sql = " UNION ALL ".join(union_parts) + + # Get last study per user, and the chapter_id of that last study + sql = f""" + WITH all_records AS ( + {union_sql} + ), + user_last AS ( + SELECT DISTINCT ON (account_id) + account_id, chapter_id, created_at as last_study + FROM all_records + ORDER BY account_id, created_at DESC + ), + churned AS ( + SELECT account_id, chapter_id + FROM user_last + WHERE last_study < NOW() - INTERVAL '30 days' + ) + SELECT + l.course_unit, + COUNT(*) as cnt, + ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER(), 1) as pct + FROM churned c + JOIN bi_level_unit_lesson l ON c.chapter_id = l.id + GROUP BY l.course_unit + ORDER BY l.course_unit + """ + + cur.execute(sql) + rows = cur.fetchall() + print(f"\n=== {level} 流失用户最后完课 Unit 分布 ===") + total = 0 + for unit, cnt, pct in rows: + print(f" {unit}: {cnt}人 ({pct}%)") + total += cnt + print(f" 合计: {total}人") + +cur.close() +conn.close() diff --git a/scripts/churn_lesson_chart.py b/scripts/churn_lesson_chart.py new file mode 100644 index 0000000..ab1afb4 --- /dev/null +++ b/scripts/churn_lesson_chart.py @@ -0,0 +1,119 @@ +#!/usr/bin/env python3 +"""流失用户最后一次完课 Lesson 分布 + 折线图""" +import os, psycopg2 +import matplotlib +matplotlib.use('Agg') +import matplotlib.pyplot as plt +import matplotlib.ticker as ticker +import numpy as np + +PG_HOST = "bj-postgres-16pob4sg.sql.tencentcdb.com" +PG_PORT = 28591 +PG_USER = "ai_member" +PG_DB = "vala_bi" +PG_PASSWORD = os.environ.get("PG_ONLINE_PASSWORD", "") + +conn = psycopg2.connect(host=PG_HOST, port=PG_PORT, user=PG_USER, password=PG_PASSWORD, dbname=PG_DB) +cur = conn.cursor() + +level_ranges = {"L1": (333, 581), "L2": (55, 331)} +shards = list(range(8)) + +# 中文字体 +plt.rcParams['font.sans-serif'] = ['WenQuanYi Micro Hei', 'SimHei', 'Noto Sans CJK SC', 'DejaVu Sans'] +plt.rcParams['axes.unicode_minus'] = False + +for level, (ch_min, ch_max) in level_ranges.items(): + union_parts = [] + for s in shards: + union_parts.append(f""" + SELECT c.account_id, r.chapter_id, r.created_at + FROM bi_user_chapter_play_record_{s} r + JOIN bi_vala_app_character c ON r.user_id = c.id + WHERE r.play_status = 1 + AND r.chapter_id BETWEEN {ch_min} AND {ch_max} + """) + union_sql = " UNION ALL ".join(union_parts) + + sql = f""" + WITH all_records AS ({union_sql}), + user_last AS ( + SELECT DISTINCT ON (account_id) account_id, chapter_id, created_at as last_study + FROM all_records + ORDER BY account_id, created_at DESC + ), + churned AS ( + SELECT account_id, chapter_id FROM user_last + WHERE last_study < NOW() - INTERVAL '30 days' + ) + SELECT l.course_unit, l.course_lesson, COUNT(*) as cnt + FROM churned c + JOIN bi_level_unit_lesson l ON c.chapter_id = l.id + GROUP BY l.course_unit, l.course_lesson + ORDER BY l.course_unit, l.course_lesson + """ + cur.execute(sql) + rows = cur.fetchall() + + # Build ordered labels: U00_L01, U00_L02, ... + labels = [] + values = [] + data_map = {} + for unit, lesson, cnt in rows: + key = f"{unit}_{lesson}" + data_map[key] = cnt + + # Get all possible unit-lesson combos for this level + cur.execute(f"SELECT course_unit, course_lesson FROM bi_level_unit_lesson WHERE course_level='{level}' ORDER BY course_unit, course_lesson") + all_lessons = cur.fetchall() + for unit, lesson in all_lessons: + key = f"{unit}_{lesson}" + labels.append(key) + values.append(data_map.get(key, 0)) + + # Build x-tick labels: show unit label at first lesson of each unit + x_ticks = [] + x_tick_labels = [] + prev_unit = None + for i, (unit, lesson) in enumerate(all_lessons): + if unit != prev_unit: + x_ticks.append(i) + x_tick_labels.append(unit) + prev_unit = unit + + fig, ax = plt.subplots(figsize=(20, 6)) + x = range(len(labels)) + ax.plot(x, values, color='#E74C3C' if level == 'L1' else '#3498DB', linewidth=1.2, marker='.', markersize=2) + + # Color U00 area + u00_end = next((i for i, l in enumerate(labels) if not l.startswith('U00_')), len(labels)) + if u00_end > 0: + ax.axvspan(-0.5, u00_end - 0.5, alpha=0.08, color='orange') + + ax.set_xticks(x_ticks) + ax.set_xticklabels(x_tick_labels, fontsize=8, rotation=45) + ax.set_ylabel('流失人数', fontsize=11) + ax.set_title(f'{level} 流失用户最后一次完课 Lesson 分布', fontsize=14, fontweight='bold') + ax.set_xlim(-0.5, len(labels) - 0.5) + ax.grid(axis='y', alpha=0.3) + + # Annotate U00 + ax.annotate('U00 (体验单元)', xy=(u00_end/2, max(values[:u00_end]) if u00_end>0 else 0), + fontsize=9, color='orange', ha='center', va='bottom', + bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8)) + + plt.tight_layout() + out_path = f'/root/.openclaw/workspace/output/churn_lesson_{level}.png' + plt.savefig(out_path, dpi=150) + plt.close() + print(f"Saved: {out_path}") + + # Print top 10 + sorted_data = sorted(data_map.items(), key=lambda x: x[1], reverse=True) + print(f"\n{level} Top 10 流失 Lesson:") + for k, v in sorted_data[:10]: + print(f" {k}: {v}人") + +cur.close() +conn.close() +print("\nDone.") diff --git a/scripts/churn_lesson_split.py b/scripts/churn_lesson_split.py new file mode 100644 index 0000000..cacbfff --- /dev/null +++ b/scripts/churn_lesson_split.py @@ -0,0 +1,108 @@ +#!/usr/bin/env python3 +"""流失用户最后完课 Lesson 分布 — 拆成 U00 和 U01+ 各一张""" +import os, psycopg2 +import matplotlib +matplotlib.use('Agg') +import matplotlib.pyplot as plt +import matplotlib.ticker as ticker +import numpy as np + +PG_HOST = "bj-postgres-16pob4sg.sql.tencentcdb.com" +PG_PORT = 28591 +PG_USER = "ai_member" +PG_DB = "vala_bi" +PG_PASSWORD = os.environ.get("PG_ONLINE_PASSWORD", "") + +conn = psycopg2.connect(host=PG_HOST, port=PG_PORT, user=PG_USER, password=PG_PASSWORD, dbname=PG_DB) +cur = conn.cursor() + +level_ranges = {"L1": (333, 581), "L2": (55, 331)} +shards = list(range(8)) + +plt.rcParams['font.sans-serif'] = ['WenQuanYi Micro Hei', 'SimHei', 'Noto Sans CJK SC', 'DejaVu Sans'] +plt.rcParams['axes.unicode_minus'] = False + +for level, (ch_min, ch_max) in level_ranges.items(): + union_parts = [] + for s in shards: + union_parts.append(f""" + SELECT c.account_id, r.chapter_id, r.created_at + FROM bi_user_chapter_play_record_{s} r + JOIN bi_vala_app_character c ON r.user_id = c.id + WHERE r.play_status = 1 + AND r.chapter_id BETWEEN {ch_min} AND {ch_max} + """) + union_sql = " UNION ALL ".join(union_parts) + + sql = f""" + WITH all_records AS ({union_sql}), + user_last AS ( + SELECT DISTINCT ON (account_id) account_id, chapter_id, created_at as last_study + FROM all_records + ORDER BY account_id, created_at DESC + ), + churned AS ( + SELECT account_id, chapter_id FROM user_last + WHERE last_study < NOW() - INTERVAL '30 days' + ) + SELECT l.course_unit, l.course_lesson, COUNT(*) as cnt + FROM churned c + JOIN bi_level_unit_lesson l ON c.chapter_id = l.id + GROUP BY l.course_unit, l.course_lesson + ORDER BY l.course_unit, l.course_lesson + """ + cur.execute(sql) + rows = cur.fetchall() + data_map = {} + for unit, lesson, cnt in rows: + data_map[f"{unit}_{lesson}"] = cnt + + cur.execute(f"SELECT course_unit, course_lesson FROM bi_level_unit_lesson WHERE course_level='{level}' ORDER BY course_unit, course_lesson") + all_lessons = cur.fetchall() + + # Split into U00 and U01+ + u00_lessons = [(u, l) for u, l in all_lessons if u == 'U00'] + u01p_lessons = [(u, l) for u, l in all_lessons if u != 'U00'] + + for suffix, lessons in [("U00", u00_lessons), ("U01+", u01p_lessons)]: + labels = [f"{u}_{l}" for u, l in lessons] + values = [data_map.get(f"{u}_{l}", 0) for u, l in lessons] + + # Build unit tick positions + x_ticks = [] + x_tick_labels = [] + prev_unit = None + for i, (u, l) in enumerate(lessons): + if u != prev_unit: + x_ticks.append(i) + x_tick_labels.append(u) + prev_unit = u + + fig, ax = plt.subplots(figsize=(14 if suffix == "U00" else 18, 5)) + color = '#E74C3C' if level == 'L1' else '#3498DB' + x = range(len(labels)) + ax.plot(x, values, color=color, linewidth=1.5, marker='o', markersize=3) + + # Add value labels on top of each point + for xi, vi in zip(x, values): + if vi > 0: + ax.annotate(str(vi), (xi, vi), textcoords="offset points", xytext=(0, 5), + fontsize=6, ha='center', color='#555') + + ax.set_xticks(x_ticks) + ax.set_xticklabels(x_tick_labels, fontsize=9, rotation=0) + ax.set_ylabel('流失人数', fontsize=11) + ax.set_title(f'{level} 流失用户最后完课 Lesson 分布 — {suffix}', fontsize=14, fontweight='bold') + ax.set_xlim(-0.5, len(labels) - 0.5) + ax.grid(axis='y', alpha=0.3) + ax.yaxis.set_major_locator(ticker.MaxNLocator(integer=True)) + + plt.tight_layout() + out_path = f'/root/.openclaw/workspace/output/churn_lesson_{level}_{suffix}.png' + plt.savefig(out_path, dpi=150) + plt.close() + print(f"Saved: {out_path}") + +cur.close() +conn.close() +print("Done.") diff --git a/scripts/churn_rate_l1_l2.py b/scripts/churn_rate_l1_l2.py new file mode 100644 index 0000000..91b7f0f --- /dev/null +++ b/scripts/churn_rate_l1_l2.py @@ -0,0 +1,72 @@ +#!/usr/bin/env python3 +"""计算 L1/L2 用户流失率 +口径(李承龙确认): +- 分母:历史上有过学习行为(play_status=1)的用户(按 account_id 去重) +- 分子:分母中,最后一次学习距今超过 30 天的用户 +- 区分 L1 和 L2 分别计算 +""" +import os +import psycopg2 + +PG_HOST = "bj-postgres-16pob4sg.sql.tencentcdb.com" +PG_PORT = 28591 +PG_USER = "ai_member" +PG_DB = "vala_bi" +PG_PASSWORD = os.environ.get("PG_ONLINE_PASSWORD", "") + +conn = psycopg2.connect( + host=PG_HOST, port=PG_PORT, user=PG_USER, + password=PG_PASSWORD, dbname=PG_DB +) +cur = conn.cursor() + +# L1 chapters: 333-581, L2 chapters: 55-331 +level_ranges = { + "L1": (333, 581), + "L2": (55, 331), +} + +shards = list(range(8)) + +for level, (ch_min, ch_max) in level_ranges.items(): + # Build UNION ALL across all 8 shards + union_parts = [] + for s in shards: + union_parts.append(f""" + SELECT c.account_id, MAX(r.created_at) as last_study + FROM bi_user_chapter_play_record_{s} r + JOIN bi_vala_app_character c ON r.user_id = c.id + WHERE r.play_status = 1 + AND r.chapter_id BETWEEN {ch_min} AND {ch_max} + GROUP BY c.account_id + """) + + union_sql = " UNION ALL ".join(union_parts) + + # Aggregate across shards: take max last_study per account + sql = f""" + WITH all_records AS ( + {union_sql} + ), + user_last_study AS ( + SELECT account_id, MAX(last_study) as last_study + FROM all_records + GROUP BY account_id + ) + SELECT + COUNT(*) as total_users, + COUNT(*) FILTER (WHERE last_study < NOW() - INTERVAL '30 days') as churned_users, + ROUND( + 100.0 * COUNT(*) FILTER (WHERE last_study < NOW() - INTERVAL '30 days') / COUNT(*), + 1 + ) as churn_rate_pct + FROM user_last_study + """ + + cur.execute(sql) + row = cur.fetchone() + total, churned, rate = row + print(f"{level}: 总用户={total}, 流失用户={churned}, 流失率={rate}%") + +cur.close() +conn.close() diff --git a/scripts/fill_leads_sheet.py b/scripts/fill_leads_sheet.py index 2e74728..3f3db30 100644 --- a/scripts/fill_leads_sheet.py +++ b/scripts/fill_leads_sheet.py @@ -58,17 +58,18 @@ U0_COL_ORDER = [ ] # 列映射(0-based) -# 注意:B列「微信昵称」由销售手动填写,脚本不读写 -COL_PHONE = 0 # A: 用户手机号 -COL_NICKNAME = 1 # B: 微信昵称(销售手动填写,脚本跳过) -COL_USER_ID = 2 # C: 用户ID -COL_EXTRACT_DATE = 3 # D: 析出日期(手动填) -COL_REG_DATE = 4 # E: 注册日期 -COL_CONVERTED = 5 # F: 是否转化 -COL_CONVERT_DATE = 6 # G: 转化日期 -COL_REFUND = 7 # H: 是否退费 -COL_REFUND_DATE = 8 # I: 退费日期 -COL_U0_START = 9 # J-S: L1-U0-L01 ~ L2-U0-L05 +# 注意:A列「序号」和C列「微信昵称」由销售手动填写,脚本不读写 +COL_SEQ = 0 # A: 序号(手动填,脚本跳过) +COL_PHONE = 1 # B: 用户手机号 +COL_NICKNAME = 2 # C: 微信昵称(销售手动填写,脚本跳过) +COL_USER_ID = 3 # D: 用户ID +COL_EXTRACT_DATE = 4 # E: 析出日期(手动填) +COL_REG_DATE = 5 # F: 注册日期 +COL_CONVERTED = 6 # G: 是否转化 +COL_CONVERT_DATE = 7 # H: 转化日期 +COL_REFUND = 8 # I: 是否退费 +COL_REFUND_DATE = 9 # J: 退费日期 +COL_U0_START = 10 # K-T: L1-U0-L01 ~ L2-U0-L05 # ── 数据库 ───────────────────────────────────────────── PG_HOST = "bj-postgres-16pob4sg.sql.tencentcdb.com" @@ -304,7 +305,7 @@ def process_sheet(sheet_id: str, sheet_name: str, dry_run: bool = False) -> list print(f"处理 Sheet: {sheet_name} ({sheet_id})") print(f"{'='*60}") - range_str = f"{sheet_id}!A2:S" + range_str = f"{sheet_id}!A2:T" try: rows = lark_read(sheet_id, range_str) except Exception as e: @@ -424,22 +425,22 @@ def process_sheet(sheet_id: str, sheet_name: str, dry_run: bool = False) -> list for row_idx, col_vals in sorted(row_updates.items()): actual_row = row_idx + 2 - # 分开写入:C列单独写,E-S列一起写,跳过B列(微信昵称)和D列(析出日期由销售手动维护) - # C: 用户ID + # 分开写入:D列单独写,F-T列一起写,跳过A列(序号)、C列(微信昵称)和E列(析出日期由销售手动维护) + # D: 用户ID if COL_USER_ID in col_vals: - lark_write(sheet_id, f"{sheet_id}!C{actual_row}:C{actual_row}", + lark_write(sheet_id, f"{sheet_id}!D{actual_row}:D{actual_row}", [[str(col_vals[COL_USER_ID])]]) - # E-S: 注册日期 ~ L2-U0-L5 - e_to_s = [] + # F-T: 注册日期 ~ L2-U0-L5 + f_to_t = [] for col in range(COL_REG_DATE, COL_U0_START + len(U0_COL_ORDER)): val = col_vals.get(col, "") - e_to_s.append(str(val) if val else "") + f_to_t.append(str(val) if val else "") if dry_run: - print(f" [DRY-RUN] {sheet_id}!C{actual_row} + E{actual_row}:S{actual_row} ← ...") + print(f" [DRY-RUN] {sheet_id}!D{actual_row} + F{actual_row}:T{actual_row} ← ...") else: - lark_write(sheet_id, f"{sheet_id}!E{actual_row}:S{actual_row}", [e_to_s]) + lark_write(sheet_id, f"{sheet_id}!F{actual_row}:T{actual_row}", [f_to_t]) print(f" ✓ 行 {actual_row} 回写成功") unmatched = set(phones) - matched_phones diff --git a/skills/welfare-user-list/SKILL.md b/skills/welfare-user-list/SKILL.md index baa33dc..5f6f155 100644 --- a/skills/welfare-user-list/SKILL.md +++ b/skills/welfare-user-list/SKILL.md @@ -29,6 +29,8 @@ description: 统计拥有两个年包课包(level1+level2 / 双level1 / 双lev 6. **发送文件时附上统计摘要:** 总用户数、有/无地址数、三种情况用户分布。 +7. **默认发送群组:** Excel 生成后,默认发送到群聊 `oc_3cf45dda24e78c29ccc477dddcd724a5`(福利品相关群),使用 `message` 工具 `action=send` + `filePath` 发送文件并附统计摘要。除非用户明确指定其他发送目标。 + ## Quick Reference | 内容 | 文件 |