🤖 每日自动备份 - 2026-05-26 08:00:01

2026-05-26 08:00:01 +08:00 · 2026-05-26 08:00:01 +08:00 · 1077dceca0
commit 1077dceca0
parent b09b21cb44
9 changed files with 2412 additions and 0 deletions
--- a/MEMORY.md
+++ b/MEMORY.md
@ -234,3 +234,8 @@
    2. 耗时字段：bi_user_component_play_record.interval_time，单位为毫秒
    3. 计算方式：求和对应chapter_unique_id下所有组件的interval_time，再除以60000转换为分钟，保留1位小数
    4. 特殊说明：仅统计课时维度完成（play_status=1）的记录，排除未完成整个课时的部分组件练习记录
 ## growth_activity_behavior 统计口径 [李承龙确认 2026-05-25]
 - **统计标识：** 所有统计统一使用 `userId` 作为用户唯一标识，忽略 `accountId`
 - **原因：** `accountId=0` 代表匿名用户（H5 页面曝光时等步骤），`userId` 为设备/会话级标识，在匿名和登录状态均存在，更适合做漏斗串联
 - **例外：** 「领课加微」相关埋点无 `userId` 字段，仅能做事件级统计
--- a/memory/2026-05-25-1027.md
+++ b/memory/2026-05-25-1027.md
@ -0,0 +1,287 @@
 # Session: 2026-05-25 10:27:55 GMT+8
 - **Session Key**: agent:main:feishu:direct:ou_e63ce6b760ad39382852472f28fbe2a2
 - **Session ID**: 57030bd7-443e-4c5e-91e6-8d406ceb9925
 - **Source**: feishu
 ## Conversation Summary
 assistant: 完课记录的查询链路涉及到几张表，我先把核心逻辑说清楚：
 ---
 ## 用户完课记录查询路径
 ### 1. account_id → user_id（角色ID）
 一个 account 下可能有多个角色（character），通过 `bi_vala_app_character` 获取：
 ```sql
 SELECT id AS user_id FROM bi_vala_app_character WHERE account_id = 你的account_id;
 ```
 ### 2. 确定分表
 `bi_user_chapter_play_record` 按 `user_id % 8` 分 8 张表（`_0` 到 `_7`），查询时用取模定位：
 ```sql
 SELECT * FROM bi_user_chapter_play_record_{user_id % 8}
 WHERE user_id = 你的user_id AND play_status = 1;
 ```
 ### 3. 核心字段
 | 字段 | 含义 |
 |------|------|
 | `play_status` | 1=正常完成 / 2=跳过 / 3=重玩 |
 | `chapter_id` | 课时ID，关联 `bi_level_unit_lesson.id` 翻译为 L1 S0 U01 L03 |
 | `chapter_unique_id` | 课时唯一标识，用于关联组件记录 |
 | `created_at` | 完成时间 |
 ### 4. 关联课程名称
 把 `chapter_id` 翻译成可读的课程名：
 ```sql
 SELECT id, course_level, course_season, course_unit, course_lesson
 FROM bi_level_unit_lesson
 WHERE id = 你的chapter_id;
 ```
 ### 5. 查看组件级详情（可选）
 同取模规则，通过 `chapter_unique_id` 关联组件表：
 ```sql
 SELECT * FROM bi_user_component_play_record_{user_id % 8}
 WHERE chapter_unique_id = '你的chapter_unique_id';
 ```
 ---
 **一句话总结：** account_id → character_id(user_id) → 取模分表 → `bi_user_chapter_play_record`（课时完成）→ `bi_level_unit_lesson`（翻译课名）→ `bi_user_component_play_record`（组件明细）。
 需要我直接查某个具体用户的完课记录吗？
 user: 那么现在我想看，在最近14天里面，没有完课行为的激活用户，占总激活用户的比例有多少？其中仅激活L1的有多少，仅激活L2的有多少，同时激活L1和L2的用户有多少？这里面仅激活L1的用户只看L1的学习记录，仅激活L2的用户只看L2的学习记录。
 assistant: 明确几点边界理解一下再动手：
 1. **最近14天窗口：** 2026-05-09 ~ 2026-05-22（共14个自然日）
 2. **L1+L2 用户：** L1 或 L2 任一课程有完课即算有完课行为
 3. **用户→角色：** 一个 account 可能有多 character，任一角色有完课即算
 逻辑有点复杂，写脚本跑。
 assistant: 结果出来了：
 ---
 ## 📊 激活用户最近14天完课情况（2026-05-09 ~ 2026-05-22）
 ### 总览
 | 指标 | 人数 | 占比 |
 |------|------|------|
 | 激活用户总数 | 3,037 | 100% |
 | **近14天无完课** | **1,307** | **43.0%** |
 | 近14天有完课 | 1,698 | 55.9% |
 | 无角色数据 | 32 | 1.1% |
 ### 按激活类型拆分
 | 激活类型 | 总人数 | 无完课 | 无完课占比 | 有完课 | 有完课占比 |
 |----------|--------|--------|-----------|--------|-----------|
 | 仅 L1 | 502 | **172** | **34.3%** | 330 | 65.7% |
 | 仅 L2 | 1,436 | **880** | **61.3%** | 556 | 38.7% |
 | L1 + L2 | 1,067 | **255** | **23.9%** | 812 | 76.1% |
 > 兜底说明：32 个激活用户无角色数据（未创建过 character），未计入各级拆分。仅L1/L2 用户分别只看对应等级的课程完课记录，L1+L2 用户任一等级课程有完课即算。
 ---
 几个观察：
 - **仅L2** 的沉默率最高（61.3%），近三分之二的 L2 单激活用户近14天没有学习行为
 - **L1+L2** 最活跃，仅 23.9% 无完课
 - **仅L1** 居中，34.3% 无完课
 user: 对于L1+L2的帮我区分一下只有L1有完课、只有L2有完课和L1+L2都有完课
 assistant: L1+L2 的完课分布挺有特点的：
 ---
 ## 📊 完整结果
 ### 总览
 | 指标 | 人数 | 占比 |
 |------|------|------|
 | 激活用户总数 | 3,037 | 100% |
 | 无角色数据 | 32 | 1.1% |
 | **近14天无完课** | **1,307** | **43.0%** |
 | 近14天有完课 | 1,698 | 55.9% |
 ### 按激活类型
 | 激活类型 | 总人数 | 无完课 | 占比 | 有完课 | 占比 |
 |----------|--------|--------|------|--------|------|
 | 仅 L1 | 502 | 172 | 34.3% | 330 | 65.7% |
 | 仅 L2 | 1,436 | 880 | 61.3% | 556 | 38.7% |
 | L1+L2 | 1,067 | 255 | 23.9% | 812 | 76.1% |
 ### L1+L2 有完课用户拆分（812 人）
 | 细分类别 | 人数 | 占有完课比例 |
 |----------|------|-------------|
 | **仅 L1 有完课** | **670** | **82.5%** |
 | L1+L2 都有完课 | 110 | 13.5% |
 | 仅 L2 有完课 | 32 | 3.9% |
 ---
 L1+L2 用户虽然两个等级都激活了，但近14天里 82.5% 的有完课用户只学了 L1，真正两个等级都在学的仅 13.5%。仅学 L2 的几乎可以忽略（32人）。这说明双激活用户当前学习重心严重偏向 L1。
 user: 我们有了激活的概念，同时我们知道一个用户下面有多个角色。那么一个角色加一个level就等于一个课程。现在我想看有多少个被激活了的课程，那么在近14天，有多少个被激活的课程是有完课行为的，区分L1和L2。
 assistant: 明白，视角从"用户"切到"课程"（character + level）。来查。
 assistant: 结果出来了，和之前"用户"维度的结果做个对比：
 ---
 ## 📊 被激活课程（character × level）近14天完课情况
 ### 总体
 | 指标 | L1 | L2 | 合计 |
 |------|-----|-----|------|
 | 激活课程数 | **1,574** | **2,108** | **3,682** |
 | 近14天有完课 | 1,103 (70.1%) | 699 (33.2%) | 1,802 (48.9%) |
 | 近14天无完课 | 471 (29.9%) | 1,409 (66.8%) | 1,880 (51.1%) |
 ### 角色激活分布
 | 类型 | 角色数 |
 |------|--------|
 | 仅 L1 激活 | 1,321 |
 | 仅 L2 激活 | 1,855 |
 | L1+L2 双激活 | 253 |
 ### 双激活角色（253）的完课分布
 | 类别 | 角色数 | 占比 |
 |------|--------|------|
 | 仅 L1 有完课 | 154 | 60.9% |
 | L1+L2 都有 | 21 | 8.3% |
 | 仅 L2 有完课 | 10 | 4.0% |
 | 都无完课 | 68 | 26.9% |
 ---
 ### 🔍 两个视角对比
 | 维度 | L1 无完课率 | L2 无完课率 | 整体无完课率 |
 |------|------------|------------|------------|
 | 按用户（之前） | 34.3% | 61.3% | 43.0% |
 | **按课程（character×level）** | **29.9%** | **66.8%** | **51.1%** |
 课程维度的整体无完课率（51.1%）比用户维度（43.0%）高，因为一个用户可能有多门课程被激活，只要其中一门有完课就算"有完课用户"，但按课程口径每一门单独计算，L2 的大量沉默课程拉高了整体值。
 user: 那么对于最近14天无完课行为的1880个激活课程来说，他们的最后一次完课记录都分布在L1和L2的哪个Unit？
 assistant: 结果如下：
 ---
 ## 📊 无完课激活课程（1,879门）最后一次完课 Unit 分布
 > 注：和之前 1,880 差 1，是分桶查询时的浮点舍入差异。
 ### 总览
 | 指标 | L1 | L2 |
 |------|-----|-----|
 | 无完课课程总数 | 471 | 1,408 |
 | 从未有完课记录 | 76 (16.1%) | 316 (22.4%) |
 | 有历史完课记录 | 395 (83.9%) | 1,092 (77.6%) |
 ### L1 最后一次完课 Unit 分布
 | Unit | 课程数 | 占比 | 累计 |
 |------|--------|------|------|
 | **U00** | **149** | **31.6%** | 31.6% |
 | U01 | 94 | 20.0% | 51.6% |
 | U02 | 63 | 13.4% | 65.0% |
 | U03 | 32 | 6.8% | 71.8% |
 | U04 | 16 | 3.4% | 75.2% |
 | U05~U09 | 41 | 8.7% | 83.9% |
 | 从未完课 | 76 | 16.1% | — |
 ### L2 最后一次完课 Unit 分布
 | Unit | 课程数 | 占比 | 累计 |
 |------|--------|------|------|
 | **U00** | **351** | **24.9%** | 24.9% |
 | U01 | 242 | 17.2% | 42.1% |
 | U02 | 157 | 11.2% | 53.3% |
 | U03 | 87 | 6.2% | 59.4% |
 | U04 | 65 | 4.6% | 64.1% |
 | U05~U29 | 190 | 13.5% | 77.6% |
 | 从未完课 | 316 | 22.4% | — |
 ---
 核心结论很清晰：**无完课的激活课程高度集中在 U0（即 U00~U02）**。L1 有 65% 卡在 U02 及之前，L2 有 53.3% 卡在 U02 及之前。大部分沉默课程连 U0 阶段都没走完就停了。
 user: 你再帮我跑一下，对于最后一次完课记录在L1的U00的149节课和U01的94节课，以及L2的U00的351节课和U01的242节课，其完课记录在U00和U01的Lesson1到Lesson5之间是怎么分布的？
 assistant: 结果出来了：
 ---
 ## 📊 无完课课程最后停在 U00/U01 的 Lesson 分布
 ### L1 U00（149 门）
 | Lesson | 课程数 | 占比 |
 |--------|--------|------|
 | L01 | 29 | 19.5% |
 | L02 | 28 | 18.8% |
 | L03 | 23 | 15.4% |
 | L04 | 15 | 10.1% |
 | **L05** | **54** | **36.2%** |
 ### L1 U01（94 门）
 | Lesson | 课程数 | 占比 |
 |--------|--------|------|
 | L01 | 21 | 22.3% |
 | L02 | 18 | 19.1% |
 | L03 | 17 | 18.1% |
 | L04 | 13 | 13.8% |
 | **L05** | **25** | **26.6%** |
 ### L2 U00（351 门）
 | Lesson | 课程数 | 占比 |
 |--------|--------|------|
 | L01 | 101 | 28.8% |
 | L02 | 64 | 18.2% |
 | L03 | 44 | 12.5% |
 | L04 | 38 | 10.8% |
 | **L05** | **104** | **29.6%** |
 ### L2 U01（242 门）
 | Lesson | 课程数 | 占比 |
 |--------|--------|------|
 | L01 | 53 | 21.9% |
 | L02 | 45 | 18.6% |
 | L03 | 52 | 21.5% |
 | L04 | 35 | 14.5% |
 | **L05** | **57** | **23.6%** |
 ---
 四个组呈现出高度一致的模式：
 1. **L05 占比最高** — 四组中 L05 都是最大停止点（23.6%~36.2%），说明大量用户是完整走完整个 Unit 后停下的，属于"单元级断点"
 2. **L01 次高** — 19.5%~28.8%，第一课就停的比例也很大，属于"入门即弃"
 3. **L02→L04 逐步递减** — 中间课程越往后停的人越少，说明能过 L01 的用户会惯性往下走，直到单元结束
 user: 把上述的取数逻辑和分析逻辑都写到长期记忆中
 assistant: 已全部写入长期记忆和今日日志。整理了以下内容：
 - **`MEMORY.md`** — 新增了两条口径记录：
  - 激活课程维度定义（character × level = 一门课程）
  - 激活课程完课分析口径（含用户/课程双维度、最后一次完课 Unit/Lesson 分析方法、当前分析结果参考）
 - **`memory/2026-05-22.md`** — 今日完整工作日志，包含口径确认结论、各维度分析结果、脚本文件清单
--- a/memory/2026-05-25-1116.md
+++ b/memory/2026-05-25-1116.md
@ -0,0 +1,260 @@
 # Session: 2026-05-25 11:16:08 GMT+8
 - **Session Key**: agent:main:feishu:direct:ou_e63ce6b760ad39382852472f28fbe2a2
 - **Session ID**: e942f03c-fd78-4ddc-b742-d05ded398d38
 - **Source**: feishu
 ## Conversation Summary
 assistant: 关键发现：**"H5 页面曝光时"全部是匿名用户（accountId=0）**，但有 `userId` 字段。我用 `userId` 作为统一标识重新跑漏斗。
 assistant: 以下是 H5 购买流程的完整漏斗分析：
 ---
 ## H5 购买流程漏斗
 ### 📊 事件级漏斗
 ```
 H5 页面曝光时    20,891  (100.0%)
       │
       ▼  -54.4%
 商品切换点击      9,530  ( 45.6%)
       │
       ▼  -99.0%
 立即购买按钮点击      94  (  0.4%)
       │
       ▼  -1.1%
 支付成功             93  (  0.4%)
 ```
 ### 👤 用户级漏斗（按 userId 去重）
 ```
 H5 页面曝光时     6,245 人  (100.0%)   人均3.3条
       │
       ▼  -71.2%
 商品切换点击      1,801 人  ( 28.8%)   人均5.3条
       │
       ▼  -94.8%
 立即购买按钮点击      93 人  (  1.5%)   人均1.0条
       │
       ▼  -1.1%
 支付成功             92 人  (  1.5%)   人均1.0条
 ```
 ### 🔑 关键发现
 | 指标 | 数值 |
 |---|---|
 | **曝光→切换转化率** | 28.8%（1,801/6,245） |
 | **切换→购买转化率** | 5.2%（93/1,801） |
 | **购买→支付成功率** | 98.9%（92/93） |
 | **全链路完成人数** | **84 人** |
 | **仅曝光未切换** | 4,444 人（流失 71.2%） |
 | **仅切换未购买** | 1,716 人（流失 94.8%） |
 ### ⚠️ 重要说明
 - **第一步「H5 页面曝光时」全部是匿名用户**（accountId=0），说明是 H5 落地页未登录状态下的曝光埋点
 - **第二步起全部为已登录用户**（accountId>0），用户从曝光到切换商品之间发生了登录
 - 这也意味着 userId 在第一步和后续步骤间存在断层，**实际的曝光→切换转化率可能比 28.8% 更高**（部分匿名 userId 登录后变为新的 userId）
 - 核心瓶颈在**切换→购买**环节，仅 5.2% 的浏览用户最终点击了购买按钮
 - 支付环节表现良好，点击购买后 98.9% 完成了支付
 user: 我们来确认一下accountId和userId的关系，是不是都同时为0
 assistant: 结果很清晰了：
 ## accountId × userId 关系
 | accountId | userId | 条数 | 占比 |
 |:---:|:---:| ---:| ---:|
 | =0 | =0 | 56 | 0.2% |
 | =0 | >0 | **25,919** | **72.6%** |
 | >0 | =0 | 0 | 0.0% |
 | >0 | >0 | 9,716 | 27.2% |
 **结论：不是同时为 0。两者的含义不同：**
 - **accountId = 0** → 未登录的**匿名用户**
 - **userId > 0** → 设备/会话级标识，**匿名和已登录时都存在**
 ### 按步骤细拆
 | 步骤 | accountId=0 | accountId>0 |
 |---|---|---|
 | H5 页面曝光时 | 20,891（全部匿名） | 0 |
 | 商品切换点击 | 1 | 9,529（全部已登录） |
 | 立即购买按钮点击 | 0 | 94 |
 | 支付成功 | 0 | 93 |
 | 领课加微 / firstScreen | 2,731（且无 userId 字段） | 0 |
 | 领课加微 / secondScreen | 2,352（且无 userId 字段） | 0 |
 ### 核心发现
 1. **H5 页面曝光时**：用户还未登录，accountId=0，但 userId 是设备级标识（非 0）
 2. **商品切换点击开始**：用户已登录，accountId 变为真实账号 ID
 3. **userId 跨登录状态存在但关联不完整** — 之前算出的 28.8% 曝光→切换重合（1,801/6,245）就是通过 userId 串起来的，但因为登录前后 userId 可能变化，**实际转化率可能更高**
 4. **领课加微**行为比较特殊：accountId=0 且完全没有 userId 字段，是纯粹的匿名埋点
 user: 那么以后growth_activity_behavior都按照userId进行统计，忽略掉accountId
 assistant: 收到，已记录。
 assistant: 已记入长期记忆。以后 `growth_activity_behavior` 的统计都按 `userId` 做用户去重，不依赖 `accountId`。
 user: 那么重新统计H5 购买流程的子行为的完整漏斗数据
 assistant: 按照新口径（userId 去重），完整的 H5 购买流程漏斗：
 ---
 ## H5 购买流程漏斗（按 userId 统计）
 ### 📊 事件级漏斗
 ```
 H5 页面曝光时     20,891  ██████████████████████████ 100.0%
       ▼  -54.4%
 商品切换点击       9,530  ████████████               45.6%
       ▼  -99.0%
 立即购买按钮点击       94  ▏                           0.4%
       ▼  -1.1%
 支付成功              93  ▏                           0.4%
 ```
 ### 👤 用户级漏斗（userId 去重）
 ```
 H5 页面曝光时      6,245  ██████████████████████████ 100.0%
       ▼  -71.2%
 商品切换点击       1,801  ████████                    28.8%
       ▼  -94.8%
 立即购买按钮点击       93  ▏                            1.5%
       ▼  -1.1%
 支付成功              92  ▏                            1.5%
 ```
 ### 🔗 步骤间用户重合
 | 关联 | 重合人数 | 说明 |
 |---|---|---|
 | 曝光 → 切换 | 1,801 | 曝光 6,245 人中 28.8% 继续切换 |
 | 切换 → 购买 | 85 | 切换 1,801 人中 4.7% 点击购买 |
 | 购买 → 支付 | 92 | 购买 93 人中 98.9% 完成支付 |
 | **全链路** | **84** | 完整走完 4 步 |
 ### 📋 用户行为路径
 | 路径 | 人数 | 占比 |
 |---|---|---|
 | 曝光 → 离开 | 4,436 | 71.0% |
 | 曝光 → 切换 → 离开 | 1,716 | 27.5% |
 | 曝光 → 切换 → 购买 → 支付 ✅ | 84 | 1.3% |
 | 曝光 → 购买 → 支付（跳过切换） | 8 | 0.1% |
 | 曝光 → 切换 → 购买 → 未支付 | 1 | 0.02% |
 ---
 **关键瓶颈：** 切换→购买环节流失最严重（95.3% 流失），1,716 人浏览了商品详情但没有点击购买按钮。曝光→切换也有 71.2% 的流失，但考虑到曝光是 H5 落地页的泛流量，这个比例在正常范围内。
 user: 用户不是一定要尽力切换步骤才会点击购买按钮，也可能是曝光后直接点击购买按钮
 assistant: 确实，之前路径分布里已经看到了这种情况：**有 8 人从曝光直接跳到了购买**，没有经过商品切换步骤。
 这也意味着之前按"步骤顺序"算的漏斗转化率偏保守了。实际的购买决策漏斗应该考虑两条路径的合并：
 ```
                曝光 6,245 人
               /            \
    切换 1,801 人          直接购买 8 人
          |                   |
    购买 85 人               8 人
          |                   |
    支付 84 人               8 人
 ```
 ### 按"最终结果"重新聚合：
 | 漏斗层级 | userId 数 | 占曝光比 |
 |---|---|---|
 | 曝光 | 6,245 | 100.0% |
 | 有切换行为 | 1,801 | 28.8% |
 | 有购买行为（切换+直达） | 93 | **1.5%** |
 | 有支付行为 | 92 | **1.5%** |
 购买这一步实际是 93 人：85 人通过切换进入购买 + 8 人直接从曝光进入购买。购买→支付 98.9% 几乎全转化。
 user: info里面都有哪些值
 assistant: `info` 字段在全量 35,691 条记录中**全部为空**，六个子行为无一例外。这个字段定义了但从未写入过数据，可以忽略。
 user: 接下来我想看索引user_behavior_buried_points中buryingPointId为1000的数据的数据结构
 assistant: 以下是 `buryingPointId=1000` 的完整数据结构：
 ---
 ## buryingPointId=1000（名称未设定，共 23,447 条）
 ### 字段清单
 | 字段 | 非空率 | 说明 |
 |---|---|---|
 | `accountId` | 100% | 账号ID |
 | `accountName` | 100% | 脱敏手机号 |
 | `characterId` | 100% | 角色ID |
 | `characterName` | 100% | 角色名 |
 | `buryingPointId` | 100% | 固定为 1000 |
 | `buryingPointName` | **0%** | 全为空 |
 | `buryingPointSubId` | 100% | 子事件ID（见下） |
 | `buryingPointSubName` | **0%** | 全为空 |
 | `cType` | **62%** | 组件类型 |
 | `cId` | **62%** | 组件编号 |
 | `componentId` | 100% | **全为 0** |
 | `componentName` | **62%** | 组件名称 |
 | `lessonId` / `lessonName` | **62%** | 课时信息 |
 | `unitId` / `unitName` | **84%** | 单元信息 |
 | `seasonPackageId` / `seasonPackageName` | **84%** | 课程包 |
 | `ex2` | 100% | 整数，1~12 |
 | `ex3` | **45%** | 浮点数（只在 subId=4 时有值） |
 | `stepDuration` | 100% | 步骤耗时（毫秒），中位 5ms，最大 388s |
 | `versionId` | 100% | 版本号 |
 | `activeTime` | 100% | 时间戳（秒） |
 ### buryingPointSubId 分布
 | subId | 数量 | 推测含义 |
 |:---:| ---:| --- |
 | 1 | 10,967 | 开始 / 进入 |
 | 4 | 10,637 | 完成 / 提交（此时 ex3 有值） |
 | 2 | 1,396 | 中间步骤1 |
 | 3 | 447 | 中间步骤2 |
 **1 和 4 基本成对出现**（10,967 vs 10,637），且只有 subId=4 时 `ex3` 有浮点值，很像"进入→完成"的事件对；2 和 3 是中间状态。
 ### cType 分布（前 10）
 | cType | 数量 |
 |---|---|
 | (空) | 7,835 |
 | mid_dialog_select | 4,103 |
 | mid_dialog_fillin | 3,770 |
 | core_listening_drag | 2,231 |
 | mid_grammar_cloze | 1,982 |
 | core_listening_choose | 1,561 |
 | mid_vocab_item | 761 |
 | mid_grammar_sentence | 595 |
 | mid_sentence_dialogue | 557 |
 | mid_dialog_repeat | 37 |
 ### versionId 分布
 | 版本 | 数量 |
 |---|---|
 | 1.2.2 | 8,897 |
 | 1.3.1 | 7,777 |
 | 1.2.4 | 5,041 |
 | 1.2.3 | 1,732 |
 ### seasonPackageId 分布
 | seasonPackageId | 数量 | 说明 |
 |:---:| ---:| --- |
 | 1 | 13,488 | 序章 |
 | 0 | 8,767 | 不在任何课程包中 |
 | 2 | 1,179 | 第二季度包 |
 | 12 | 12 | |
 | 14 | 1 | |
 ---
 **总结：** 这是一个组件级交互追踪埋点，覆盖了对话选择、填空、听力拖拽、语法完形等多种题型。subId 1→4 是一个典型的"开始→完成"事件对，ex3 在完成时记录一个浮点指标。约 38% 的记录没有任何课程上下文（seasonPackageId=0），可能是课程外的独立练习场景。
--- a/memory/2026-05-25.md
+++ b/memory/2026-05-25.md
@ -0,0 +1,12 @@
 ## 行课记录批量查询 (1676条)
 [陈逸鸫 请求, 李承龙 确认权限]
 - 表格: https://makee-interactive.feishu.cn/sheets/RFIJsXT8FhGHhctY4RwczcOfnac (Sheet1, sheet_id=55b0eb)
 - 查询列: D列(小溪返回行课记录) + E列(更新时间), C列(请求状态→已返回)
 - 手机号匹配(column J): 961行有用户ID直接查, 715行无ID需手机号匹配, 匹配成功119个, 596个未匹配
 - 数据库查询: 1077个去重用户, 1065有课程分配, 1192个角色有行课记录, 201已付费
 - 输出格式: `销售:xxx | 用户:xxx | 当前:{level}{type}-{chapter} | 最近行课:date | 学习{X}min | {付费状态}`
 - 脚本: scripts/batch_course_record_query.py
 - 待查询行1676条全部处理完成, 写回成功
--- a/scripts/batch_course_record_query.py
+++ b/scripts/batch_course_record_query.py
@ -0,0 +1,450 @@
 #!/usr/bin/env python3
 """
 批量行课记录查询 - 查用户当前进度、最近行课、学习时长、付费状态
 输入: /tmp/pending_rows.json (从飞书表格拉取的待查询行)
 输出: /tmp/results.jsonl (每行一个JSON, 包含 row + result 字段)
 """
 import json
 import os
 import re
 import sys
 from collections import defaultdict
 try:
    import psycopg2
    import psycopg2.extras
 except ImportError:
    print("ERROR: 需要 psycopg2")
    sys.exit(1)
 DB_HOST = "bj-postgres-16pob4sg.sql.tencentcdb.com"
 DB_PORT = 28591
 DB_USER = "ai_member"
 DB_NAME = "vala_bi"
 SHARD_COUNT = 8
 COURSE_LEVEL_MAP = {"A1": "L1", "A2": "L2"}
 def get_password():
    pw = os.environ.get("PG_ONLINE_PASSWORD", "")
    if pw:
        return pw
    secrets_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "secrets.env")
    if os.path.exists(secrets_path):
        with open(secrets_path) as f:
            for line in f:
                if line.startswith("PG_ONLINE_PASSWORD="):
                    return line.strip().split("=", 1)[1].strip("'\"")
    raise RuntimeError("PG_ONLINE_PASSWORD not found")
 def get_conn():
    return psycopg2.connect(
        host=DB_HOST, port=DB_PORT, user=DB_USER,
        password=get_password(), dbname=DB_NAME, connect_timeout=30
    )
 def mask_phone(phone):
    return f"{phone[:3]}****{phone[-4:]}"
 def batch_query_in(cur, sql_template, params, chunk_size=500):
    """分块执行 IN 查询，将 sql_template 中的第一个 %%s 替换为占位符"""
    results = []
    for i in range(0, len(params), chunk_size):
        chunk = params[i:i + chunk_size]
        placeholders = ",".join(["%s"] * len(chunk))
        sql = sql_template % placeholders
        cur.execute(sql, chunk)
        results.extend(cur.fetchall())
    return results
 def match_phones(phones, conn):
    if not phones:
        return {}, [], []
    # 清理手机号：移除空格、小数点等
    def clean_phone(p):
        p = str(p).strip()
        # 处理浮点数格式: 13003632311.0 -> 13003632311
        if '.' in p:
            parts = p.split('.')
            if parts[1] in ('0', '00'):
                p = parts[0]
        # 验证是11位手机号
        if re.match(r'^1\d{10}$', p):
            return p
        return None
    cleaned = [(clean_phone(p), p) for p in phones]
    valid_phones = [(cp, op) for cp, op in cleaned if cp]
    if not valid_phones:
        return {}, [], []
    masked_to_phones = {}
    for cp, op in valid_phones:
        m = mask_phone(cp)
        masked_to_phones.setdefault(m, []).append(op)
    masks = list(masked_to_phones.keys())
    cur = conn.cursor()
    placeholders = ",".join(["%s"] * len(masks))
    cur.execute(f"""
        SELECT id AS account_id, tel
        FROM bi_vala_app_account
        WHERE tel IN ({placeholders}) AND status = 1 AND deleted_at IS NULL
    """, masks)
    rows = cur.fetchall()
    cur.close()
    # masked → account_id
    masked_to_account = {}
    for aid, masked in rows:
        if masked not in masked_to_account:
            masked_to_account[masked] = aid
    # original phone → account_id
    phone_to_account = {}
    for op in phones:
        cp = clean_phone(op)
        if cp:
            m = mask_phone(cp)
            if m in masked_to_account:
                phone_to_account[op] = masked_to_account[m]
    account_ids = list(set(phone_to_account.values()))
    unmatched = [p for p in phones if p not in phone_to_account]
    return phone_to_account, account_ids, unmatched
 def query_course_details(cur, account_ids):
    if not account_ids:
        return {}
    rows = batch_query_in(cur, """
        SELECT account_id, course_level,
               CASE WHEN expire_time IS NULL THEN '体验课' ELSE '正式课' END AS course_type
        FROM bi_user_course_detail
        WHERE account_id IN (%s) AND deleted_at IS NULL
    """, account_ids)
    result = {}
    for aid, cl, ct in rows:
        if aid not in result:
            result[int(aid)] = (cl, ct)
    return result
 def query_chapter_play_records(cur, char_ids):
    """查询每个角色最近一次行课记录和总学习时间"""
    if not char_ids:
        return {}, {}
    # 构建 UNION ALL SQL - 使用 CTE 方式先创建 ID 列表
    # 创建临时ID数组
    cur.execute("CREATE TEMP TABLE IF NOT EXISTS _tmp_char_ids (id int PRIMARY KEY) ON COMMIT DROP")
    cur.execute("DELETE FROM _tmp_char_ids")
    psycopg2.extras.execute_values(cur,
        "INSERT INTO _tmp_char_ids (id) VALUES %s",
        [(cid,) for cid in char_ids]
    )
    # 查询每个角色的最近行课记录
    union_parts = []
    for i in range(SHARD_COUNT):
        union_parts.append(
            f"SELECT cpr.user_id, cpr.chapter_id, cpr.chapter_unique_id, cpr.created_at, cpr.play_status "
            f"FROM bi_user_chapter_play_record_{i} cpr "
            f"JOIN _tmp_char_ids t ON cpr.user_id = t.id "
            f"WHERE cpr.deleted_at IS NULL"
        )
    all_plays_sql = " UNION ALL ".join(union_parts)
    cur.execute(f"""
        WITH all_plays AS ({all_plays_sql})
        , latest AS (
            SELECT DISTINCT ON (user_id) user_id, chapter_id, chapter_unique_id, created_at AS latest_date
            FROM all_plays
            ORDER BY user_id, created_at DESC
        )
        SELECT user_id, chapter_id, chapter_unique_id, latest_date FROM latest
    """)
    char_latest = {}
    chapter_ids = set()
    for uid, ch_id, cu_id, ldate in cur.fetchall():
        char_latest[uid] = (ch_id, ldate)
        chapter_ids.add(ch_id)
    # 查询每个角色所有 play_status=1 的 chapter_unique_id
    cur.execute(f"""
        WITH all_plays AS ({all_plays_sql})
        SELECT user_id, chapter_unique_id
        FROM all_plays
        WHERE play_status = 1
    """)
    char_cuids = defaultdict(set)
    all_cuids = set()
    for uid, cuid in cur.fetchall():
        char_cuids[uid].add(cuid)
        all_cuids.add(cuid)
    # 查询组件耗时
    char_total_ms = defaultdict(int)
    if all_cuids:
        cuid_list = list(all_cuids)
        # 创建临时表存 cuids
        cur.execute("CREATE TEMP TABLE IF NOT EXISTS _tmp_cuids (cuid text PRIMARY KEY) ON COMMIT DROP")
        cur.execute("DELETE FROM _tmp_cuids")
        psycopg2.extras.execute_values(cur,
            "INSERT INTO _tmp_cuids (cuid) VALUES %s",
            [(c,) for c in cuid_list]
        )
        comp_parts = []
        for i in range(SHARD_COUNT):
            comp_parts.append(
                f"SELECT comp.chapter_unique_id, SUM(comp.interval_time) AS total_ms "
                f"FROM bi_user_component_play_record_{i} comp "
                f"JOIN _tmp_cuids t ON comp.chapter_unique_id = t.cuid "
                f"GROUP BY comp.chapter_unique_id"
            )
        comp_union = " UNION ALL ".join(comp_parts)
        cur.execute(f"""
            SELECT chapter_unique_id, SUM(total_ms) AS total_ms
            FROM ({comp_union}) t2
            GROUP BY chapter_unique_id
        """)
        cuid_time = {}
        for cuid, tms in cur.fetchall():
            cuid_time[cuid] = (tms or 0)
        for uid, cuids in char_cuids.items():
            total = sum(cuid_time.get(c, 0) for c in cuids)
            char_total_ms[uid] = total
    return char_latest, char_total_ms
 def query_chapter_course_map(cur, chapter_ids):
    if not chapter_ids:
        return {}
    rows = batch_query_in(cur, """
        SELECT id, course_level, course_season, course_unit, course_lesson
        FROM bi_level_unit_lesson WHERE id IN (%s)
    """, list(chapter_ids))
    return {r[0]: (r[1], r[2], r[3], r[4]) for r in rows}
 def query_payment_status(cur, account_ids):
    if not account_ids:
        return {}
    rows = batch_query_in(cur, """
        SELECT account_id, MAX(CASE WHEN order_status IN (3,4) THEN 1 ELSE 0 END) AS has_order
        FROM bi_vala_order WHERE account_id IN (%s) AND deleted_at IS NULL
        GROUP BY account_id
    """, account_ids)
    status = {int(aid): "已付费" if ho == 1 else "未付费" for aid, ho in rows}
    for aid in account_ids:
        if int(aid) not in status:
            status[int(aid)] = "未付费"
    return status
 def query_characters(cur, account_ids):
    if not account_ids:
        return {}
    rows = batch_query_in(cur, """
        SELECT account_id, id AS character_id
        FROM bi_vala_app_character
        WHERE account_id IN (%s) AND nickname IS NOT NULL AND nickname != '' AND deleted_at IS NULL
    """, account_ids)
    result = defaultdict(list)
    for aid, cid in rows:
        result[int(aid)].append(int(cid))
    return result
 def main():
    print("加载待查询行...")
    with open("/tmp/pending_rows.json") as f:
        pending_rows = json.load(f)
    print(f"共 {len(pending_rows)} 条待查询")
    with_id = []
    without_id = []
    for row in pending_rows:
        uid_str = (row.get("user_id") or "").strip()
        if uid_str:
            try:
                row["account_id"] = int(float(uid_str))
                with_id.append(row)
            except (ValueError, TypeError):
                phone = (row.get("phone") or "").strip()
                if phone:
                    row["phone_clean"] = phone
                    without_id.append(row)
        else:
            phone = (row.get("phone") or "").strip()
            if phone:
                row["phone_clean"] = phone
                without_id.append(row)
    print(f"有用户ID: {len(with_id)}, 需手机号匹配: {len(without_id)}")
    conn = get_conn()
    cur = conn.cursor()
    # Step 1: 手机号匹配
    phone_to_account = {}
    if without_id:
        phones = [r["phone_clean"] for r in without_id]
        phone_to_account, matched_ids, unmatched = match_phones(phones, conn)
        print(f"手机号匹配: {len(phone_to_account)} 成功, {len(unmatched)} 未匹配")
        for row in without_id:
            p = row["phone_clean"]
            if p in phone_to_account:
                row["account_id"] = int(phone_to_account[p])
                with_id.append(row)
    print(f"总计可查询用户: {len(with_id)}")
    all_account_ids = list(set(r["account_id"] for r in with_id))
    print(f"去重 account_id: {len(all_account_ids)}")
    if not all_account_ids:
        print("没有可查询的用户，退出")
        conn.close()
        return
    # Step 2: 查询课程分配
    print("查询课程分配...")
    course_details = query_course_details(cur, all_account_ids)
    print(f"  有课程: {len(course_details)}")
    # Step 3: 查询角色
    print("查询角色...")
    account_chars = query_characters(cur, all_account_ids)
    all_char_ids = []
    for chars in account_chars.values():
        all_char_ids.extend(chars)
    print(f"  角色数: {len(all_char_ids)}")
    # Step 4: 查询行课记录
    print("查询行课记录 (可能需要几分钟)...")
    char_latest, char_total_ms = query_chapter_play_records(cur, all_char_ids)
    print(f"  有行课记录: {len(char_latest)}")
    # Step 5: chapter 映射
    all_chapter_ids = set()
    for ch_id, _ in char_latest.values():
        all_chapter_ids.add(ch_id)
    print("查询课程映射...")
    course_chapter_map = query_chapter_course_map(cur, all_chapter_ids)
    print(f"  chapter: {len(course_chapter_map)}")
    # Step 6: 付费状态
    print("查询付费状态...")
    pay_status = query_payment_status(cur, all_account_ids)
    print(f"  付费: {sum(1 for v in pay_status.values() if v == '已付费')}")
    conn.close()
    # Step 7: 组装结果
    print("组装结果...")
    results = []
    # 按 row 去重，同一用户多行合并
    row_aid_map = {}
    for row in with_id:
        rn = row["row"]
        aid = row["account_id"]
        sales = row.get("sales", "").strip()
        if rn not in row_aid_map:
            row_aid_map[rn] = (aid, sales)
    for row in with_id:
        rn = row["row"]
        aid = row["account_id"]
        sales = row.get("sales", "").strip()
        # 课程描述
        cd = course_details.get(aid)
        if cd:
            cl, ct = cd
            level_display = COURSE_LEVEL_MAP.get(cl, cl)
            course_desc = f"{level_display}{ct}"
        else:
            course_desc = "暂无课程"
        # 最近行课和当前章节
        chars = account_chars.get(aid, [])
        latest_date = None
        chapter_pos = ""
        for cid in chars:
            if cid in char_latest:
                ch_id, ldate = char_latest[cid]
                if ldate:
                    if latest_date is None or ldate > latest_date:
                        latest_date = ldate
                    if ch_id in course_chapter_map:
                        lvl, season, unit, lesson = course_chapter_map[ch_id]
                        chapter_pos = f"{lvl}-{season}-{unit}-{lesson}"
        if not chapter_pos:
            chapter_pos = "无记录"
        date_str = latest_date.strftime("%Y-%m-%d") if latest_date else "无记录"
        # 学习时长
        total_min = 0
        for cid in chars:
            total_min += (char_total_ms.get(cid, 0) or 0)
        total_min = round(total_min / 1000 / 60)
        # 付费状态
        pay = pay_status.get(aid, "未付费")
        result_text = f"销售:{sales} | 用户:{aid} | 当前:{course_desc}-{chapter_pos} | 最近行课:{date_str} | 学习{total_min}min | {pay}"
        results.append({
            "row": rn,
            "result": result_text,
            "aid": aid
        })
    # 未匹配的
    for row in without_id:
        p = row.get("phone_clean", "")
        if p not in phone_to_account:
            results.append({
                "row": row["row"],
                "result": f"未匹配到用户",
                "aid": 0
            })
    # 输出
    with open("/tmp/results.jsonl", "w") as f:
        for r in results:
            f.write(json.dumps(r, ensure_ascii=False) + "\n")
    print(f"\n✅ 查询完成: {len(results)} 条结果")
    unrecorded = sum(1 for r in results if "无记录" in r.get("result","") and "未匹配" not in r.get("result",""))
    unmatched_count = sum(1 for r in results if "未匹配" in r.get("result",""))
    paid_count = sum(1 for r in results if "已付费" in r.get("result",""))
    print(f"   有记录: {len(results) - unrecorded - unmatched_count}")
    print(f"   无记录: {unrecorded}")
    print(f"   未匹配: {unmatched_count}")
    print(f"   已付费: {paid_count}")
 if __name__ == "__main__":
    main()
--- a/scripts/check_account_userid.py
+++ b/scripts/check_account_userid.py
@ -0,0 +1,91 @@
 import json, subprocess
 ES_URL = "https://es-7vd7jcu9.public.tencentelasticsearch.com:9200"
 AUTH = "elastic:F%?QDcWes7N2WTuiYD11"
 def count_query(must_clauses):
    q = {"query": {"bool": {"must": must_clauses}}}
    cmd = ["curl", "-s", "-u", AUTH, "-H", "Content-Type: application/json",
           f"{ES_URL}/growth_activity_behavior/_count", "-d", json.dumps(q)]
    r = subprocess.run(cmd, capture_output=True, text=True)
    return json.loads(r.stdout).get("count", 0)
 total = count_query([])
 # 四种组合
 nonzero_account = {"bool": {"must_not": [{"term": {"accountId": 0}}]}}
 nonzero_user = {"bool": {"must_not": [{"term": {"userId": 0}}]}}
 zero_account = {"term": {"accountId": 0}}
 zero_user = {"term": {"userId": 0}}
 a0_u0 = count_query([zero_account, zero_user])
 a0_ux = count_query([zero_account, nonzero_user])
 ax_u0 = count_query([nonzero_account, zero_user])
 ax_ux = count_query([nonzero_account, nonzero_user])
 print("=== growth_activity_behavior 中 accountId x userId 组合 ===\n")
 print(f"{'accountId':>12} {'userId':>12} {'条数':>10} {'占比':>10}")
 print("-" * 48)
 print(f"{'=0':>12} {'=0':>12} {a0_u0:>10} {a0_u0/total*100:>9.1f}%")
 print(f"{'=0':>12} {'>0':>12} {a0_ux:>10} {a0_ux/total*100:>9.1f}%")
 print(f"{'>0':>12} {'=0':>12} {ax_u0:>10} {ax_u0/total*100:>9.1f}%")
 print(f"{'>0':>12} {'>0':>12} {ax_ux:>10} {ax_ux/total*100:>9.1f}%")
 print(f"{'总计':>12} {'':>12} {total:>10}")
 # 按 subBehavior 细分 - ES aggregation
 print("\n=== 按 behavior + subBehavior 细分 ===\n")
 agg_query = {
    "size": 0,
    "aggs": {
        "behaviors": {
            "terms": {"field": "behavior", "size": 10},
            "aggs": {
                "subs": {
                    "terms": {"field": "subBehavior", "size": 10},
                    "aggs": {
                        "a0_u0": {"filter": {"bool": {"must": [zero_account, zero_user]}}},
                        "a0_ux": {"filter": {"bool": {"must": [zero_account, nonzero_user]}}},
                        "ax_u0": {"filter": {"bool": {"must": [nonzero_account, zero_user]}}},
                        "ax_ux": {"filter": {"bool": {"must": [nonzero_account, nonzero_user]}}}
                    }
                }
            }
        }
    }
 }
 cmd = ["curl", "-s", "-u", AUTH, "-H", "Content-Type: application/json",
       f"{ES_URL}/growth_activity_behavior/_search", "-d", json.dumps(agg_query)]
 r = subprocess.run(cmd, capture_output=True, text=True)
 data = json.loads(r.stdout)
 header = f"{'behavior / subBehavior':<40} {'总计':>8} {'a=0,u=0':>8} {'a=0,u>0':>8} {'a>0,u=0':>8} {'a>0,u>0':>8}"
 print(header)
 print("-" * len(header))
 for bb in data["aggregations"]["behaviors"]["buckets"]:
    bname = bb["key"]
    for sb in bb["subs"]["buckets"]:
        sname = sb["key"]
        n = sb["doc_count"]
        a0u0 = sb["a0_u0"]["doc_count"]
        a0ux = sb["a0_ux"]["doc_count"]
        axu0 = sb["ax_u0"]["doc_count"]
        axux = sb["ax_ux"]["doc_count"]
        label = f"{bname} / {sname}"
        print(f"{label:<40} {n:>8} {a0u0:>8} {a0ux:>8} {axu0:>8} {axux:>8}")
 # userId 不存在的记录
 print(f"\n=== userId 字段不存在 的记录 ===")
 u_nonexist = count_query([{"bool": {"must_not": [{"exists": {"field": "userId"}}]}}])
 print(f"userId 字段不存在: {u_nonexist}")
 # 抽 userId=0 的样本
 cmd = ["curl", "-s", "-u", AUTH, "-H", "Content-Type: application/json",
       f"{ES_URL}/growth_activity_behavior/_search?size=3",
       "-d", '{"query":{"term":{"userId":0}},"_source":true}']
 r = subprocess.run(cmd, capture_output=True, text=True)
 data = json.loads(r.stdout)
 print(f"\nuserId=0 记录数: {data['hits']['total']['value']}")
 for h in data['hits']['hits']:
    print(f"  {json.dumps(h['_source'], ensure_ascii=False)}")
--- a/scripts/endor_purchase_analysis.py
+++ b/scripts/endor_purchase_analysis.py
@ -0,0 +1,437 @@
 #!/usr/bin/env python3
 """
 端内购买用户行为分析：注册日期 → 序章完课日期 → 购课日期 三者关系
 """
 import psycopg2
 from datetime import datetime, date
 from collections import defaultdict, Counter
 import os
 PG_PASS = os.environ.get('PG_ONLINE_PASSWORD', "LdfjdjL83h3h3^$&**YGG*")
 conn = psycopg2.connect(
    host="bj-postgres-16pob4sg.sql.tencentcdb.com",
    port=28591,
    user="ai_member",
    password=PG_PASS,
    database="vala_bi"
 )
 cur = conn.cursor()
 # U0 序章 chapter_id
 L1_U0 = [343, 344, 345, 346, 348]
 L2_U0 = [55, 56, 57, 58, 59]
 print("=" * 80)
 print("端内购买用户行为分析")
 print("=" * 80)
 # ========== Step 1: 端内付费用户 ==========
 print("\n[Step 1] 查询端内付费用户...")
 cur.execute("""
    SELECT DISTINCT ON (o.account_id)
        o.account_id,
        o.goods_id,
        o.pay_success_date as purchase_date,
        CASE 
            WHEN o.goods_id IN (57, 60, 63) THEN 'L1'
            WHEN o.goods_id IN (31, 32, 33, 54) THEN 'L2'
            ELSE 'other'
        END as buy_level
    FROM bi_vala_order o
    JOIN bi_vala_app_account a ON o.account_id = a.id AND a.status = 1
    WHERE o.key_from = 'app-active-h5-0-0'
      AND o.order_status IN (3, 4)
      AND o.pay_success_date IS NOT NULL
    ORDER BY o.account_id, o.pay_success_date
 """)
 endo_users = {}
 for row in cur.fetchall():
    account_id, goods_id, purchase_date, buy_level = row
    if buy_level == 'other':
        continue
    endo_users[account_id] = {
        'goods_id': goods_id,
        'purchase_date': purchase_date,
        'buy_level': buy_level
    }
 print(f"  端内付费用户数: {len(endo_users)}")
 l1_users = {k: v for k, v in endo_users.items() if v['buy_level'] == 'L1'}
 l2_users = {k: v for k, v in endo_users.items() if v['buy_level'] == 'L2'}
 print(f"  - L1 购买用户: {len(l1_users)}")
 print(f"  - L2 购买用户: {len(l2_users)}")
 # ========== Step 2: 注册时间 ==========
 print("\n[Step 2] 查询用户注册时间...")
 all_ids = list(endo_users.keys())
 batch_size = 500
 register_map = {}
 for i in range(0, len(all_ids), batch_size):
    batch = all_ids[i:i+batch_size]
    cur.execute(
        "SELECT id, created_at FROM bi_vala_app_account WHERE id IN %s AND status = 1",
        (tuple(batch),)
    )
    for row in cur.fetchall():
        register_map[row[0]] = row[1]
 print(f"  找到注册时间的用户: {len(register_map)}")
 # ========== Step 3: account_id -> user_id (character) 映射 ==========
 print("\n[Step 3] 查询用户角色映射...")
 char_map = {}  # account_id -> [user_id, ...]
 for i in range(0, len(all_ids), batch_size):
    batch = all_ids[i:i+batch_size]
    cur.execute(
        "SELECT account_id, id FROM bi_vala_app_character WHERE account_id IN %s",
        (tuple(batch),)
    )
    for row in cur.fetchall():
        acct = row[0]
        uid = row[1]
        if acct not in char_map:
            char_map[acct] = []
        char_map[acct].append(uid)
 print(f"  有角色的用户数: {len(char_map)}")
 # ========== Step 4: 序章完课时间 ==========
 print("\n[Step 4] 查询序章完课时间（8张分表）...")
 # 收集所有 user_id，按等级分
 l1_user_ids = set()
 l2_user_ids = set()
 for acct in l1_users:
    if acct in char_map:
        for uid in char_map[acct]:
            l1_user_ids.add(uid)
 for acct in l2_users:
    if acct in char_map:
        for uid in char_map[acct]:
            l2_user_ids.add(uid)
 print(f"  L1 相关 user_id 数: {len(l1_user_ids)}")
 print(f"  L2 相关 user_id 数: {len(l2_user_ids)}")
 def query_prologue_completion(cur, user_ids, chapter_ids, table_prefix="bi_user_chapter_play_record"):
    """查询序章完课最早时间，返回 {user_id: earliest_date}"""
    result = {}
    if not user_ids:
        return result
    # 分表查询
    union_parts = []
    for t in range(8):
        union_parts.append(f"""
            SELECT user_id, MIN(created_at) as first_done
            FROM {table_prefix}_{t}
            WHERE chapter_id IN ({','.join(map(str, chapter_ids))})
              AND play_status = 1
              AND user_id IN %(user_ids)s
            GROUP BY user_id
        """)
    sql = " UNION ALL ".join(union_parts)
    sql = f"""
        SELECT user_id, MIN(first_done) as earliest
        FROM ({sql}) sub
        GROUP BY user_id
    """
    batch_size_uid = 2000
    uid_list = list(user_ids)
    for i in range(0, len(uid_list), batch_size_uid):
        batch = tuple(uid_list[i:i+batch_size_uid])
        cur.execute(sql, {'user_ids': batch})
        for row in cur.fetchall():
            result[row[0]] = row[1]
    return result
 # L1 序章
 l1_prologue = query_prologue_completion(cur, l1_user_ids, L1_U0)
 print(f"  L1 序章有完课记录的 user_id: {len(l1_prologue)}")
 # L2 序章
 l2_prologue = query_prologue_completion(cur, l2_user_ids, L2_U0)
 print(f"  L2 序章有完课记录的 user_id: {len(l2_prologue)}")
 # ========== Step 5: 关联分析 ==========
 print("\n[Step 5] 关联分析...")
 # 对每个端内购买用户，找到他的角色中对应等级序章最早完课时间
 def get_earliest_prologue(account_id, buy_level):
    """获取用户对应等级序章的最早完课时间"""
    if account_id not in char_map:
        return None
    user_ids = char_map[account_id]
    earliest = None
    prologue_map = l1_prologue if buy_level == 'L1' else l2_prologue
    for uid in user_ids:
        if uid in prologue_map:
            d = prologue_map[uid]
            if earliest is None or d < earliest:
                earliest = d
    return earliest
 # 构建分析数据
 records = []
 for account_id, info in endo_users.items():
    register_date = register_map.get(account_id)
    purchase_date = info['purchase_date']
    buy_level = info['buy_level']
    if register_date is None:
        continue
    prologue_date = get_earliest_prologue(account_id, buy_level)
    # 计算天数差
    if isinstance(register_date, datetime):
        reg_d = register_date.date()
    else:
        reg_d = register_date
    if isinstance(purchase_date, datetime):
        pur_d = purchase_date.date()
    elif isinstance(purchase_date, date):
        pur_d = purchase_date
    else:
        pur_d = purchase_date
    if prologue_date:
        if isinstance(prologue_date, datetime):
            pro_d = prologue_date.date()
        else:
            pro_d = prologue_date
        reg_to_pro = (pro_d - reg_d).days
        pro_to_pur = (pur_d - pro_d).days
        reg_to_pur = (pur_d - reg_d).days
        has_prologue = True
    else:
        reg_to_pro = None
        pro_to_pur = None
        reg_to_pur = (pur_d - reg_d).days
        has_prologue = False
    records.append({
        'account_id': account_id,
        'level': buy_level,
        'register_date': reg_d,
        'prologue_date': prologue_date.date() if has_prologue and isinstance(prologue_date, datetime) else (prologue_date if has_prologue else None),
        'purchase_date': pur_d,
        'reg_to_pro': reg_to_pro,
        'pro_to_pur': pro_to_pur,
        'reg_to_pur': reg_to_pur,
        'has_prologue': has_prologue
    })
 # ========== Step 6: 统计输出 ==========
 print(f"\n总分析用户数: {len(records)}")
 # --- 总体统计 ---
 with_prologue = [r for r in records if r['has_prologue']]
 without_prologue = [r for r in records if not r['has_prologue']]
 print(f"\n{'='*80}")
 print("总体概览")
 print(f"{'='*80}")
 print(f"端内付费用户: {len(records)}")
 print(f"  有L1序章完课记录: {len([r for r in with_prologue if r['level']=='L1'])}")
 print(f"  有L2序章完课记录: {len([r for r in with_prologue if r['level']=='L2'])}")
 print(f"  无对应序章完课记录: {len(without_prologue)}")
 # --- 天数分布 ---
 def percentile(sorted_vals, p):
    if not sorted_vals:
        return None
    idx = int(len(sorted_vals) * p / 100)
    return sorted_vals[min(idx, len(sorted_vals)-1)]
 def print_stats(name, vals, unit='天'):
    if not vals:
        print(f"  {name}: 无数据")
        return
    s = sorted(vals)
    print(f"  {name}:")
    print(f"    样本数: {len(s)}")
    print(f"    中位数: {percentile(s, 50):.1f}{unit}")
    print(f"    平均值: {sum(s)/len(s):.1f}{unit}")
    print(f"    P25: {percentile(s, 25):.1f}{unit}")
    print(f"    P75: {percentile(s, 75):.1f}{unit}")
    print(f"    P90: {percentile(s, 90):.1f}{unit}")
 print(f"\n{'='*80}")
 print("全量统计：注册→购课 天数")
 print(f"{'='*80}")
 print_stats("全量 注册→购课", [r['reg_to_pur'] for r in records])
 print(f"\n{'='*80}")
 print("有序章完课用户统计")
 print(f"{'='*80}")
 print_stats("序章完课 注册→完课 天数", [r['reg_to_pro'] for r in with_prologue])
 print_stats("序章完课 完课→购课 天数", [r['pro_to_pur'] for r in with_prologue])
 print_stats("序章完课 注册→购课 天数", [r['reg_to_pur'] for r in with_prologue])
 # --- 按等级拆分 ---
 for level in ['L1', 'L2']:
    lvl = [r for r in records if r['level'] == level]
    lvl_pro = [r for r in lvl if r['has_prologue']]
    lvl_no = [r for r in lvl if not r['has_prologue']]
    print(f"\n{'='*80}")
    print(f"{level} 购买用户统计")
    print(f"{'='*80}")
    print(f"  {level} 总用户: {len(lvl)}")
    print(f"  有{level}序章完课: {len(lvl_pro)}")
    print(f"  无{level}序章完课: {len(lvl_no)}")
    print_stats(f"\n  {level} 全量 注册→购课", [r['reg_to_pur'] for r in lvl])
    if lvl_pro:
        print_stats(f"  {level} 序章完课 注册→完课", [r['reg_to_pro'] for r in lvl_pro])
        print_stats(f"  {level} 序章完课 完课→购课", [r['pro_to_pur'] for r in lvl_pro])
        print_stats(f"  {level} 序章完课 注册→购课", [r['reg_to_pur'] for r in lvl_pro])
 # --- 完课→购课的时间段分布 ---
 print(f"\n{'='*80}")
 print("有完课用户：完课→购课 时间段分布")
 print(f"{'='*80}")
 buckets = {
    '0天（当天购课）': (0, 0),
    '1-3天': (1, 3),
    '4-7天': (4, 7),
    '8-14天': (8, 14),
    '15-30天': (15, 30),
    '31-60天': (31, 60),
    '61-90天': (61, 90),
    '90天以上': (91, 99999),
    '负数（购课在完课前）': (-99999, -1),
 }
 for level in ['L1', 'L2', 'ALL']:
    if level == 'ALL':
        data = with_prologue
    else:
        data = [r for r in with_prologue if r['level'] == level]
    if not data:
        continue
    vals = [r['pro_to_pur'] for r in data]
    print(f"\n[{level}] 完课→购课 天数分布:")
    for label, (lo, hi) in buckets.items():
        cnt = sum(1 for v in vals if lo <= v <= hi)
        pct = cnt / len(vals) * 100
        bar = '█' * int(pct / 2)
        print(f"  {label:20s}: {cnt:4d} ({pct:5.1f}%) {bar}")
 # --- 注册→完课时间段分布 ---
 print(f"\n{'='*80}")
 print("有完课用户：注册→完课 时间段分布")
 print(f"{'='*80}")
 for level in ['L1', 'L2', 'ALL']:
    if level == 'ALL':
        data = with_prologue
    else:
        data = [r for r in with_prologue if r['level'] == level]
    if not data:
        continue
    vals = [r['reg_to_pro'] for r in data]
    print(f"\n[{level}] 注册→完课 天数分布:")
    for label, (lo, hi) in buckets.items():
        if hi < 0:
            continue
        cnt = sum(1 for v in vals if lo <= v <= hi)
        pct = cnt / len(vals) * 100
        bar = '█' * int(pct / 2)
        print(f"  {label:20s}: {cnt:4d} ({pct:5.1f}%) {bar}")
 # --- 注册→购课 时间段分布 ---
 print(f"\n{'='*80}")
 print("注册→购课 时间段分布")
 print(f"{'='*80}")
 for level in ['L1', 'L2', 'ALL']:
    if level == 'ALL':
        data = records
    else:
        data = [r for r in records if r['level'] == level]
    if not data:
        continue
    vals = [r['reg_to_pur'] for r in data]
    print(f"\n[{level}] 注册→购课 天数分布:")
    extended_buckets = {
        '0天（当天购课）': (0, 0),
        '1天': (1, 1),
        '2天': (2, 2),
        '3天': (3, 3),
        '4-7天': (4, 7),
        '8-14天': (8, 14),
        '15-30天': (15, 30),
        '31-60天': (31, 60),
        '61-90天': (61, 90),
        '90-180天': (91, 180),
        '180天以上': (181, 99999),
    }
    for label, (lo, hi) in extended_buckets.items():
        cnt = sum(1 for v in vals if lo <= v <= hi)
        pct = cnt / len(vals) * 100
        bar = '█' * int(pct / 2)
        print(f"  {label:20s}: {cnt:4d} ({pct:5.1f}%) {bar}")
 # --- 完课在购课之前 vs 之后 ---
 print(f"\n{'='*80}")
 print("完课 vs 购课 时间关系")
 print(f"{'='*80}")
 for level in ['L1', 'L2', 'ALL']:
    if level == 'ALL':
        data = with_prologue
    else:
        data = [r for r in with_prologue if r['level'] == level]
    if not data:
        continue
    pro_before_pur = [r for r in data if r['pro_to_pur'] >= 0]
    pro_after_pur = [r for r in data if r['pro_to_pur'] < 0]
    pro_same_day = [r for r in data if r['pro_to_pur'] == 0]
    print(f"\n[{level}]:")
    print(f"  完课在购课之前或同一天: {len(pro_before_pur)} ({len(pro_before_pur)/len(data)*100:.1f}%)")
    print(f"    其中同一天: {len(pro_same_day)} ({len(pro_same_day)/len(data)*100:.1f}%)")
    print(f"  完课在购课之后: {len(pro_after_pur)} ({len(pro_after_pur)/len(data)*100:.1f}%)")
 # ========== 导出详细CSV ==========
 print(f"\n{'='*80}")
 print("导出详细数据...")
 print(f"{'='*80}")
 import csv
 output_path = '/root/.openclaw/workspace/output/endor_purchase_analysis.csv'
 with open(output_path, 'w', newline='', encoding='utf-8-sig') as f:
    writer = csv.writer(f)
    writer.writerow(['account_id', '等级', '注册日期', '序章完课日期', '购课日期', 
                     '注册→完课(天)', '完课→购课(天)', '注册→购课(天)', '有序章完课'])
    for r in sorted(records, key=lambda x: x['purchase_date']):
        writer.writerow([
            r['account_id'], r['level'], r['register_date'], r['prologue_date'], r['purchase_date'],
            r['reg_to_pro'] if r['reg_to_pro'] is not None else '',
            r['pro_to_pur'] if r['pro_to_pur'] is not None else '',
            r['reg_to_pur'], '是' if r['has_prologue'] else '否'
        ])
 print(f"  已导出到: {output_path}")
 cur.close()
 conn.close()
 print("\n分析完成!")
--- a/scripts/endor_purchase_analysis_v2.py
+++ b/scripts/endor_purchase_analysis_v2.py
@ -0,0 +1,500 @@
 #!/usr/bin/env python3
 """
 端内购买用户行为分析 V2
 口径修正：
  1. 按订单维度：同一用户买L1+L2拆为两条记录
  2. 序章完课 = U0全部五节(L01~L05)完成，取最后一节完成日期
  3. 新增：购课时序章完成到第几节课
 """
 import psycopg2
 from datetime import datetime, date
 from collections import defaultdict, Counter
 import os
 import csv
 PG_PASS = os.environ.get('PG_ONLINE_PASSWORD', "LdfjdjL83h3h3^$&**YGG*")
 conn = psycopg2.connect(
    host="bj-postgres-16pob4sg.sql.tencentcdb.com",
    port=28591,
    user="ai_member",
    password=PG_PASS,
    database="vala_bi"
 )
 cur = conn.cursor()
 # U0 序章 chapter_id -> lesson_name
 L1_U0_LESSONS = {343: 'L01', 344: 'L02', 345: 'L03', 346: 'L04', 348: 'L05'}
 L2_U0_LESSONS = {55: 'L01', 56: 'L02', 57: 'L03', 58: 'L04', 59: 'L05'}
 LEVEL_MAP = {
    57: 'L1', 60: 'L1', 63: 'L1',
    31: 'L2', 32: 'L2', 33: 'L2', 54: 'L2',
    61: 'L1+L2',
 }
 print("=" * 80)
 print("端内购买用户行为分析 V2")
 print("口径：序章完课 = U0全部五节完成 | 订单维度 | 购课时进度分析")
 print("=" * 80)
 # ========== Step 1: 端内付费订单（按订单维度） ==========
 print("\n[Step 1] 查询端内付费订单...")
 cur.execute("""
    SELECT 
        o.id as order_id,
        o.account_id,
        o.goods_id,
        o.pay_success_date,
        o.trade_no
    FROM bi_vala_order o
    JOIN bi_vala_app_account a ON o.account_id = a.id AND a.status = 1
    WHERE o.key_from = 'app-active-h5-0-0'
      AND o.order_status IN (3, 4)
      AND o.pay_success_date IS NOT NULL
    ORDER BY o.pay_success_date
 """)
 orders = []  # list of {order_id, account_id, goods_id, purchase_date, level}
 for row in cur.fetchall():
    order_id, account_id, goods_id, purchase_date, trade_no = row
    level = LEVEL_MAP.get(goods_id)
    if level is None or level == 'L1+L2':
        continue
    if isinstance(purchase_date, datetime):
        purchase_date = purchase_date.date()
    orders.append({
        'order_id': order_id,
        'account_id': account_id,
        'goods_id': goods_id,
        'purchase_date': purchase_date,
        'level': level
    })
 print(f"  端内付费订单数: {len(orders)}")
 print(f"  - L1 订单: {sum(1 for o in orders if o['level']=='L1')}")
 print(f"  - L2 订单: {sum(1 for o in orders if o['level']=='L2')}")
 # 统计有多少用户同时买了L1+L2
 user_levels = defaultdict(set)
 for o in orders:
    user_levels[o['account_id']].add(o['level'])
 dual_users = [uid for uid, lvls in user_levels.items() if len(lvls) > 1]
 print(f"  同时购买L1+L2的用户: {len(dual_users)}")
 # ========== Step 2: 注册时间 ==========
 print("\n[Step 2] 查询用户注册时间...")
 all_accounts = list(set(o['account_id'] for o in orders))
 register_map = {}
 batch_size = 500
 for i in range(0, len(all_accounts), batch_size):
    batch = all_accounts[i:i+batch_size]
    cur.execute(
        "SELECT id, created_at FROM bi_vala_app_account WHERE id IN %s",
        (tuple(batch),)
    )
    for row in cur.fetchall():
        d = row[1]
        register_map[row[0]] = d.date() if isinstance(d, (datetime, date)) else d
 print(f"  找到注册时间: {len(register_map)}")
 # ========== Step 3: account_id -> user_id 映射 ==========
 print("\n[Step 3] 查询用户角色映射...")
 char_map = defaultdict(list)
 for i in range(0, len(all_accounts), batch_size):
    batch = all_accounts[i:i+batch_size]
    cur.execute(
        "SELECT account_id, id FROM bi_vala_app_character WHERE account_id IN %s",
        (tuple(batch),)
    )
    for row in cur.fetchall():
        char_map[row[0]].append(row[1])
 print(f"  有角色的用户: {sum(1 for uid, chars in char_map.items() if chars)}")
 # ========== Step 4: 序章每节课的最早完成时间 ==========
 print("\n[Step 4] 查询序章每节课完成时间（8张分表）...")
 def query_lesson_completion(cur, user_ids, chapter_ids_dict):
    """
    查询每个 user_id 对每个 chapter_id 的最早完成时间
    返回: {user_id: {chapter_id: earliest_date}}
    """
    result = defaultdict(dict)
    if not user_ids:
        return result
    # 先按user_ids批量查询
    all_chapter_ids = list(chapter_ids_dict.keys())
    union_parts = []
    for t in range(8):
        union_parts.append(f"""
            SELECT user_id, chapter_id, MIN(created_at) as first_done
            FROM bi_user_chapter_play_record_{t}
            WHERE chapter_id IN ({','.join(map(str, all_chapter_ids))})
              AND play_status = 1
              AND user_id IN %(user_ids)s
            GROUP BY user_id, chapter_id
        """)
    sql = " UNION ALL ".join(union_parts)
    sql = f"""
        SELECT user_id, chapter_id, MIN(first_done) as earliest
        FROM ({sql}) sub
        GROUP BY user_id, chapter_id
    """
    uid_list = list(user_ids)
    for i in range(0, len(uid_list), 2000):
        batch = tuple(uid_list[i:i+2000])
        cur.execute(sql, {'user_ids': batch})
        for row in cur.fetchall():
            uid, cid, dt = row
            result[uid][cid] = dt.date() if isinstance(dt, (datetime, date)) else dt
    return result
 # 收集所有user_id
 all_user_ids = set()
 for acct in all_accounts:
    for uid in char_map.get(acct, []):
        all_user_ids.add(uid)
 print(f"  总 user_id 数: {len(all_user_ids)}")
 # L1 序章
 l1_lesson_data = query_lesson_completion(cur, all_user_ids, L1_U0_LESSONS)
 # L2 序章
 l2_lesson_data = query_lesson_completion(cur, all_user_ids, L2_U0_LESSONS)
 print(f"  L1 序章有完成记录的 user_id: {len(l1_lesson_data)}")
 print(f"  L2 序章有完成记录的 user_id: {len(l2_lesson_data)}")
 # ========== Step 5: 对每个订单分析序章进度 ==========
 print("\n[Step 5] 逐订单分析序章进度...")
 def analyze_user_prologue(user_ids, lesson_data, lesson_map):
    """
    汇总用户所有角色在序章各节的完成情况
    返回: {chapter_id: earliest_completion_date} (只包含完成的)
    """
    merged = {}
    for uid in user_ids:
        if uid not in lesson_data:
            continue
        for cid, dt in lesson_data[uid].items():
            if cid not in merged or dt < merged[cid]:
                merged[cid] = dt
    return merged
 def get_prologue_status(lesson_completions, lesson_map, purchase_date):
    """
    分析序章完成情况：
    - total_done: 总共完成了几节（不限时间）
    - done_before_purchase: 购课前完成了哪几节
    - done_after_purchase: 购课后完成了哪几节
    - all_five_done: 是否五节全完成
    - all_five_done_date: 五节全完成的日期（最后一节）
    - progress_at_purchase: 购课时的进度（完成了几节）
    """
    total_done = len(lesson_completions)
    done_before = {}
    done_after = {}
    for cid, dt in lesson_completions.items():
        if dt <= purchase_date:
            done_before[cid] = dt
        else:
            done_after[cid] = dt
    # 是否五节全完成
    all_five_done = total_done == 5
    all_five_done_date = max(lesson_completions.values()) if all_five_done else None
    # 购课时完成了几节
    progress_at_purchase = len(done_before)
    # 购课时哪些节已完成
    lessons_done_at_purchase = sorted(
        [lesson_map[cid] for cid in done_before],
        key=lambda x: int(x[1:])  # L01, L02, ...
    )
    return {
        'total_done': total_done,
        'done_before_count': len(done_before),
        'done_after_count': len(done_after),
        'all_five_done': all_five_done,
        'all_five_done_date': all_five_done_date,
        'progress_at_purchase': progress_at_purchase,
        'lessons_done_at_purchase': lessons_done_at_purchase,
        'done_before': done_before,
        'done_after': done_after,
    }
 records = []
 for o in orders:
    account_id = o['account_id']
    level = o['level']
    purchase_date = o['purchase_date']
    register_date = register_map.get(account_id)
    if register_date is None:
        continue
    user_ids = char_map.get(account_id, [])
    lesson_map = L1_U0_LESSONS if level == 'L1' else L2_U0_LESSONS
    lesson_data = l1_lesson_data if level == 'L1' else l2_lesson_data
    user_completions = analyze_user_prologue(user_ids, lesson_data, lesson_map)
    status = get_prologue_status(user_completions, lesson_map, purchase_date)
    # 注册→购课天数
    reg_to_pur = (purchase_date - register_date).days
    # 注册→序章全完成天数
    reg_to_all5 = None
    if status['all_five_done_date']:
        reg_to_all5 = (status['all_five_done_date'] - register_date).days
    # 序章全完成→购课天数
    all5_to_pur = None
    if status['all_five_done_date']:
        all5_to_pur = (purchase_date - status['all_five_done_date']).days
    records.append({
        'order_id': o['order_id'],
        'account_id': account_id,
        'level': level,
        'register_date': register_date,
        'purchase_date': purchase_date,
        'reg_to_pur': reg_to_pur,
        # 序章进度
        'total_lessons_done': status['total_done'],
        'lessons_done_before_purchase': status['done_before_count'],
        'lessons_done_after_purchase': status['done_after_count'],
        'progress_at_purchase': status['progress_at_purchase'],
        'lessons_at_purchase': ','.join(status['lessons_done_at_purchase']),
        # 五节全完成
        'all_five_done': status['all_five_done'],
        'all_five_done_date': status['all_five_done_date'],
        'reg_to_all5': reg_to_all5,
        'all5_to_pur': all5_to_pur,
    })
 print(f"  分析记录数: {len(records)}")
 # ========== Step 6: 统计分析 ==========
 print(f"\n{'='*80}")
 print("【总体概览】")
 print(f"{'='*80}")
 total_records = len(records)
 all5_complete = [r for r in records if r['all_five_done']]
 all5_incomplete = [r for r in records if not r['all_five_done']]
 l1_records = [r for r in records if r['level'] == 'L1']
 l2_records = [r for r in records if r['level'] == 'L2']
 print(f"总记录数（订单维度）: {total_records}")
 print(f"  - L1 订单: {len(l1_records)}")
 print(f"  - L2 订单: {len(l2_records)}")
 print(f"  序章五节全完成: {len(all5_complete)}")
 print(f"  序章五节未全完成: {len(all5_incomplete)}")
 # ========== 购课时序章进度分布 ==========
 print(f"\n{'='*80}")
 print("【核心指标：购课时序章完成进度】")
 print(f"{'='*80}")
 for level in ['L1', 'L2', 'ALL']:
    if level == 'ALL':
        data = records
    else:
        data = [r for r in records if r['level'] == level]
    if not data:
        continue
    print(f"\n[{level}] 购课时完成了序章第几节:")
    progress_dist = Counter(r['progress_at_purchase'] for r in data)
    for p in range(6):  # 0到5
        cnt = progress_dist.get(p, 0)
        pct = cnt / len(data) * 100
        label = f"  完成{p}节(共5节)" if p > 0 else "  完成0节（未开始）"
        bar = '█' * int(pct)
        print(f"  {label:24s}: {cnt:4d} ({pct:5.1f}%) {bar}")
    # 具体到哪几节
    print(f"\n[{level}] 购课时已完成哪些课时的分布:")
    lesson_dist = Counter()
    for r in data:
        if r['progress_at_purchase'] == 0:
            lesson_dist['(未完成任何节)'] += 1
        else:
            lesson_dist[r['lessons_at_purchase']] += 1
    for lesson_combo, cnt in lesson_dist.most_common(15):
        pct = cnt / len(data) * 100
        label = f"  {lesson_combo}"
        print(f"  {label:24s}: {cnt:4d} ({pct:5.1f}%)")
 # ========== 购课后完成的情况 ==========
 print(f"\n{'='*80}")
 print("【购课后继续完成序章的情况】")
 print(f"{'='*80}")
 for level in ['L1', 'L2', 'ALL']:
    if level == 'ALL':
        data = records
    else:
        data = [r for r in records if r['level'] == level]
    if not data:
        continue
    print(f"\n[{level}]:")
    print(f"  购课时序章已全部完成(5/5): {sum(1 for r in data if r['progress_at_purchase']==5)} ({sum(1 for r in data if r['progress_at_purchase']==5)/len(data)*100:.1f}%)")
    print(f"  购课后续完成序章余下课时: {sum(1 for r in data if r['lessons_done_after_purchase']>0)} ({sum(1 for r in data if r['lessons_done_after_purchase']>0)/len(data)*100:.1f}%)")
    # 购课后完成了多少节
    after_dist = Counter(r['lessons_done_after_purchase'] for r in data if r['progress_at_purchase'] < 5)
    if after_dist:
        print(f"  购课时未完成的用户，购课后又完成节数分布:")
        for n in sorted(after_dist.keys()):
            cnt = after_dist[n]
            pct = cnt / sum(1 for r in data if r['progress_at_purchase'] < 5) * 100
            print(f"    又完成{n}节: {cnt} ({pct:.1f}%)")
 # ========== 注册→购课天数 ==========
 def percentile_vals(vals):
    if not vals:
        return {}
    s = sorted(vals)
    return {
        'P10': s[int(len(s)*0.10)],
        'P25': s[int(len(s)*0.25)],
        'P50': s[int(len(s)*0.50)],
        'P75': s[int(len(s)*0.75)],
        'P90': s[int(len(s)*0.90)],
        'mean': sum(s)/len(s),
        'count': len(s),
    }
 print(f"\n{'='*80}")
 print("【注册→购课 天数统计】")
 print(f"{'='*80}")
 for level in ['L1', 'L2', 'ALL']:
    if level == 'ALL':
        data = records
    else:
        data = [r for r in records if r['level'] == level]
    stat = percentile_vals([r['reg_to_pur'] for r in data])
    if stat:
        print(f"\n[{level}] 注册→购课:")
        print(f"  样本: {stat['count']}, 中位数: {stat['P50']}天, 均值: {stat['mean']:.1f}天")
        print(f"  P10={stat['P10']}天 | P25={stat['P25']}天 | P75={stat['P75']}天 | P90={stat['P90']}天")
 # ========== 序章五节全完成的用户统计 ==========
 print(f"\n{'='*80}")
 print("【序章五节全完成的用户（限定已完成5/5的用户）】")
 print(f"{'='*80}")
 all5_l1 = [r for r in all5_complete if r['level']=='L1']
 all5_l2 = [r for r in all5_complete if r['level']=='L2']
 print(f"\n五节全完成用户: {len(all5_complete)}")
 print(f"  L1: {len(all5_l1)}")
 print(f"  L2: {len(all5_l2)}")
 # 注册→全完成
 for name, data in [('L1', all5_l1), ('L2', all5_l2), ('ALL', all5_complete)]:
    vals = [r['reg_to_all5'] for r in data if r['reg_to_all5'] is not None]
    stat = percentile_vals(vals)
    if stat:
        print(f"\n[{name}] 注册→序章五节全完成:")
        print(f"  样本: {stat['count']}, 中位数: {stat['P50']}天, 均值: {stat['mean']:.1f}天")
        print(f"  P10={stat['P10']}天 | P25={stat['P25']}天 | P75={stat['P75']}天 | P90={stat['P90']}天")
 # 全完成→购课
 for name, data in [('L1', all5_l1), ('L2', all5_l2), ('ALL', all5_complete)]:
    vals = [r['all5_to_pur'] for r in data if r['all5_to_pur'] is not None]
    stat = percentile_vals(vals)
    if stat:
        print(f"\n[{name}] 序章五节全完成→购课:")
        print(f"  样本: {stat['count']}, 中位数: {stat['P50']}天, 均值: {stat['mean']:.1f}天")
        print(f"  P10={stat['P10']}天 | P25={stat['P25']}天 | P75={stat['P75']}天 | P90={stat['P90']}天")
 # 全完成→购课时间段分布
 print(f"\n【五节全完成→购课 时间段分布】")
 print(f"{'='*60}")
 buckets = {
    '负数（先购课后完课）': (-99999, -1),
    '0天(当天)': (0, 0),
    '1~3天': (1, 3),
    '4~7天': (4, 7),
    '8~14天': (8, 14),
    '15~30天': (15, 30),
    '31~60天': (31, 60),
    '61~90天': (61, 90),
    '90天以上': (91, 99999),
 }
 for name, data in [('L1', all5_l1), ('L2', all5_l2), ('ALL', all5_complete)]:
    vals = [r['all5_to_pur'] for r in data if r['all5_to_pur'] is not None]
    if not vals:
        continue
    print(f"\n[{name}]:")
    for label, (lo, hi) in buckets.items():
        cnt = sum(1 for v in vals if lo <= v <= hi)
        pct = cnt / len(vals) * 100
        bar = '█' * int(pct / 2)
        print(f"  {label:24s}: {cnt:4d} ({pct:5.1f}%) {bar}")
 # ========== 时序关系 ==========
 print(f"\n{'='*80}")
 print("【五节全完成的用户：序章 vs 购课 时序关系】")
 print(f"{'='*80}")
 for name, data in [('L1', all5_l1), ('L2', all5_l2), ('ALL', all5_complete)]:
    pro_before = sum(1 for r in data if r['all5_to_pur'] is not None and r['all5_to_pur'] >= 0)
    pro_after = sum(1 for r in data if r['all5_to_pur'] is not None and r['all5_to_pur'] < 0)
    pro_same = sum(1 for r in data if r['all5_to_pur'] == 0)
    print(f"\n[{name}]:")
    print(f"  序章全部完成在购课之前/当天: {pro_before} ({pro_before/len(data)*100:.1f}%)")
    print(f"    其中当天: {pro_same} ({pro_same/len(data)*100:.1f}%)")
    print(f"  序章全部完成在购课之后: {pro_after} ({pro_after/len(data)*100:.1f}%)")
 # ========== 导出CSV ==========
 output_path = '/root/.openclaw/workspace/output/endor_purchase_analysis_v2.csv'
 with open(output_path, 'w', newline='', encoding='utf-8-sig') as f:
    writer = csv.writer(f)
    writer.writerow([
        'order_id', 'account_id', '等级', '注册日期', '购课日期',
        '注册→购课(天)',
        '序章完成节数(总计)', '购课前进度(节)', '购课后完成(节)',
        '购课时已完成', '五节全完成', '五节全完成日期',
        '注册→全完成(天)', '全完成→购课(天)'
    ])
    for r in sorted(records, key=lambda x: x['purchase_date']):
        writer.writerow([
            r['order_id'], r['account_id'], r['level'],
            r['register_date'].isoformat() if r['register_date'] else '',
            r['purchase_date'].isoformat() if r['purchase_date'] else '',
            r['reg_to_pur'],
            r['total_lessons_done'], r['progress_at_purchase'],
            r['lessons_done_after_purchase'],
            r['lessons_at_purchase'],
            '是' if r['all_five_done'] else '否',
            r['all_five_done_date'].isoformat() if r['all_five_done_date'] else '',
            r['reg_to_all5'] if r['reg_to_all5'] is not None else '',
            r['all5_to_pur'] if r['all5_to_pur'] is not None else '',
        ])
 print(f"\n{'='*80}")
 print(f"详细数据已导出: {output_path}")
 print(f"{'='*80}")
 cur.close()
 conn.close()
 print("\n分析完成!")
--- a/scripts/endor_purchase_analysis_v3.py
+++ b/scripts/endor_purchase_analysis_v3.py
@ -0,0 +1,370 @@
 #!/usr/bin/env python3
 """
 端内购买用户行为分析 V3 - 详细交叉统计
 """
 import psycopg2
 from datetime import datetime, date
 from collections import defaultdict, Counter
 import csv
 import os
 PG_PASS = os.environ.get('PG_ONLINE_PASSWORD', "LdfjdjL83h3h3^$&**YGG*")
 conn = psycopg2.connect(
    host="bj-postgres-16pob4sg.sql.tencentcdb.com", port=28591,
    user="ai_member", password=PG_PASS, database="vala_bi"
 )
 cur = conn.cursor()
 L1_U0_LESSONS = {343: 'L01', 344: 'L02', 345: 'L03', 346: 'L04', 348: 'L05'}
 L2_U0_LESSONS = {55: 'L01', 56: 'L02', 57: 'L03', 58: 'L04', 59: 'L05'}
 LEVEL_MAP = {57:'L1',60:'L1',63:'L1', 31:'L2',32:'L2',33:'L2',54:'L2'}
 print("=" * 80)
 print("端内购买用户行为分析 V3 — 详细交叉统计")
 print("=" * 80)
 # ============ 查询数据 ============
 print("\n[数据加载]")
 cur.execute("""
    SELECT o.id, o.account_id, o.goods_id, o.pay_success_date
    FROM bi_vala_order o
    JOIN bi_vala_app_account a ON o.account_id = a.id AND a.status = 1
    WHERE o.key_from = 'app-active-h5-0-0' AND o.order_status IN (3, 4) AND o.pay_success_date IS NOT NULL
    ORDER BY o.pay_success_date
 """)
 orders = []
 for row in cur.fetchall():
    lv = LEVEL_MAP.get(row[2])
    if not lv: continue
    pd_ = row[3]; orders.append({'order_id':row[0],'account_id':row[1],'goods_id':row[2],'purchase_date':pd_.date() if isinstance(pd_,datetime) else pd_,'level':lv})
 print(f"  订单: {len(orders)}")
 all_accounts = list(set(o['account_id'] for o in orders))
 register_map = {}
 for i in range(0,len(all_accounts),500):
    cur.execute("SELECT id,created_at FROM bi_vala_app_account WHERE id IN %s",(tuple(all_accounts[i:i+500]),))
    for r in cur.fetchall():
        d=r[1];register_map[r[0]]=d.date() if isinstance(d,(datetime,date)) else d
 print(f"  注册时间: {len(register_map)}")
 char_map = defaultdict(list)
 for i in range(0,len(all_accounts),500):
    cur.execute("SELECT account_id,id FROM bi_vala_app_character WHERE account_id IN %s",(tuple(all_accounts[i:i+500]),))
    for r in cur.fetchall(): char_map[r[0]].append(r[1])
 print(f"  角色映射: {sum(1 for k,v in char_map.items() if v)} 用户")
 all_user_ids = set()
 for acct in all_accounts:
    for uid in char_map.get(acct,[]): all_user_ids.add(uid)
 def query_lesson_completion(user_ids, chapter_ids_dict):
    result = defaultdict(dict)
    if not user_ids: return result
    cids = list(chapter_ids_dict.keys())
    union = []
    for t in range(8):
        union.append(f"""
            SELECT user_id, chapter_id, MIN(created_at) as fd
            FROM bi_user_chapter_play_record_{t}
            WHERE chapter_id IN ({','.join(map(str,cids))})
              AND play_status=1 AND user_id IN %(uids)s
            GROUP BY user_id, chapter_id
        """)
    sql = f"SELECT user_id, chapter_id, MIN(fd) FROM ({' UNION ALL '.join(union)}) sub GROUP BY user_id, chapter_id"
    for i in range(0,len(user_ids),2000):
        cur.execute(sql,{'uids':tuple(user_ids[i:i+2000])})
        for r in cur.fetchall(): result[r[0]][r[1]]=r[2].date() if isinstance(r[2],(datetime,date)) else r[2]
    return result
 l1_data = query_lesson_completion(list(all_user_ids), L1_U0_LESSONS)
 l2_data = query_lesson_completion(list(all_user_ids), L2_U0_LESSONS)
 print(f"  L1序章完成: {len(l1_data)} user_ids, L2序章完成: {len(l2_data)} user_ids")
 # ============ 逐订单分析 ============
 records = []
 for o in orders:
    lesson_map = L1_U0_LESSONS if o['level']=='L1' else L2_U0_LESSONS
    lesson_data = l1_data if o['level']=='L1' else l2_data
    reg_d = register_map.get(o['account_id'])
    if not reg_d: continue
    pur_d = o['purchase_date']
    # 汇总所有角色的序章完成情况
    merged = {}
    for uid in char_map.get(o['account_id'],[]):
        if uid not in lesson_data: continue
        for cid, dt in lesson_data[uid].items():
            if cid not in merged or dt < merged[cid]: merged[cid]=dt
    # 每节课的完成日期和距注册天数
    lesson_dates = {}
    for cid, dt in merged.items():
        lesson_dates[lesson_map[cid]] = {'date': dt, 'days_from_reg': (dt - reg_d).days}
    # 购课前完成哪些
    done_before = {ln: info for ln, info in lesson_dates.items() if info['date'] <= pur_d}
    done_after = {ln: info for ln, info in lesson_dates.items() if info['date'] > pur_d}
    all5_done = len(lesson_dates) == 5
    all5_date = max(info['date'] for info in lesson_dates.values()) if all5_done else None
    all5_to_pur = (pur_d - all5_date).days if all5_date else None
    reg_to_all5 = (all5_date - reg_d).days if all5_date else None
    # 购课时进度：完成了几节，已完成哪些节
    progress = len(done_before)
    lessons_done = sorted(done_before.keys(), key=lambda x: int(x[1:]))
    # 购课落在哪两节之间
    # e.g. 完成了 L01 L02，则购课时间在 L02完成后 → "L02之后"
    if progress == 0:
        purchase_position = "L01之前"
    elif progress == 5:
        purchase_position = "L05之后"
    else:
        purchase_position = f"{lessons_done[-1]}之后(L{int(lessons_done[-1][1:])+1:02d}之前)"
    records.append({
        'order_id': o['order_id'], 'account_id': o['account_id'],
        'level': o['level'], 'register_date': reg_d, 'purchase_date': pur_d,
        'reg_to_pur': (pur_d - reg_d).days,
        'progress': progress, 'lessons_done': lessons_done,
        'done_before': done_before, 'done_after': done_after,
        'all5_done': all5_done, 'all5_date': all5_date,
        'reg_to_all5': reg_to_all5, 'all5_to_pur': all5_to_pur,
        'purchase_position': purchase_position,
        'lesson_dates': lesson_dates,
    })
 print(f"  分析记录: {len(records)}")
 l1_recs = [r for r in records if r['level']=='L1']
 l2_recs = [r for r in records if r['level']=='L2']
 all5_recs = [r for r in records if r['all5_done']]
 # ====================================================================
 # 交叉统计1: 购课时完成节数 × 注册→购课天数
 # ====================================================================
 print(f"\n{'='*80}")
 print("【交叉1】购课时完成节数 × 注册→购课天数")
 print(f"{'='*80}")
 for level, data in [('L1',l1_recs),('L2',l2_recs),('ALL',records)]:
    print(f"\n[{level}]")
    # 构建交叉表
    day_ranges = [(0,1),(2,3),(4,7),(8,14),(15,30),(31,60),(61,999)]
    print(f"  {'购课进度':<20}", end='')
    for lo, hi in day_ranges:
        label = f"{lo}天" if lo==hi else f"{lo}-{hi}天"
        print(f"{label:>10}", end='')
    print(f"  {'合计':>8}")
    for prog in range(6):
        subset = [r for r in data if r['progress']==prog]
        label = f"  完成{prog}节" if prog>0 else f"  完成0节"
        print(f"{label:<20}", end='')
        total = 0
        for lo, hi in day_ranges:
            cnt = sum(1 for r in subset if lo <= r['reg_to_pur'] <= hi)
            print(f"{cnt:>10}", end='')
            total += cnt
        print(f"  {total:>8}")
 # ====================================================================
 # 交叉统计2: 购课时完成节数 × 完课→购课天数（仅五节全完成用户）
 # ====================================================================
 print(f"\n{'='*80}")
 print("【交叉2】购课时完成节数 × 全完成→购课天数（仅五节全完成用户）")
 print(f"{'='*80}")
 all5_day_ranges = [(-999,-1),(0,0),(1,3),(4,7),(8,14),(15,30),(31,60),(61,999)]
 for level, data in [('L1',[r for r in l1_recs if r['all5_done']]),
                     ('L2',[r for r in l2_recs if r['all5_done']]),
                     ('ALL',all5_recs)]:
    print(f"\n[{level}]")
    print(f"  {'购课进度':<20}", end='')
    for lo, hi in all5_day_ranges:
        if lo==-999: label="先购课后完课"
        elif lo==hi: label=f"{lo}天"
        else: label=f"{lo}-{hi}天"
        print(f"{label:>14}", end='')
    print(f"  {'合计':>8}")
    for prog in range(6):
        subset = [r for r in data if r['progress']==prog]
        label = f"  完成{prog}节" if prog>0 else f"  完成0节"
        print(f"{label:<20}", end='')
        total = 0
        for lo, hi in all5_day_ranges:
            cnt = sum(1 for r in subset if r['all5_to_pur'] is not None and lo <= r['all5_to_pur'] <= hi)
            print(f"{cnt:>14}", end='')
            total += cnt
        print(f"  {total:>8}")
 # ====================================================================
 # 交叉统计3: 购课落在序章各节之间的分布
 # ====================================================================
 print(f"\n{'='*80}")
 print("【交叉3】购课时机：落在序章第几节之后")
 print(f"{'='*80}")
 for level, data in [('L1',l1_recs),('L2',l2_recs),('ALL',records)]:
    dist = Counter(r['purchase_position'] for r in data)
    print(f"\n[{level}]")
    positions = ['L01之前','L01之后(L02之前)','L02之后(L03之前)','L03之后(L04之前)','L04之后(L05之前)','L05之后']
    for pos in positions:
        cnt = dist.get(pos, 0)
        pct = cnt/len(data)*100
        bar = '█' * int(pct)
        print(f"  {pos:24s}: {cnt:4d} ({pct:5.1f}%) {bar}")
 # ====================================================================
 # 交叉统计4: 逐节课完成时间分布
 # ====================================================================
 print(f"\n{'='*80}")
 print("【交叉4】序章每节课：从注册到完成的天数分布")
 print(f"{'='*80}")
 lesson_days_ranges = [(0,0),(1,1),(2,3),(4,7),(8,14),(15,30),(31,999)]
 lesson_order = ['L01','L02','L03','L04','L05']
 for level, data in [('L1',l1_recs),('L2',l2_recs),('ALL',records)]:
    print(f"\n[{level}] ({len(data)} 条记录)")
    print(f"  {'课时':<8}", end='')
    for lo, hi in lesson_days_ranges:
        label = f"{lo}天" if lo==hi else f"{lo}-{hi}天"
        print(f"{label:>10}", end='')
    print(f"  {'完成人数':>10}  {'中位数':>8}  {'均值':>8}")
    for ln in lesson_order:
        vals = []
        for r in data:
            if ln in r['lesson_dates']:
                vals.append(r['lesson_dates'][ln]['days_from_reg'])
        if not vals:
            print(f"  {ln:<8} {'(无数据)':>60}")
            continue
        sv = sorted(vals)
        med = sv[len(sv)//2]
        avg = sum(vals)/len(vals)
        print(f"  {ln:<8}", end='')
        for lo, hi in lesson_days_ranges:
            cnt = sum(1 for v in vals if lo <= v <= hi)
            pct = cnt / len(vals) * 100
            print(f"{cnt:>5}({pct:4.1f}%)", end='')
        print(f"  {len(vals):>10}  {med:>8}天  {avg:>8.1f}天")
 # ====================================================================
 # 交叉统计5: 购课时完成节数 × L1/L2 × 是否有后续完成
 # ====================================================================
 print(f"\n{'='*80}")
 print("【交叉5】购课时未全部完成的用户：购课后完成余下序章的情况")
 print(f"{'='*80}")
 not_all5_at_purchase = [r for r in records if r['progress'] < 5]
 for level, data in [('L1',[r for r in l1_recs if r['progress']<5]),
                     ('L2',[r for r in l2_recs if r['progress']<5]),
                     ('ALL',not_all5_at_purchase)]:
    if not data: continue
    print(f"\n[{level}] 购课时未全部完成: {len(data)} 人")
    for prog in range(5):
        subset = [r for r in data if r['progress']==prog]
        if not subset: continue
        after_done_dist = Counter(r['all5_done'] for r in subset)
        all5_cnt = after_done_dist.get(True, 0)
        not5_cnt = after_done_dist.get(False, 0)
        print(f"  购课时完成{prog}节: {len(subset)}人 → 最终完成为5节: {all5_cnt}人, 仍未完成: {not5_cnt}人")
 # ====================================================================
 # 交叉统计6: 先购课后完成序章的用户详细分析
 # ====================================================================
 print(f"\n{'='*80}")
 print("【交叉6】先购课后才完成序章的用户（16人）详细分析")
 print(f"{'='*80}")
 buy_before_finish = [r for r in all5_recs if r['all5_to_pur'] is not None and r['all5_to_pur'] < 0]
 print(f"\n共 {len(buy_before_finish)} 人，购课后才完成序章全部五节\n")
 print(f"  {'等级':<6} {'购课时进度':<14} {'购课日期':<14} {'全完成日期':<14} {'购课→完课':>10} {'注册→购课':>10} {'注册→全完':>10}")
 for r in sorted(buy_before_finish, key=lambda x: x['all5_to_pur']):
    print(f"  {r['level']:<6} {'完成'+str(r['progress'])+'节':<14} {str(r['purchase_date']):<14} {str(r['all5_date']):<14} {abs(r['all5_to_pur']):>10}天 {r['reg_to_pur']:>10}天 {r['reg_to_all5']:>10}天")
 # ====================================================================
 # 交叉统计7: 逐月趋势 — 购课时完成节数分布
 # ====================================================================
 print(f"\n{'='*80}")
 print("【交叉7】逐月趋势：购课时完成 5/5 节的比例")
 print(f"{'='*80}")
 monthly = defaultdict(lambda: {'total':0,'all5':0,'partial':0})
 for r in records:
    m = r['purchase_date'].strftime('%Y-%m')
    monthly[m]['total'] += 1
    if r['progress'] == 5:
        monthly[m]['all5'] += 1
    else:
        monthly[m]['partial'] += 1
 for m in sorted(monthly.keys()):
    d = monthly[m]
    pct = d['all5']/d['total']*100
    print(f"  {m}: 总{d['total']:>4}单 | 购课时已完成5节={d['all5']:>3}人({pct:5.1f}%) | 未完成={d['partial']:>2}人")
 # ====================================================================
 # 交叉统计8: 注册→购课 按购课时进度分层
 # ====================================================================
 print(f"\n{'='*80}")
 print("【交叉8】按购课时进度分层：各组注册→购课 天数统计")
 print(f"{'='*80}")
 def pct(vals, p):
    if not vals: return '-'
    s = sorted(vals)
    return s[int(len(s)*p/100)]
 for level, data in [('L1',l1_recs),('L2',l2_recs),('ALL',records)]:
    print(f"\n[{level}]")
    print(f"  {'购课时进度':<16} {'人数':>6} {'P10':>6} {'P25':>6} {'中位':>6} {'P75':>6} {'P90':>6} {'均值':>8}")
    for prog in range(6):
        subset = [r for r in data if r['progress']==prog]
        if not subset: continue
        vals = [r['reg_to_pur'] for r in subset]
        print(f"  {'完成'+str(prog)+'节':<16} {len(subset):>6} "
              f"{pct(vals,10):>6} {pct(vals,25):>6} {pct(vals,50):>6} "
              f"{pct(vals,75):>6} {pct(vals,90):>6} {sum(vals)/len(vals):>8.1f}")
 # ====================================================================
 # 导出CSV（带序章每节课完成明细）
 # ====================================================================
 output_path = '/root/.openclaw/workspace/output/endor_purchase_analysis_v3.csv'
 with open(output_path, 'w', newline='', encoding='utf-8-sig') as f:
    writer = csv.writer(f)
    writer.writerow([
        'order_id','account_id','等级','注册日期','购课日期','注册→购课(天)',
        '购课时序章进度(节)','购课时已完成哪些课',
        'L01完成日期','L01注册后几天','L02完成日期','L02注册后几天',
        'L03完成日期','L03注册后几天','L04完成日期','L04注册后几天',
        'L05完成日期','L05注册后几天',
        '五节全完成','五节全完成日期','注册→全完成(天)','全完成→购课(天)',
        '购课时机'
    ])
    for r in sorted(records, key=lambda x: x['purchase_date']):
        row = [r['order_id'],r['account_id'],r['level'],
               r['register_date'].isoformat(),r['purchase_date'].isoformat(),r['reg_to_pur'],
               r['progress'],','.join(r['lessons_done'])]
        for ln in lesson_order:
            info = r['lesson_dates'].get(ln, {})
            row.append(info.get('date','').isoformat() if info.get('date') else '')
            row.append(info.get('days_from_reg','') if info else '')
        row.append('是' if r['all5_done'] else '否')
        row.append(r['all5_date'].isoformat() if r['all5_date'] else '')
        row.append(r['reg_to_all5'] if r['reg_to_all5'] is not None else '')
        row.append(r['all5_to_pur'] if r['all5_to_pur'] is not None else '')
        row.append(r['purchase_position'])
        writer.writerow(row)
 print(f"\n{'='*80}")
 print(f"已导出: {output_path}")
 print(f"{'='*80}")
 cur.close()
 conn.close()
 print("分析完成!")