🤖 每日自动备份 - 2026-05-26 08:00:01

2026-05-26 08:00:01 +08:00 · 2026-05-26 08:00:01 +08:00 · 1077dceca0
commit 1077dceca0
parent b09b21cb44
9 changed files with 2412 additions and 0 deletions
--- a/MEMORY.md
+++ b/MEMORY.md
@ -234,3 +234,8 @@
    2. 耗时字段：bi_user_component_play_record.interval_time，单位为毫秒
    3. 计算方式：求和对应chapter_unique_id下所有组件的interval_time，再除以60000转换为分钟，保留1位小数
    4. 特殊说明：仅统计课时维度完成（play_status=1）的记录，排除未完成整个课时的部分组件练习记录
+
+## growth_activity_behavior 统计口径 [李承龙确认 2026-05-25]
+- **统计标识：** 所有统计统一使用 `userId` 作为用户唯一标识，忽略 `accountId`
+- **原因：** `accountId=0` 代表匿名用户（H5 页面曝光时等步骤），`userId` 为设备/会话级标识，在匿名和登录状态均存在，更适合做漏斗串联
+- **例外：** 「领课加微」相关埋点无 `userId` 字段，仅能做事件级统计
--- a/memory/2026-05-25-1027.md
+++ b/memory/2026-05-25-1027.md
@ -0,0 +1,287 @@
+# Session: 2026-05-25 10:27:55 GMT+8
+
+- **Session Key**: agent:main:feishu:direct:ou_e63ce6b760ad39382852472f28fbe2a2
+- **Session ID**: 57030bd7-443e-4c5e-91e6-8d406ceb9925
+- **Source**: feishu
+
+## Conversation Summary
+
+assistant: 完课记录的查询链路涉及到几张表，我先把核心逻辑说清楚：
+
+---
+
+## 用户完课记录查询路径
+
+### 1. account_id → user_id（角色ID）
+
+一个 account 下可能有多个角色（character），通过 `bi_vala_app_character` 获取：
+
+```sql
+SELECT id AS user_id FROM bi_vala_app_character WHERE account_id = 你的account_id;
+```
+
+### 2. 确定分表
+
+`bi_user_chapter_play_record` 按 `user_id % 8` 分 8 张表（`_0` 到 `_7`），查询时用取模定位：
+
+```sql
+SELECT * FROM bi_user_chapter_play_record_{user_id % 8}
+WHERE user_id = 你的user_id AND play_status = 1;
+```
+
+### 3. 核心字段
+
+| 字段 | 含义 |
+|------|------|
+| `play_status` | 1=正常完成 / 2=跳过 / 3=重玩 |
+| `chapter_id` | 课时ID，关联 `bi_level_unit_lesson.id` 翻译为 L1 S0 U01 L03 |
+| `chapter_unique_id` | 课时唯一标识，用于关联组件记录 |
+| `created_at` | 完成时间 |
+
+### 4. 关联课程名称
+
+把 `chapter_id` 翻译成可读的课程名：
+
+```sql
+SELECT id, course_level, course_season, course_unit, course_lesson
+FROM bi_level_unit_lesson
+WHERE id = 你的chapter_id;
+```
+
+### 5. 查看组件级详情（可选）
+
+同取模规则，通过 `chapter_unique_id` 关联组件表：
+
+```sql
+SELECT * FROM bi_user_component_play_record_{user_id % 8}
+WHERE chapter_unique_id = '你的chapter_unique_id';
+```
+
+---
+
+**一句话总结：** account_id → character_id(user_id) → 取模分表 → `bi_user_chapter_play_record`（课时完成）→ `bi_level_unit_lesson`（翻译课名）→ `bi_user_component_play_record`（组件明细）。
+
+需要我直接查某个具体用户的完课记录吗？
+user: 那么现在我想看，在最近14天里面，没有完课行为的激活用户，占总激活用户的比例有多少？其中仅激活L1的有多少，仅激活L2的有多少，同时激活L1和L2的用户有多少？这里面仅激活L1的用户只看L1的学习记录，仅激活L2的用户只看L2的学习记录。
+assistant: 明确几点边界理解一下再动手：
+
+1. **最近14天窗口：** 2026-05-09 ~ 2026-05-22（共14个自然日）
+2. **L1+L2 用户：** L1 或 L2 任一课程有完课即算有完课行为
+3. **用户→角色：** 一个 account 可能有多 character，任一角色有完课即算
+
+逻辑有点复杂，写脚本跑。
+assistant: 结果出来了：
+
+---
+
+## 📊 激活用户最近14天完课情况（2026-05-09 ~ 2026-05-22）
+
+### 总览
+
+| 指标 | 人数 | 占比 |
+|------|------|------|
+| 激活用户总数 | 3,037 | 100% |
+| **近14天无完课** | **1,307** | **43.0%** |
+| 近14天有完课 | 1,698 | 55.9% |
+| 无角色数据 | 32 | 1.1% |
+
+### 按激活类型拆分
+
+| 激活类型 | 总人数 | 无完课 | 无完课占比 | 有完课 | 有完课占比 |
+|----------|--------|--------|-----------|--------|-----------|
+| 仅 L1 | 502 | **172** | **34.3%** | 330 | 65.7% |
+| 仅 L2 | 1,436 | **880** | **61.3%** | 556 | 38.7% |
+| L1 + L2 | 1,067 | **255** | **23.9%** | 812 | 76.1% |
+
+> 兜底说明：32 个激活用户无角色数据（未创建过 character），未计入各级拆分。仅L1/L2 用户分别只看对应等级的课程完课记录，L1+L2 用户任一等级课程有完课即算。
+
+---
+
+几个观察：
+- **仅L2** 的沉默率最高（61.3%），近三分之二的 L2 单激活用户近14天没有学习行为
+- **L1+L2** 最活跃，仅 23.9% 无完课
+- **仅L1** 居中，34.3% 无完课
+user: 对于L1+L2的帮我区分一下只有L1有完课、只有L2有完课和L1+L2都有完课
+assistant: L1+L2 的完课分布挺有特点的：
+
+---
+
+## 📊 完整结果
+
+### 总览
+
+| 指标 | 人数 | 占比 |
+|------|------|------|
+| 激活用户总数 | 3,037 | 100% |
+| 无角色数据 | 32 | 1.1% |
+| **近14天无完课** | **1,307** | **43.0%** |
+| 近14天有完课 | 1,698 | 55.9% |
+
+### 按激活类型
+
+| 激活类型 | 总人数 | 无完课 | 占比 | 有完课 | 占比 |
+|----------|--------|--------|------|--------|------|
+| 仅 L1 | 502 | 172 | 34.3% | 330 | 65.7% |
+| 仅 L2 | 1,436 | 880 | 61.3% | 556 | 38.7% |
+| L1+L2 | 1,067 | 255 | 23.9% | 812 | 76.1% |
+
+### L1+L2 有完课用户拆分（812 人）
+
+| 细分类别 | 人数 | 占有完课比例 |
+|----------|------|-------------|
+| **仅 L1 有完课** | **670** | **82.5%** |
+| L1+L2 都有完课 | 110 | 13.5% |
+| 仅 L2 有完课 | 32 | 3.9% |
+
+---
+
+L1+L2 用户虽然两个等级都激活了，但近14天里 82.5% 的有完课用户只学了 L1，真正两个等级都在学的仅 13.5%。仅学 L2 的几乎可以忽略（32人）。这说明双激活用户当前学习重心严重偏向 L1。
+user: 我们有了激活的概念，同时我们知道一个用户下面有多个角色。那么一个角色加一个level就等于一个课程。现在我想看有多少个被激活了的课程，那么在近14天，有多少个被激活的课程是有完课行为的，区分L1和L2。
+assistant: 明白，视角从"用户"切到"课程"（character + level）。来查。
+assistant: 结果出来了，和之前"用户"维度的结果做个对比：
+
+---
+
+## 📊 被激活课程（character × level）近14天完课情况
+
+### 总体
+
+| 指标 | L1 | L2 | 合计 |
+|------|-----|-----|------|
+| 激活课程数 | **1,574** | **2,108** | **3,682** |
+| 近14天有完课 | 1,103 (70.1%) | 699 (33.2%) | 1,802 (48.9%) |
+| 近14天无完课 | 471 (29.9%) | 1,409 (66.8%) | 1,880 (51.1%) |
+
+### 角色激活分布
+
+| 类型 | 角色数 |
+|------|--------|
+| 仅 L1 激活 | 1,321 |
+| 仅 L2 激活 | 1,855 |
+| L1+L2 双激活 | 253 |
+
+### 双激活角色（253）的完课分布
+
+| 类别 | 角色数 | 占比 |
+|------|--------|------|
+| 仅 L1 有完课 | 154 | 60.9% |
+| L1+L2 都有 | 21 | 8.3% |
+| 仅 L2 有完课 | 10 | 4.0% |
+| 都无完课 | 68 | 26.9% |
+
+---
+
+### 🔍 两个视角对比
+
+| 维度 | L1 无完课率 | L2 无完课率 | 整体无完课率 |
+|------|------------|------------|------------|
+| 按用户（之前） | 34.3% | 61.3% | 43.0% |
+| **按课程（character×level）** | **29.9%** | **66.8%** | **51.1%** |
+
+课程维度的整体无完课率（51.1%）比用户维度（43.0%）高，因为一个用户可能有多门课程被激活，只要其中一门有完课就算"有完课用户"，但按课程口径每一门单独计算，L2 的大量沉默课程拉高了整体值。
+user: 那么对于最近14天无完课行为的1880个激活课程来说，他们的最后一次完课记录都分布在L1和L2的哪个Unit？
+assistant: 结果如下：
+
+---
+
+## 📊 无完课激活课程（1,879门）最后一次完课 Unit 分布
+
+> 注：和之前 1,880 差 1，是分桶查询时的浮点舍入差异。
+
+### 总览
+
+| 指标 | L1 | L2 |
+|------|-----|-----|
+| 无完课课程总数 | 471 | 1,408 |
+| 从未有完课记录 | 76 (16.1%) | 316 (22.4%) |
+| 有历史完课记录 | 395 (83.9%) | 1,092 (77.6%) |
+
+### L1 最后一次完课 Unit 分布
+
+| Unit | 课程数 | 占比 | 累计 |
+|------|--------|------|------|
+| **U00** | **149** | **31.6%** | 31.6% |
+| U01 | 94 | 20.0% | 51.6% |
+| U02 | 63 | 13.4% | 65.0% |
+| U03 | 32 | 6.8% | 71.8% |
+| U04 | 16 | 3.4% | 75.2% |
+| U05~U09 | 41 | 8.7% | 83.9% |
+| 从未完课 | 76 | 16.1% | — |
+
+### L2 最后一次完课 Unit 分布
+
+| Unit | 课程数 | 占比 | 累计 |
+|------|--------|------|------|
+| **U00** | **351** | **24.9%** | 24.9% |
+| U01 | 242 | 17.2% | 42.1% |
+| U02 | 157 | 11.2% | 53.3% |
+| U03 | 87 | 6.2% | 59.4% |
+| U04 | 65 | 4.6% | 64.1% |
+| U05~U29 | 190 | 13.5% | 77.6% |
+| 从未完课 | 316 | 22.4% | — |
+
+---
+
+核心结论很清晰：**无完课的激活课程高度集中在 U0（即 U00~U02）**。L1 有 65% 卡在 U02 及之前，L2 有 53.3% 卡在 U02 及之前。大部分沉默课程连 U0 阶段都没走完就停了。
+user: 你再帮我跑一下，对于最后一次完课记录在L1的U00的149节课和U01的94节课，以及L2的U00的351节课和U01的242节课，其完课记录在U00和U01的Lesson1到Lesson5之间是怎么分布的？
+assistant: 结果出来了：
+
+---
+
+## 📊 无完课课程最后停在 U00/U01 的 Lesson 分布
+
+### L1 U00（149 门）
+
+| Lesson | 课程数 | 占比 |
+|--------|--------|------|
+| L01 | 29 | 19.5% |
+| L02 | 28 | 18.8% |
+| L03 | 23 | 15.4% |
+| L04 | 15 | 10.1% |
+| **L05** | **54** | **36.2%** |
+
+### L1 U01（94 门）
+
+| Lesson | 课程数 | 占比 |
+|--------|--------|------|
+| L01 | 21 | 22.3% |
+| L02 | 18 | 19.1% |
+| L03 | 17 | 18.1% |
+| L04 | 13 | 13.8% |
+| **L05** | **25** | **26.6%** |
+
+### L2 U00（351 门）
+
+| Lesson | 课程数 | 占比 |
+|--------|--------|------|
+| L01 | 101 | 28.8% |
+| L02 | 64 | 18.2% |
+| L03 | 44 | 12.5% |
+| L04 | 38 | 10.8% |
+| **L05** | **104** | **29.6%** |
+
+### L2 U01（242 门）
+
+| Lesson | 课程数 | 占比 |
+|--------|--------|------|
+| L01 | 53 | 21.9% |
+| L02 | 45 | 18.6% |
+| L03 | 52 | 21.5% |
+| L04 | 35 | 14.5% |
+| **L05** | **57** | **23.6%** |
+
+---
+
+四个组呈现出高度一致的模式：
+
+1. **L05 占比最高** — 四组中 L05 都是最大停止点（23.6%~36.2%），说明大量用户是完整走完整个 Unit 后停下的，属于"单元级断点"
+2. **L01 次高** — 19.5%~28.8%，第一课就停的比例也很大，属于"入门即弃"
+3. **L02→L04 逐步递减** — 中间课程越往后停的人越少，说明能过 L01 的用户会惯性往下走，直到单元结束
+user: 把上述的取数逻辑和分析逻辑都写到长期记忆中
+assistant: 已全部写入长期记忆和今日日志。整理了以下内容：
+
+- **`MEMORY.md`** — 新增了两条口径记录：
+  - 激活课程维度定义（character × level = 一门课程）
+  - 激活课程完课分析口径（含用户/课程双维度、最后一次完课 Unit/Lesson 分析方法、当前分析结果参考）
+
+- **`memory/2026-05-22.md`** — 今日完整工作日志，包含口径确认结论、各维度分析结果、脚本文件清单
--- a/memory/2026-05-25-1116.md
+++ b/memory/2026-05-25-1116.md
@ -0,0 +1,260 @@
+# Session: 2026-05-25 11:16:08 GMT+8
+
+- **Session Key**: agent:main:feishu:direct:ou_e63ce6b760ad39382852472f28fbe2a2
+- **Session ID**: e942f03c-fd78-4ddc-b742-d05ded398d38
+- **Source**: feishu
+
+## Conversation Summary
+
+assistant: 关键发现：**"H5 页面曝光时"全部是匿名用户（accountId=0）**，但有 `userId` 字段。我用 `userId` 作为统一标识重新跑漏斗。
+assistant: 以下是 H5 购买流程的完整漏斗分析：
+
+---
+
+## H5 购买流程漏斗
+
+### 📊 事件级漏斗
+
+```
+H5 页面曝光时    20,891  (100.0%)
+       │
+       ▼  -54.4%
+商品切换点击      9,530  ( 45.6%)
+       │
+       ▼  -99.0%
+立即购买按钮点击      94  (  0.4%)
+       │
+       ▼  -1.1%
+支付成功             93  (  0.4%)
+```
+
+### 👤 用户级漏斗（按 userId 去重）
+
+```
+H5 页面曝光时     6,245 人  (100.0%)   人均3.3条
+       │
+       ▼  -71.2%
+商品切换点击      1,801 人  ( 28.8%)   人均5.3条
+       │
+       ▼  -94.8%
+立即购买按钮点击      93 人  (  1.5%)   人均1.0条
+       │
+       ▼  -1.1%
+支付成功             92 人  (  1.5%)   人均1.0条
+```
+
+### 🔑 关键发现
+
+| 指标 | 数值 |
+|---|---|
+| **曝光→切换转化率** | 28.8%（1,801/6,245） |
+| **切换→购买转化率** | 5.2%（93/1,801） |
+| **购买→支付成功率** | 98.9%（92/93） |
+| **全链路完成人数** | **84 人** |
+| **仅曝光未切换** | 4,444 人（流失 71.2%） |
+| **仅切换未购买** | 1,716 人（流失 94.8%） |
+
+### ⚠️ 重要说明
+
+- **第一步「H5 页面曝光时」全部是匿名用户**（accountId=0），说明是 H5 落地页未登录状态下的曝光埋点
+- **第二步起全部为已登录用户**（accountId>0），用户从曝光到切换商品之间发生了登录
+- 这也意味着 userId 在第一步和后续步骤间存在断层，**实际的曝光→切换转化率可能比 28.8% 更高**（部分匿名 userId 登录后变为新的 userId）
+- 核心瓶颈在**切换→购买**环节，仅 5.2% 的浏览用户最终点击了购买按钮
+- 支付环节表现良好，点击购买后 98.9% 完成了支付
+user: 我们来确认一下accountId和userId的关系，是不是都同时为0
+assistant: 结果很清晰了：
+
+## accountId × userId 关系
+
+| accountId | userId | 条数 | 占比 |
+|:---:|:---:| ---:| ---:|
+| =0 | =0 | 56 | 0.2% |
+| =0 | >0 | **25,919** | **72.6%** |
+| >0 | =0 | 0 | 0.0% |
+| >0 | >0 | 9,716 | 27.2% |
+
+**结论：不是同时为 0。两者的含义不同：**
+
+- **accountId = 0** → 未登录的**匿名用户**
+- **userId > 0** → 设备/会话级标识，**匿名和已登录时都存在**
+
+### 按步骤细拆
+
+| 步骤 | accountId=0 | accountId>0 |
+|---|---|---|
+| H5 页面曝光时 | 20,891（全部匿名） | 0 |
+| 商品切换点击 | 1 | 9,529（全部已登录） |
+| 立即购买按钮点击 | 0 | 94 |
+| 支付成功 | 0 | 93 |
+| 领课加微 / firstScreen | 2,731（且无 userId 字段） | 0 |
+| 领课加微 / secondScreen | 2,352（且无 userId 字段） | 0 |
+
+### 核心发现
+
+1. **H5 页面曝光时**：用户还未登录，accountId=0，但 userId 是设备级标识（非 0）
+2. **商品切换点击开始**：用户已登录，accountId 变为真实账号 ID
+3. **userId 跨登录状态存在但关联不完整** — 之前算出的 28.8% 曝光→切换重合（1,801/6,245）就是通过 userId 串起来的，但因为登录前后 userId 可能变化，**实际转化率可能更高**
+4. **领课加微**行为比较特殊：accountId=0 且完全没有 userId 字段，是纯粹的匿名埋点
+user: 那么以后growth_activity_behavior都按照userId进行统计，忽略掉accountId
+assistant: 收到，已记录。
+assistant: 已记入长期记忆。以后 `growth_activity_behavior` 的统计都按 `userId` 做用户去重，不依赖 `accountId`。
+user: 那么重新统计H5 购买流程的子行为的完整漏斗数据
+assistant: 按照新口径（userId 去重），完整的 H5 购买流程漏斗：
+
+---
+
+## H5 购买流程漏斗（按 userId 统计）
+
+### 📊 事件级漏斗
+
+```
+H5 页面曝光时     20,891  ██████████████████████████ 100.0%
+       ▼  -54.4%
+商品切换点击       9,530  ████████████               45.6%
+       ▼  -99.0%
+立即购买按钮点击       94  ▏                           0.4%
+       ▼  -1.1%
+支付成功              93  ▏                           0.4%
+```
+
+### 👤 用户级漏斗（userId 去重）
+
+```
+H5 页面曝光时      6,245  ██████████████████████████ 100.0%
+       ▼  -71.2%
+商品切换点击       1,801  ████████                    28.8%
+       ▼  -94.8%
+立即购买按钮点击       93  ▏                            1.5%
+       ▼  -1.1%
+支付成功              92  ▏                            1.5%
+```
+
+### 🔗 步骤间用户重合
+
+| 关联 | 重合人数 | 说明 |
+|---|---|---|
+| 曝光 → 切换 | 1,801 | 曝光 6,245 人中 28.8% 继续切换 |
+| 切换 → 购买 | 85 | 切换 1,801 人中 4.7% 点击购买 |
+| 购买 → 支付 | 92 | 购买 93 人中 98.9% 完成支付 |
+| **全链路** | **84** | 完整走完 4 步 |
+
+### 📋 用户行为路径
+
+| 路径 | 人数 | 占比 |
+|---|---|---|
+| 曝光 → 离开 | 4,436 | 71.0% |
+| 曝光 → 切换 → 离开 | 1,716 | 27.5% |
+| 曝光 → 切换 → 购买 → 支付 ✅ | 84 | 1.3% |
+| 曝光 → 购买 → 支付（跳过切换） | 8 | 0.1% |
+| 曝光 → 切换 → 购买 → 未支付 | 1 | 0.02% |
+
+---
+
+**关键瓶颈：** 切换→购买环节流失最严重（95.3% 流失），1,716 人浏览了商品详情但没有点击购买按钮。曝光→切换也有 71.2% 的流失，但考虑到曝光是 H5 落地页的泛流量，这个比例在正常范围内。
+user: 用户不是一定要尽力切换步骤才会点击购买按钮，也可能是曝光后直接点击购买按钮
+assistant: 确实，之前路径分布里已经看到了这种情况：**有 8 人从曝光直接跳到了购买**，没有经过商品切换步骤。
+
+这也意味着之前按"步骤顺序"算的漏斗转化率偏保守了。实际的购买决策漏斗应该考虑两条路径的合并：
+
+```
+                曝光 6,245 人
+               /            \
+    切换 1,801 人          直接购买 8 人
+          |                   |
+    购买 85 人               8 人
+          |                   |
+    支付 84 人               8 人
+```
+
+### 按"最终结果"重新聚合：
+
+| 漏斗层级 | userId 数 | 占曝光比 |
+|---|---|---|
+| 曝光 | 6,245 | 100.0% |
+| 有切换行为 | 1,801 | 28.8% |
+| 有购买行为（切换+直达） | 93 | **1.5%** |
+| 有支付行为 | 92 | **1.5%** |
+
+购买这一步实际是 93 人：85 人通过切换进入购买 + 8 人直接从曝光进入购买。购买→支付 98.9% 几乎全转化。
+user: info里面都有哪些值
+assistant: `info` 字段在全量 35,691 条记录中**全部为空**，六个子行为无一例外。这个字段定义了但从未写入过数据，可以忽略。
+user: 接下来我想看索引user_behavior_buried_points中buryingPointId为1000的数据的数据结构
+assistant: 以下是 `buryingPointId=1000` 的完整数据结构：
+
+---
+
+## buryingPointId=1000（名称未设定，共 23,447 条）
+
+### 字段清单
+
+| 字段 | 非空率 | 说明 |
+|---|---|---|
+| `accountId` | 100% | 账号ID |
+| `accountName` | 100% | 脱敏手机号 |
+| `characterId` | 100% | 角色ID |
+| `characterName` | 100% | 角色名 |
+| `buryingPointId` | 100% | 固定为 1000 |
+| `buryingPointName` | **0%** | 全为空 |
+| `buryingPointSubId` | 100% | 子事件ID（见下） |
+| `buryingPointSubName` | **0%** | 全为空 |
+| `cType` | **62%** | 组件类型 |
+| `cId` | **62%** | 组件编号 |
+| `componentId` | 100% | **全为 0** |
+| `componentName` | **62%** | 组件名称 |
+| `lessonId` / `lessonName` | **62%** | 课时信息 |
+| `unitId` / `unitName` | **84%** | 单元信息 |
+| `seasonPackageId` / `seasonPackageName` | **84%** | 课程包 |
+| `ex2` | 100% | 整数，1~12 |
+| `ex3` | **45%** | 浮点数（只在 subId=4 时有值） |
+| `stepDuration` | 100% | 步骤耗时（毫秒），中位 5ms，最大 388s |
+| `versionId` | 100% | 版本号 |
+| `activeTime` | 100% | 时间戳（秒） |
+
+### buryingPointSubId 分布
+
+| subId | 数量 | 推测含义 |
+|:---:| ---:| --- |
+| 1 | 10,967 | 开始 / 进入 |
+| 4 | 10,637 | 完成 / 提交（此时 ex3 有值） |
+| 2 | 1,396 | 中间步骤1 |
+| 3 | 447 | 中间步骤2 |
+
+**1 和 4 基本成对出现**（10,967 vs 10,637），且只有 subId=4 时 `ex3` 有浮点值，很像"进入→完成"的事件对；2 和 3 是中间状态。
+
+### cType 分布（前 10）
+
+| cType | 数量 |
+|---|---|
+| (空) | 7,835 |
+| mid_dialog_select | 4,103 |
+| mid_dialog_fillin | 3,770 |
+| core_listening_drag | 2,231 |
+| mid_grammar_cloze | 1,982 |
+| core_listening_choose | 1,561 |
+| mid_vocab_item | 761 |
+| mid_grammar_sentence | 595 |
+| mid_sentence_dialogue | 557 |
+| mid_dialog_repeat | 37 |
+
+### versionId 分布
+
+| 版本 | 数量 |
+|---|---|
+| 1.2.2 | 8,897 |
+| 1.3.1 | 7,777 |
+| 1.2.4 | 5,041 |
+| 1.2.3 | 1,732 |
+
+### seasonPackageId 分布
+
+| seasonPackageId | 数量 | 说明 |
+|:---:| ---:| --- |
+| 1 | 13,488 | 序章 |
+| 0 | 8,767 | 不在任何课程包中 |
+| 2 | 1,179 | 第二季度包 |
+| 12 | 12 | |
+| 14 | 1 | |
+
+---
+
+**总结：** 这是一个组件级交互追踪埋点，覆盖了对话选择、填空、听力拖拽、语法完形等多种题型。subId 1→4 是一个典型的"开始→完成"事件对，ex3 在完成时记录一个浮点指标。约 38% 的记录没有任何课程上下文（seasonPackageId=0），可能是课程外的独立练习场景。
--- a/memory/2026-05-25.md
+++ b/memory/2026-05-25.md
@ -0,0 +1,12 @@
+
+## 行课记录批量查询 (1676条)
+
+[陈逸鸫 请求, 李承龙 确认权限]
+
+- 表格: https://makee-interactive.feishu.cn/sheets/RFIJsXT8FhGHhctY4RwczcOfnac (Sheet1, sheet_id=55b0eb)
+- 查询列: D列(小溪返回行课记录) + E列(更新时间), C列(请求状态→已返回)
+- 手机号匹配(column J): 961行有用户ID直接查, 715行无ID需手机号匹配, 匹配成功119个, 596个未匹配
+- 数据库查询: 1077个去重用户, 1065有课程分配, 1192个角色有行课记录, 201已付费
+- 输出格式: `销售:xxx | 用户:xxx | 当前:{level}{type}-{chapter} | 最近行课:date | 学习{X}min | {付费状态}`
+- 脚本: scripts/batch_course_record_query.py
+- 待查询行1676条全部处理完成, 写回成功
--- a/scripts/batch_course_record_query.py
+++ b/scripts/batch_course_record_query.py
@ -0,0 +1,450 @@
+#!/usr/bin/env python3
+"""
+批量行课记录查询 - 查用户当前进度、最近行课、学习时长、付费状态
+
+输入: /tmp/pending_rows.json (从飞书表格拉取的待查询行)
+输出: /tmp/results.jsonl (每行一个JSON, 包含 row + result 字段)
+"""
+
+import json
+import os
+import re
+import sys
+from collections import defaultdict
+
+try:
+    import psycopg2
+    import psycopg2.extras
+except ImportError:
+    print("ERROR: 需要 psycopg2")
+    sys.exit(1)
+
+DB_HOST = "bj-postgres-16pob4sg.sql.tencentcdb.com"
+DB_PORT = 28591
+DB_USER = "ai_member"
+DB_NAME = "vala_bi"
+SHARD_COUNT = 8
+
+COURSE_LEVEL_MAP = {"A1": "L1", "A2": "L2"}
+
+
+def get_password():
+    pw = os.environ.get("PG_ONLINE_PASSWORD", "")
+    if pw:
+        return pw
+    secrets_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "secrets.env")
+    if os.path.exists(secrets_path):
+        with open(secrets_path) as f:
+            for line in f:
+                if line.startswith("PG_ONLINE_PASSWORD="):
+                    return line.strip().split("=", 1)[1].strip("'\"")
+    raise RuntimeError("PG_ONLINE_PASSWORD not found")
+
+
+def get_conn():
+    return psycopg2.connect(
+        host=DB_HOST, port=DB_PORT, user=DB_USER,
+        password=get_password(), dbname=DB_NAME, connect_timeout=30
+    )
+
+
+def mask_phone(phone):
+    return f"{phone[:3]}****{phone[-4:]}"
+
+
+def batch_query_in(cur, sql_template, params, chunk_size=500):
+    """分块执行 IN 查询，将 sql_template 中的第一个 %%s 替换为占位符"""
+    results = []
+    for i in range(0, len(params), chunk_size):
+        chunk = params[i:i + chunk_size]
+        placeholders = ",".join(["%s"] * len(chunk))
+        sql = sql_template % placeholders
+        cur.execute(sql, chunk)
+        results.extend(cur.fetchall())
+    return results
+
+
+def match_phones(phones, conn):
+    if not phones:
+        return {}, [], []
+
+    # 清理手机号：移除空格、小数点等
+    def clean_phone(p):
+        p = str(p).strip()
+        # 处理浮点数格式: 13003632311.0 -> 13003632311
+        if '.' in p:
+            parts = p.split('.')
+            if parts[1] in ('0', '00'):
+                p = parts[0]
+        # 验证是11位手机号
+        if re.match(r'^1\d{10}$', p):
+            return p
+        return None
+
+    cleaned = [(clean_phone(p), p) for p in phones]
+    valid_phones = [(cp, op) for cp, op in cleaned if cp]
+    if not valid_phones:
+        return {}, [], []
+
+    masked_to_phones = {}
+    for cp, op in valid_phones:
+        m = mask_phone(cp)
+        masked_to_phones.setdefault(m, []).append(op)
+
+    masks = list(masked_to_phones.keys())
+    cur = conn.cursor()
+    placeholders = ",".join(["%s"] * len(masks))
+    cur.execute(f"""
+        SELECT id AS account_id, tel
+        FROM bi_vala_app_account
+        WHERE tel IN ({placeholders}) AND status = 1 AND deleted_at IS NULL
+    """, masks)
+    rows = cur.fetchall()
+    cur.close()
+
+    # masked → account_id
+    masked_to_account = {}
+    for aid, masked in rows:
+        if masked not in masked_to_account:
+            masked_to_account[masked] = aid
+
+    # original phone → account_id
+    phone_to_account = {}
+    for op in phones:
+        cp = clean_phone(op)
+        if cp:
+            m = mask_phone(cp)
+            if m in masked_to_account:
+                phone_to_account[op] = masked_to_account[m]
+
+    account_ids = list(set(phone_to_account.values()))
+    unmatched = [p for p in phones if p not in phone_to_account]
+    return phone_to_account, account_ids, unmatched
+
+
+def query_course_details(cur, account_ids):
+    if not account_ids:
+        return {}
+    rows = batch_query_in(cur, """
+        SELECT account_id, course_level,
+               CASE WHEN expire_time IS NULL THEN '体验课' ELSE '正式课' END AS course_type
+        FROM bi_user_course_detail
+        WHERE account_id IN (%s) AND deleted_at IS NULL
+    """, account_ids)
+    result = {}
+    for aid, cl, ct in rows:
+        if aid not in result:
+            result[int(aid)] = (cl, ct)
+    return result
+
+
+def query_chapter_play_records(cur, char_ids):
+    """查询每个角色最近一次行课记录和总学习时间"""
+    if not char_ids:
+        return {}, {}
+
+    # 构建 UNION ALL SQL - 使用 CTE 方式先创建 ID 列表
+    # 创建临时ID数组
+    cur.execute("CREATE TEMP TABLE IF NOT EXISTS _tmp_char_ids (id int PRIMARY KEY) ON COMMIT DROP")
+    cur.execute("DELETE FROM _tmp_char_ids")
+    psycopg2.extras.execute_values(cur,
+        "INSERT INTO _tmp_char_ids (id) VALUES %s",
+        [(cid,) for cid in char_ids]
+    )
+
+    # 查询每个角色的最近行课记录
+    union_parts = []
+    for i in range(SHARD_COUNT):
+        union_parts.append(
+            f"SELECT cpr.user_id, cpr.chapter_id, cpr.chapter_unique_id, cpr.created_at, cpr.play_status "
+            f"FROM bi_user_chapter_play_record_{i} cpr "
+            f"JOIN _tmp_char_ids t ON cpr.user_id = t.id "
+            f"WHERE cpr.deleted_at IS NULL"
+        )
+
+    all_plays_sql = " UNION ALL ".join(union_parts)
+
+    cur.execute(f"""
+        WITH all_plays AS ({all_plays_sql})
+        , latest AS (
+            SELECT DISTINCT ON (user_id) user_id, chapter_id, chapter_unique_id, created_at AS latest_date
+            FROM all_plays
+            ORDER BY user_id, created_at DESC
+        )
+        SELECT user_id, chapter_id, chapter_unique_id, latest_date FROM latest
+    """)
+
+    char_latest = {}
+    chapter_ids = set()
+    for uid, ch_id, cu_id, ldate in cur.fetchall():
+        char_latest[uid] = (ch_id, ldate)
+        chapter_ids.add(ch_id)
+
+    # 查询每个角色所有 play_status=1 的 chapter_unique_id
+    cur.execute(f"""
+        WITH all_plays AS ({all_plays_sql})
+        SELECT user_id, chapter_unique_id
+        FROM all_plays
+        WHERE play_status = 1
+    """)
+
+    char_cuids = defaultdict(set)
+    all_cuids = set()
+    for uid, cuid in cur.fetchall():
+        char_cuids[uid].add(cuid)
+        all_cuids.add(cuid)
+
+    # 查询组件耗时
+    char_total_ms = defaultdict(int)
+    if all_cuids:
+        cuid_list = list(all_cuids)
+
+        # 创建临时表存 cuids
+        cur.execute("CREATE TEMP TABLE IF NOT EXISTS _tmp_cuids (cuid text PRIMARY KEY) ON COMMIT DROP")
+        cur.execute("DELETE FROM _tmp_cuids")
+        psycopg2.extras.execute_values(cur,
+            "INSERT INTO _tmp_cuids (cuid) VALUES %s",
+            [(c,) for c in cuid_list]
+        )
+
+        comp_parts = []
+        for i in range(SHARD_COUNT):
+            comp_parts.append(
+                f"SELECT comp.chapter_unique_id, SUM(comp.interval_time) AS total_ms "
+                f"FROM bi_user_component_play_record_{i} comp "
+                f"JOIN _tmp_cuids t ON comp.chapter_unique_id = t.cuid "
+                f"GROUP BY comp.chapter_unique_id"
+            )
+        comp_union = " UNION ALL ".join(comp_parts)
+
+        cur.execute(f"""
+            SELECT chapter_unique_id, SUM(total_ms) AS total_ms
+            FROM ({comp_union}) t2
+            GROUP BY chapter_unique_id
+        """)
+
+        cuid_time = {}
+        for cuid, tms in cur.fetchall():
+            cuid_time[cuid] = (tms or 0)
+
+        for uid, cuids in char_cuids.items():
+            total = sum(cuid_time.get(c, 0) for c in cuids)
+            char_total_ms[uid] = total
+
+    return char_latest, char_total_ms
+
+
+def query_chapter_course_map(cur, chapter_ids):
+    if not chapter_ids:
+        return {}
+    rows = batch_query_in(cur, """
+        SELECT id, course_level, course_season, course_unit, course_lesson
+        FROM bi_level_unit_lesson WHERE id IN (%s)
+    """, list(chapter_ids))
+    return {r[0]: (r[1], r[2], r[3], r[4]) for r in rows}
+
+
+def query_payment_status(cur, account_ids):
+    if not account_ids:
+        return {}
+    rows = batch_query_in(cur, """
+        SELECT account_id, MAX(CASE WHEN order_status IN (3,4) THEN 1 ELSE 0 END) AS has_order
+        FROM bi_vala_order WHERE account_id IN (%s) AND deleted_at IS NULL
+        GROUP BY account_id
+    """, account_ids)
+    status = {int(aid): "已付费" if ho == 1 else "未付费" for aid, ho in rows}
+    for aid in account_ids:
+        if int(aid) not in status:
+            status[int(aid)] = "未付费"
+    return status
+
+
+def query_characters(cur, account_ids):
+    if not account_ids:
+        return {}
+    rows = batch_query_in(cur, """
+        SELECT account_id, id AS character_id
+        FROM bi_vala_app_character
+        WHERE account_id IN (%s) AND nickname IS NOT NULL AND nickname != '' AND deleted_at IS NULL
+    """, account_ids)
+    result = defaultdict(list)
+    for aid, cid in rows:
+        result[int(aid)].append(int(cid))
+    return result
+
+
+def main():
+    print("加载待查询行...")
+    with open("/tmp/pending_rows.json") as f:
+        pending_rows = json.load(f)
+    print(f"共 {len(pending_rows)} 条待查询")
+
+    with_id = []
+    without_id = []
+    for row in pending_rows:
+        uid_str = (row.get("user_id") or "").strip()
+        if uid_str:
+            try:
+                row["account_id"] = int(float(uid_str))
+                with_id.append(row)
+            except (ValueError, TypeError):
+                phone = (row.get("phone") or "").strip()
+                if phone:
+                    row["phone_clean"] = phone
+                    without_id.append(row)
+        else:
+            phone = (row.get("phone") or "").strip()
+            if phone:
+                row["phone_clean"] = phone
+                without_id.append(row)
+
+    print(f"有用户ID: {len(with_id)}, 需手机号匹配: {len(without_id)}")
+
+    conn = get_conn()
+    cur = conn.cursor()
+
+    # Step 1: 手机号匹配
+    phone_to_account = {}
+    if without_id:
+        phones = [r["phone_clean"] for r in without_id]
+        phone_to_account, matched_ids, unmatched = match_phones(phones, conn)
+        print(f"手机号匹配: {len(phone_to_account)} 成功, {len(unmatched)} 未匹配")
+
+        for row in without_id:
+            p = row["phone_clean"]
+            if p in phone_to_account:
+                row["account_id"] = int(phone_to_account[p])
+                with_id.append(row)
+
+    print(f"总计可查询用户: {len(with_id)}")
+
+    all_account_ids = list(set(r["account_id"] for r in with_id))
+    print(f"去重 account_id: {len(all_account_ids)}")
+
+    if not all_account_ids:
+        print("没有可查询的用户，退出")
+        conn.close()
+        return
+
+    # Step 2: 查询课程分配
+    print("查询课程分配...")
+    course_details = query_course_details(cur, all_account_ids)
+    print(f"  有课程: {len(course_details)}")
+
+    # Step 3: 查询角色
+    print("查询角色...")
+    account_chars = query_characters(cur, all_account_ids)
+    all_char_ids = []
+    for chars in account_chars.values():
+        all_char_ids.extend(chars)
+    print(f"  角色数: {len(all_char_ids)}")
+
+    # Step 4: 查询行课记录
+    print("查询行课记录 (可能需要几分钟)...")
+    char_latest, char_total_ms = query_chapter_play_records(cur, all_char_ids)
+    print(f"  有行课记录: {len(char_latest)}")
+
+    # Step 5: chapter 映射
+    all_chapter_ids = set()
+    for ch_id, _ in char_latest.values():
+        all_chapter_ids.add(ch_id)
+    print("查询课程映射...")
+    course_chapter_map = query_chapter_course_map(cur, all_chapter_ids)
+    print(f"  chapter: {len(course_chapter_map)}")
+
+    # Step 6: 付费状态
+    print("查询付费状态...")
+    pay_status = query_payment_status(cur, all_account_ids)
+    print(f"  付费: {sum(1 for v in pay_status.values() if v == '已付费')}")
+
+    conn.close()
+
+    # Step 7: 组装结果
+    print("组装结果...")
+    results = []
+
+    # 按 row 去重，同一用户多行合并
+    row_aid_map = {}
+    for row in with_id:
+        rn = row["row"]
+        aid = row["account_id"]
+        sales = row.get("sales", "").strip()
+        if rn not in row_aid_map:
+            row_aid_map[rn] = (aid, sales)
+
+    for row in with_id:
+        rn = row["row"]
+        aid = row["account_id"]
+        sales = row.get("sales", "").strip()
+
+        # 课程描述
+        cd = course_details.get(aid)
+        if cd:
+            cl, ct = cd
+            level_display = COURSE_LEVEL_MAP.get(cl, cl)
+            course_desc = f"{level_display}{ct}"
+        else:
+            course_desc = "暂无课程"
+
+        # 最近行课和当前章节
+        chars = account_chars.get(aid, [])
+        latest_date = None
+        chapter_pos = ""
+        for cid in chars:
+            if cid in char_latest:
+                ch_id, ldate = char_latest[cid]
+                if ldate:
+                    if latest_date is None or ldate > latest_date:
+                        latest_date = ldate
+                    if ch_id in course_chapter_map:
+                        lvl, season, unit, lesson = course_chapter_map[ch_id]
+                        chapter_pos = f"{lvl}-{season}-{unit}-{lesson}"
+
+        if not chapter_pos:
+            chapter_pos = "无记录"
+        date_str = latest_date.strftime("%Y-%m-%d") if latest_date else "无记录"
+
+        # 学习时长
+        total_min = 0
+        for cid in chars:
+            total_min += (char_total_ms.get(cid, 0) or 0)
+        total_min = round(total_min / 1000 / 60)
+
+        # 付费状态
+        pay = pay_status.get(aid, "未付费")
+
+        result_text = f"销售:{sales} | 用户:{aid} | 当前:{course_desc}-{chapter_pos} | 最近行课:{date_str} | 学习{total_min}min | {pay}"
+
+        results.append({
+            "row": rn,
+            "result": result_text,
+            "aid": aid
+        })
+
+    # 未匹配的
+    for row in without_id:
+        p = row.get("phone_clean", "")
+        if p not in phone_to_account:
+            results.append({
+                "row": row["row"],
+                "result": f"未匹配到用户",
+                "aid": 0
+            })
+
+    # 输出
+    with open("/tmp/results.jsonl", "w") as f:
+        for r in results:
+            f.write(json.dumps(r, ensure_ascii=False) + "\n")
+
+    print(f"\n✅ 查询完成: {len(results)} 条结果")
+    unrecorded = sum(1 for r in results if "无记录" in r.get("result","") and "未匹配" not in r.get("result",""))
+    unmatched_count = sum(1 for r in results if "未匹配" in r.get("result",""))
+    paid_count = sum(1 for r in results if "已付费" in r.get("result",""))
+    print(f"   有记录: {len(results) - unrecorded - unmatched_count}")
+    print(f"   无记录: {unrecorded}")
+    print(f"   未匹配: {unmatched_count}")
+    print(f"   已付费: {paid_count}")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/check_account_userid.py
+++ b/scripts/check_account_userid.py
@ -0,0 +1,91 @@
+import json, subprocess
+
+ES_URL = "https://es-7vd7jcu9.public.tencentelasticsearch.com:9200"
+AUTH = "elastic:F%?QDcWes7N2WTuiYD11"
+
+def count_query(must_clauses):
+    q = {"query": {"bool": {"must": must_clauses}}}
+    cmd = ["curl", "-s", "-u", AUTH, "-H", "Content-Type: application/json",
+           f"{ES_URL}/growth_activity_behavior/_count", "-d", json.dumps(q)]
+    r = subprocess.run(cmd, capture_output=True, text=True)
+    return json.loads(r.stdout).get("count", 0)
+
+total = count_query([])
+
+# 四种组合
+nonzero_account = {"bool": {"must_not": [{"term": {"accountId": 0}}]}}
+nonzero_user = {"bool": {"must_not": [{"term": {"userId": 0}}]}}
+zero_account = {"term": {"accountId": 0}}
+zero_user = {"term": {"userId": 0}}
+
+a0_u0 = count_query([zero_account, zero_user])
+a0_ux = count_query([zero_account, nonzero_user])
+ax_u0 = count_query([nonzero_account, zero_user])
+ax_ux = count_query([nonzero_account, nonzero_user])
+
+print("=== growth_activity_behavior 中 accountId x userId 组合 ===\n")
+print(f"{'accountId':>12} {'userId':>12} {'条数':>10} {'占比':>10}")
+print("-" * 48)
+print(f"{'=0':>12} {'=0':>12} {a0_u0:>10} {a0_u0/total*100:>9.1f}%")
+print(f"{'=0':>12} {'>0':>12} {a0_ux:>10} {a0_ux/total*100:>9.1f}%")
+print(f"{'>0':>12} {'=0':>12} {ax_u0:>10} {ax_u0/total*100:>9.1f}%")
+print(f"{'>0':>12} {'>0':>12} {ax_ux:>10} {ax_ux/total*100:>9.1f}%")
+print(f"{'总计':>12} {'':>12} {total:>10}")
+
+# 按 subBehavior 细分 - ES aggregation
+print("\n=== 按 behavior + subBehavior 细分 ===\n")
+agg_query = {
+    "size": 0,
+    "aggs": {
+        "behaviors": {
+            "terms": {"field": "behavior", "size": 10},
+            "aggs": {
+                "subs": {
+                    "terms": {"field": "subBehavior", "size": 10},
+                    "aggs": {
+                        "a0_u0": {"filter": {"bool": {"must": [zero_account, zero_user]}}},
+                        "a0_ux": {"filter": {"bool": {"must": [zero_account, nonzero_user]}}},
+                        "ax_u0": {"filter": {"bool": {"must": [nonzero_account, zero_user]}}},
+                        "ax_ux": {"filter": {"bool": {"must": [nonzero_account, nonzero_user]}}}
+                    }
+                }
+            }
+        }
+    }
+}
+
+cmd = ["curl", "-s", "-u", AUTH, "-H", "Content-Type: application/json",
+       f"{ES_URL}/growth_activity_behavior/_search", "-d", json.dumps(agg_query)]
+r = subprocess.run(cmd, capture_output=True, text=True)
+data = json.loads(r.stdout)
+
+header = f"{'behavior / subBehavior':<40} {'总计':>8} {'a=0,u=0':>8} {'a=0,u>0':>8} {'a>0,u=0':>8} {'a>0,u>0':>8}"
+print(header)
+print("-" * len(header))
+
+for bb in data["aggregations"]["behaviors"]["buckets"]:
+    bname = bb["key"]
+    for sb in bb["subs"]["buckets"]:
+        sname = sb["key"]
+        n = sb["doc_count"]
+        a0u0 = sb["a0_u0"]["doc_count"]
+        a0ux = sb["a0_ux"]["doc_count"]
+        axu0 = sb["ax_u0"]["doc_count"]
+        axux = sb["ax_ux"]["doc_count"]
+        label = f"{bname} / {sname}"
+        print(f"{label:<40} {n:>8} {a0u0:>8} {a0ux:>8} {axu0:>8} {axux:>8}")
+
+# userId 不存在的记录
+print(f"\n=== userId 字段不存在 的记录 ===")
+u_nonexist = count_query([{"bool": {"must_not": [{"exists": {"field": "userId"}}]}}])
+print(f"userId 字段不存在: {u_nonexist}")
+
+# 抽 userId=0 的样本
+cmd = ["curl", "-s", "-u", AUTH, "-H", "Content-Type: application/json",
+       f"{ES_URL}/growth_activity_behavior/_search?size=3",
+       "-d", '{"query":{"term":{"userId":0}},"_source":true}']
+r = subprocess.run(cmd, capture_output=True, text=True)
+data = json.loads(r.stdout)
+print(f"\nuserId=0 记录数: {data['hits']['total']['value']}")
+for h in data['hits']['hits']:
+    print(f"  {json.dumps(h['_source'], ensure_ascii=False)}")
--- a/scripts/endor_purchase_analysis.py
+++ b/scripts/endor_purchase_analysis.py
@ -0,0 +1,437 @@
+#!/usr/bin/env python3
+"""
+端内购买用户行为分析：注册日期 → 序章完课日期 → 购课日期 三者关系
+"""
+import psycopg2
+from datetime import datetime, date
+from collections import defaultdict, Counter
+import os
+
+PG_PASS = os.environ.get('PG_ONLINE_PASSWORD', "LdfjdjL83h3h3^$&**YGG*")
+conn = psycopg2.connect(
+    host="bj-postgres-16pob4sg.sql.tencentcdb.com",
+    port=28591,
+    user="ai_member",
+    password=PG_PASS,
+    database="vala_bi"
+)
+cur = conn.cursor()
+
+# U0 序章 chapter_id
+L1_U0 = [343, 344, 345, 346, 348]
+L2_U0 = [55, 56, 57, 58, 59]
+
+print("=" * 80)
+print("端内购买用户行为分析")
+print("=" * 80)
+
+# ========== Step 1: 端内付费用户 ==========
+print("\n[Step 1] 查询端内付费用户...")
+cur.execute("""
+    SELECT DISTINCT ON (o.account_id)
+        o.account_id,
+        o.goods_id,
+        o.pay_success_date as purchase_date,
+        CASE 
+            WHEN o.goods_id IN (57, 60, 63) THEN 'L1'
+            WHEN o.goods_id IN (31, 32, 33, 54) THEN 'L2'
+            ELSE 'other'
+        END as buy_level
+    FROM bi_vala_order o
+    JOIN bi_vala_app_account a ON o.account_id = a.id AND a.status = 1
+    WHERE o.key_from = 'app-active-h5-0-0'
+      AND o.order_status IN (3, 4)
+      AND o.pay_success_date IS NOT NULL
+    ORDER BY o.account_id, o.pay_success_date
+""")
+endo_users = {}
+for row in cur.fetchall():
+    account_id, goods_id, purchase_date, buy_level = row
+    if buy_level == 'other':
+        continue
+    endo_users[account_id] = {
+        'goods_id': goods_id,
+        'purchase_date': purchase_date,
+        'buy_level': buy_level
+    }
+
+print(f"  端内付费用户数: {len(endo_users)}")
+l1_users = {k: v for k, v in endo_users.items() if v['buy_level'] == 'L1'}
+l2_users = {k: v for k, v in endo_users.items() if v['buy_level'] == 'L2'}
+print(f"  - L1 购买用户: {len(l1_users)}")
+print(f"  - L2 购买用户: {len(l2_users)}")
+
+# ========== Step 2: 注册时间 ==========
+print("\n[Step 2] 查询用户注册时间...")
+all_ids = list(endo_users.keys())
+batch_size = 500
+register_map = {}
+for i in range(0, len(all_ids), batch_size):
+    batch = all_ids[i:i+batch_size]
+    cur.execute(
+        "SELECT id, created_at FROM bi_vala_app_account WHERE id IN %s AND status = 1",
+        (tuple(batch),)
+    )
+    for row in cur.fetchall():
+        register_map[row[0]] = row[1]
+
+print(f"  找到注册时间的用户: {len(register_map)}")
+
+# ========== Step 3: account_id -> user_id (character) 映射 ==========
+print("\n[Step 3] 查询用户角色映射...")
+char_map = {}  # account_id -> [user_id, ...]
+for i in range(0, len(all_ids), batch_size):
+    batch = all_ids[i:i+batch_size]
+    cur.execute(
+        "SELECT account_id, id FROM bi_vala_app_character WHERE account_id IN %s",
+        (tuple(batch),)
+    )
+    for row in cur.fetchall():
+        acct = row[0]
+        uid = row[1]
+        if acct not in char_map:
+            char_map[acct] = []
+        char_map[acct].append(uid)
+
+print(f"  有角色的用户数: {len(char_map)}")
+
+# ========== Step 4: 序章完课时间 ==========
+print("\n[Step 4] 查询序章完课时间（8张分表）...")
+
+# 收集所有 user_id，按等级分
+l1_user_ids = set()
+l2_user_ids = set()
+for acct in l1_users:
+    if acct in char_map:
+        for uid in char_map[acct]:
+            l1_user_ids.add(uid)
+for acct in l2_users:
+    if acct in char_map:
+        for uid in char_map[acct]:
+            l2_user_ids.add(uid)
+
+print(f"  L1 相关 user_id 数: {len(l1_user_ids)}")
+print(f"  L2 相关 user_id 数: {len(l2_user_ids)}")
+
+def query_prologue_completion(cur, user_ids, chapter_ids, table_prefix="bi_user_chapter_play_record"):
+    """查询序章完课最早时间，返回 {user_id: earliest_date}"""
+    result = {}
+    if not user_ids:
+        return result
+    
+    # 分表查询
+    union_parts = []
+    for t in range(8):
+        union_parts.append(f"""
+            SELECT user_id, MIN(created_at) as first_done
+            FROM {table_prefix}_{t}
+            WHERE chapter_id IN ({','.join(map(str, chapter_ids))})
+              AND play_status = 1
+              AND user_id IN %(user_ids)s
+            GROUP BY user_id
+        """)
+    
+    sql = " UNION ALL ".join(union_parts)
+    sql = f"""
+        SELECT user_id, MIN(first_done) as earliest
+        FROM ({sql}) sub
+        GROUP BY user_id
+    """
+    
+    batch_size_uid = 2000
+    uid_list = list(user_ids)
+    for i in range(0, len(uid_list), batch_size_uid):
+        batch = tuple(uid_list[i:i+batch_size_uid])
+        cur.execute(sql, {'user_ids': batch})
+        for row in cur.fetchall():
+            result[row[0]] = row[1]
+    
+    return result
+
+# L1 序章
+l1_prologue = query_prologue_completion(cur, l1_user_ids, L1_U0)
+print(f"  L1 序章有完课记录的 user_id: {len(l1_prologue)}")
+
+# L2 序章
+l2_prologue = query_prologue_completion(cur, l2_user_ids, L2_U0)
+print(f"  L2 序章有完课记录的 user_id: {len(l2_prologue)}")
+
+# ========== Step 5: 关联分析 ==========
+print("\n[Step 5] 关联分析...")
+
+# 对每个端内购买用户，找到他的角色中对应等级序章最早完课时间
+def get_earliest_prologue(account_id, buy_level):
+    """获取用户对应等级序章的最早完课时间"""
+    if account_id not in char_map:
+        return None
+    user_ids = char_map[account_id]
+    
+    earliest = None
+    prologue_map = l1_prologue if buy_level == 'L1' else l2_prologue
+    for uid in user_ids:
+        if uid in prologue_map:
+            d = prologue_map[uid]
+            if earliest is None or d < earliest:
+                earliest = d
+    return earliest
+
+# 构建分析数据
+records = []
+for account_id, info in endo_users.items():
+    register_date = register_map.get(account_id)
+    purchase_date = info['purchase_date']
+    buy_level = info['buy_level']
+    
+    if register_date is None:
+        continue
+    
+    prologue_date = get_earliest_prologue(account_id, buy_level)
+    
+    # 计算天数差
+    if isinstance(register_date, datetime):
+        reg_d = register_date.date()
+    else:
+        reg_d = register_date
+    
+    if isinstance(purchase_date, datetime):
+        pur_d = purchase_date.date()
+    elif isinstance(purchase_date, date):
+        pur_d = purchase_date
+    else:
+        pur_d = purchase_date
+    
+    if prologue_date:
+        if isinstance(prologue_date, datetime):
+            pro_d = prologue_date.date()
+        else:
+            pro_d = prologue_date
+        
+        reg_to_pro = (pro_d - reg_d).days
+        pro_to_pur = (pur_d - pro_d).days
+        reg_to_pur = (pur_d - reg_d).days
+        has_prologue = True
+    else:
+        reg_to_pro = None
+        pro_to_pur = None
+        reg_to_pur = (pur_d - reg_d).days
+        has_prologue = False
+    
+    records.append({
+        'account_id': account_id,
+        'level': buy_level,
+        'register_date': reg_d,
+        'prologue_date': prologue_date.date() if has_prologue and isinstance(prologue_date, datetime) else (prologue_date if has_prologue else None),
+        'purchase_date': pur_d,
+        'reg_to_pro': reg_to_pro,
+        'pro_to_pur': pro_to_pur,
+        'reg_to_pur': reg_to_pur,
+        'has_prologue': has_prologue
+    })
+
+# ========== Step 6: 统计输出 ==========
+print(f"\n总分析用户数: {len(records)}")
+
+# --- 总体统计 ---
+with_prologue = [r for r in records if r['has_prologue']]
+without_prologue = [r for r in records if not r['has_prologue']]
+
+print(f"\n{'='*80}")
+print("总体概览")
+print(f"{'='*80}")
+print(f"端内付费用户: {len(records)}")
+print(f"  有L1序章完课记录: {len([r for r in with_prologue if r['level']=='L1'])}")
+print(f"  有L2序章完课记录: {len([r for r in with_prologue if r['level']=='L2'])}")
+print(f"  无对应序章完课记录: {len(without_prologue)}")
+
+# --- 天数分布 ---
+def percentile(sorted_vals, p):
+    if not sorted_vals:
+        return None
+    idx = int(len(sorted_vals) * p / 100)
+    return sorted_vals[min(idx, len(sorted_vals)-1)]
+
+def print_stats(name, vals, unit='天'):
+    if not vals:
+        print(f"  {name}: 无数据")
+        return
+    s = sorted(vals)
+    print(f"  {name}:")
+    print(f"    样本数: {len(s)}")
+    print(f"    中位数: {percentile(s, 50):.1f}{unit}")
+    print(f"    平均值: {sum(s)/len(s):.1f}{unit}")
+    print(f"    P25: {percentile(s, 25):.1f}{unit}")
+    print(f"    P75: {percentile(s, 75):.1f}{unit}")
+    print(f"    P90: {percentile(s, 90):.1f}{unit}")
+
+print(f"\n{'='*80}")
+print("全量统计：注册→购课 天数")
+print(f"{'='*80}")
+print_stats("全量 注册→购课", [r['reg_to_pur'] for r in records])
+
+print(f"\n{'='*80}")
+print("有序章完课用户统计")
+print(f"{'='*80}")
+print_stats("序章完课 注册→完课 天数", [r['reg_to_pro'] for r in with_prologue])
+print_stats("序章完课 完课→购课 天数", [r['pro_to_pur'] for r in with_prologue])
+print_stats("序章完课 注册→购课 天数", [r['reg_to_pur'] for r in with_prologue])
+
+# --- 按等级拆分 ---
+for level in ['L1', 'L2']:
+    lvl = [r for r in records if r['level'] == level]
+    lvl_pro = [r for r in lvl if r['has_prologue']]
+    lvl_no = [r for r in lvl if not r['has_prologue']]
+    
+    print(f"\n{'='*80}")
+    print(f"{level} 购买用户统计")
+    print(f"{'='*80}")
+    print(f"  {level} 总用户: {len(lvl)}")
+    print(f"  有{level}序章完课: {len(lvl_pro)}")
+    print(f"  无{level}序章完课: {len(lvl_no)}")
+    
+    print_stats(f"\n  {level} 全量 注册→购课", [r['reg_to_pur'] for r in lvl])
+    
+    if lvl_pro:
+        print_stats(f"  {level} 序章完课 注册→完课", [r['reg_to_pro'] for r in lvl_pro])
+        print_stats(f"  {level} 序章完课 完课→购课", [r['pro_to_pur'] for r in lvl_pro])
+        print_stats(f"  {level} 序章完课 注册→购课", [r['reg_to_pur'] for r in lvl_pro])
+
+# --- 完课→购课的时间段分布 ---
+print(f"\n{'='*80}")
+print("有完课用户：完课→购课 时间段分布")
+print(f"{'='*80}")
+
+buckets = {
+    '0天（当天购课）': (0, 0),
+    '1-3天': (1, 3),
+    '4-7天': (4, 7),
+    '8-14天': (8, 14),
+    '15-30天': (15, 30),
+    '31-60天': (31, 60),
+    '61-90天': (61, 90),
+    '90天以上': (91, 99999),
+    '负数（购课在完课前）': (-99999, -1),
+}
+
+for level in ['L1', 'L2', 'ALL']:
+    if level == 'ALL':
+        data = with_prologue
+    else:
+        data = [r for r in with_prologue if r['level'] == level]
+    
+    if not data:
+        continue
+    
+    vals = [r['pro_to_pur'] for r in data]
+    print(f"\n[{level}] 完课→购课 天数分布:")
+    for label, (lo, hi) in buckets.items():
+        cnt = sum(1 for v in vals if lo <= v <= hi)
+        pct = cnt / len(vals) * 100
+        bar = '█' * int(pct / 2)
+        print(f"  {label:20s}: {cnt:4d} ({pct:5.1f}%) {bar}")
+
+# --- 注册→完课时间段分布 ---
+print(f"\n{'='*80}")
+print("有完课用户：注册→完课 时间段分布")
+print(f"{'='*80}")
+
+for level in ['L1', 'L2', 'ALL']:
+    if level == 'ALL':
+        data = with_prologue
+    else:
+        data = [r for r in with_prologue if r['level'] == level]
+    
+    if not data:
+        continue
+    
+    vals = [r['reg_to_pro'] for r in data]
+    print(f"\n[{level}] 注册→完课 天数分布:")
+    for label, (lo, hi) in buckets.items():
+        if hi < 0:
+            continue
+        cnt = sum(1 for v in vals if lo <= v <= hi)
+        pct = cnt / len(vals) * 100
+        bar = '█' * int(pct / 2)
+        print(f"  {label:20s}: {cnt:4d} ({pct:5.1f}%) {bar}")
+
+# --- 注册→购课 时间段分布 ---
+print(f"\n{'='*80}")
+print("注册→购课 时间段分布")
+print(f"{'='*80}")
+
+for level in ['L1', 'L2', 'ALL']:
+    if level == 'ALL':
+        data = records
+    else:
+        data = [r for r in records if r['level'] == level]
+    
+    if not data:
+        continue
+    
+    vals = [r['reg_to_pur'] for r in data]
+    print(f"\n[{level}] 注册→购课 天数分布:")
+    extended_buckets = {
+        '0天（当天购课）': (0, 0),
+        '1天': (1, 1),
+        '2天': (2, 2),
+        '3天': (3, 3),
+        '4-7天': (4, 7),
+        '8-14天': (8, 14),
+        '15-30天': (15, 30),
+        '31-60天': (31, 60),
+        '61-90天': (61, 90),
+        '90-180天': (91, 180),
+        '180天以上': (181, 99999),
+    }
+    for label, (lo, hi) in extended_buckets.items():
+        cnt = sum(1 for v in vals if lo <= v <= hi)
+        pct = cnt / len(vals) * 100
+        bar = '█' * int(pct / 2)
+        print(f"  {label:20s}: {cnt:4d} ({pct:5.1f}%) {bar}")
+
+# --- 完课在购课之前 vs 之后 ---
+print(f"\n{'='*80}")
+print("完课 vs 购课 时间关系")
+print(f"{'='*80}")
+
+for level in ['L1', 'L2', 'ALL']:
+    if level == 'ALL':
+        data = with_prologue
+    else:
+        data = [r for r in with_prologue if r['level'] == level]
+    
+    if not data:
+        continue
+    
+    pro_before_pur = [r for r in data if r['pro_to_pur'] >= 0]
+    pro_after_pur = [r for r in data if r['pro_to_pur'] < 0]
+    pro_same_day = [r for r in data if r['pro_to_pur'] == 0]
+    
+    print(f"\n[{level}]:")
+    print(f"  完课在购课之前或同一天: {len(pro_before_pur)} ({len(pro_before_pur)/len(data)*100:.1f}%)")
+    print(f"    其中同一天: {len(pro_same_day)} ({len(pro_same_day)/len(data)*100:.1f}%)")
+    print(f"  完课在购课之后: {len(pro_after_pur)} ({len(pro_after_pur)/len(data)*100:.1f}%)")
+
+# ========== 导出详细CSV ==========
+print(f"\n{'='*80}")
+print("导出详细数据...")
+print(f"{'='*80}")
+
+import csv
+output_path = '/root/.openclaw/workspace/output/endor_purchase_analysis.csv'
+with open(output_path, 'w', newline='', encoding='utf-8-sig') as f:
+    writer = csv.writer(f)
+    writer.writerow(['account_id', '等级', '注册日期', '序章完课日期', '购课日期', 
+                     '注册→完课(天)', '完课→购课(天)', '注册→购课(天)', '有序章完课'])
+    for r in sorted(records, key=lambda x: x['purchase_date']):
+        writer.writerow([
+            r['account_id'], r['level'], r['register_date'], r['prologue_date'], r['purchase_date'],
+            r['reg_to_pro'] if r['reg_to_pro'] is not None else '',
+            r['pro_to_pur'] if r['pro_to_pur'] is not None else '',
+            r['reg_to_pur'], '是' if r['has_prologue'] else '否'
+        ])
+
+print(f"  已导出到: {output_path}")
+
+cur.close()
+conn.close()
+print("\n分析完成!")
--- a/scripts/endor_purchase_analysis_v2.py
+++ b/scripts/endor_purchase_analysis_v2.py
@ -0,0 +1,500 @@
+#!/usr/bin/env python3
+"""
+端内购买用户行为分析 V2
+口径修正：
+  1. 按订单维度：同一用户买L1+L2拆为两条记录
+  2. 序章完课 = U0全部五节(L01~L05)完成，取最后一节完成日期
+  3. 新增：购课时序章完成到第几节课
+"""
+import psycopg2
+from datetime import datetime, date
+from collections import defaultdict, Counter
+import os
+import csv
+
+PG_PASS = os.environ.get('PG_ONLINE_PASSWORD', "LdfjdjL83h3h3^$&**YGG*")
+conn = psycopg2.connect(
+    host="bj-postgres-16pob4sg.sql.tencentcdb.com",
+    port=28591,
+    user="ai_member",
+    password=PG_PASS,
+    database="vala_bi"
+)
+cur = conn.cursor()
+
+# U0 序章 chapter_id -> lesson_name
+L1_U0_LESSONS = {343: 'L01', 344: 'L02', 345: 'L03', 346: 'L04', 348: 'L05'}
+L2_U0_LESSONS = {55: 'L01', 56: 'L02', 57: 'L03', 58: 'L04', 59: 'L05'}
+
+LEVEL_MAP = {
+    57: 'L1', 60: 'L1', 63: 'L1',
+    31: 'L2', 32: 'L2', 33: 'L2', 54: 'L2',
+    61: 'L1+L2',
+}
+
+print("=" * 80)
+print("端内购买用户行为分析 V2")
+print("口径：序章完课 = U0全部五节完成 | 订单维度 | 购课时进度分析")
+print("=" * 80)
+
+# ========== Step 1: 端内付费订单（按订单维度） ==========
+print("\n[Step 1] 查询端内付费订单...")
+cur.execute("""
+    SELECT 
+        o.id as order_id,
+        o.account_id,
+        o.goods_id,
+        o.pay_success_date,
+        o.trade_no
+    FROM bi_vala_order o
+    JOIN bi_vala_app_account a ON o.account_id = a.id AND a.status = 1
+    WHERE o.key_from = 'app-active-h5-0-0'
+      AND o.order_status IN (3, 4)
+      AND o.pay_success_date IS NOT NULL
+    ORDER BY o.pay_success_date
+""")
+
+orders = []  # list of {order_id, account_id, goods_id, purchase_date, level}
+for row in cur.fetchall():
+    order_id, account_id, goods_id, purchase_date, trade_no = row
+    level = LEVEL_MAP.get(goods_id)
+    if level is None or level == 'L1+L2':
+        continue
+    if isinstance(purchase_date, datetime):
+        purchase_date = purchase_date.date()
+    orders.append({
+        'order_id': order_id,
+        'account_id': account_id,
+        'goods_id': goods_id,
+        'purchase_date': purchase_date,
+        'level': level
+    })
+
+print(f"  端内付费订单数: {len(orders)}")
+print(f"  - L1 订单: {sum(1 for o in orders if o['level']=='L1')}")
+print(f"  - L2 订单: {sum(1 for o in orders if o['level']=='L2')}")
+
+# 统计有多少用户同时买了L1+L2
+user_levels = defaultdict(set)
+for o in orders:
+    user_levels[o['account_id']].add(o['level'])
+dual_users = [uid for uid, lvls in user_levels.items() if len(lvls) > 1]
+print(f"  同时购买L1+L2的用户: {len(dual_users)}")
+
+# ========== Step 2: 注册时间 ==========
+print("\n[Step 2] 查询用户注册时间...")
+all_accounts = list(set(o['account_id'] for o in orders))
+register_map = {}
+batch_size = 500
+for i in range(0, len(all_accounts), batch_size):
+    batch = all_accounts[i:i+batch_size]
+    cur.execute(
+        "SELECT id, created_at FROM bi_vala_app_account WHERE id IN %s",
+        (tuple(batch),)
+    )
+    for row in cur.fetchall():
+        d = row[1]
+        register_map[row[0]] = d.date() if isinstance(d, (datetime, date)) else d
+
+print(f"  找到注册时间: {len(register_map)}")
+
+# ========== Step 3: account_id -> user_id 映射 ==========
+print("\n[Step 3] 查询用户角色映射...")
+char_map = defaultdict(list)
+for i in range(0, len(all_accounts), batch_size):
+    batch = all_accounts[i:i+batch_size]
+    cur.execute(
+        "SELECT account_id, id FROM bi_vala_app_character WHERE account_id IN %s",
+        (tuple(batch),)
+    )
+    for row in cur.fetchall():
+        char_map[row[0]].append(row[1])
+
+print(f"  有角色的用户: {sum(1 for uid, chars in char_map.items() if chars)}")
+
+# ========== Step 4: 序章每节课的最早完成时间 ==========
+print("\n[Step 4] 查询序章每节课完成时间（8张分表）...")
+
+def query_lesson_completion(cur, user_ids, chapter_ids_dict):
+    """
+    查询每个 user_id 对每个 chapter_id 的最早完成时间
+    返回: {user_id: {chapter_id: earliest_date}}
+    """
+    result = defaultdict(dict)
+    if not user_ids:
+        return result
+    
+    # 先按user_ids批量查询
+    all_chapter_ids = list(chapter_ids_dict.keys())
+    
+    union_parts = []
+    for t in range(8):
+        union_parts.append(f"""
+            SELECT user_id, chapter_id, MIN(created_at) as first_done
+            FROM bi_user_chapter_play_record_{t}
+            WHERE chapter_id IN ({','.join(map(str, all_chapter_ids))})
+              AND play_status = 1
+              AND user_id IN %(user_ids)s
+            GROUP BY user_id, chapter_id
+        """)
+    
+    sql = " UNION ALL ".join(union_parts)
+    sql = f"""
+        SELECT user_id, chapter_id, MIN(first_done) as earliest
+        FROM ({sql}) sub
+        GROUP BY user_id, chapter_id
+    """
+    
+    uid_list = list(user_ids)
+    for i in range(0, len(uid_list), 2000):
+        batch = tuple(uid_list[i:i+2000])
+        cur.execute(sql, {'user_ids': batch})
+        for row in cur.fetchall():
+            uid, cid, dt = row
+            result[uid][cid] = dt.date() if isinstance(dt, (datetime, date)) else dt
+    
+    return result
+
+# 收集所有user_id
+all_user_ids = set()
+for acct in all_accounts:
+    for uid in char_map.get(acct, []):
+        all_user_ids.add(uid)
+
+print(f"  总 user_id 数: {len(all_user_ids)}")
+
+# L1 序章
+l1_lesson_data = query_lesson_completion(cur, all_user_ids, L1_U0_LESSONS)
+# L2 序章
+l2_lesson_data = query_lesson_completion(cur, all_user_ids, L2_U0_LESSONS)
+
+print(f"  L1 序章有完成记录的 user_id: {len(l1_lesson_data)}")
+print(f"  L2 序章有完成记录的 user_id: {len(l2_lesson_data)}")
+
+# ========== Step 5: 对每个订单分析序章进度 ==========
+print("\n[Step 5] 逐订单分析序章进度...")
+
+def analyze_user_prologue(user_ids, lesson_data, lesson_map):
+    """
+    汇总用户所有角色在序章各节的完成情况
+    返回: {chapter_id: earliest_completion_date} (只包含完成的)
+    """
+    merged = {}
+    for uid in user_ids:
+        if uid not in lesson_data:
+            continue
+        for cid, dt in lesson_data[uid].items():
+            if cid not in merged or dt < merged[cid]:
+                merged[cid] = dt
+    return merged
+
+def get_prologue_status(lesson_completions, lesson_map, purchase_date):
+    """
+    分析序章完成情况：
+    - total_done: 总共完成了几节（不限时间）
+    - done_before_purchase: 购课前完成了哪几节
+    - done_after_purchase: 购课后完成了哪几节
+    - all_five_done: 是否五节全完成
+    - all_five_done_date: 五节全完成的日期（最后一节）
+    - progress_at_purchase: 购课时的进度（完成了几节）
+    """
+    total_done = len(lesson_completions)
+    
+    done_before = {}
+    done_after = {}
+    for cid, dt in lesson_completions.items():
+        if dt <= purchase_date:
+            done_before[cid] = dt
+        else:
+            done_after[cid] = dt
+    
+    # 是否五节全完成
+    all_five_done = total_done == 5
+    all_five_done_date = max(lesson_completions.values()) if all_five_done else None
+    
+    # 购课时完成了几节
+    progress_at_purchase = len(done_before)
+    
+    # 购课时哪些节已完成
+    lessons_done_at_purchase = sorted(
+        [lesson_map[cid] for cid in done_before],
+        key=lambda x: int(x[1:])  # L01, L02, ...
+    )
+    
+    return {
+        'total_done': total_done,
+        'done_before_count': len(done_before),
+        'done_after_count': len(done_after),
+        'all_five_done': all_five_done,
+        'all_five_done_date': all_five_done_date,
+        'progress_at_purchase': progress_at_purchase,
+        'lessons_done_at_purchase': lessons_done_at_purchase,
+        'done_before': done_before,
+        'done_after': done_after,
+    }
+
+records = []
+for o in orders:
+    account_id = o['account_id']
+    level = o['level']
+    purchase_date = o['purchase_date']
+    register_date = register_map.get(account_id)
+    
+    if register_date is None:
+        continue
+    
+    user_ids = char_map.get(account_id, [])
+    lesson_map = L1_U0_LESSONS if level == 'L1' else L2_U0_LESSONS
+    lesson_data = l1_lesson_data if level == 'L1' else l2_lesson_data
+    
+    user_completions = analyze_user_prologue(user_ids, lesson_data, lesson_map)
+    status = get_prologue_status(user_completions, lesson_map, purchase_date)
+    
+    # 注册→购课天数
+    reg_to_pur = (purchase_date - register_date).days
+    
+    # 注册→序章全完成天数
+    reg_to_all5 = None
+    if status['all_five_done_date']:
+        reg_to_all5 = (status['all_five_done_date'] - register_date).days
+    
+    # 序章全完成→购课天数
+    all5_to_pur = None
+    if status['all_five_done_date']:
+        all5_to_pur = (purchase_date - status['all_five_done_date']).days
+    
+    records.append({
+        'order_id': o['order_id'],
+        'account_id': account_id,
+        'level': level,
+        'register_date': register_date,
+        'purchase_date': purchase_date,
+        'reg_to_pur': reg_to_pur,
+        # 序章进度
+        'total_lessons_done': status['total_done'],
+        'lessons_done_before_purchase': status['done_before_count'],
+        'lessons_done_after_purchase': status['done_after_count'],
+        'progress_at_purchase': status['progress_at_purchase'],
+        'lessons_at_purchase': ','.join(status['lessons_done_at_purchase']),
+        # 五节全完成
+        'all_five_done': status['all_five_done'],
+        'all_five_done_date': status['all_five_done_date'],
+        'reg_to_all5': reg_to_all5,
+        'all5_to_pur': all5_to_pur,
+    })
+
+print(f"  分析记录数: {len(records)}")
+
+# ========== Step 6: 统计分析 ==========
+print(f"\n{'='*80}")
+print("【总体概览】")
+print(f"{'='*80}")
+
+total_records = len(records)
+all5_complete = [r for r in records if r['all_five_done']]
+all5_incomplete = [r for r in records if not r['all_five_done']]
+l1_records = [r for r in records if r['level'] == 'L1']
+l2_records = [r for r in records if r['level'] == 'L2']
+
+print(f"总记录数（订单维度）: {total_records}")
+print(f"  - L1 订单: {len(l1_records)}")
+print(f"  - L2 订单: {len(l2_records)}")
+print(f"  序章五节全完成: {len(all5_complete)}")
+print(f"  序章五节未全完成: {len(all5_incomplete)}")
+
+# ========== 购课时序章进度分布 ==========
+print(f"\n{'='*80}")
+print("【核心指标：购课时序章完成进度】")
+print(f"{'='*80}")
+
+for level in ['L1', 'L2', 'ALL']:
+    if level == 'ALL':
+        data = records
+    else:
+        data = [r for r in records if r['level'] == level]
+    
+    if not data:
+        continue
+    
+    print(f"\n[{level}] 购课时完成了序章第几节:")
+    progress_dist = Counter(r['progress_at_purchase'] for r in data)
+    for p in range(6):  # 0到5
+        cnt = progress_dist.get(p, 0)
+        pct = cnt / len(data) * 100
+        label = f"  完成{p}节(共5节)" if p > 0 else "  完成0节（未开始）"
+        bar = '█' * int(pct)
+        print(f"  {label:24s}: {cnt:4d} ({pct:5.1f}%) {bar}")
+    
+    # 具体到哪几节
+    print(f"\n[{level}] 购课时已完成哪些课时的分布:")
+    lesson_dist = Counter()
+    for r in data:
+        if r['progress_at_purchase'] == 0:
+            lesson_dist['(未完成任何节)'] += 1
+        else:
+            lesson_dist[r['lessons_at_purchase']] += 1
+    for lesson_combo, cnt in lesson_dist.most_common(15):
+        pct = cnt / len(data) * 100
+        label = f"  {lesson_combo}"
+        print(f"  {label:24s}: {cnt:4d} ({pct:5.1f}%)")
+
+# ========== 购课后完成的情况 ==========
+print(f"\n{'='*80}")
+print("【购课后继续完成序章的情况】")
+print(f"{'='*80}")
+
+for level in ['L1', 'L2', 'ALL']:
+    if level == 'ALL':
+        data = records
+    else:
+        data = [r for r in records if r['level'] == level]
+    
+    if not data:
+        continue
+    
+    print(f"\n[{level}]:")
+    print(f"  购课时序章已全部完成(5/5): {sum(1 for r in data if r['progress_at_purchase']==5)} ({sum(1 for r in data if r['progress_at_purchase']==5)/len(data)*100:.1f}%)")
+    print(f"  购课后续完成序章余下课时: {sum(1 for r in data if r['lessons_done_after_purchase']>0)} ({sum(1 for r in data if r['lessons_done_after_purchase']>0)/len(data)*100:.1f}%)")
+    
+    # 购课后完成了多少节
+    after_dist = Counter(r['lessons_done_after_purchase'] for r in data if r['progress_at_purchase'] < 5)
+    if after_dist:
+        print(f"  购课时未完成的用户，购课后又完成节数分布:")
+        for n in sorted(after_dist.keys()):
+            cnt = after_dist[n]
+            pct = cnt / sum(1 for r in data if r['progress_at_purchase'] < 5) * 100
+            print(f"    又完成{n}节: {cnt} ({pct:.1f}%)")
+
+# ========== 注册→购课天数 ==========
+def percentile_vals(vals):
+    if not vals:
+        return {}
+    s = sorted(vals)
+    return {
+        'P10': s[int(len(s)*0.10)],
+        'P25': s[int(len(s)*0.25)],
+        'P50': s[int(len(s)*0.50)],
+        'P75': s[int(len(s)*0.75)],
+        'P90': s[int(len(s)*0.90)],
+        'mean': sum(s)/len(s),
+        'count': len(s),
+    }
+
+print(f"\n{'='*80}")
+print("【注册→购课 天数统计】")
+print(f"{'='*80}")
+
+for level in ['L1', 'L2', 'ALL']:
+    if level == 'ALL':
+        data = records
+    else:
+        data = [r for r in records if r['level'] == level]
+    
+    stat = percentile_vals([r['reg_to_pur'] for r in data])
+    if stat:
+        print(f"\n[{level}] 注册→购课:")
+        print(f"  样本: {stat['count']}, 中位数: {stat['P50']}天, 均值: {stat['mean']:.1f}天")
+        print(f"  P10={stat['P10']}天 | P25={stat['P25']}天 | P75={stat['P75']}天 | P90={stat['P90']}天")
+
+# ========== 序章五节全完成的用户统计 ==========
+print(f"\n{'='*80}")
+print("【序章五节全完成的用户（限定已完成5/5的用户）】")
+print(f"{'='*80}")
+
+all5_l1 = [r for r in all5_complete if r['level']=='L1']
+all5_l2 = [r for r in all5_complete if r['level']=='L2']
+
+print(f"\n五节全完成用户: {len(all5_complete)}")
+print(f"  L1: {len(all5_l1)}")
+print(f"  L2: {len(all5_l2)}")
+
+# 注册→全完成
+for name, data in [('L1', all5_l1), ('L2', all5_l2), ('ALL', all5_complete)]:
+    vals = [r['reg_to_all5'] for r in data if r['reg_to_all5'] is not None]
+    stat = percentile_vals(vals)
+    if stat:
+        print(f"\n[{name}] 注册→序章五节全完成:")
+        print(f"  样本: {stat['count']}, 中位数: {stat['P50']}天, 均值: {stat['mean']:.1f}天")
+        print(f"  P10={stat['P10']}天 | P25={stat['P25']}天 | P75={stat['P75']}天 | P90={stat['P90']}天")
+
+# 全完成→购课
+for name, data in [('L1', all5_l1), ('L2', all5_l2), ('ALL', all5_complete)]:
+    vals = [r['all5_to_pur'] for r in data if r['all5_to_pur'] is not None]
+    stat = percentile_vals(vals)
+    if stat:
+        print(f"\n[{name}] 序章五节全完成→购课:")
+        print(f"  样本: {stat['count']}, 中位数: {stat['P50']}天, 均值: {stat['mean']:.1f}天")
+        print(f"  P10={stat['P10']}天 | P25={stat['P25']}天 | P75={stat['P75']}天 | P90={stat['P90']}天")
+
+# 全完成→购课时间段分布
+print(f"\n【五节全完成→购课 时间段分布】")
+print(f"{'='*60}")
+buckets = {
+    '负数（先购课后完课）': (-99999, -1),
+    '0天(当天)': (0, 0),
+    '1~3天': (1, 3),
+    '4~7天': (4, 7),
+    '8~14天': (8, 14),
+    '15~30天': (15, 30),
+    '31~60天': (31, 60),
+    '61~90天': (61, 90),
+    '90天以上': (91, 99999),
+}
+for name, data in [('L1', all5_l1), ('L2', all5_l2), ('ALL', all5_complete)]:
+    vals = [r['all5_to_pur'] for r in data if r['all5_to_pur'] is not None]
+    if not vals:
+        continue
+    print(f"\n[{name}]:")
+    for label, (lo, hi) in buckets.items():
+        cnt = sum(1 for v in vals if lo <= v <= hi)
+        pct = cnt / len(vals) * 100
+        bar = '█' * int(pct / 2)
+        print(f"  {label:24s}: {cnt:4d} ({pct:5.1f}%) {bar}")
+
+# ========== 时序关系 ==========
+print(f"\n{'='*80}")
+print("【五节全完成的用户：序章 vs 购课 时序关系】")
+print(f"{'='*80}")
+
+for name, data in [('L1', all5_l1), ('L2', all5_l2), ('ALL', all5_complete)]:
+    pro_before = sum(1 for r in data if r['all5_to_pur'] is not None and r['all5_to_pur'] >= 0)
+    pro_after = sum(1 for r in data if r['all5_to_pur'] is not None and r['all5_to_pur'] < 0)
+    pro_same = sum(1 for r in data if r['all5_to_pur'] == 0)
+    print(f"\n[{name}]:")
+    print(f"  序章全部完成在购课之前/当天: {pro_before} ({pro_before/len(data)*100:.1f}%)")
+    print(f"    其中当天: {pro_same} ({pro_same/len(data)*100:.1f}%)")
+    print(f"  序章全部完成在购课之后: {pro_after} ({pro_after/len(data)*100:.1f}%)")
+
+# ========== 导出CSV ==========
+output_path = '/root/.openclaw/workspace/output/endor_purchase_analysis_v2.csv'
+with open(output_path, 'w', newline='', encoding='utf-8-sig') as f:
+    writer = csv.writer(f)
+    writer.writerow([
+        'order_id', 'account_id', '等级', '注册日期', '购课日期',
+        '注册→购课(天)',
+        '序章完成节数(总计)', '购课前进度(节)', '购课后完成(节)',
+        '购课时已完成', '五节全完成', '五节全完成日期',
+        '注册→全完成(天)', '全完成→购课(天)'
+    ])
+    for r in sorted(records, key=lambda x: x['purchase_date']):
+        writer.writerow([
+            r['order_id'], r['account_id'], r['level'],
+            r['register_date'].isoformat() if r['register_date'] else '',
+            r['purchase_date'].isoformat() if r['purchase_date'] else '',
+            r['reg_to_pur'],
+            r['total_lessons_done'], r['progress_at_purchase'],
+            r['lessons_done_after_purchase'],
+            r['lessons_at_purchase'],
+            '是' if r['all_five_done'] else '否',
+            r['all_five_done_date'].isoformat() if r['all_five_done_date'] else '',
+            r['reg_to_all5'] if r['reg_to_all5'] is not None else '',
+            r['all5_to_pur'] if r['all5_to_pur'] is not None else '',
+        ])
+
+print(f"\n{'='*80}")
+print(f"详细数据已导出: {output_path}")
+print(f"{'='*80}")
+
+cur.close()
+conn.close()
+print("\n分析完成!")
--- a/scripts/endor_purchase_analysis_v3.py
+++ b/scripts/endor_purchase_analysis_v3.py
@ -0,0 +1,370 @@
+#!/usr/bin/env python3
+"""
+端内购买用户行为分析 V3 - 详细交叉统计
+"""
+import psycopg2
+from datetime import datetime, date
+from collections import defaultdict, Counter
+import csv
+import os
+
+PG_PASS = os.environ.get('PG_ONLINE_PASSWORD', "LdfjdjL83h3h3^$&**YGG*")
+conn = psycopg2.connect(
+    host="bj-postgres-16pob4sg.sql.tencentcdb.com", port=28591,
+    user="ai_member", password=PG_PASS, database="vala_bi"
+)
+cur = conn.cursor()
+
+L1_U0_LESSONS = {343: 'L01', 344: 'L02', 345: 'L03', 346: 'L04', 348: 'L05'}
+L2_U0_LESSONS = {55: 'L01', 56: 'L02', 57: 'L03', 58: 'L04', 59: 'L05'}
+LEVEL_MAP = {57:'L1',60:'L1',63:'L1', 31:'L2',32:'L2',33:'L2',54:'L2'}
+
+print("=" * 80)
+print("端内购买用户行为分析 V3 — 详细交叉统计")
+print("=" * 80)
+
+# ============ 查询数据 ============
+print("\n[数据加载]")
+
+cur.execute("""
+    SELECT o.id, o.account_id, o.goods_id, o.pay_success_date
+    FROM bi_vala_order o
+    JOIN bi_vala_app_account a ON o.account_id = a.id AND a.status = 1
+    WHERE o.key_from = 'app-active-h5-0-0' AND o.order_status IN (3, 4) AND o.pay_success_date IS NOT NULL
+    ORDER BY o.pay_success_date
+""")
+orders = []
+for row in cur.fetchall():
+    lv = LEVEL_MAP.get(row[2])
+    if not lv: continue
+    pd_ = row[3]; orders.append({'order_id':row[0],'account_id':row[1],'goods_id':row[2],'purchase_date':pd_.date() if isinstance(pd_,datetime) else pd_,'level':lv})
+print(f"  订单: {len(orders)}")
+
+all_accounts = list(set(o['account_id'] for o in orders))
+register_map = {}
+for i in range(0,len(all_accounts),500):
+    cur.execute("SELECT id,created_at FROM bi_vala_app_account WHERE id IN %s",(tuple(all_accounts[i:i+500]),))
+    for r in cur.fetchall():
+        d=r[1];register_map[r[0]]=d.date() if isinstance(d,(datetime,date)) else d
+print(f"  注册时间: {len(register_map)}")
+
+char_map = defaultdict(list)
+for i in range(0,len(all_accounts),500):
+    cur.execute("SELECT account_id,id FROM bi_vala_app_character WHERE account_id IN %s",(tuple(all_accounts[i:i+500]),))
+    for r in cur.fetchall(): char_map[r[0]].append(r[1])
+print(f"  角色映射: {sum(1 for k,v in char_map.items() if v)} 用户")
+
+all_user_ids = set()
+for acct in all_accounts:
+    for uid in char_map.get(acct,[]): all_user_ids.add(uid)
+
+def query_lesson_completion(user_ids, chapter_ids_dict):
+    result = defaultdict(dict)
+    if not user_ids: return result
+    cids = list(chapter_ids_dict.keys())
+    union = []
+    for t in range(8):
+        union.append(f"""
+            SELECT user_id, chapter_id, MIN(created_at) as fd
+            FROM bi_user_chapter_play_record_{t}
+            WHERE chapter_id IN ({','.join(map(str,cids))})
+              AND play_status=1 AND user_id IN %(uids)s
+            GROUP BY user_id, chapter_id
+        """)
+    sql = f"SELECT user_id, chapter_id, MIN(fd) FROM ({' UNION ALL '.join(union)}) sub GROUP BY user_id, chapter_id"
+    for i in range(0,len(user_ids),2000):
+        cur.execute(sql,{'uids':tuple(user_ids[i:i+2000])})
+        for r in cur.fetchall(): result[r[0]][r[1]]=r[2].date() if isinstance(r[2],(datetime,date)) else r[2]
+    return result
+
+l1_data = query_lesson_completion(list(all_user_ids), L1_U0_LESSONS)
+l2_data = query_lesson_completion(list(all_user_ids), L2_U0_LESSONS)
+print(f"  L1序章完成: {len(l1_data)} user_ids, L2序章完成: {len(l2_data)} user_ids")
+
+# ============ 逐订单分析 ============
+records = []
+for o in orders:
+    lesson_map = L1_U0_LESSONS if o['level']=='L1' else L2_U0_LESSONS
+    lesson_data = l1_data if o['level']=='L1' else l2_data
+    reg_d = register_map.get(o['account_id'])
+    if not reg_d: continue
+    pur_d = o['purchase_date']
+    
+    # 汇总所有角色的序章完成情况
+    merged = {}
+    for uid in char_map.get(o['account_id'],[]):
+        if uid not in lesson_data: continue
+        for cid, dt in lesson_data[uid].items():
+            if cid not in merged or dt < merged[cid]: merged[cid]=dt
+    
+    # 每节课的完成日期和距注册天数
+    lesson_dates = {}
+    for cid, dt in merged.items():
+        lesson_dates[lesson_map[cid]] = {'date': dt, 'days_from_reg': (dt - reg_d).days}
+    
+    # 购课前完成哪些
+    done_before = {ln: info for ln, info in lesson_dates.items() if info['date'] <= pur_d}
+    done_after = {ln: info for ln, info in lesson_dates.items() if info['date'] > pur_d}
+    
+    all5_done = len(lesson_dates) == 5
+    all5_date = max(info['date'] for info in lesson_dates.values()) if all5_done else None
+    all5_to_pur = (pur_d - all5_date).days if all5_date else None
+    reg_to_all5 = (all5_date - reg_d).days if all5_date else None
+    
+    # 购课时进度：完成了几节，已完成哪些节
+    progress = len(done_before)
+    lessons_done = sorted(done_before.keys(), key=lambda x: int(x[1:]))
+    
+    # 购课落在哪两节之间
+    # e.g. 完成了 L01 L02，则购课时间在 L02完成后 → "L02之后"
+    if progress == 0:
+        purchase_position = "L01之前"
+    elif progress == 5:
+        purchase_position = "L05之后"
+    else:
+        purchase_position = f"{lessons_done[-1]}之后(L{int(lessons_done[-1][1:])+1:02d}之前)"
+    
+    records.append({
+        'order_id': o['order_id'], 'account_id': o['account_id'],
+        'level': o['level'], 'register_date': reg_d, 'purchase_date': pur_d,
+        'reg_to_pur': (pur_d - reg_d).days,
+        'progress': progress, 'lessons_done': lessons_done,
+        'done_before': done_before, 'done_after': done_after,
+        'all5_done': all5_done, 'all5_date': all5_date,
+        'reg_to_all5': reg_to_all5, 'all5_to_pur': all5_to_pur,
+        'purchase_position': purchase_position,
+        'lesson_dates': lesson_dates,
+    })
+
+print(f"  分析记录: {len(records)}")
+l1_recs = [r for r in records if r['level']=='L1']
+l2_recs = [r for r in records if r['level']=='L2']
+all5_recs = [r for r in records if r['all5_done']]
+
+# ====================================================================
+# 交叉统计1: 购课时完成节数 × 注册→购课天数
+# ====================================================================
+print(f"\n{'='*80}")
+print("【交叉1】购课时完成节数 × 注册→购课天数")
+print(f"{'='*80}")
+
+for level, data in [('L1',l1_recs),('L2',l2_recs),('ALL',records)]:
+    print(f"\n[{level}]")
+    # 构建交叉表
+    day_ranges = [(0,1),(2,3),(4,7),(8,14),(15,30),(31,60),(61,999)]
+    print(f"  {'购课进度':<20}", end='')
+    for lo, hi in day_ranges:
+        label = f"{lo}天" if lo==hi else f"{lo}-{hi}天"
+        print(f"{label:>10}", end='')
+    print(f"  {'合计':>8}")
+    
+    for prog in range(6):
+        subset = [r for r in data if r['progress']==prog]
+        label = f"  完成{prog}节" if prog>0 else f"  完成0节"
+        print(f"{label:<20}", end='')
+        total = 0
+        for lo, hi in day_ranges:
+            cnt = sum(1 for r in subset if lo <= r['reg_to_pur'] <= hi)
+            print(f"{cnt:>10}", end='')
+            total += cnt
+        print(f"  {total:>8}")
+
+# ====================================================================
+# 交叉统计2: 购课时完成节数 × 完课→购课天数（仅五节全完成用户）
+# ====================================================================
+print(f"\n{'='*80}")
+print("【交叉2】购课时完成节数 × 全完成→购课天数（仅五节全完成用户）")
+print(f"{'='*80}")
+
+all5_day_ranges = [(-999,-1),(0,0),(1,3),(4,7),(8,14),(15,30),(31,60),(61,999)]
+
+for level, data in [('L1',[r for r in l1_recs if r['all5_done']]),
+                     ('L2',[r for r in l2_recs if r['all5_done']]),
+                     ('ALL',all5_recs)]:
+    print(f"\n[{level}]")
+    print(f"  {'购课进度':<20}", end='')
+    for lo, hi in all5_day_ranges:
+        if lo==-999: label="先购课后完课"
+        elif lo==hi: label=f"{lo}天"
+        else: label=f"{lo}-{hi}天"
+        print(f"{label:>14}", end='')
+    print(f"  {'合计':>8}")
+    
+    for prog in range(6):
+        subset = [r for r in data if r['progress']==prog]
+        label = f"  完成{prog}节" if prog>0 else f"  完成0节"
+        print(f"{label:<20}", end='')
+        total = 0
+        for lo, hi in all5_day_ranges:
+            cnt = sum(1 for r in subset if r['all5_to_pur'] is not None and lo <= r['all5_to_pur'] <= hi)
+            print(f"{cnt:>14}", end='')
+            total += cnt
+        print(f"  {total:>8}")
+
+# ====================================================================
+# 交叉统计3: 购课落在序章各节之间的分布
+# ====================================================================
+print(f"\n{'='*80}")
+print("【交叉3】购课时机：落在序章第几节之后")
+print(f"{'='*80}")
+
+for level, data in [('L1',l1_recs),('L2',l2_recs),('ALL',records)]:
+    dist = Counter(r['purchase_position'] for r in data)
+    print(f"\n[{level}]")
+    positions = ['L01之前','L01之后(L02之前)','L02之后(L03之前)','L03之后(L04之前)','L04之后(L05之前)','L05之后']
+    for pos in positions:
+        cnt = dist.get(pos, 0)
+        pct = cnt/len(data)*100
+        bar = '█' * int(pct)
+        print(f"  {pos:24s}: {cnt:4d} ({pct:5.1f}%) {bar}")
+
+# ====================================================================
+# 交叉统计4: 逐节课完成时间分布
+# ====================================================================
+print(f"\n{'='*80}")
+print("【交叉4】序章每节课：从注册到完成的天数分布")
+print(f"{'='*80}")
+
+lesson_days_ranges = [(0,0),(1,1),(2,3),(4,7),(8,14),(15,30),(31,999)]
+lesson_order = ['L01','L02','L03','L04','L05']
+
+for level, data in [('L1',l1_recs),('L2',l2_recs),('ALL',records)]:
+    print(f"\n[{level}] ({len(data)} 条记录)")
+    print(f"  {'课时':<8}", end='')
+    for lo, hi in lesson_days_ranges:
+        label = f"{lo}天" if lo==hi else f"{lo}-{hi}天"
+        print(f"{label:>10}", end='')
+    print(f"  {'完成人数':>10}  {'中位数':>8}  {'均值':>8}")
+    
+    for ln in lesson_order:
+        vals = []
+        for r in data:
+            if ln in r['lesson_dates']:
+                vals.append(r['lesson_dates'][ln]['days_from_reg'])
+        if not vals:
+            print(f"  {ln:<8} {'(无数据)':>60}")
+            continue
+        sv = sorted(vals)
+        med = sv[len(sv)//2]
+        avg = sum(vals)/len(vals)
+        print(f"  {ln:<8}", end='')
+        for lo, hi in lesson_days_ranges:
+            cnt = sum(1 for v in vals if lo <= v <= hi)
+            pct = cnt / len(vals) * 100
+            print(f"{cnt:>5}({pct:4.1f}%)", end='')
+        print(f"  {len(vals):>10}  {med:>8}天  {avg:>8.1f}天")
+
+# ====================================================================
+# 交叉统计5: 购课时完成节数 × L1/L2 × 是否有后续完成
+# ====================================================================
+print(f"\n{'='*80}")
+print("【交叉5】购课时未全部完成的用户：购课后完成余下序章的情况")
+print(f"{'='*80}")
+
+not_all5_at_purchase = [r for r in records if r['progress'] < 5]
+for level, data in [('L1',[r for r in l1_recs if r['progress']<5]),
+                     ('L2',[r for r in l2_recs if r['progress']<5]),
+                     ('ALL',not_all5_at_purchase)]:
+    if not data: continue
+    print(f"\n[{level}] 购课时未全部完成: {len(data)} 人")
+    for prog in range(5):
+        subset = [r for r in data if r['progress']==prog]
+        if not subset: continue
+        after_done_dist = Counter(r['all5_done'] for r in subset)
+        all5_cnt = after_done_dist.get(True, 0)
+        not5_cnt = after_done_dist.get(False, 0)
+        print(f"  购课时完成{prog}节: {len(subset)}人 → 最终完成为5节: {all5_cnt}人, 仍未完成: {not5_cnt}人")
+
+# ====================================================================
+# 交叉统计6: 先购课后完成序章的用户详细分析
+# ====================================================================
+print(f"\n{'='*80}")
+print("【交叉6】先购课后才完成序章的用户（16人）详细分析")
+print(f"{'='*80}")
+
+buy_before_finish = [r for r in all5_recs if r['all5_to_pur'] is not None and r['all5_to_pur'] < 0]
+print(f"\n共 {len(buy_before_finish)} 人，购课后才完成序章全部五节\n")
+print(f"  {'等级':<6} {'购课时进度':<14} {'购课日期':<14} {'全完成日期':<14} {'购课→完课':>10} {'注册→购课':>10} {'注册→全完':>10}")
+for r in sorted(buy_before_finish, key=lambda x: x['all5_to_pur']):
+    print(f"  {r['level']:<6} {'完成'+str(r['progress'])+'节':<14} {str(r['purchase_date']):<14} {str(r['all5_date']):<14} {abs(r['all5_to_pur']):>10}天 {r['reg_to_pur']:>10}天 {r['reg_to_all5']:>10}天")
+
+# ====================================================================
+# 交叉统计7: 逐月趋势 — 购课时完成节数分布
+# ====================================================================
+print(f"\n{'='*80}")
+print("【交叉7】逐月趋势：购课时完成 5/5 节的比例")
+print(f"{'='*80}")
+
+monthly = defaultdict(lambda: {'total':0,'all5':0,'partial':0})
+for r in records:
+    m = r['purchase_date'].strftime('%Y-%m')
+    monthly[m]['total'] += 1
+    if r['progress'] == 5:
+        monthly[m]['all5'] += 1
+    else:
+        monthly[m]['partial'] += 1
+
+for m in sorted(monthly.keys()):
+    d = monthly[m]
+    pct = d['all5']/d['total']*100
+    print(f"  {m}: 总{d['total']:>4}单 | 购课时已完成5节={d['all5']:>3}人({pct:5.1f}%) | 未完成={d['partial']:>2}人")
+
+# ====================================================================
+# 交叉统计8: 注册→购课 按购课时进度分层
+# ====================================================================
+print(f"\n{'='*80}")
+print("【交叉8】按购课时进度分层：各组注册→购课 天数统计")
+print(f"{'='*80}")
+
+def pct(vals, p):
+    if not vals: return '-'
+    s = sorted(vals)
+    return s[int(len(s)*p/100)]
+
+for level, data in [('L1',l1_recs),('L2',l2_recs),('ALL',records)]:
+    print(f"\n[{level}]")
+    print(f"  {'购课时进度':<16} {'人数':>6} {'P10':>6} {'P25':>6} {'中位':>6} {'P75':>6} {'P90':>6} {'均值':>8}")
+    for prog in range(6):
+        subset = [r for r in data if r['progress']==prog]
+        if not subset: continue
+        vals = [r['reg_to_pur'] for r in subset]
+        print(f"  {'完成'+str(prog)+'节':<16} {len(subset):>6} "
+              f"{pct(vals,10):>6} {pct(vals,25):>6} {pct(vals,50):>6} "
+              f"{pct(vals,75):>6} {pct(vals,90):>6} {sum(vals)/len(vals):>8.1f}")
+
+# ====================================================================
+# 导出CSV（带序章每节课完成明细）
+# ====================================================================
+output_path = '/root/.openclaw/workspace/output/endor_purchase_analysis_v3.csv'
+with open(output_path, 'w', newline='', encoding='utf-8-sig') as f:
+    writer = csv.writer(f)
+    writer.writerow([
+        'order_id','account_id','等级','注册日期','购课日期','注册→购课(天)',
+        '购课时序章进度(节)','购课时已完成哪些课',
+        'L01完成日期','L01注册后几天','L02完成日期','L02注册后几天',
+        'L03完成日期','L03注册后几天','L04完成日期','L04注册后几天',
+        'L05完成日期','L05注册后几天',
+        '五节全完成','五节全完成日期','注册→全完成(天)','全完成→购课(天)',
+        '购课时机'
+    ])
+    for r in sorted(records, key=lambda x: x['purchase_date']):
+        row = [r['order_id'],r['account_id'],r['level'],
+               r['register_date'].isoformat(),r['purchase_date'].isoformat(),r['reg_to_pur'],
+               r['progress'],','.join(r['lessons_done'])]
+        for ln in lesson_order:
+            info = r['lesson_dates'].get(ln, {})
+            row.append(info.get('date','').isoformat() if info.get('date') else '')
+            row.append(info.get('days_from_reg','') if info else '')
+        row.append('是' if r['all5_done'] else '否')
+        row.append(r['all5_date'].isoformat() if r['all5_date'] else '')
+        row.append(r['reg_to_all5'] if r['reg_to_all5'] is not None else '')
+        row.append(r['all5_to_pur'] if r['all5_to_pur'] is not None else '')
+        row.append(r['purchase_position'])
+        writer.writerow(row)
+
+print(f"\n{'='*80}")
+print(f"已导出: {output_path}")
+print(f"{'='*80}")
+cur.close()
+conn.close()
+print("分析完成!")