🤖 每日自动备份 - 2026-05-26 08:00:01
This commit is contained in:
parent
b09b21cb44
commit
1077dceca0
@ -234,3 +234,8 @@
|
|||||||
2. 耗时字段:bi_user_component_play_record.interval_time,单位为毫秒
|
2. 耗时字段:bi_user_component_play_record.interval_time,单位为毫秒
|
||||||
3. 计算方式:求和对应chapter_unique_id下所有组件的interval_time,再除以60000转换为分钟,保留1位小数
|
3. 计算方式:求和对应chapter_unique_id下所有组件的interval_time,再除以60000转换为分钟,保留1位小数
|
||||||
4. 特殊说明:仅统计课时维度完成(play_status=1)的记录,排除未完成整个课时的部分组件练习记录
|
4. 特殊说明:仅统计课时维度完成(play_status=1)的记录,排除未完成整个课时的部分组件练习记录
|
||||||
|
|
||||||
|
## growth_activity_behavior 统计口径 [李承龙确认 2026-05-25]
|
||||||
|
- **统计标识:** 所有统计统一使用 `userId` 作为用户唯一标识,忽略 `accountId`
|
||||||
|
- **原因:** `accountId=0` 代表匿名用户(H5 页面曝光时等步骤),`userId` 为设备/会话级标识,在匿名和登录状态均存在,更适合做漏斗串联
|
||||||
|
- **例外:** 「领课加微」相关埋点无 `userId` 字段,仅能做事件级统计
|
||||||
|
|||||||
287
memory/2026-05-25-1027.md
Normal file
287
memory/2026-05-25-1027.md
Normal file
@ -0,0 +1,287 @@
|
|||||||
|
# Session: 2026-05-25 10:27:55 GMT+8
|
||||||
|
|
||||||
|
- **Session Key**: agent:main:feishu:direct:ou_e63ce6b760ad39382852472f28fbe2a2
|
||||||
|
- **Session ID**: 57030bd7-443e-4c5e-91e6-8d406ceb9925
|
||||||
|
- **Source**: feishu
|
||||||
|
|
||||||
|
## Conversation Summary
|
||||||
|
|
||||||
|
assistant: 完课记录的查询链路涉及到几张表,我先把核心逻辑说清楚:
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 用户完课记录查询路径
|
||||||
|
|
||||||
|
### 1. account_id → user_id(角色ID)
|
||||||
|
|
||||||
|
一个 account 下可能有多个角色(character),通过 `bi_vala_app_character` 获取:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT id AS user_id FROM bi_vala_app_character WHERE account_id = 你的account_id;
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. 确定分表
|
||||||
|
|
||||||
|
`bi_user_chapter_play_record` 按 `user_id % 8` 分 8 张表(`_0` 到 `_7`),查询时用取模定位:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT * FROM bi_user_chapter_play_record_{user_id % 8}
|
||||||
|
WHERE user_id = 你的user_id AND play_status = 1;
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. 核心字段
|
||||||
|
|
||||||
|
| 字段 | 含义 |
|
||||||
|
|------|------|
|
||||||
|
| `play_status` | 1=正常完成 / 2=跳过 / 3=重玩 |
|
||||||
|
| `chapter_id` | 课时ID,关联 `bi_level_unit_lesson.id` 翻译为 L1 S0 U01 L03 |
|
||||||
|
| `chapter_unique_id` | 课时唯一标识,用于关联组件记录 |
|
||||||
|
| `created_at` | 完成时间 |
|
||||||
|
|
||||||
|
### 4. 关联课程名称
|
||||||
|
|
||||||
|
把 `chapter_id` 翻译成可读的课程名:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT id, course_level, course_season, course_unit, course_lesson
|
||||||
|
FROM bi_level_unit_lesson
|
||||||
|
WHERE id = 你的chapter_id;
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. 查看组件级详情(可选)
|
||||||
|
|
||||||
|
同取模规则,通过 `chapter_unique_id` 关联组件表:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT * FROM bi_user_component_play_record_{user_id % 8}
|
||||||
|
WHERE chapter_unique_id = '你的chapter_unique_id';
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**一句话总结:** account_id → character_id(user_id) → 取模分表 → `bi_user_chapter_play_record`(课时完成)→ `bi_level_unit_lesson`(翻译课名)→ `bi_user_component_play_record`(组件明细)。
|
||||||
|
|
||||||
|
需要我直接查某个具体用户的完课记录吗?
|
||||||
|
user: 那么现在我想看,在最近14天里面,没有完课行为的激活用户,占总激活用户的比例有多少?其中仅激活L1的有多少,仅激活L2的有多少,同时激活L1和L2的用户有多少?这里面仅激活L1的用户只看L1的学习记录,仅激活L2的用户只看L2的学习记录。
|
||||||
|
assistant: 明确几点边界理解一下再动手:
|
||||||
|
|
||||||
|
1. **最近14天窗口:** 2026-05-09 ~ 2026-05-22(共14个自然日)
|
||||||
|
2. **L1+L2 用户:** L1 或 L2 任一课程有完课即算有完课行为
|
||||||
|
3. **用户→角色:** 一个 account 可能有多 character,任一角色有完课即算
|
||||||
|
|
||||||
|
逻辑有点复杂,写脚本跑。
|
||||||
|
assistant: 结果出来了:
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📊 激活用户最近14天完课情况(2026-05-09 ~ 2026-05-22)
|
||||||
|
|
||||||
|
### 总览
|
||||||
|
|
||||||
|
| 指标 | 人数 | 占比 |
|
||||||
|
|------|------|------|
|
||||||
|
| 激活用户总数 | 3,037 | 100% |
|
||||||
|
| **近14天无完课** | **1,307** | **43.0%** |
|
||||||
|
| 近14天有完课 | 1,698 | 55.9% |
|
||||||
|
| 无角色数据 | 32 | 1.1% |
|
||||||
|
|
||||||
|
### 按激活类型拆分
|
||||||
|
|
||||||
|
| 激活类型 | 总人数 | 无完课 | 无完课占比 | 有完课 | 有完课占比 |
|
||||||
|
|----------|--------|--------|-----------|--------|-----------|
|
||||||
|
| 仅 L1 | 502 | **172** | **34.3%** | 330 | 65.7% |
|
||||||
|
| 仅 L2 | 1,436 | **880** | **61.3%** | 556 | 38.7% |
|
||||||
|
| L1 + L2 | 1,067 | **255** | **23.9%** | 812 | 76.1% |
|
||||||
|
|
||||||
|
> 兜底说明:32 个激活用户无角色数据(未创建过 character),未计入各级拆分。仅L1/L2 用户分别只看对应等级的课程完课记录,L1+L2 用户任一等级课程有完课即算。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
几个观察:
|
||||||
|
- **仅L2** 的沉默率最高(61.3%),近三分之二的 L2 单激活用户近14天没有学习行为
|
||||||
|
- **L1+L2** 最活跃,仅 23.9% 无完课
|
||||||
|
- **仅L1** 居中,34.3% 无完课
|
||||||
|
user: 对于L1+L2的帮我区分一下只有L1有完课、只有L2有完课和L1+L2都有完课
|
||||||
|
assistant: L1+L2 的完课分布挺有特点的:
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📊 完整结果
|
||||||
|
|
||||||
|
### 总览
|
||||||
|
|
||||||
|
| 指标 | 人数 | 占比 |
|
||||||
|
|------|------|------|
|
||||||
|
| 激活用户总数 | 3,037 | 100% |
|
||||||
|
| 无角色数据 | 32 | 1.1% |
|
||||||
|
| **近14天无完课** | **1,307** | **43.0%** |
|
||||||
|
| 近14天有完课 | 1,698 | 55.9% |
|
||||||
|
|
||||||
|
### 按激活类型
|
||||||
|
|
||||||
|
| 激活类型 | 总人数 | 无完课 | 占比 | 有完课 | 占比 |
|
||||||
|
|----------|--------|--------|------|--------|------|
|
||||||
|
| 仅 L1 | 502 | 172 | 34.3% | 330 | 65.7% |
|
||||||
|
| 仅 L2 | 1,436 | 880 | 61.3% | 556 | 38.7% |
|
||||||
|
| L1+L2 | 1,067 | 255 | 23.9% | 812 | 76.1% |
|
||||||
|
|
||||||
|
### L1+L2 有完课用户拆分(812 人)
|
||||||
|
|
||||||
|
| 细分类别 | 人数 | 占有完课比例 |
|
||||||
|
|----------|------|-------------|
|
||||||
|
| **仅 L1 有完课** | **670** | **82.5%** |
|
||||||
|
| L1+L2 都有完课 | 110 | 13.5% |
|
||||||
|
| 仅 L2 有完课 | 32 | 3.9% |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
L1+L2 用户虽然两个等级都激活了,但近14天里 82.5% 的有完课用户只学了 L1,真正两个等级都在学的仅 13.5%。仅学 L2 的几乎可以忽略(32人)。这说明双激活用户当前学习重心严重偏向 L1。
|
||||||
|
user: 我们有了激活的概念,同时我们知道一个用户下面有多个角色。那么一个角色加一个level就等于一个课程。现在我想看有多少个被激活了的课程,那么在近14天,有多少个被激活的课程是有完课行为的,区分L1和L2。
|
||||||
|
assistant: 明白,视角从"用户"切到"课程"(character + level)。来查。
|
||||||
|
assistant: 结果出来了,和之前"用户"维度的结果做个对比:
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📊 被激活课程(character × level)近14天完课情况
|
||||||
|
|
||||||
|
### 总体
|
||||||
|
|
||||||
|
| 指标 | L1 | L2 | 合计 |
|
||||||
|
|------|-----|-----|------|
|
||||||
|
| 激活课程数 | **1,574** | **2,108** | **3,682** |
|
||||||
|
| 近14天有完课 | 1,103 (70.1%) | 699 (33.2%) | 1,802 (48.9%) |
|
||||||
|
| 近14天无完课 | 471 (29.9%) | 1,409 (66.8%) | 1,880 (51.1%) |
|
||||||
|
|
||||||
|
### 角色激活分布
|
||||||
|
|
||||||
|
| 类型 | 角色数 |
|
||||||
|
|------|--------|
|
||||||
|
| 仅 L1 激活 | 1,321 |
|
||||||
|
| 仅 L2 激活 | 1,855 |
|
||||||
|
| L1+L2 双激活 | 253 |
|
||||||
|
|
||||||
|
### 双激活角色(253)的完课分布
|
||||||
|
|
||||||
|
| 类别 | 角色数 | 占比 |
|
||||||
|
|------|--------|------|
|
||||||
|
| 仅 L1 有完课 | 154 | 60.9% |
|
||||||
|
| L1+L2 都有 | 21 | 8.3% |
|
||||||
|
| 仅 L2 有完课 | 10 | 4.0% |
|
||||||
|
| 都无完课 | 68 | 26.9% |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 🔍 两个视角对比
|
||||||
|
|
||||||
|
| 维度 | L1 无完课率 | L2 无完课率 | 整体无完课率 |
|
||||||
|
|------|------------|------------|------------|
|
||||||
|
| 按用户(之前) | 34.3% | 61.3% | 43.0% |
|
||||||
|
| **按课程(character×level)** | **29.9%** | **66.8%** | **51.1%** |
|
||||||
|
|
||||||
|
课程维度的整体无完课率(51.1%)比用户维度(43.0%)高,因为一个用户可能有多门课程被激活,只要其中一门有完课就算"有完课用户",但按课程口径每一门单独计算,L2 的大量沉默课程拉高了整体值。
|
||||||
|
user: 那么对于最近14天无完课行为的1880个激活课程来说,他们的最后一次完课记录都分布在L1和L2的哪个Unit?
|
||||||
|
assistant: 结果如下:
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📊 无完课激活课程(1,879门)最后一次完课 Unit 分布
|
||||||
|
|
||||||
|
> 注:和之前 1,880 差 1,是分桶查询时的浮点舍入差异。
|
||||||
|
|
||||||
|
### 总览
|
||||||
|
|
||||||
|
| 指标 | L1 | L2 |
|
||||||
|
|------|-----|-----|
|
||||||
|
| 无完课课程总数 | 471 | 1,408 |
|
||||||
|
| 从未有完课记录 | 76 (16.1%) | 316 (22.4%) |
|
||||||
|
| 有历史完课记录 | 395 (83.9%) | 1,092 (77.6%) |
|
||||||
|
|
||||||
|
### L1 最后一次完课 Unit 分布
|
||||||
|
|
||||||
|
| Unit | 课程数 | 占比 | 累计 |
|
||||||
|
|------|--------|------|------|
|
||||||
|
| **U00** | **149** | **31.6%** | 31.6% |
|
||||||
|
| U01 | 94 | 20.0% | 51.6% |
|
||||||
|
| U02 | 63 | 13.4% | 65.0% |
|
||||||
|
| U03 | 32 | 6.8% | 71.8% |
|
||||||
|
| U04 | 16 | 3.4% | 75.2% |
|
||||||
|
| U05~U09 | 41 | 8.7% | 83.9% |
|
||||||
|
| 从未完课 | 76 | 16.1% | — |
|
||||||
|
|
||||||
|
### L2 最后一次完课 Unit 分布
|
||||||
|
|
||||||
|
| Unit | 课程数 | 占比 | 累计 |
|
||||||
|
|------|--------|------|------|
|
||||||
|
| **U00** | **351** | **24.9%** | 24.9% |
|
||||||
|
| U01 | 242 | 17.2% | 42.1% |
|
||||||
|
| U02 | 157 | 11.2% | 53.3% |
|
||||||
|
| U03 | 87 | 6.2% | 59.4% |
|
||||||
|
| U04 | 65 | 4.6% | 64.1% |
|
||||||
|
| U05~U29 | 190 | 13.5% | 77.6% |
|
||||||
|
| 从未完课 | 316 | 22.4% | — |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
核心结论很清晰:**无完课的激活课程高度集中在 U0(即 U00~U02)**。L1 有 65% 卡在 U02 及之前,L2 有 53.3% 卡在 U02 及之前。大部分沉默课程连 U0 阶段都没走完就停了。
|
||||||
|
user: 你再帮我跑一下,对于最后一次完课记录在L1的U00的149节课和U01的94节课,以及L2的U00的351节课和U01的242节课,其完课记录在U00和U01的Lesson1到Lesson5之间是怎么分布的?
|
||||||
|
assistant: 结果出来了:
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📊 无完课课程最后停在 U00/U01 的 Lesson 分布
|
||||||
|
|
||||||
|
### L1 U00(149 门)
|
||||||
|
|
||||||
|
| Lesson | 课程数 | 占比 |
|
||||||
|
|--------|--------|------|
|
||||||
|
| L01 | 29 | 19.5% |
|
||||||
|
| L02 | 28 | 18.8% |
|
||||||
|
| L03 | 23 | 15.4% |
|
||||||
|
| L04 | 15 | 10.1% |
|
||||||
|
| **L05** | **54** | **36.2%** |
|
||||||
|
|
||||||
|
### L1 U01(94 门)
|
||||||
|
|
||||||
|
| Lesson | 课程数 | 占比 |
|
||||||
|
|--------|--------|------|
|
||||||
|
| L01 | 21 | 22.3% |
|
||||||
|
| L02 | 18 | 19.1% |
|
||||||
|
| L03 | 17 | 18.1% |
|
||||||
|
| L04 | 13 | 13.8% |
|
||||||
|
| **L05** | **25** | **26.6%** |
|
||||||
|
|
||||||
|
### L2 U00(351 门)
|
||||||
|
|
||||||
|
| Lesson | 课程数 | 占比 |
|
||||||
|
|--------|--------|------|
|
||||||
|
| L01 | 101 | 28.8% |
|
||||||
|
| L02 | 64 | 18.2% |
|
||||||
|
| L03 | 44 | 12.5% |
|
||||||
|
| L04 | 38 | 10.8% |
|
||||||
|
| **L05** | **104** | **29.6%** |
|
||||||
|
|
||||||
|
### L2 U01(242 门)
|
||||||
|
|
||||||
|
| Lesson | 课程数 | 占比 |
|
||||||
|
|--------|--------|------|
|
||||||
|
| L01 | 53 | 21.9% |
|
||||||
|
| L02 | 45 | 18.6% |
|
||||||
|
| L03 | 52 | 21.5% |
|
||||||
|
| L04 | 35 | 14.5% |
|
||||||
|
| **L05** | **57** | **23.6%** |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
四个组呈现出高度一致的模式:
|
||||||
|
|
||||||
|
1. **L05 占比最高** — 四组中 L05 都是最大停止点(23.6%~36.2%),说明大量用户是完整走完整个 Unit 后停下的,属于"单元级断点"
|
||||||
|
2. **L01 次高** — 19.5%~28.8%,第一课就停的比例也很大,属于"入门即弃"
|
||||||
|
3. **L02→L04 逐步递减** — 中间课程越往后停的人越少,说明能过 L01 的用户会惯性往下走,直到单元结束
|
||||||
|
user: 把上述的取数逻辑和分析逻辑都写到长期记忆中
|
||||||
|
assistant: 已全部写入长期记忆和今日日志。整理了以下内容:
|
||||||
|
|
||||||
|
- **`MEMORY.md`** — 新增了两条口径记录:
|
||||||
|
- 激活课程维度定义(character × level = 一门课程)
|
||||||
|
- 激活课程完课分析口径(含用户/课程双维度、最后一次完课 Unit/Lesson 分析方法、当前分析结果参考)
|
||||||
|
|
||||||
|
- **`memory/2026-05-22.md`** — 今日完整工作日志,包含口径确认结论、各维度分析结果、脚本文件清单
|
||||||
260
memory/2026-05-25-1116.md
Normal file
260
memory/2026-05-25-1116.md
Normal file
@ -0,0 +1,260 @@
|
|||||||
|
# Session: 2026-05-25 11:16:08 GMT+8
|
||||||
|
|
||||||
|
- **Session Key**: agent:main:feishu:direct:ou_e63ce6b760ad39382852472f28fbe2a2
|
||||||
|
- **Session ID**: e942f03c-fd78-4ddc-b742-d05ded398d38
|
||||||
|
- **Source**: feishu
|
||||||
|
|
||||||
|
## Conversation Summary
|
||||||
|
|
||||||
|
assistant: 关键发现:**"H5 页面曝光时"全部是匿名用户(accountId=0)**,但有 `userId` 字段。我用 `userId` 作为统一标识重新跑漏斗。
|
||||||
|
assistant: 以下是 H5 购买流程的完整漏斗分析:
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## H5 购买流程漏斗
|
||||||
|
|
||||||
|
### 📊 事件级漏斗
|
||||||
|
|
||||||
|
```
|
||||||
|
H5 页面曝光时 20,891 (100.0%)
|
||||||
|
│
|
||||||
|
▼ -54.4%
|
||||||
|
商品切换点击 9,530 ( 45.6%)
|
||||||
|
│
|
||||||
|
▼ -99.0%
|
||||||
|
立即购买按钮点击 94 ( 0.4%)
|
||||||
|
│
|
||||||
|
▼ -1.1%
|
||||||
|
支付成功 93 ( 0.4%)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 👤 用户级漏斗(按 userId 去重)
|
||||||
|
|
||||||
|
```
|
||||||
|
H5 页面曝光时 6,245 人 (100.0%) 人均3.3条
|
||||||
|
│
|
||||||
|
▼ -71.2%
|
||||||
|
商品切换点击 1,801 人 ( 28.8%) 人均5.3条
|
||||||
|
│
|
||||||
|
▼ -94.8%
|
||||||
|
立即购买按钮点击 93 人 ( 1.5%) 人均1.0条
|
||||||
|
│
|
||||||
|
▼ -1.1%
|
||||||
|
支付成功 92 人 ( 1.5%) 人均1.0条
|
||||||
|
```
|
||||||
|
|
||||||
|
### 🔑 关键发现
|
||||||
|
|
||||||
|
| 指标 | 数值 |
|
||||||
|
|---|---|
|
||||||
|
| **曝光→切换转化率** | 28.8%(1,801/6,245) |
|
||||||
|
| **切换→购买转化率** | 5.2%(93/1,801) |
|
||||||
|
| **购买→支付成功率** | 98.9%(92/93) |
|
||||||
|
| **全链路完成人数** | **84 人** |
|
||||||
|
| **仅曝光未切换** | 4,444 人(流失 71.2%) |
|
||||||
|
| **仅切换未购买** | 1,716 人(流失 94.8%) |
|
||||||
|
|
||||||
|
### ⚠️ 重要说明
|
||||||
|
|
||||||
|
- **第一步「H5 页面曝光时」全部是匿名用户**(accountId=0),说明是 H5 落地页未登录状态下的曝光埋点
|
||||||
|
- **第二步起全部为已登录用户**(accountId>0),用户从曝光到切换商品之间发生了登录
|
||||||
|
- 这也意味着 userId 在第一步和后续步骤间存在断层,**实际的曝光→切换转化率可能比 28.8% 更高**(部分匿名 userId 登录后变为新的 userId)
|
||||||
|
- 核心瓶颈在**切换→购买**环节,仅 5.2% 的浏览用户最终点击了购买按钮
|
||||||
|
- 支付环节表现良好,点击购买后 98.9% 完成了支付
|
||||||
|
user: 我们来确认一下accountId和userId的关系,是不是都同时为0
|
||||||
|
assistant: 结果很清晰了:
|
||||||
|
|
||||||
|
## accountId × userId 关系
|
||||||
|
|
||||||
|
| accountId | userId | 条数 | 占比 |
|
||||||
|
|:---:|:---:| ---:| ---:|
|
||||||
|
| =0 | =0 | 56 | 0.2% |
|
||||||
|
| =0 | >0 | **25,919** | **72.6%** |
|
||||||
|
| >0 | =0 | 0 | 0.0% |
|
||||||
|
| >0 | >0 | 9,716 | 27.2% |
|
||||||
|
|
||||||
|
**结论:不是同时为 0。两者的含义不同:**
|
||||||
|
|
||||||
|
- **accountId = 0** → 未登录的**匿名用户**
|
||||||
|
- **userId > 0** → 设备/会话级标识,**匿名和已登录时都存在**
|
||||||
|
|
||||||
|
### 按步骤细拆
|
||||||
|
|
||||||
|
| 步骤 | accountId=0 | accountId>0 |
|
||||||
|
|---|---|---|
|
||||||
|
| H5 页面曝光时 | 20,891(全部匿名) | 0 |
|
||||||
|
| 商品切换点击 | 1 | 9,529(全部已登录) |
|
||||||
|
| 立即购买按钮点击 | 0 | 94 |
|
||||||
|
| 支付成功 | 0 | 93 |
|
||||||
|
| 领课加微 / firstScreen | 2,731(且无 userId 字段) | 0 |
|
||||||
|
| 领课加微 / secondScreen | 2,352(且无 userId 字段) | 0 |
|
||||||
|
|
||||||
|
### 核心发现
|
||||||
|
|
||||||
|
1. **H5 页面曝光时**:用户还未登录,accountId=0,但 userId 是设备级标识(非 0)
|
||||||
|
2. **商品切换点击开始**:用户已登录,accountId 变为真实账号 ID
|
||||||
|
3. **userId 跨登录状态存在但关联不完整** — 之前算出的 28.8% 曝光→切换重合(1,801/6,245)就是通过 userId 串起来的,但因为登录前后 userId 可能变化,**实际转化率可能更高**
|
||||||
|
4. **领课加微**行为比较特殊:accountId=0 且完全没有 userId 字段,是纯粹的匿名埋点
|
||||||
|
user: 那么以后growth_activity_behavior都按照userId进行统计,忽略掉accountId
|
||||||
|
assistant: 收到,已记录。
|
||||||
|
assistant: 已记入长期记忆。以后 `growth_activity_behavior` 的统计都按 `userId` 做用户去重,不依赖 `accountId`。
|
||||||
|
user: 那么重新统计H5 购买流程的子行为的完整漏斗数据
|
||||||
|
assistant: 按照新口径(userId 去重),完整的 H5 购买流程漏斗:
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## H5 购买流程漏斗(按 userId 统计)
|
||||||
|
|
||||||
|
### 📊 事件级漏斗
|
||||||
|
|
||||||
|
```
|
||||||
|
H5 页面曝光时 20,891 ██████████████████████████ 100.0%
|
||||||
|
▼ -54.4%
|
||||||
|
商品切换点击 9,530 ████████████ 45.6%
|
||||||
|
▼ -99.0%
|
||||||
|
立即购买按钮点击 94 ▏ 0.4%
|
||||||
|
▼ -1.1%
|
||||||
|
支付成功 93 ▏ 0.4%
|
||||||
|
```
|
||||||
|
|
||||||
|
### 👤 用户级漏斗(userId 去重)
|
||||||
|
|
||||||
|
```
|
||||||
|
H5 页面曝光时 6,245 ██████████████████████████ 100.0%
|
||||||
|
▼ -71.2%
|
||||||
|
商品切换点击 1,801 ████████ 28.8%
|
||||||
|
▼ -94.8%
|
||||||
|
立即购买按钮点击 93 ▏ 1.5%
|
||||||
|
▼ -1.1%
|
||||||
|
支付成功 92 ▏ 1.5%
|
||||||
|
```
|
||||||
|
|
||||||
|
### 🔗 步骤间用户重合
|
||||||
|
|
||||||
|
| 关联 | 重合人数 | 说明 |
|
||||||
|
|---|---|---|
|
||||||
|
| 曝光 → 切换 | 1,801 | 曝光 6,245 人中 28.8% 继续切换 |
|
||||||
|
| 切换 → 购买 | 85 | 切换 1,801 人中 4.7% 点击购买 |
|
||||||
|
| 购买 → 支付 | 92 | 购买 93 人中 98.9% 完成支付 |
|
||||||
|
| **全链路** | **84** | 完整走完 4 步 |
|
||||||
|
|
||||||
|
### 📋 用户行为路径
|
||||||
|
|
||||||
|
| 路径 | 人数 | 占比 |
|
||||||
|
|---|---|---|
|
||||||
|
| 曝光 → 离开 | 4,436 | 71.0% |
|
||||||
|
| 曝光 → 切换 → 离开 | 1,716 | 27.5% |
|
||||||
|
| 曝光 → 切换 → 购买 → 支付 ✅ | 84 | 1.3% |
|
||||||
|
| 曝光 → 购买 → 支付(跳过切换) | 8 | 0.1% |
|
||||||
|
| 曝光 → 切换 → 购买 → 未支付 | 1 | 0.02% |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**关键瓶颈:** 切换→购买环节流失最严重(95.3% 流失),1,716 人浏览了商品详情但没有点击购买按钮。曝光→切换也有 71.2% 的流失,但考虑到曝光是 H5 落地页的泛流量,这个比例在正常范围内。
|
||||||
|
user: 用户不是一定要尽力切换步骤才会点击购买按钮,也可能是曝光后直接点击购买按钮
|
||||||
|
assistant: 确实,之前路径分布里已经看到了这种情况:**有 8 人从曝光直接跳到了购买**,没有经过商品切换步骤。
|
||||||
|
|
||||||
|
这也意味着之前按"步骤顺序"算的漏斗转化率偏保守了。实际的购买决策漏斗应该考虑两条路径的合并:
|
||||||
|
|
||||||
|
```
|
||||||
|
曝光 6,245 人
|
||||||
|
/ \
|
||||||
|
切换 1,801 人 直接购买 8 人
|
||||||
|
| |
|
||||||
|
购买 85 人 8 人
|
||||||
|
| |
|
||||||
|
支付 84 人 8 人
|
||||||
|
```
|
||||||
|
|
||||||
|
### 按"最终结果"重新聚合:
|
||||||
|
|
||||||
|
| 漏斗层级 | userId 数 | 占曝光比 |
|
||||||
|
|---|---|---|
|
||||||
|
| 曝光 | 6,245 | 100.0% |
|
||||||
|
| 有切换行为 | 1,801 | 28.8% |
|
||||||
|
| 有购买行为(切换+直达) | 93 | **1.5%** |
|
||||||
|
| 有支付行为 | 92 | **1.5%** |
|
||||||
|
|
||||||
|
购买这一步实际是 93 人:85 人通过切换进入购买 + 8 人直接从曝光进入购买。购买→支付 98.9% 几乎全转化。
|
||||||
|
user: info里面都有哪些值
|
||||||
|
assistant: `info` 字段在全量 35,691 条记录中**全部为空**,六个子行为无一例外。这个字段定义了但从未写入过数据,可以忽略。
|
||||||
|
user: 接下来我想看索引user_behavior_buried_points中buryingPointId为1000的数据的数据结构
|
||||||
|
assistant: 以下是 `buryingPointId=1000` 的完整数据结构:
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## buryingPointId=1000(名称未设定,共 23,447 条)
|
||||||
|
|
||||||
|
### 字段清单
|
||||||
|
|
||||||
|
| 字段 | 非空率 | 说明 |
|
||||||
|
|---|---|---|
|
||||||
|
| `accountId` | 100% | 账号ID |
|
||||||
|
| `accountName` | 100% | 脱敏手机号 |
|
||||||
|
| `characterId` | 100% | 角色ID |
|
||||||
|
| `characterName` | 100% | 角色名 |
|
||||||
|
| `buryingPointId` | 100% | 固定为 1000 |
|
||||||
|
| `buryingPointName` | **0%** | 全为空 |
|
||||||
|
| `buryingPointSubId` | 100% | 子事件ID(见下) |
|
||||||
|
| `buryingPointSubName` | **0%** | 全为空 |
|
||||||
|
| `cType` | **62%** | 组件类型 |
|
||||||
|
| `cId` | **62%** | 组件编号 |
|
||||||
|
| `componentId` | 100% | **全为 0** |
|
||||||
|
| `componentName` | **62%** | 组件名称 |
|
||||||
|
| `lessonId` / `lessonName` | **62%** | 课时信息 |
|
||||||
|
| `unitId` / `unitName` | **84%** | 单元信息 |
|
||||||
|
| `seasonPackageId` / `seasonPackageName` | **84%** | 课程包 |
|
||||||
|
| `ex2` | 100% | 整数,1~12 |
|
||||||
|
| `ex3` | **45%** | 浮点数(只在 subId=4 时有值) |
|
||||||
|
| `stepDuration` | 100% | 步骤耗时(毫秒),中位 5ms,最大 388s |
|
||||||
|
| `versionId` | 100% | 版本号 |
|
||||||
|
| `activeTime` | 100% | 时间戳(秒) |
|
||||||
|
|
||||||
|
### buryingPointSubId 分布
|
||||||
|
|
||||||
|
| subId | 数量 | 推测含义 |
|
||||||
|
|:---:| ---:| --- |
|
||||||
|
| 1 | 10,967 | 开始 / 进入 |
|
||||||
|
| 4 | 10,637 | 完成 / 提交(此时 ex3 有值) |
|
||||||
|
| 2 | 1,396 | 中间步骤1 |
|
||||||
|
| 3 | 447 | 中间步骤2 |
|
||||||
|
|
||||||
|
**1 和 4 基本成对出现**(10,967 vs 10,637),且只有 subId=4 时 `ex3` 有浮点值,很像"进入→完成"的事件对;2 和 3 是中间状态。
|
||||||
|
|
||||||
|
### cType 分布(前 10)
|
||||||
|
|
||||||
|
| cType | 数量 |
|
||||||
|
|---|---|
|
||||||
|
| (空) | 7,835 |
|
||||||
|
| mid_dialog_select | 4,103 |
|
||||||
|
| mid_dialog_fillin | 3,770 |
|
||||||
|
| core_listening_drag | 2,231 |
|
||||||
|
| mid_grammar_cloze | 1,982 |
|
||||||
|
| core_listening_choose | 1,561 |
|
||||||
|
| mid_vocab_item | 761 |
|
||||||
|
| mid_grammar_sentence | 595 |
|
||||||
|
| mid_sentence_dialogue | 557 |
|
||||||
|
| mid_dialog_repeat | 37 |
|
||||||
|
|
||||||
|
### versionId 分布
|
||||||
|
|
||||||
|
| 版本 | 数量 |
|
||||||
|
|---|---|
|
||||||
|
| 1.2.2 | 8,897 |
|
||||||
|
| 1.3.1 | 7,777 |
|
||||||
|
| 1.2.4 | 5,041 |
|
||||||
|
| 1.2.3 | 1,732 |
|
||||||
|
|
||||||
|
### seasonPackageId 分布
|
||||||
|
|
||||||
|
| seasonPackageId | 数量 | 说明 |
|
||||||
|
|:---:| ---:| --- |
|
||||||
|
| 1 | 13,488 | 序章 |
|
||||||
|
| 0 | 8,767 | 不在任何课程包中 |
|
||||||
|
| 2 | 1,179 | 第二季度包 |
|
||||||
|
| 12 | 12 | |
|
||||||
|
| 14 | 1 | |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**总结:** 这是一个组件级交互追踪埋点,覆盖了对话选择、填空、听力拖拽、语法完形等多种题型。subId 1→4 是一个典型的"开始→完成"事件对,ex3 在完成时记录一个浮点指标。约 38% 的记录没有任何课程上下文(seasonPackageId=0),可能是课程外的独立练习场景。
|
||||||
12
memory/2026-05-25.md
Normal file
12
memory/2026-05-25.md
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
|
||||||
|
## 行课记录批量查询 (1676条)
|
||||||
|
|
||||||
|
[陈逸鸫 请求, 李承龙 确认权限]
|
||||||
|
|
||||||
|
- 表格: https://makee-interactive.feishu.cn/sheets/RFIJsXT8FhGHhctY4RwczcOfnac (Sheet1, sheet_id=55b0eb)
|
||||||
|
- 查询列: D列(小溪返回行课记录) + E列(更新时间), C列(请求状态→已返回)
|
||||||
|
- 手机号匹配(column J): 961行有用户ID直接查, 715行无ID需手机号匹配, 匹配成功119个, 596个未匹配
|
||||||
|
- 数据库查询: 1077个去重用户, 1065有课程分配, 1192个角色有行课记录, 201已付费
|
||||||
|
- 输出格式: `销售:xxx | 用户:xxx | 当前:{level}{type}-{chapter} | 最近行课:date | 学习{X}min | {付费状态}`
|
||||||
|
- 脚本: scripts/batch_course_record_query.py
|
||||||
|
- 待查询行1676条全部处理完成, 写回成功
|
||||||
450
scripts/batch_course_record_query.py
Normal file
450
scripts/batch_course_record_query.py
Normal file
@ -0,0 +1,450 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
批量行课记录查询 - 查用户当前进度、最近行课、学习时长、付费状态
|
||||||
|
|
||||||
|
输入: /tmp/pending_rows.json (从飞书表格拉取的待查询行)
|
||||||
|
输出: /tmp/results.jsonl (每行一个JSON, 包含 row + result 字段)
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
from collections import defaultdict
|
||||||
|
|
||||||
|
try:
|
||||||
|
import psycopg2
|
||||||
|
import psycopg2.extras
|
||||||
|
except ImportError:
|
||||||
|
print("ERROR: 需要 psycopg2")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
DB_HOST = "bj-postgres-16pob4sg.sql.tencentcdb.com"
|
||||||
|
DB_PORT = 28591
|
||||||
|
DB_USER = "ai_member"
|
||||||
|
DB_NAME = "vala_bi"
|
||||||
|
SHARD_COUNT = 8
|
||||||
|
|
||||||
|
COURSE_LEVEL_MAP = {"A1": "L1", "A2": "L2"}
|
||||||
|
|
||||||
|
|
||||||
|
def get_password():
|
||||||
|
pw = os.environ.get("PG_ONLINE_PASSWORD", "")
|
||||||
|
if pw:
|
||||||
|
return pw
|
||||||
|
secrets_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "secrets.env")
|
||||||
|
if os.path.exists(secrets_path):
|
||||||
|
with open(secrets_path) as f:
|
||||||
|
for line in f:
|
||||||
|
if line.startswith("PG_ONLINE_PASSWORD="):
|
||||||
|
return line.strip().split("=", 1)[1].strip("'\"")
|
||||||
|
raise RuntimeError("PG_ONLINE_PASSWORD not found")
|
||||||
|
|
||||||
|
|
||||||
|
def get_conn():
|
||||||
|
return psycopg2.connect(
|
||||||
|
host=DB_HOST, port=DB_PORT, user=DB_USER,
|
||||||
|
password=get_password(), dbname=DB_NAME, connect_timeout=30
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def mask_phone(phone):
|
||||||
|
return f"{phone[:3]}****{phone[-4:]}"
|
||||||
|
|
||||||
|
|
||||||
|
def batch_query_in(cur, sql_template, params, chunk_size=500):
|
||||||
|
"""分块执行 IN 查询,将 sql_template 中的第一个 %%s 替换为占位符"""
|
||||||
|
results = []
|
||||||
|
for i in range(0, len(params), chunk_size):
|
||||||
|
chunk = params[i:i + chunk_size]
|
||||||
|
placeholders = ",".join(["%s"] * len(chunk))
|
||||||
|
sql = sql_template % placeholders
|
||||||
|
cur.execute(sql, chunk)
|
||||||
|
results.extend(cur.fetchall())
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def match_phones(phones, conn):
|
||||||
|
if not phones:
|
||||||
|
return {}, [], []
|
||||||
|
|
||||||
|
# 清理手机号:移除空格、小数点等
|
||||||
|
def clean_phone(p):
|
||||||
|
p = str(p).strip()
|
||||||
|
# 处理浮点数格式: 13003632311.0 -> 13003632311
|
||||||
|
if '.' in p:
|
||||||
|
parts = p.split('.')
|
||||||
|
if parts[1] in ('0', '00'):
|
||||||
|
p = parts[0]
|
||||||
|
# 验证是11位手机号
|
||||||
|
if re.match(r'^1\d{10}$', p):
|
||||||
|
return p
|
||||||
|
return None
|
||||||
|
|
||||||
|
cleaned = [(clean_phone(p), p) for p in phones]
|
||||||
|
valid_phones = [(cp, op) for cp, op in cleaned if cp]
|
||||||
|
if not valid_phones:
|
||||||
|
return {}, [], []
|
||||||
|
|
||||||
|
masked_to_phones = {}
|
||||||
|
for cp, op in valid_phones:
|
||||||
|
m = mask_phone(cp)
|
||||||
|
masked_to_phones.setdefault(m, []).append(op)
|
||||||
|
|
||||||
|
masks = list(masked_to_phones.keys())
|
||||||
|
cur = conn.cursor()
|
||||||
|
placeholders = ",".join(["%s"] * len(masks))
|
||||||
|
cur.execute(f"""
|
||||||
|
SELECT id AS account_id, tel
|
||||||
|
FROM bi_vala_app_account
|
||||||
|
WHERE tel IN ({placeholders}) AND status = 1 AND deleted_at IS NULL
|
||||||
|
""", masks)
|
||||||
|
rows = cur.fetchall()
|
||||||
|
cur.close()
|
||||||
|
|
||||||
|
# masked → account_id
|
||||||
|
masked_to_account = {}
|
||||||
|
for aid, masked in rows:
|
||||||
|
if masked not in masked_to_account:
|
||||||
|
masked_to_account[masked] = aid
|
||||||
|
|
||||||
|
# original phone → account_id
|
||||||
|
phone_to_account = {}
|
||||||
|
for op in phones:
|
||||||
|
cp = clean_phone(op)
|
||||||
|
if cp:
|
||||||
|
m = mask_phone(cp)
|
||||||
|
if m in masked_to_account:
|
||||||
|
phone_to_account[op] = masked_to_account[m]
|
||||||
|
|
||||||
|
account_ids = list(set(phone_to_account.values()))
|
||||||
|
unmatched = [p for p in phones if p not in phone_to_account]
|
||||||
|
return phone_to_account, account_ids, unmatched
|
||||||
|
|
||||||
|
|
||||||
|
def query_course_details(cur, account_ids):
|
||||||
|
if not account_ids:
|
||||||
|
return {}
|
||||||
|
rows = batch_query_in(cur, """
|
||||||
|
SELECT account_id, course_level,
|
||||||
|
CASE WHEN expire_time IS NULL THEN '体验课' ELSE '正式课' END AS course_type
|
||||||
|
FROM bi_user_course_detail
|
||||||
|
WHERE account_id IN (%s) AND deleted_at IS NULL
|
||||||
|
""", account_ids)
|
||||||
|
result = {}
|
||||||
|
for aid, cl, ct in rows:
|
||||||
|
if aid not in result:
|
||||||
|
result[int(aid)] = (cl, ct)
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def query_chapter_play_records(cur, char_ids):
|
||||||
|
"""查询每个角色最近一次行课记录和总学习时间"""
|
||||||
|
if not char_ids:
|
||||||
|
return {}, {}
|
||||||
|
|
||||||
|
# 构建 UNION ALL SQL - 使用 CTE 方式先创建 ID 列表
|
||||||
|
# 创建临时ID数组
|
||||||
|
cur.execute("CREATE TEMP TABLE IF NOT EXISTS _tmp_char_ids (id int PRIMARY KEY) ON COMMIT DROP")
|
||||||
|
cur.execute("DELETE FROM _tmp_char_ids")
|
||||||
|
psycopg2.extras.execute_values(cur,
|
||||||
|
"INSERT INTO _tmp_char_ids (id) VALUES %s",
|
||||||
|
[(cid,) for cid in char_ids]
|
||||||
|
)
|
||||||
|
|
||||||
|
# 查询每个角色的最近行课记录
|
||||||
|
union_parts = []
|
||||||
|
for i in range(SHARD_COUNT):
|
||||||
|
union_parts.append(
|
||||||
|
f"SELECT cpr.user_id, cpr.chapter_id, cpr.chapter_unique_id, cpr.created_at, cpr.play_status "
|
||||||
|
f"FROM bi_user_chapter_play_record_{i} cpr "
|
||||||
|
f"JOIN _tmp_char_ids t ON cpr.user_id = t.id "
|
||||||
|
f"WHERE cpr.deleted_at IS NULL"
|
||||||
|
)
|
||||||
|
|
||||||
|
all_plays_sql = " UNION ALL ".join(union_parts)
|
||||||
|
|
||||||
|
cur.execute(f"""
|
||||||
|
WITH all_plays AS ({all_plays_sql})
|
||||||
|
, latest AS (
|
||||||
|
SELECT DISTINCT ON (user_id) user_id, chapter_id, chapter_unique_id, created_at AS latest_date
|
||||||
|
FROM all_plays
|
||||||
|
ORDER BY user_id, created_at DESC
|
||||||
|
)
|
||||||
|
SELECT user_id, chapter_id, chapter_unique_id, latest_date FROM latest
|
||||||
|
""")
|
||||||
|
|
||||||
|
char_latest = {}
|
||||||
|
chapter_ids = set()
|
||||||
|
for uid, ch_id, cu_id, ldate in cur.fetchall():
|
||||||
|
char_latest[uid] = (ch_id, ldate)
|
||||||
|
chapter_ids.add(ch_id)
|
||||||
|
|
||||||
|
# 查询每个角色所有 play_status=1 的 chapter_unique_id
|
||||||
|
cur.execute(f"""
|
||||||
|
WITH all_plays AS ({all_plays_sql})
|
||||||
|
SELECT user_id, chapter_unique_id
|
||||||
|
FROM all_plays
|
||||||
|
WHERE play_status = 1
|
||||||
|
""")
|
||||||
|
|
||||||
|
char_cuids = defaultdict(set)
|
||||||
|
all_cuids = set()
|
||||||
|
for uid, cuid in cur.fetchall():
|
||||||
|
char_cuids[uid].add(cuid)
|
||||||
|
all_cuids.add(cuid)
|
||||||
|
|
||||||
|
# 查询组件耗时
|
||||||
|
char_total_ms = defaultdict(int)
|
||||||
|
if all_cuids:
|
||||||
|
cuid_list = list(all_cuids)
|
||||||
|
|
||||||
|
# 创建临时表存 cuids
|
||||||
|
cur.execute("CREATE TEMP TABLE IF NOT EXISTS _tmp_cuids (cuid text PRIMARY KEY) ON COMMIT DROP")
|
||||||
|
cur.execute("DELETE FROM _tmp_cuids")
|
||||||
|
psycopg2.extras.execute_values(cur,
|
||||||
|
"INSERT INTO _tmp_cuids (cuid) VALUES %s",
|
||||||
|
[(c,) for c in cuid_list]
|
||||||
|
)
|
||||||
|
|
||||||
|
comp_parts = []
|
||||||
|
for i in range(SHARD_COUNT):
|
||||||
|
comp_parts.append(
|
||||||
|
f"SELECT comp.chapter_unique_id, SUM(comp.interval_time) AS total_ms "
|
||||||
|
f"FROM bi_user_component_play_record_{i} comp "
|
||||||
|
f"JOIN _tmp_cuids t ON comp.chapter_unique_id = t.cuid "
|
||||||
|
f"GROUP BY comp.chapter_unique_id"
|
||||||
|
)
|
||||||
|
comp_union = " UNION ALL ".join(comp_parts)
|
||||||
|
|
||||||
|
cur.execute(f"""
|
||||||
|
SELECT chapter_unique_id, SUM(total_ms) AS total_ms
|
||||||
|
FROM ({comp_union}) t2
|
||||||
|
GROUP BY chapter_unique_id
|
||||||
|
""")
|
||||||
|
|
||||||
|
cuid_time = {}
|
||||||
|
for cuid, tms in cur.fetchall():
|
||||||
|
cuid_time[cuid] = (tms or 0)
|
||||||
|
|
||||||
|
for uid, cuids in char_cuids.items():
|
||||||
|
total = sum(cuid_time.get(c, 0) for c in cuids)
|
||||||
|
char_total_ms[uid] = total
|
||||||
|
|
||||||
|
return char_latest, char_total_ms
|
||||||
|
|
||||||
|
|
||||||
|
def query_chapter_course_map(cur, chapter_ids):
|
||||||
|
if not chapter_ids:
|
||||||
|
return {}
|
||||||
|
rows = batch_query_in(cur, """
|
||||||
|
SELECT id, course_level, course_season, course_unit, course_lesson
|
||||||
|
FROM bi_level_unit_lesson WHERE id IN (%s)
|
||||||
|
""", list(chapter_ids))
|
||||||
|
return {r[0]: (r[1], r[2], r[3], r[4]) for r in rows}
|
||||||
|
|
||||||
|
|
||||||
|
def query_payment_status(cur, account_ids):
|
||||||
|
if not account_ids:
|
||||||
|
return {}
|
||||||
|
rows = batch_query_in(cur, """
|
||||||
|
SELECT account_id, MAX(CASE WHEN order_status IN (3,4) THEN 1 ELSE 0 END) AS has_order
|
||||||
|
FROM bi_vala_order WHERE account_id IN (%s) AND deleted_at IS NULL
|
||||||
|
GROUP BY account_id
|
||||||
|
""", account_ids)
|
||||||
|
status = {int(aid): "已付费" if ho == 1 else "未付费" for aid, ho in rows}
|
||||||
|
for aid in account_ids:
|
||||||
|
if int(aid) not in status:
|
||||||
|
status[int(aid)] = "未付费"
|
||||||
|
return status
|
||||||
|
|
||||||
|
|
||||||
|
def query_characters(cur, account_ids):
|
||||||
|
if not account_ids:
|
||||||
|
return {}
|
||||||
|
rows = batch_query_in(cur, """
|
||||||
|
SELECT account_id, id AS character_id
|
||||||
|
FROM bi_vala_app_character
|
||||||
|
WHERE account_id IN (%s) AND nickname IS NOT NULL AND nickname != '' AND deleted_at IS NULL
|
||||||
|
""", account_ids)
|
||||||
|
result = defaultdict(list)
|
||||||
|
for aid, cid in rows:
|
||||||
|
result[int(aid)].append(int(cid))
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
print("加载待查询行...")
|
||||||
|
with open("/tmp/pending_rows.json") as f:
|
||||||
|
pending_rows = json.load(f)
|
||||||
|
print(f"共 {len(pending_rows)} 条待查询")
|
||||||
|
|
||||||
|
with_id = []
|
||||||
|
without_id = []
|
||||||
|
for row in pending_rows:
|
||||||
|
uid_str = (row.get("user_id") or "").strip()
|
||||||
|
if uid_str:
|
||||||
|
try:
|
||||||
|
row["account_id"] = int(float(uid_str))
|
||||||
|
with_id.append(row)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
phone = (row.get("phone") or "").strip()
|
||||||
|
if phone:
|
||||||
|
row["phone_clean"] = phone
|
||||||
|
without_id.append(row)
|
||||||
|
else:
|
||||||
|
phone = (row.get("phone") or "").strip()
|
||||||
|
if phone:
|
||||||
|
row["phone_clean"] = phone
|
||||||
|
without_id.append(row)
|
||||||
|
|
||||||
|
print(f"有用户ID: {len(with_id)}, 需手机号匹配: {len(without_id)}")
|
||||||
|
|
||||||
|
conn = get_conn()
|
||||||
|
cur = conn.cursor()
|
||||||
|
|
||||||
|
# Step 1: 手机号匹配
|
||||||
|
phone_to_account = {}
|
||||||
|
if without_id:
|
||||||
|
phones = [r["phone_clean"] for r in without_id]
|
||||||
|
phone_to_account, matched_ids, unmatched = match_phones(phones, conn)
|
||||||
|
print(f"手机号匹配: {len(phone_to_account)} 成功, {len(unmatched)} 未匹配")
|
||||||
|
|
||||||
|
for row in without_id:
|
||||||
|
p = row["phone_clean"]
|
||||||
|
if p in phone_to_account:
|
||||||
|
row["account_id"] = int(phone_to_account[p])
|
||||||
|
with_id.append(row)
|
||||||
|
|
||||||
|
print(f"总计可查询用户: {len(with_id)}")
|
||||||
|
|
||||||
|
all_account_ids = list(set(r["account_id"] for r in with_id))
|
||||||
|
print(f"去重 account_id: {len(all_account_ids)}")
|
||||||
|
|
||||||
|
if not all_account_ids:
|
||||||
|
print("没有可查询的用户,退出")
|
||||||
|
conn.close()
|
||||||
|
return
|
||||||
|
|
||||||
|
# Step 2: 查询课程分配
|
||||||
|
print("查询课程分配...")
|
||||||
|
course_details = query_course_details(cur, all_account_ids)
|
||||||
|
print(f" 有课程: {len(course_details)}")
|
||||||
|
|
||||||
|
# Step 3: 查询角色
|
||||||
|
print("查询角色...")
|
||||||
|
account_chars = query_characters(cur, all_account_ids)
|
||||||
|
all_char_ids = []
|
||||||
|
for chars in account_chars.values():
|
||||||
|
all_char_ids.extend(chars)
|
||||||
|
print(f" 角色数: {len(all_char_ids)}")
|
||||||
|
|
||||||
|
# Step 4: 查询行课记录
|
||||||
|
print("查询行课记录 (可能需要几分钟)...")
|
||||||
|
char_latest, char_total_ms = query_chapter_play_records(cur, all_char_ids)
|
||||||
|
print(f" 有行课记录: {len(char_latest)}")
|
||||||
|
|
||||||
|
# Step 5: chapter 映射
|
||||||
|
all_chapter_ids = set()
|
||||||
|
for ch_id, _ in char_latest.values():
|
||||||
|
all_chapter_ids.add(ch_id)
|
||||||
|
print("查询课程映射...")
|
||||||
|
course_chapter_map = query_chapter_course_map(cur, all_chapter_ids)
|
||||||
|
print(f" chapter: {len(course_chapter_map)}")
|
||||||
|
|
||||||
|
# Step 6: 付费状态
|
||||||
|
print("查询付费状态...")
|
||||||
|
pay_status = query_payment_status(cur, all_account_ids)
|
||||||
|
print(f" 付费: {sum(1 for v in pay_status.values() if v == '已付费')}")
|
||||||
|
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
# Step 7: 组装结果
|
||||||
|
print("组装结果...")
|
||||||
|
results = []
|
||||||
|
|
||||||
|
# 按 row 去重,同一用户多行合并
|
||||||
|
row_aid_map = {}
|
||||||
|
for row in with_id:
|
||||||
|
rn = row["row"]
|
||||||
|
aid = row["account_id"]
|
||||||
|
sales = row.get("sales", "").strip()
|
||||||
|
if rn not in row_aid_map:
|
||||||
|
row_aid_map[rn] = (aid, sales)
|
||||||
|
|
||||||
|
for row in with_id:
|
||||||
|
rn = row["row"]
|
||||||
|
aid = row["account_id"]
|
||||||
|
sales = row.get("sales", "").strip()
|
||||||
|
|
||||||
|
# 课程描述
|
||||||
|
cd = course_details.get(aid)
|
||||||
|
if cd:
|
||||||
|
cl, ct = cd
|
||||||
|
level_display = COURSE_LEVEL_MAP.get(cl, cl)
|
||||||
|
course_desc = f"{level_display}{ct}"
|
||||||
|
else:
|
||||||
|
course_desc = "暂无课程"
|
||||||
|
|
||||||
|
# 最近行课和当前章节
|
||||||
|
chars = account_chars.get(aid, [])
|
||||||
|
latest_date = None
|
||||||
|
chapter_pos = ""
|
||||||
|
for cid in chars:
|
||||||
|
if cid in char_latest:
|
||||||
|
ch_id, ldate = char_latest[cid]
|
||||||
|
if ldate:
|
||||||
|
if latest_date is None or ldate > latest_date:
|
||||||
|
latest_date = ldate
|
||||||
|
if ch_id in course_chapter_map:
|
||||||
|
lvl, season, unit, lesson = course_chapter_map[ch_id]
|
||||||
|
chapter_pos = f"{lvl}-{season}-{unit}-{lesson}"
|
||||||
|
|
||||||
|
if not chapter_pos:
|
||||||
|
chapter_pos = "无记录"
|
||||||
|
date_str = latest_date.strftime("%Y-%m-%d") if latest_date else "无记录"
|
||||||
|
|
||||||
|
# 学习时长
|
||||||
|
total_min = 0
|
||||||
|
for cid in chars:
|
||||||
|
total_min += (char_total_ms.get(cid, 0) or 0)
|
||||||
|
total_min = round(total_min / 1000 / 60)
|
||||||
|
|
||||||
|
# 付费状态
|
||||||
|
pay = pay_status.get(aid, "未付费")
|
||||||
|
|
||||||
|
result_text = f"销售:{sales} | 用户:{aid} | 当前:{course_desc}-{chapter_pos} | 最近行课:{date_str} | 学习{total_min}min | {pay}"
|
||||||
|
|
||||||
|
results.append({
|
||||||
|
"row": rn,
|
||||||
|
"result": result_text,
|
||||||
|
"aid": aid
|
||||||
|
})
|
||||||
|
|
||||||
|
# 未匹配的
|
||||||
|
for row in without_id:
|
||||||
|
p = row.get("phone_clean", "")
|
||||||
|
if p not in phone_to_account:
|
||||||
|
results.append({
|
||||||
|
"row": row["row"],
|
||||||
|
"result": f"未匹配到用户",
|
||||||
|
"aid": 0
|
||||||
|
})
|
||||||
|
|
||||||
|
# 输出
|
||||||
|
with open("/tmp/results.jsonl", "w") as f:
|
||||||
|
for r in results:
|
||||||
|
f.write(json.dumps(r, ensure_ascii=False) + "\n")
|
||||||
|
|
||||||
|
print(f"\n✅ 查询完成: {len(results)} 条结果")
|
||||||
|
unrecorded = sum(1 for r in results if "无记录" in r.get("result","") and "未匹配" not in r.get("result",""))
|
||||||
|
unmatched_count = sum(1 for r in results if "未匹配" in r.get("result",""))
|
||||||
|
paid_count = sum(1 for r in results if "已付费" in r.get("result",""))
|
||||||
|
print(f" 有记录: {len(results) - unrecorded - unmatched_count}")
|
||||||
|
print(f" 无记录: {unrecorded}")
|
||||||
|
print(f" 未匹配: {unmatched_count}")
|
||||||
|
print(f" 已付费: {paid_count}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
91
scripts/check_account_userid.py
Normal file
91
scripts/check_account_userid.py
Normal file
@ -0,0 +1,91 @@
|
|||||||
|
import json, subprocess
|
||||||
|
|
||||||
|
ES_URL = "https://es-7vd7jcu9.public.tencentelasticsearch.com:9200"
|
||||||
|
AUTH = "elastic:F%?QDcWes7N2WTuiYD11"
|
||||||
|
|
||||||
|
def count_query(must_clauses):
|
||||||
|
q = {"query": {"bool": {"must": must_clauses}}}
|
||||||
|
cmd = ["curl", "-s", "-u", AUTH, "-H", "Content-Type: application/json",
|
||||||
|
f"{ES_URL}/growth_activity_behavior/_count", "-d", json.dumps(q)]
|
||||||
|
r = subprocess.run(cmd, capture_output=True, text=True)
|
||||||
|
return json.loads(r.stdout).get("count", 0)
|
||||||
|
|
||||||
|
total = count_query([])
|
||||||
|
|
||||||
|
# 四种组合
|
||||||
|
nonzero_account = {"bool": {"must_not": [{"term": {"accountId": 0}}]}}
|
||||||
|
nonzero_user = {"bool": {"must_not": [{"term": {"userId": 0}}]}}
|
||||||
|
zero_account = {"term": {"accountId": 0}}
|
||||||
|
zero_user = {"term": {"userId": 0}}
|
||||||
|
|
||||||
|
a0_u0 = count_query([zero_account, zero_user])
|
||||||
|
a0_ux = count_query([zero_account, nonzero_user])
|
||||||
|
ax_u0 = count_query([nonzero_account, zero_user])
|
||||||
|
ax_ux = count_query([nonzero_account, nonzero_user])
|
||||||
|
|
||||||
|
print("=== growth_activity_behavior 中 accountId x userId 组合 ===\n")
|
||||||
|
print(f"{'accountId':>12} {'userId':>12} {'条数':>10} {'占比':>10}")
|
||||||
|
print("-" * 48)
|
||||||
|
print(f"{'=0':>12} {'=0':>12} {a0_u0:>10} {a0_u0/total*100:>9.1f}%")
|
||||||
|
print(f"{'=0':>12} {'>0':>12} {a0_ux:>10} {a0_ux/total*100:>9.1f}%")
|
||||||
|
print(f"{'>0':>12} {'=0':>12} {ax_u0:>10} {ax_u0/total*100:>9.1f}%")
|
||||||
|
print(f"{'>0':>12} {'>0':>12} {ax_ux:>10} {ax_ux/total*100:>9.1f}%")
|
||||||
|
print(f"{'总计':>12} {'':>12} {total:>10}")
|
||||||
|
|
||||||
|
# 按 subBehavior 细分 - ES aggregation
|
||||||
|
print("\n=== 按 behavior + subBehavior 细分 ===\n")
|
||||||
|
agg_query = {
|
||||||
|
"size": 0,
|
||||||
|
"aggs": {
|
||||||
|
"behaviors": {
|
||||||
|
"terms": {"field": "behavior", "size": 10},
|
||||||
|
"aggs": {
|
||||||
|
"subs": {
|
||||||
|
"terms": {"field": "subBehavior", "size": 10},
|
||||||
|
"aggs": {
|
||||||
|
"a0_u0": {"filter": {"bool": {"must": [zero_account, zero_user]}}},
|
||||||
|
"a0_ux": {"filter": {"bool": {"must": [zero_account, nonzero_user]}}},
|
||||||
|
"ax_u0": {"filter": {"bool": {"must": [nonzero_account, zero_user]}}},
|
||||||
|
"ax_ux": {"filter": {"bool": {"must": [nonzero_account, nonzero_user]}}}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
cmd = ["curl", "-s", "-u", AUTH, "-H", "Content-Type: application/json",
|
||||||
|
f"{ES_URL}/growth_activity_behavior/_search", "-d", json.dumps(agg_query)]
|
||||||
|
r = subprocess.run(cmd, capture_output=True, text=True)
|
||||||
|
data = json.loads(r.stdout)
|
||||||
|
|
||||||
|
header = f"{'behavior / subBehavior':<40} {'总计':>8} {'a=0,u=0':>8} {'a=0,u>0':>8} {'a>0,u=0':>8} {'a>0,u>0':>8}"
|
||||||
|
print(header)
|
||||||
|
print("-" * len(header))
|
||||||
|
|
||||||
|
for bb in data["aggregations"]["behaviors"]["buckets"]:
|
||||||
|
bname = bb["key"]
|
||||||
|
for sb in bb["subs"]["buckets"]:
|
||||||
|
sname = sb["key"]
|
||||||
|
n = sb["doc_count"]
|
||||||
|
a0u0 = sb["a0_u0"]["doc_count"]
|
||||||
|
a0ux = sb["a0_ux"]["doc_count"]
|
||||||
|
axu0 = sb["ax_u0"]["doc_count"]
|
||||||
|
axux = sb["ax_ux"]["doc_count"]
|
||||||
|
label = f"{bname} / {sname}"
|
||||||
|
print(f"{label:<40} {n:>8} {a0u0:>8} {a0ux:>8} {axu0:>8} {axux:>8}")
|
||||||
|
|
||||||
|
# userId 不存在的记录
|
||||||
|
print(f"\n=== userId 字段不存在 的记录 ===")
|
||||||
|
u_nonexist = count_query([{"bool": {"must_not": [{"exists": {"field": "userId"}}]}}])
|
||||||
|
print(f"userId 字段不存在: {u_nonexist}")
|
||||||
|
|
||||||
|
# 抽 userId=0 的样本
|
||||||
|
cmd = ["curl", "-s", "-u", AUTH, "-H", "Content-Type: application/json",
|
||||||
|
f"{ES_URL}/growth_activity_behavior/_search?size=3",
|
||||||
|
"-d", '{"query":{"term":{"userId":0}},"_source":true}']
|
||||||
|
r = subprocess.run(cmd, capture_output=True, text=True)
|
||||||
|
data = json.loads(r.stdout)
|
||||||
|
print(f"\nuserId=0 记录数: {data['hits']['total']['value']}")
|
||||||
|
for h in data['hits']['hits']:
|
||||||
|
print(f" {json.dumps(h['_source'], ensure_ascii=False)}")
|
||||||
437
scripts/endor_purchase_analysis.py
Normal file
437
scripts/endor_purchase_analysis.py
Normal file
@ -0,0 +1,437 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
端内购买用户行为分析:注册日期 → 序章完课日期 → 购课日期 三者关系
|
||||||
|
"""
|
||||||
|
import psycopg2
|
||||||
|
from datetime import datetime, date
|
||||||
|
from collections import defaultdict, Counter
|
||||||
|
import os
|
||||||
|
|
||||||
|
PG_PASS = os.environ.get('PG_ONLINE_PASSWORD', "LdfjdjL83h3h3^$&**YGG*")
|
||||||
|
conn = psycopg2.connect(
|
||||||
|
host="bj-postgres-16pob4sg.sql.tencentcdb.com",
|
||||||
|
port=28591,
|
||||||
|
user="ai_member",
|
||||||
|
password=PG_PASS,
|
||||||
|
database="vala_bi"
|
||||||
|
)
|
||||||
|
cur = conn.cursor()
|
||||||
|
|
||||||
|
# U0 序章 chapter_id
|
||||||
|
L1_U0 = [343, 344, 345, 346, 348]
|
||||||
|
L2_U0 = [55, 56, 57, 58, 59]
|
||||||
|
|
||||||
|
print("=" * 80)
|
||||||
|
print("端内购买用户行为分析")
|
||||||
|
print("=" * 80)
|
||||||
|
|
||||||
|
# ========== Step 1: 端内付费用户 ==========
|
||||||
|
print("\n[Step 1] 查询端内付费用户...")
|
||||||
|
cur.execute("""
|
||||||
|
SELECT DISTINCT ON (o.account_id)
|
||||||
|
o.account_id,
|
||||||
|
o.goods_id,
|
||||||
|
o.pay_success_date as purchase_date,
|
||||||
|
CASE
|
||||||
|
WHEN o.goods_id IN (57, 60, 63) THEN 'L1'
|
||||||
|
WHEN o.goods_id IN (31, 32, 33, 54) THEN 'L2'
|
||||||
|
ELSE 'other'
|
||||||
|
END as buy_level
|
||||||
|
FROM bi_vala_order o
|
||||||
|
JOIN bi_vala_app_account a ON o.account_id = a.id AND a.status = 1
|
||||||
|
WHERE o.key_from = 'app-active-h5-0-0'
|
||||||
|
AND o.order_status IN (3, 4)
|
||||||
|
AND o.pay_success_date IS NOT NULL
|
||||||
|
ORDER BY o.account_id, o.pay_success_date
|
||||||
|
""")
|
||||||
|
endo_users = {}
|
||||||
|
for row in cur.fetchall():
|
||||||
|
account_id, goods_id, purchase_date, buy_level = row
|
||||||
|
if buy_level == 'other':
|
||||||
|
continue
|
||||||
|
endo_users[account_id] = {
|
||||||
|
'goods_id': goods_id,
|
||||||
|
'purchase_date': purchase_date,
|
||||||
|
'buy_level': buy_level
|
||||||
|
}
|
||||||
|
|
||||||
|
print(f" 端内付费用户数: {len(endo_users)}")
|
||||||
|
l1_users = {k: v for k, v in endo_users.items() if v['buy_level'] == 'L1'}
|
||||||
|
l2_users = {k: v for k, v in endo_users.items() if v['buy_level'] == 'L2'}
|
||||||
|
print(f" - L1 购买用户: {len(l1_users)}")
|
||||||
|
print(f" - L2 购买用户: {len(l2_users)}")
|
||||||
|
|
||||||
|
# ========== Step 2: 注册时间 ==========
|
||||||
|
print("\n[Step 2] 查询用户注册时间...")
|
||||||
|
all_ids = list(endo_users.keys())
|
||||||
|
batch_size = 500
|
||||||
|
register_map = {}
|
||||||
|
for i in range(0, len(all_ids), batch_size):
|
||||||
|
batch = all_ids[i:i+batch_size]
|
||||||
|
cur.execute(
|
||||||
|
"SELECT id, created_at FROM bi_vala_app_account WHERE id IN %s AND status = 1",
|
||||||
|
(tuple(batch),)
|
||||||
|
)
|
||||||
|
for row in cur.fetchall():
|
||||||
|
register_map[row[0]] = row[1]
|
||||||
|
|
||||||
|
print(f" 找到注册时间的用户: {len(register_map)}")
|
||||||
|
|
||||||
|
# ========== Step 3: account_id -> user_id (character) 映射 ==========
|
||||||
|
print("\n[Step 3] 查询用户角色映射...")
|
||||||
|
char_map = {} # account_id -> [user_id, ...]
|
||||||
|
for i in range(0, len(all_ids), batch_size):
|
||||||
|
batch = all_ids[i:i+batch_size]
|
||||||
|
cur.execute(
|
||||||
|
"SELECT account_id, id FROM bi_vala_app_character WHERE account_id IN %s",
|
||||||
|
(tuple(batch),)
|
||||||
|
)
|
||||||
|
for row in cur.fetchall():
|
||||||
|
acct = row[0]
|
||||||
|
uid = row[1]
|
||||||
|
if acct not in char_map:
|
||||||
|
char_map[acct] = []
|
||||||
|
char_map[acct].append(uid)
|
||||||
|
|
||||||
|
print(f" 有角色的用户数: {len(char_map)}")
|
||||||
|
|
||||||
|
# ========== Step 4: 序章完课时间 ==========
|
||||||
|
print("\n[Step 4] 查询序章完课时间(8张分表)...")
|
||||||
|
|
||||||
|
# 收集所有 user_id,按等级分
|
||||||
|
l1_user_ids = set()
|
||||||
|
l2_user_ids = set()
|
||||||
|
for acct in l1_users:
|
||||||
|
if acct in char_map:
|
||||||
|
for uid in char_map[acct]:
|
||||||
|
l1_user_ids.add(uid)
|
||||||
|
for acct in l2_users:
|
||||||
|
if acct in char_map:
|
||||||
|
for uid in char_map[acct]:
|
||||||
|
l2_user_ids.add(uid)
|
||||||
|
|
||||||
|
print(f" L1 相关 user_id 数: {len(l1_user_ids)}")
|
||||||
|
print(f" L2 相关 user_id 数: {len(l2_user_ids)}")
|
||||||
|
|
||||||
|
def query_prologue_completion(cur, user_ids, chapter_ids, table_prefix="bi_user_chapter_play_record"):
|
||||||
|
"""查询序章完课最早时间,返回 {user_id: earliest_date}"""
|
||||||
|
result = {}
|
||||||
|
if not user_ids:
|
||||||
|
return result
|
||||||
|
|
||||||
|
# 分表查询
|
||||||
|
union_parts = []
|
||||||
|
for t in range(8):
|
||||||
|
union_parts.append(f"""
|
||||||
|
SELECT user_id, MIN(created_at) as first_done
|
||||||
|
FROM {table_prefix}_{t}
|
||||||
|
WHERE chapter_id IN ({','.join(map(str, chapter_ids))})
|
||||||
|
AND play_status = 1
|
||||||
|
AND user_id IN %(user_ids)s
|
||||||
|
GROUP BY user_id
|
||||||
|
""")
|
||||||
|
|
||||||
|
sql = " UNION ALL ".join(union_parts)
|
||||||
|
sql = f"""
|
||||||
|
SELECT user_id, MIN(first_done) as earliest
|
||||||
|
FROM ({sql}) sub
|
||||||
|
GROUP BY user_id
|
||||||
|
"""
|
||||||
|
|
||||||
|
batch_size_uid = 2000
|
||||||
|
uid_list = list(user_ids)
|
||||||
|
for i in range(0, len(uid_list), batch_size_uid):
|
||||||
|
batch = tuple(uid_list[i:i+batch_size_uid])
|
||||||
|
cur.execute(sql, {'user_ids': batch})
|
||||||
|
for row in cur.fetchall():
|
||||||
|
result[row[0]] = row[1]
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
# L1 序章
|
||||||
|
l1_prologue = query_prologue_completion(cur, l1_user_ids, L1_U0)
|
||||||
|
print(f" L1 序章有完课记录的 user_id: {len(l1_prologue)}")
|
||||||
|
|
||||||
|
# L2 序章
|
||||||
|
l2_prologue = query_prologue_completion(cur, l2_user_ids, L2_U0)
|
||||||
|
print(f" L2 序章有完课记录的 user_id: {len(l2_prologue)}")
|
||||||
|
|
||||||
|
# ========== Step 5: 关联分析 ==========
|
||||||
|
print("\n[Step 5] 关联分析...")
|
||||||
|
|
||||||
|
# 对每个端内购买用户,找到他的角色中对应等级序章最早完课时间
|
||||||
|
def get_earliest_prologue(account_id, buy_level):
|
||||||
|
"""获取用户对应等级序章的最早完课时间"""
|
||||||
|
if account_id not in char_map:
|
||||||
|
return None
|
||||||
|
user_ids = char_map[account_id]
|
||||||
|
|
||||||
|
earliest = None
|
||||||
|
prologue_map = l1_prologue if buy_level == 'L1' else l2_prologue
|
||||||
|
for uid in user_ids:
|
||||||
|
if uid in prologue_map:
|
||||||
|
d = prologue_map[uid]
|
||||||
|
if earliest is None or d < earliest:
|
||||||
|
earliest = d
|
||||||
|
return earliest
|
||||||
|
|
||||||
|
# 构建分析数据
|
||||||
|
records = []
|
||||||
|
for account_id, info in endo_users.items():
|
||||||
|
register_date = register_map.get(account_id)
|
||||||
|
purchase_date = info['purchase_date']
|
||||||
|
buy_level = info['buy_level']
|
||||||
|
|
||||||
|
if register_date is None:
|
||||||
|
continue
|
||||||
|
|
||||||
|
prologue_date = get_earliest_prologue(account_id, buy_level)
|
||||||
|
|
||||||
|
# 计算天数差
|
||||||
|
if isinstance(register_date, datetime):
|
||||||
|
reg_d = register_date.date()
|
||||||
|
else:
|
||||||
|
reg_d = register_date
|
||||||
|
|
||||||
|
if isinstance(purchase_date, datetime):
|
||||||
|
pur_d = purchase_date.date()
|
||||||
|
elif isinstance(purchase_date, date):
|
||||||
|
pur_d = purchase_date
|
||||||
|
else:
|
||||||
|
pur_d = purchase_date
|
||||||
|
|
||||||
|
if prologue_date:
|
||||||
|
if isinstance(prologue_date, datetime):
|
||||||
|
pro_d = prologue_date.date()
|
||||||
|
else:
|
||||||
|
pro_d = prologue_date
|
||||||
|
|
||||||
|
reg_to_pro = (pro_d - reg_d).days
|
||||||
|
pro_to_pur = (pur_d - pro_d).days
|
||||||
|
reg_to_pur = (pur_d - reg_d).days
|
||||||
|
has_prologue = True
|
||||||
|
else:
|
||||||
|
reg_to_pro = None
|
||||||
|
pro_to_pur = None
|
||||||
|
reg_to_pur = (pur_d - reg_d).days
|
||||||
|
has_prologue = False
|
||||||
|
|
||||||
|
records.append({
|
||||||
|
'account_id': account_id,
|
||||||
|
'level': buy_level,
|
||||||
|
'register_date': reg_d,
|
||||||
|
'prologue_date': prologue_date.date() if has_prologue and isinstance(prologue_date, datetime) else (prologue_date if has_prologue else None),
|
||||||
|
'purchase_date': pur_d,
|
||||||
|
'reg_to_pro': reg_to_pro,
|
||||||
|
'pro_to_pur': pro_to_pur,
|
||||||
|
'reg_to_pur': reg_to_pur,
|
||||||
|
'has_prologue': has_prologue
|
||||||
|
})
|
||||||
|
|
||||||
|
# ========== Step 6: 统计输出 ==========
|
||||||
|
print(f"\n总分析用户数: {len(records)}")
|
||||||
|
|
||||||
|
# --- 总体统计 ---
|
||||||
|
with_prologue = [r for r in records if r['has_prologue']]
|
||||||
|
without_prologue = [r for r in records if not r['has_prologue']]
|
||||||
|
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("总体概览")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
print(f"端内付费用户: {len(records)}")
|
||||||
|
print(f" 有L1序章完课记录: {len([r for r in with_prologue if r['level']=='L1'])}")
|
||||||
|
print(f" 有L2序章完课记录: {len([r for r in with_prologue if r['level']=='L2'])}")
|
||||||
|
print(f" 无对应序章完课记录: {len(without_prologue)}")
|
||||||
|
|
||||||
|
# --- 天数分布 ---
|
||||||
|
def percentile(sorted_vals, p):
|
||||||
|
if not sorted_vals:
|
||||||
|
return None
|
||||||
|
idx = int(len(sorted_vals) * p / 100)
|
||||||
|
return sorted_vals[min(idx, len(sorted_vals)-1)]
|
||||||
|
|
||||||
|
def print_stats(name, vals, unit='天'):
|
||||||
|
if not vals:
|
||||||
|
print(f" {name}: 无数据")
|
||||||
|
return
|
||||||
|
s = sorted(vals)
|
||||||
|
print(f" {name}:")
|
||||||
|
print(f" 样本数: {len(s)}")
|
||||||
|
print(f" 中位数: {percentile(s, 50):.1f}{unit}")
|
||||||
|
print(f" 平均值: {sum(s)/len(s):.1f}{unit}")
|
||||||
|
print(f" P25: {percentile(s, 25):.1f}{unit}")
|
||||||
|
print(f" P75: {percentile(s, 75):.1f}{unit}")
|
||||||
|
print(f" P90: {percentile(s, 90):.1f}{unit}")
|
||||||
|
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("全量统计:注册→购课 天数")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
print_stats("全量 注册→购课", [r['reg_to_pur'] for r in records])
|
||||||
|
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("有序章完课用户统计")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
print_stats("序章完课 注册→完课 天数", [r['reg_to_pro'] for r in with_prologue])
|
||||||
|
print_stats("序章完课 完课→购课 天数", [r['pro_to_pur'] for r in with_prologue])
|
||||||
|
print_stats("序章完课 注册→购课 天数", [r['reg_to_pur'] for r in with_prologue])
|
||||||
|
|
||||||
|
# --- 按等级拆分 ---
|
||||||
|
for level in ['L1', 'L2']:
|
||||||
|
lvl = [r for r in records if r['level'] == level]
|
||||||
|
lvl_pro = [r for r in lvl if r['has_prologue']]
|
||||||
|
lvl_no = [r for r in lvl if not r['has_prologue']]
|
||||||
|
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print(f"{level} 购买用户统计")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
print(f" {level} 总用户: {len(lvl)}")
|
||||||
|
print(f" 有{level}序章完课: {len(lvl_pro)}")
|
||||||
|
print(f" 无{level}序章完课: {len(lvl_no)}")
|
||||||
|
|
||||||
|
print_stats(f"\n {level} 全量 注册→购课", [r['reg_to_pur'] for r in lvl])
|
||||||
|
|
||||||
|
if lvl_pro:
|
||||||
|
print_stats(f" {level} 序章完课 注册→完课", [r['reg_to_pro'] for r in lvl_pro])
|
||||||
|
print_stats(f" {level} 序章完课 完课→购课", [r['pro_to_pur'] for r in lvl_pro])
|
||||||
|
print_stats(f" {level} 序章完课 注册→购课", [r['reg_to_pur'] for r in lvl_pro])
|
||||||
|
|
||||||
|
# --- 完课→购课的时间段分布 ---
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("有完课用户:完课→购课 时间段分布")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
buckets = {
|
||||||
|
'0天(当天购课)': (0, 0),
|
||||||
|
'1-3天': (1, 3),
|
||||||
|
'4-7天': (4, 7),
|
||||||
|
'8-14天': (8, 14),
|
||||||
|
'15-30天': (15, 30),
|
||||||
|
'31-60天': (31, 60),
|
||||||
|
'61-90天': (61, 90),
|
||||||
|
'90天以上': (91, 99999),
|
||||||
|
'负数(购课在完课前)': (-99999, -1),
|
||||||
|
}
|
||||||
|
|
||||||
|
for level in ['L1', 'L2', 'ALL']:
|
||||||
|
if level == 'ALL':
|
||||||
|
data = with_prologue
|
||||||
|
else:
|
||||||
|
data = [r for r in with_prologue if r['level'] == level]
|
||||||
|
|
||||||
|
if not data:
|
||||||
|
continue
|
||||||
|
|
||||||
|
vals = [r['pro_to_pur'] for r in data]
|
||||||
|
print(f"\n[{level}] 完课→购课 天数分布:")
|
||||||
|
for label, (lo, hi) in buckets.items():
|
||||||
|
cnt = sum(1 for v in vals if lo <= v <= hi)
|
||||||
|
pct = cnt / len(vals) * 100
|
||||||
|
bar = '█' * int(pct / 2)
|
||||||
|
print(f" {label:20s}: {cnt:4d} ({pct:5.1f}%) {bar}")
|
||||||
|
|
||||||
|
# --- 注册→完课时间段分布 ---
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("有完课用户:注册→完课 时间段分布")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
for level in ['L1', 'L2', 'ALL']:
|
||||||
|
if level == 'ALL':
|
||||||
|
data = with_prologue
|
||||||
|
else:
|
||||||
|
data = [r for r in with_prologue if r['level'] == level]
|
||||||
|
|
||||||
|
if not data:
|
||||||
|
continue
|
||||||
|
|
||||||
|
vals = [r['reg_to_pro'] for r in data]
|
||||||
|
print(f"\n[{level}] 注册→完课 天数分布:")
|
||||||
|
for label, (lo, hi) in buckets.items():
|
||||||
|
if hi < 0:
|
||||||
|
continue
|
||||||
|
cnt = sum(1 for v in vals if lo <= v <= hi)
|
||||||
|
pct = cnt / len(vals) * 100
|
||||||
|
bar = '█' * int(pct / 2)
|
||||||
|
print(f" {label:20s}: {cnt:4d} ({pct:5.1f}%) {bar}")
|
||||||
|
|
||||||
|
# --- 注册→购课 时间段分布 ---
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("注册→购课 时间段分布")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
for level in ['L1', 'L2', 'ALL']:
|
||||||
|
if level == 'ALL':
|
||||||
|
data = records
|
||||||
|
else:
|
||||||
|
data = [r for r in records if r['level'] == level]
|
||||||
|
|
||||||
|
if not data:
|
||||||
|
continue
|
||||||
|
|
||||||
|
vals = [r['reg_to_pur'] for r in data]
|
||||||
|
print(f"\n[{level}] 注册→购课 天数分布:")
|
||||||
|
extended_buckets = {
|
||||||
|
'0天(当天购课)': (0, 0),
|
||||||
|
'1天': (1, 1),
|
||||||
|
'2天': (2, 2),
|
||||||
|
'3天': (3, 3),
|
||||||
|
'4-7天': (4, 7),
|
||||||
|
'8-14天': (8, 14),
|
||||||
|
'15-30天': (15, 30),
|
||||||
|
'31-60天': (31, 60),
|
||||||
|
'61-90天': (61, 90),
|
||||||
|
'90-180天': (91, 180),
|
||||||
|
'180天以上': (181, 99999),
|
||||||
|
}
|
||||||
|
for label, (lo, hi) in extended_buckets.items():
|
||||||
|
cnt = sum(1 for v in vals if lo <= v <= hi)
|
||||||
|
pct = cnt / len(vals) * 100
|
||||||
|
bar = '█' * int(pct / 2)
|
||||||
|
print(f" {label:20s}: {cnt:4d} ({pct:5.1f}%) {bar}")
|
||||||
|
|
||||||
|
# --- 完课在购课之前 vs 之后 ---
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("完课 vs 购课 时间关系")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
for level in ['L1', 'L2', 'ALL']:
|
||||||
|
if level == 'ALL':
|
||||||
|
data = with_prologue
|
||||||
|
else:
|
||||||
|
data = [r for r in with_prologue if r['level'] == level]
|
||||||
|
|
||||||
|
if not data:
|
||||||
|
continue
|
||||||
|
|
||||||
|
pro_before_pur = [r for r in data if r['pro_to_pur'] >= 0]
|
||||||
|
pro_after_pur = [r for r in data if r['pro_to_pur'] < 0]
|
||||||
|
pro_same_day = [r for r in data if r['pro_to_pur'] == 0]
|
||||||
|
|
||||||
|
print(f"\n[{level}]:")
|
||||||
|
print(f" 完课在购课之前或同一天: {len(pro_before_pur)} ({len(pro_before_pur)/len(data)*100:.1f}%)")
|
||||||
|
print(f" 其中同一天: {len(pro_same_day)} ({len(pro_same_day)/len(data)*100:.1f}%)")
|
||||||
|
print(f" 完课在购课之后: {len(pro_after_pur)} ({len(pro_after_pur)/len(data)*100:.1f}%)")
|
||||||
|
|
||||||
|
# ========== 导出详细CSV ==========
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("导出详细数据...")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
import csv
|
||||||
|
output_path = '/root/.openclaw/workspace/output/endor_purchase_analysis.csv'
|
||||||
|
with open(output_path, 'w', newline='', encoding='utf-8-sig') as f:
|
||||||
|
writer = csv.writer(f)
|
||||||
|
writer.writerow(['account_id', '等级', '注册日期', '序章完课日期', '购课日期',
|
||||||
|
'注册→完课(天)', '完课→购课(天)', '注册→购课(天)', '有序章完课'])
|
||||||
|
for r in sorted(records, key=lambda x: x['purchase_date']):
|
||||||
|
writer.writerow([
|
||||||
|
r['account_id'], r['level'], r['register_date'], r['prologue_date'], r['purchase_date'],
|
||||||
|
r['reg_to_pro'] if r['reg_to_pro'] is not None else '',
|
||||||
|
r['pro_to_pur'] if r['pro_to_pur'] is not None else '',
|
||||||
|
r['reg_to_pur'], '是' if r['has_prologue'] else '否'
|
||||||
|
])
|
||||||
|
|
||||||
|
print(f" 已导出到: {output_path}")
|
||||||
|
|
||||||
|
cur.close()
|
||||||
|
conn.close()
|
||||||
|
print("\n分析完成!")
|
||||||
500
scripts/endor_purchase_analysis_v2.py
Normal file
500
scripts/endor_purchase_analysis_v2.py
Normal file
@ -0,0 +1,500 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
端内购买用户行为分析 V2
|
||||||
|
口径修正:
|
||||||
|
1. 按订单维度:同一用户买L1+L2拆为两条记录
|
||||||
|
2. 序章完课 = U0全部五节(L01~L05)完成,取最后一节完成日期
|
||||||
|
3. 新增:购课时序章完成到第几节课
|
||||||
|
"""
|
||||||
|
import psycopg2
|
||||||
|
from datetime import datetime, date
|
||||||
|
from collections import defaultdict, Counter
|
||||||
|
import os
|
||||||
|
import csv
|
||||||
|
|
||||||
|
PG_PASS = os.environ.get('PG_ONLINE_PASSWORD', "LdfjdjL83h3h3^$&**YGG*")
|
||||||
|
conn = psycopg2.connect(
|
||||||
|
host="bj-postgres-16pob4sg.sql.tencentcdb.com",
|
||||||
|
port=28591,
|
||||||
|
user="ai_member",
|
||||||
|
password=PG_PASS,
|
||||||
|
database="vala_bi"
|
||||||
|
)
|
||||||
|
cur = conn.cursor()
|
||||||
|
|
||||||
|
# U0 序章 chapter_id -> lesson_name
|
||||||
|
L1_U0_LESSONS = {343: 'L01', 344: 'L02', 345: 'L03', 346: 'L04', 348: 'L05'}
|
||||||
|
L2_U0_LESSONS = {55: 'L01', 56: 'L02', 57: 'L03', 58: 'L04', 59: 'L05'}
|
||||||
|
|
||||||
|
LEVEL_MAP = {
|
||||||
|
57: 'L1', 60: 'L1', 63: 'L1',
|
||||||
|
31: 'L2', 32: 'L2', 33: 'L2', 54: 'L2',
|
||||||
|
61: 'L1+L2',
|
||||||
|
}
|
||||||
|
|
||||||
|
print("=" * 80)
|
||||||
|
print("端内购买用户行为分析 V2")
|
||||||
|
print("口径:序章完课 = U0全部五节完成 | 订单维度 | 购课时进度分析")
|
||||||
|
print("=" * 80)
|
||||||
|
|
||||||
|
# ========== Step 1: 端内付费订单(按订单维度) ==========
|
||||||
|
print("\n[Step 1] 查询端内付费订单...")
|
||||||
|
cur.execute("""
|
||||||
|
SELECT
|
||||||
|
o.id as order_id,
|
||||||
|
o.account_id,
|
||||||
|
o.goods_id,
|
||||||
|
o.pay_success_date,
|
||||||
|
o.trade_no
|
||||||
|
FROM bi_vala_order o
|
||||||
|
JOIN bi_vala_app_account a ON o.account_id = a.id AND a.status = 1
|
||||||
|
WHERE o.key_from = 'app-active-h5-0-0'
|
||||||
|
AND o.order_status IN (3, 4)
|
||||||
|
AND o.pay_success_date IS NOT NULL
|
||||||
|
ORDER BY o.pay_success_date
|
||||||
|
""")
|
||||||
|
|
||||||
|
orders = [] # list of {order_id, account_id, goods_id, purchase_date, level}
|
||||||
|
for row in cur.fetchall():
|
||||||
|
order_id, account_id, goods_id, purchase_date, trade_no = row
|
||||||
|
level = LEVEL_MAP.get(goods_id)
|
||||||
|
if level is None or level == 'L1+L2':
|
||||||
|
continue
|
||||||
|
if isinstance(purchase_date, datetime):
|
||||||
|
purchase_date = purchase_date.date()
|
||||||
|
orders.append({
|
||||||
|
'order_id': order_id,
|
||||||
|
'account_id': account_id,
|
||||||
|
'goods_id': goods_id,
|
||||||
|
'purchase_date': purchase_date,
|
||||||
|
'level': level
|
||||||
|
})
|
||||||
|
|
||||||
|
print(f" 端内付费订单数: {len(orders)}")
|
||||||
|
print(f" - L1 订单: {sum(1 for o in orders if o['level']=='L1')}")
|
||||||
|
print(f" - L2 订单: {sum(1 for o in orders if o['level']=='L2')}")
|
||||||
|
|
||||||
|
# 统计有多少用户同时买了L1+L2
|
||||||
|
user_levels = defaultdict(set)
|
||||||
|
for o in orders:
|
||||||
|
user_levels[o['account_id']].add(o['level'])
|
||||||
|
dual_users = [uid for uid, lvls in user_levels.items() if len(lvls) > 1]
|
||||||
|
print(f" 同时购买L1+L2的用户: {len(dual_users)}")
|
||||||
|
|
||||||
|
# ========== Step 2: 注册时间 ==========
|
||||||
|
print("\n[Step 2] 查询用户注册时间...")
|
||||||
|
all_accounts = list(set(o['account_id'] for o in orders))
|
||||||
|
register_map = {}
|
||||||
|
batch_size = 500
|
||||||
|
for i in range(0, len(all_accounts), batch_size):
|
||||||
|
batch = all_accounts[i:i+batch_size]
|
||||||
|
cur.execute(
|
||||||
|
"SELECT id, created_at FROM bi_vala_app_account WHERE id IN %s",
|
||||||
|
(tuple(batch),)
|
||||||
|
)
|
||||||
|
for row in cur.fetchall():
|
||||||
|
d = row[1]
|
||||||
|
register_map[row[0]] = d.date() if isinstance(d, (datetime, date)) else d
|
||||||
|
|
||||||
|
print(f" 找到注册时间: {len(register_map)}")
|
||||||
|
|
||||||
|
# ========== Step 3: account_id -> user_id 映射 ==========
|
||||||
|
print("\n[Step 3] 查询用户角色映射...")
|
||||||
|
char_map = defaultdict(list)
|
||||||
|
for i in range(0, len(all_accounts), batch_size):
|
||||||
|
batch = all_accounts[i:i+batch_size]
|
||||||
|
cur.execute(
|
||||||
|
"SELECT account_id, id FROM bi_vala_app_character WHERE account_id IN %s",
|
||||||
|
(tuple(batch),)
|
||||||
|
)
|
||||||
|
for row in cur.fetchall():
|
||||||
|
char_map[row[0]].append(row[1])
|
||||||
|
|
||||||
|
print(f" 有角色的用户: {sum(1 for uid, chars in char_map.items() if chars)}")
|
||||||
|
|
||||||
|
# ========== Step 4: 序章每节课的最早完成时间 ==========
|
||||||
|
print("\n[Step 4] 查询序章每节课完成时间(8张分表)...")
|
||||||
|
|
||||||
|
def query_lesson_completion(cur, user_ids, chapter_ids_dict):
|
||||||
|
"""
|
||||||
|
查询每个 user_id 对每个 chapter_id 的最早完成时间
|
||||||
|
返回: {user_id: {chapter_id: earliest_date}}
|
||||||
|
"""
|
||||||
|
result = defaultdict(dict)
|
||||||
|
if not user_ids:
|
||||||
|
return result
|
||||||
|
|
||||||
|
# 先按user_ids批量查询
|
||||||
|
all_chapter_ids = list(chapter_ids_dict.keys())
|
||||||
|
|
||||||
|
union_parts = []
|
||||||
|
for t in range(8):
|
||||||
|
union_parts.append(f"""
|
||||||
|
SELECT user_id, chapter_id, MIN(created_at) as first_done
|
||||||
|
FROM bi_user_chapter_play_record_{t}
|
||||||
|
WHERE chapter_id IN ({','.join(map(str, all_chapter_ids))})
|
||||||
|
AND play_status = 1
|
||||||
|
AND user_id IN %(user_ids)s
|
||||||
|
GROUP BY user_id, chapter_id
|
||||||
|
""")
|
||||||
|
|
||||||
|
sql = " UNION ALL ".join(union_parts)
|
||||||
|
sql = f"""
|
||||||
|
SELECT user_id, chapter_id, MIN(first_done) as earliest
|
||||||
|
FROM ({sql}) sub
|
||||||
|
GROUP BY user_id, chapter_id
|
||||||
|
"""
|
||||||
|
|
||||||
|
uid_list = list(user_ids)
|
||||||
|
for i in range(0, len(uid_list), 2000):
|
||||||
|
batch = tuple(uid_list[i:i+2000])
|
||||||
|
cur.execute(sql, {'user_ids': batch})
|
||||||
|
for row in cur.fetchall():
|
||||||
|
uid, cid, dt = row
|
||||||
|
result[uid][cid] = dt.date() if isinstance(dt, (datetime, date)) else dt
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
# 收集所有user_id
|
||||||
|
all_user_ids = set()
|
||||||
|
for acct in all_accounts:
|
||||||
|
for uid in char_map.get(acct, []):
|
||||||
|
all_user_ids.add(uid)
|
||||||
|
|
||||||
|
print(f" 总 user_id 数: {len(all_user_ids)}")
|
||||||
|
|
||||||
|
# L1 序章
|
||||||
|
l1_lesson_data = query_lesson_completion(cur, all_user_ids, L1_U0_LESSONS)
|
||||||
|
# L2 序章
|
||||||
|
l2_lesson_data = query_lesson_completion(cur, all_user_ids, L2_U0_LESSONS)
|
||||||
|
|
||||||
|
print(f" L1 序章有完成记录的 user_id: {len(l1_lesson_data)}")
|
||||||
|
print(f" L2 序章有完成记录的 user_id: {len(l2_lesson_data)}")
|
||||||
|
|
||||||
|
# ========== Step 5: 对每个订单分析序章进度 ==========
|
||||||
|
print("\n[Step 5] 逐订单分析序章进度...")
|
||||||
|
|
||||||
|
def analyze_user_prologue(user_ids, lesson_data, lesson_map):
|
||||||
|
"""
|
||||||
|
汇总用户所有角色在序章各节的完成情况
|
||||||
|
返回: {chapter_id: earliest_completion_date} (只包含完成的)
|
||||||
|
"""
|
||||||
|
merged = {}
|
||||||
|
for uid in user_ids:
|
||||||
|
if uid not in lesson_data:
|
||||||
|
continue
|
||||||
|
for cid, dt in lesson_data[uid].items():
|
||||||
|
if cid not in merged or dt < merged[cid]:
|
||||||
|
merged[cid] = dt
|
||||||
|
return merged
|
||||||
|
|
||||||
|
def get_prologue_status(lesson_completions, lesson_map, purchase_date):
|
||||||
|
"""
|
||||||
|
分析序章完成情况:
|
||||||
|
- total_done: 总共完成了几节(不限时间)
|
||||||
|
- done_before_purchase: 购课前完成了哪几节
|
||||||
|
- done_after_purchase: 购课后完成了哪几节
|
||||||
|
- all_five_done: 是否五节全完成
|
||||||
|
- all_five_done_date: 五节全完成的日期(最后一节)
|
||||||
|
- progress_at_purchase: 购课时的进度(完成了几节)
|
||||||
|
"""
|
||||||
|
total_done = len(lesson_completions)
|
||||||
|
|
||||||
|
done_before = {}
|
||||||
|
done_after = {}
|
||||||
|
for cid, dt in lesson_completions.items():
|
||||||
|
if dt <= purchase_date:
|
||||||
|
done_before[cid] = dt
|
||||||
|
else:
|
||||||
|
done_after[cid] = dt
|
||||||
|
|
||||||
|
# 是否五节全完成
|
||||||
|
all_five_done = total_done == 5
|
||||||
|
all_five_done_date = max(lesson_completions.values()) if all_five_done else None
|
||||||
|
|
||||||
|
# 购课时完成了几节
|
||||||
|
progress_at_purchase = len(done_before)
|
||||||
|
|
||||||
|
# 购课时哪些节已完成
|
||||||
|
lessons_done_at_purchase = sorted(
|
||||||
|
[lesson_map[cid] for cid in done_before],
|
||||||
|
key=lambda x: int(x[1:]) # L01, L02, ...
|
||||||
|
)
|
||||||
|
|
||||||
|
return {
|
||||||
|
'total_done': total_done,
|
||||||
|
'done_before_count': len(done_before),
|
||||||
|
'done_after_count': len(done_after),
|
||||||
|
'all_five_done': all_five_done,
|
||||||
|
'all_five_done_date': all_five_done_date,
|
||||||
|
'progress_at_purchase': progress_at_purchase,
|
||||||
|
'lessons_done_at_purchase': lessons_done_at_purchase,
|
||||||
|
'done_before': done_before,
|
||||||
|
'done_after': done_after,
|
||||||
|
}
|
||||||
|
|
||||||
|
records = []
|
||||||
|
for o in orders:
|
||||||
|
account_id = o['account_id']
|
||||||
|
level = o['level']
|
||||||
|
purchase_date = o['purchase_date']
|
||||||
|
register_date = register_map.get(account_id)
|
||||||
|
|
||||||
|
if register_date is None:
|
||||||
|
continue
|
||||||
|
|
||||||
|
user_ids = char_map.get(account_id, [])
|
||||||
|
lesson_map = L1_U0_LESSONS if level == 'L1' else L2_U0_LESSONS
|
||||||
|
lesson_data = l1_lesson_data if level == 'L1' else l2_lesson_data
|
||||||
|
|
||||||
|
user_completions = analyze_user_prologue(user_ids, lesson_data, lesson_map)
|
||||||
|
status = get_prologue_status(user_completions, lesson_map, purchase_date)
|
||||||
|
|
||||||
|
# 注册→购课天数
|
||||||
|
reg_to_pur = (purchase_date - register_date).days
|
||||||
|
|
||||||
|
# 注册→序章全完成天数
|
||||||
|
reg_to_all5 = None
|
||||||
|
if status['all_five_done_date']:
|
||||||
|
reg_to_all5 = (status['all_five_done_date'] - register_date).days
|
||||||
|
|
||||||
|
# 序章全完成→购课天数
|
||||||
|
all5_to_pur = None
|
||||||
|
if status['all_five_done_date']:
|
||||||
|
all5_to_pur = (purchase_date - status['all_five_done_date']).days
|
||||||
|
|
||||||
|
records.append({
|
||||||
|
'order_id': o['order_id'],
|
||||||
|
'account_id': account_id,
|
||||||
|
'level': level,
|
||||||
|
'register_date': register_date,
|
||||||
|
'purchase_date': purchase_date,
|
||||||
|
'reg_to_pur': reg_to_pur,
|
||||||
|
# 序章进度
|
||||||
|
'total_lessons_done': status['total_done'],
|
||||||
|
'lessons_done_before_purchase': status['done_before_count'],
|
||||||
|
'lessons_done_after_purchase': status['done_after_count'],
|
||||||
|
'progress_at_purchase': status['progress_at_purchase'],
|
||||||
|
'lessons_at_purchase': ','.join(status['lessons_done_at_purchase']),
|
||||||
|
# 五节全完成
|
||||||
|
'all_five_done': status['all_five_done'],
|
||||||
|
'all_five_done_date': status['all_five_done_date'],
|
||||||
|
'reg_to_all5': reg_to_all5,
|
||||||
|
'all5_to_pur': all5_to_pur,
|
||||||
|
})
|
||||||
|
|
||||||
|
print(f" 分析记录数: {len(records)}")
|
||||||
|
|
||||||
|
# ========== Step 6: 统计分析 ==========
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("【总体概览】")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
total_records = len(records)
|
||||||
|
all5_complete = [r for r in records if r['all_five_done']]
|
||||||
|
all5_incomplete = [r for r in records if not r['all_five_done']]
|
||||||
|
l1_records = [r for r in records if r['level'] == 'L1']
|
||||||
|
l2_records = [r for r in records if r['level'] == 'L2']
|
||||||
|
|
||||||
|
print(f"总记录数(订单维度): {total_records}")
|
||||||
|
print(f" - L1 订单: {len(l1_records)}")
|
||||||
|
print(f" - L2 订单: {len(l2_records)}")
|
||||||
|
print(f" 序章五节全完成: {len(all5_complete)}")
|
||||||
|
print(f" 序章五节未全完成: {len(all5_incomplete)}")
|
||||||
|
|
||||||
|
# ========== 购课时序章进度分布 ==========
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("【核心指标:购课时序章完成进度】")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
for level in ['L1', 'L2', 'ALL']:
|
||||||
|
if level == 'ALL':
|
||||||
|
data = records
|
||||||
|
else:
|
||||||
|
data = [r for r in records if r['level'] == level]
|
||||||
|
|
||||||
|
if not data:
|
||||||
|
continue
|
||||||
|
|
||||||
|
print(f"\n[{level}] 购课时完成了序章第几节:")
|
||||||
|
progress_dist = Counter(r['progress_at_purchase'] for r in data)
|
||||||
|
for p in range(6): # 0到5
|
||||||
|
cnt = progress_dist.get(p, 0)
|
||||||
|
pct = cnt / len(data) * 100
|
||||||
|
label = f" 完成{p}节(共5节)" if p > 0 else " 完成0节(未开始)"
|
||||||
|
bar = '█' * int(pct)
|
||||||
|
print(f" {label:24s}: {cnt:4d} ({pct:5.1f}%) {bar}")
|
||||||
|
|
||||||
|
# 具体到哪几节
|
||||||
|
print(f"\n[{level}] 购课时已完成哪些课时的分布:")
|
||||||
|
lesson_dist = Counter()
|
||||||
|
for r in data:
|
||||||
|
if r['progress_at_purchase'] == 0:
|
||||||
|
lesson_dist['(未完成任何节)'] += 1
|
||||||
|
else:
|
||||||
|
lesson_dist[r['lessons_at_purchase']] += 1
|
||||||
|
for lesson_combo, cnt in lesson_dist.most_common(15):
|
||||||
|
pct = cnt / len(data) * 100
|
||||||
|
label = f" {lesson_combo}"
|
||||||
|
print(f" {label:24s}: {cnt:4d} ({pct:5.1f}%)")
|
||||||
|
|
||||||
|
# ========== 购课后完成的情况 ==========
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("【购课后继续完成序章的情况】")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
for level in ['L1', 'L2', 'ALL']:
|
||||||
|
if level == 'ALL':
|
||||||
|
data = records
|
||||||
|
else:
|
||||||
|
data = [r for r in records if r['level'] == level]
|
||||||
|
|
||||||
|
if not data:
|
||||||
|
continue
|
||||||
|
|
||||||
|
print(f"\n[{level}]:")
|
||||||
|
print(f" 购课时序章已全部完成(5/5): {sum(1 for r in data if r['progress_at_purchase']==5)} ({sum(1 for r in data if r['progress_at_purchase']==5)/len(data)*100:.1f}%)")
|
||||||
|
print(f" 购课后续完成序章余下课时: {sum(1 for r in data if r['lessons_done_after_purchase']>0)} ({sum(1 for r in data if r['lessons_done_after_purchase']>0)/len(data)*100:.1f}%)")
|
||||||
|
|
||||||
|
# 购课后完成了多少节
|
||||||
|
after_dist = Counter(r['lessons_done_after_purchase'] for r in data if r['progress_at_purchase'] < 5)
|
||||||
|
if after_dist:
|
||||||
|
print(f" 购课时未完成的用户,购课后又完成节数分布:")
|
||||||
|
for n in sorted(after_dist.keys()):
|
||||||
|
cnt = after_dist[n]
|
||||||
|
pct = cnt / sum(1 for r in data if r['progress_at_purchase'] < 5) * 100
|
||||||
|
print(f" 又完成{n}节: {cnt} ({pct:.1f}%)")
|
||||||
|
|
||||||
|
# ========== 注册→购课天数 ==========
|
||||||
|
def percentile_vals(vals):
|
||||||
|
if not vals:
|
||||||
|
return {}
|
||||||
|
s = sorted(vals)
|
||||||
|
return {
|
||||||
|
'P10': s[int(len(s)*0.10)],
|
||||||
|
'P25': s[int(len(s)*0.25)],
|
||||||
|
'P50': s[int(len(s)*0.50)],
|
||||||
|
'P75': s[int(len(s)*0.75)],
|
||||||
|
'P90': s[int(len(s)*0.90)],
|
||||||
|
'mean': sum(s)/len(s),
|
||||||
|
'count': len(s),
|
||||||
|
}
|
||||||
|
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("【注册→购课 天数统计】")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
for level in ['L1', 'L2', 'ALL']:
|
||||||
|
if level == 'ALL':
|
||||||
|
data = records
|
||||||
|
else:
|
||||||
|
data = [r for r in records if r['level'] == level]
|
||||||
|
|
||||||
|
stat = percentile_vals([r['reg_to_pur'] for r in data])
|
||||||
|
if stat:
|
||||||
|
print(f"\n[{level}] 注册→购课:")
|
||||||
|
print(f" 样本: {stat['count']}, 中位数: {stat['P50']}天, 均值: {stat['mean']:.1f}天")
|
||||||
|
print(f" P10={stat['P10']}天 | P25={stat['P25']}天 | P75={stat['P75']}天 | P90={stat['P90']}天")
|
||||||
|
|
||||||
|
# ========== 序章五节全完成的用户统计 ==========
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("【序章五节全完成的用户(限定已完成5/5的用户)】")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
all5_l1 = [r for r in all5_complete if r['level']=='L1']
|
||||||
|
all5_l2 = [r for r in all5_complete if r['level']=='L2']
|
||||||
|
|
||||||
|
print(f"\n五节全完成用户: {len(all5_complete)}")
|
||||||
|
print(f" L1: {len(all5_l1)}")
|
||||||
|
print(f" L2: {len(all5_l2)}")
|
||||||
|
|
||||||
|
# 注册→全完成
|
||||||
|
for name, data in [('L1', all5_l1), ('L2', all5_l2), ('ALL', all5_complete)]:
|
||||||
|
vals = [r['reg_to_all5'] for r in data if r['reg_to_all5'] is not None]
|
||||||
|
stat = percentile_vals(vals)
|
||||||
|
if stat:
|
||||||
|
print(f"\n[{name}] 注册→序章五节全完成:")
|
||||||
|
print(f" 样本: {stat['count']}, 中位数: {stat['P50']}天, 均值: {stat['mean']:.1f}天")
|
||||||
|
print(f" P10={stat['P10']}天 | P25={stat['P25']}天 | P75={stat['P75']}天 | P90={stat['P90']}天")
|
||||||
|
|
||||||
|
# 全完成→购课
|
||||||
|
for name, data in [('L1', all5_l1), ('L2', all5_l2), ('ALL', all5_complete)]:
|
||||||
|
vals = [r['all5_to_pur'] for r in data if r['all5_to_pur'] is not None]
|
||||||
|
stat = percentile_vals(vals)
|
||||||
|
if stat:
|
||||||
|
print(f"\n[{name}] 序章五节全完成→购课:")
|
||||||
|
print(f" 样本: {stat['count']}, 中位数: {stat['P50']}天, 均值: {stat['mean']:.1f}天")
|
||||||
|
print(f" P10={stat['P10']}天 | P25={stat['P25']}天 | P75={stat['P75']}天 | P90={stat['P90']}天")
|
||||||
|
|
||||||
|
# 全完成→购课时间段分布
|
||||||
|
print(f"\n【五节全完成→购课 时间段分布】")
|
||||||
|
print(f"{'='*60}")
|
||||||
|
buckets = {
|
||||||
|
'负数(先购课后完课)': (-99999, -1),
|
||||||
|
'0天(当天)': (0, 0),
|
||||||
|
'1~3天': (1, 3),
|
||||||
|
'4~7天': (4, 7),
|
||||||
|
'8~14天': (8, 14),
|
||||||
|
'15~30天': (15, 30),
|
||||||
|
'31~60天': (31, 60),
|
||||||
|
'61~90天': (61, 90),
|
||||||
|
'90天以上': (91, 99999),
|
||||||
|
}
|
||||||
|
for name, data in [('L1', all5_l1), ('L2', all5_l2), ('ALL', all5_complete)]:
|
||||||
|
vals = [r['all5_to_pur'] for r in data if r['all5_to_pur'] is not None]
|
||||||
|
if not vals:
|
||||||
|
continue
|
||||||
|
print(f"\n[{name}]:")
|
||||||
|
for label, (lo, hi) in buckets.items():
|
||||||
|
cnt = sum(1 for v in vals if lo <= v <= hi)
|
||||||
|
pct = cnt / len(vals) * 100
|
||||||
|
bar = '█' * int(pct / 2)
|
||||||
|
print(f" {label:24s}: {cnt:4d} ({pct:5.1f}%) {bar}")
|
||||||
|
|
||||||
|
# ========== 时序关系 ==========
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("【五节全完成的用户:序章 vs 购课 时序关系】")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
for name, data in [('L1', all5_l1), ('L2', all5_l2), ('ALL', all5_complete)]:
|
||||||
|
pro_before = sum(1 for r in data if r['all5_to_pur'] is not None and r['all5_to_pur'] >= 0)
|
||||||
|
pro_after = sum(1 for r in data if r['all5_to_pur'] is not None and r['all5_to_pur'] < 0)
|
||||||
|
pro_same = sum(1 for r in data if r['all5_to_pur'] == 0)
|
||||||
|
print(f"\n[{name}]:")
|
||||||
|
print(f" 序章全部完成在购课之前/当天: {pro_before} ({pro_before/len(data)*100:.1f}%)")
|
||||||
|
print(f" 其中当天: {pro_same} ({pro_same/len(data)*100:.1f}%)")
|
||||||
|
print(f" 序章全部完成在购课之后: {pro_after} ({pro_after/len(data)*100:.1f}%)")
|
||||||
|
|
||||||
|
# ========== 导出CSV ==========
|
||||||
|
output_path = '/root/.openclaw/workspace/output/endor_purchase_analysis_v2.csv'
|
||||||
|
with open(output_path, 'w', newline='', encoding='utf-8-sig') as f:
|
||||||
|
writer = csv.writer(f)
|
||||||
|
writer.writerow([
|
||||||
|
'order_id', 'account_id', '等级', '注册日期', '购课日期',
|
||||||
|
'注册→购课(天)',
|
||||||
|
'序章完成节数(总计)', '购课前进度(节)', '购课后完成(节)',
|
||||||
|
'购课时已完成', '五节全完成', '五节全完成日期',
|
||||||
|
'注册→全完成(天)', '全完成→购课(天)'
|
||||||
|
])
|
||||||
|
for r in sorted(records, key=lambda x: x['purchase_date']):
|
||||||
|
writer.writerow([
|
||||||
|
r['order_id'], r['account_id'], r['level'],
|
||||||
|
r['register_date'].isoformat() if r['register_date'] else '',
|
||||||
|
r['purchase_date'].isoformat() if r['purchase_date'] else '',
|
||||||
|
r['reg_to_pur'],
|
||||||
|
r['total_lessons_done'], r['progress_at_purchase'],
|
||||||
|
r['lessons_done_after_purchase'],
|
||||||
|
r['lessons_at_purchase'],
|
||||||
|
'是' if r['all_five_done'] else '否',
|
||||||
|
r['all_five_done_date'].isoformat() if r['all_five_done_date'] else '',
|
||||||
|
r['reg_to_all5'] if r['reg_to_all5'] is not None else '',
|
||||||
|
r['all5_to_pur'] if r['all5_to_pur'] is not None else '',
|
||||||
|
])
|
||||||
|
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print(f"详细数据已导出: {output_path}")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
cur.close()
|
||||||
|
conn.close()
|
||||||
|
print("\n分析完成!")
|
||||||
370
scripts/endor_purchase_analysis_v3.py
Normal file
370
scripts/endor_purchase_analysis_v3.py
Normal file
@ -0,0 +1,370 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
端内购买用户行为分析 V3 - 详细交叉统计
|
||||||
|
"""
|
||||||
|
import psycopg2
|
||||||
|
from datetime import datetime, date
|
||||||
|
from collections import defaultdict, Counter
|
||||||
|
import csv
|
||||||
|
import os
|
||||||
|
|
||||||
|
PG_PASS = os.environ.get('PG_ONLINE_PASSWORD', "LdfjdjL83h3h3^$&**YGG*")
|
||||||
|
conn = psycopg2.connect(
|
||||||
|
host="bj-postgres-16pob4sg.sql.tencentcdb.com", port=28591,
|
||||||
|
user="ai_member", password=PG_PASS, database="vala_bi"
|
||||||
|
)
|
||||||
|
cur = conn.cursor()
|
||||||
|
|
||||||
|
L1_U0_LESSONS = {343: 'L01', 344: 'L02', 345: 'L03', 346: 'L04', 348: 'L05'}
|
||||||
|
L2_U0_LESSONS = {55: 'L01', 56: 'L02', 57: 'L03', 58: 'L04', 59: 'L05'}
|
||||||
|
LEVEL_MAP = {57:'L1',60:'L1',63:'L1', 31:'L2',32:'L2',33:'L2',54:'L2'}
|
||||||
|
|
||||||
|
print("=" * 80)
|
||||||
|
print("端内购买用户行为分析 V3 — 详细交叉统计")
|
||||||
|
print("=" * 80)
|
||||||
|
|
||||||
|
# ============ 查询数据 ============
|
||||||
|
print("\n[数据加载]")
|
||||||
|
|
||||||
|
cur.execute("""
|
||||||
|
SELECT o.id, o.account_id, o.goods_id, o.pay_success_date
|
||||||
|
FROM bi_vala_order o
|
||||||
|
JOIN bi_vala_app_account a ON o.account_id = a.id AND a.status = 1
|
||||||
|
WHERE o.key_from = 'app-active-h5-0-0' AND o.order_status IN (3, 4) AND o.pay_success_date IS NOT NULL
|
||||||
|
ORDER BY o.pay_success_date
|
||||||
|
""")
|
||||||
|
orders = []
|
||||||
|
for row in cur.fetchall():
|
||||||
|
lv = LEVEL_MAP.get(row[2])
|
||||||
|
if not lv: continue
|
||||||
|
pd_ = row[3]; orders.append({'order_id':row[0],'account_id':row[1],'goods_id':row[2],'purchase_date':pd_.date() if isinstance(pd_,datetime) else pd_,'level':lv})
|
||||||
|
print(f" 订单: {len(orders)}")
|
||||||
|
|
||||||
|
all_accounts = list(set(o['account_id'] for o in orders))
|
||||||
|
register_map = {}
|
||||||
|
for i in range(0,len(all_accounts),500):
|
||||||
|
cur.execute("SELECT id,created_at FROM bi_vala_app_account WHERE id IN %s",(tuple(all_accounts[i:i+500]),))
|
||||||
|
for r in cur.fetchall():
|
||||||
|
d=r[1];register_map[r[0]]=d.date() if isinstance(d,(datetime,date)) else d
|
||||||
|
print(f" 注册时间: {len(register_map)}")
|
||||||
|
|
||||||
|
char_map = defaultdict(list)
|
||||||
|
for i in range(0,len(all_accounts),500):
|
||||||
|
cur.execute("SELECT account_id,id FROM bi_vala_app_character WHERE account_id IN %s",(tuple(all_accounts[i:i+500]),))
|
||||||
|
for r in cur.fetchall(): char_map[r[0]].append(r[1])
|
||||||
|
print(f" 角色映射: {sum(1 for k,v in char_map.items() if v)} 用户")
|
||||||
|
|
||||||
|
all_user_ids = set()
|
||||||
|
for acct in all_accounts:
|
||||||
|
for uid in char_map.get(acct,[]): all_user_ids.add(uid)
|
||||||
|
|
||||||
|
def query_lesson_completion(user_ids, chapter_ids_dict):
|
||||||
|
result = defaultdict(dict)
|
||||||
|
if not user_ids: return result
|
||||||
|
cids = list(chapter_ids_dict.keys())
|
||||||
|
union = []
|
||||||
|
for t in range(8):
|
||||||
|
union.append(f"""
|
||||||
|
SELECT user_id, chapter_id, MIN(created_at) as fd
|
||||||
|
FROM bi_user_chapter_play_record_{t}
|
||||||
|
WHERE chapter_id IN ({','.join(map(str,cids))})
|
||||||
|
AND play_status=1 AND user_id IN %(uids)s
|
||||||
|
GROUP BY user_id, chapter_id
|
||||||
|
""")
|
||||||
|
sql = f"SELECT user_id, chapter_id, MIN(fd) FROM ({' UNION ALL '.join(union)}) sub GROUP BY user_id, chapter_id"
|
||||||
|
for i in range(0,len(user_ids),2000):
|
||||||
|
cur.execute(sql,{'uids':tuple(user_ids[i:i+2000])})
|
||||||
|
for r in cur.fetchall(): result[r[0]][r[1]]=r[2].date() if isinstance(r[2],(datetime,date)) else r[2]
|
||||||
|
return result
|
||||||
|
|
||||||
|
l1_data = query_lesson_completion(list(all_user_ids), L1_U0_LESSONS)
|
||||||
|
l2_data = query_lesson_completion(list(all_user_ids), L2_U0_LESSONS)
|
||||||
|
print(f" L1序章完成: {len(l1_data)} user_ids, L2序章完成: {len(l2_data)} user_ids")
|
||||||
|
|
||||||
|
# ============ 逐订单分析 ============
|
||||||
|
records = []
|
||||||
|
for o in orders:
|
||||||
|
lesson_map = L1_U0_LESSONS if o['level']=='L1' else L2_U0_LESSONS
|
||||||
|
lesson_data = l1_data if o['level']=='L1' else l2_data
|
||||||
|
reg_d = register_map.get(o['account_id'])
|
||||||
|
if not reg_d: continue
|
||||||
|
pur_d = o['purchase_date']
|
||||||
|
|
||||||
|
# 汇总所有角色的序章完成情况
|
||||||
|
merged = {}
|
||||||
|
for uid in char_map.get(o['account_id'],[]):
|
||||||
|
if uid not in lesson_data: continue
|
||||||
|
for cid, dt in lesson_data[uid].items():
|
||||||
|
if cid not in merged or dt < merged[cid]: merged[cid]=dt
|
||||||
|
|
||||||
|
# 每节课的完成日期和距注册天数
|
||||||
|
lesson_dates = {}
|
||||||
|
for cid, dt in merged.items():
|
||||||
|
lesson_dates[lesson_map[cid]] = {'date': dt, 'days_from_reg': (dt - reg_d).days}
|
||||||
|
|
||||||
|
# 购课前完成哪些
|
||||||
|
done_before = {ln: info for ln, info in lesson_dates.items() if info['date'] <= pur_d}
|
||||||
|
done_after = {ln: info for ln, info in lesson_dates.items() if info['date'] > pur_d}
|
||||||
|
|
||||||
|
all5_done = len(lesson_dates) == 5
|
||||||
|
all5_date = max(info['date'] for info in lesson_dates.values()) if all5_done else None
|
||||||
|
all5_to_pur = (pur_d - all5_date).days if all5_date else None
|
||||||
|
reg_to_all5 = (all5_date - reg_d).days if all5_date else None
|
||||||
|
|
||||||
|
# 购课时进度:完成了几节,已完成哪些节
|
||||||
|
progress = len(done_before)
|
||||||
|
lessons_done = sorted(done_before.keys(), key=lambda x: int(x[1:]))
|
||||||
|
|
||||||
|
# 购课落在哪两节之间
|
||||||
|
# e.g. 完成了 L01 L02,则购课时间在 L02完成后 → "L02之后"
|
||||||
|
if progress == 0:
|
||||||
|
purchase_position = "L01之前"
|
||||||
|
elif progress == 5:
|
||||||
|
purchase_position = "L05之后"
|
||||||
|
else:
|
||||||
|
purchase_position = f"{lessons_done[-1]}之后(L{int(lessons_done[-1][1:])+1:02d}之前)"
|
||||||
|
|
||||||
|
records.append({
|
||||||
|
'order_id': o['order_id'], 'account_id': o['account_id'],
|
||||||
|
'level': o['level'], 'register_date': reg_d, 'purchase_date': pur_d,
|
||||||
|
'reg_to_pur': (pur_d - reg_d).days,
|
||||||
|
'progress': progress, 'lessons_done': lessons_done,
|
||||||
|
'done_before': done_before, 'done_after': done_after,
|
||||||
|
'all5_done': all5_done, 'all5_date': all5_date,
|
||||||
|
'reg_to_all5': reg_to_all5, 'all5_to_pur': all5_to_pur,
|
||||||
|
'purchase_position': purchase_position,
|
||||||
|
'lesson_dates': lesson_dates,
|
||||||
|
})
|
||||||
|
|
||||||
|
print(f" 分析记录: {len(records)}")
|
||||||
|
l1_recs = [r for r in records if r['level']=='L1']
|
||||||
|
l2_recs = [r for r in records if r['level']=='L2']
|
||||||
|
all5_recs = [r for r in records if r['all5_done']]
|
||||||
|
|
||||||
|
# ====================================================================
|
||||||
|
# 交叉统计1: 购课时完成节数 × 注册→购课天数
|
||||||
|
# ====================================================================
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("【交叉1】购课时完成节数 × 注册→购课天数")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
for level, data in [('L1',l1_recs),('L2',l2_recs),('ALL',records)]:
|
||||||
|
print(f"\n[{level}]")
|
||||||
|
# 构建交叉表
|
||||||
|
day_ranges = [(0,1),(2,3),(4,7),(8,14),(15,30),(31,60),(61,999)]
|
||||||
|
print(f" {'购课进度':<20}", end='')
|
||||||
|
for lo, hi in day_ranges:
|
||||||
|
label = f"{lo}天" if lo==hi else f"{lo}-{hi}天"
|
||||||
|
print(f"{label:>10}", end='')
|
||||||
|
print(f" {'合计':>8}")
|
||||||
|
|
||||||
|
for prog in range(6):
|
||||||
|
subset = [r for r in data if r['progress']==prog]
|
||||||
|
label = f" 完成{prog}节" if prog>0 else f" 完成0节"
|
||||||
|
print(f"{label:<20}", end='')
|
||||||
|
total = 0
|
||||||
|
for lo, hi in day_ranges:
|
||||||
|
cnt = sum(1 for r in subset if lo <= r['reg_to_pur'] <= hi)
|
||||||
|
print(f"{cnt:>10}", end='')
|
||||||
|
total += cnt
|
||||||
|
print(f" {total:>8}")
|
||||||
|
|
||||||
|
# ====================================================================
|
||||||
|
# 交叉统计2: 购课时完成节数 × 完课→购课天数(仅五节全完成用户)
|
||||||
|
# ====================================================================
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("【交叉2】购课时完成节数 × 全完成→购课天数(仅五节全完成用户)")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
all5_day_ranges = [(-999,-1),(0,0),(1,3),(4,7),(8,14),(15,30),(31,60),(61,999)]
|
||||||
|
|
||||||
|
for level, data in [('L1',[r for r in l1_recs if r['all5_done']]),
|
||||||
|
('L2',[r for r in l2_recs if r['all5_done']]),
|
||||||
|
('ALL',all5_recs)]:
|
||||||
|
print(f"\n[{level}]")
|
||||||
|
print(f" {'购课进度':<20}", end='')
|
||||||
|
for lo, hi in all5_day_ranges:
|
||||||
|
if lo==-999: label="先购课后完课"
|
||||||
|
elif lo==hi: label=f"{lo}天"
|
||||||
|
else: label=f"{lo}-{hi}天"
|
||||||
|
print(f"{label:>14}", end='')
|
||||||
|
print(f" {'合计':>8}")
|
||||||
|
|
||||||
|
for prog in range(6):
|
||||||
|
subset = [r for r in data if r['progress']==prog]
|
||||||
|
label = f" 完成{prog}节" if prog>0 else f" 完成0节"
|
||||||
|
print(f"{label:<20}", end='')
|
||||||
|
total = 0
|
||||||
|
for lo, hi in all5_day_ranges:
|
||||||
|
cnt = sum(1 for r in subset if r['all5_to_pur'] is not None and lo <= r['all5_to_pur'] <= hi)
|
||||||
|
print(f"{cnt:>14}", end='')
|
||||||
|
total += cnt
|
||||||
|
print(f" {total:>8}")
|
||||||
|
|
||||||
|
# ====================================================================
|
||||||
|
# 交叉统计3: 购课落在序章各节之间的分布
|
||||||
|
# ====================================================================
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("【交叉3】购课时机:落在序章第几节之后")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
for level, data in [('L1',l1_recs),('L2',l2_recs),('ALL',records)]:
|
||||||
|
dist = Counter(r['purchase_position'] for r in data)
|
||||||
|
print(f"\n[{level}]")
|
||||||
|
positions = ['L01之前','L01之后(L02之前)','L02之后(L03之前)','L03之后(L04之前)','L04之后(L05之前)','L05之后']
|
||||||
|
for pos in positions:
|
||||||
|
cnt = dist.get(pos, 0)
|
||||||
|
pct = cnt/len(data)*100
|
||||||
|
bar = '█' * int(pct)
|
||||||
|
print(f" {pos:24s}: {cnt:4d} ({pct:5.1f}%) {bar}")
|
||||||
|
|
||||||
|
# ====================================================================
|
||||||
|
# 交叉统计4: 逐节课完成时间分布
|
||||||
|
# ====================================================================
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("【交叉4】序章每节课:从注册到完成的天数分布")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
lesson_days_ranges = [(0,0),(1,1),(2,3),(4,7),(8,14),(15,30),(31,999)]
|
||||||
|
lesson_order = ['L01','L02','L03','L04','L05']
|
||||||
|
|
||||||
|
for level, data in [('L1',l1_recs),('L2',l2_recs),('ALL',records)]:
|
||||||
|
print(f"\n[{level}] ({len(data)} 条记录)")
|
||||||
|
print(f" {'课时':<8}", end='')
|
||||||
|
for lo, hi in lesson_days_ranges:
|
||||||
|
label = f"{lo}天" if lo==hi else f"{lo}-{hi}天"
|
||||||
|
print(f"{label:>10}", end='')
|
||||||
|
print(f" {'完成人数':>10} {'中位数':>8} {'均值':>8}")
|
||||||
|
|
||||||
|
for ln in lesson_order:
|
||||||
|
vals = []
|
||||||
|
for r in data:
|
||||||
|
if ln in r['lesson_dates']:
|
||||||
|
vals.append(r['lesson_dates'][ln]['days_from_reg'])
|
||||||
|
if not vals:
|
||||||
|
print(f" {ln:<8} {'(无数据)':>60}")
|
||||||
|
continue
|
||||||
|
sv = sorted(vals)
|
||||||
|
med = sv[len(sv)//2]
|
||||||
|
avg = sum(vals)/len(vals)
|
||||||
|
print(f" {ln:<8}", end='')
|
||||||
|
for lo, hi in lesson_days_ranges:
|
||||||
|
cnt = sum(1 for v in vals if lo <= v <= hi)
|
||||||
|
pct = cnt / len(vals) * 100
|
||||||
|
print(f"{cnt:>5}({pct:4.1f}%)", end='')
|
||||||
|
print(f" {len(vals):>10} {med:>8}天 {avg:>8.1f}天")
|
||||||
|
|
||||||
|
# ====================================================================
|
||||||
|
# 交叉统计5: 购课时完成节数 × L1/L2 × 是否有后续完成
|
||||||
|
# ====================================================================
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("【交叉5】购课时未全部完成的用户:购课后完成余下序章的情况")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
not_all5_at_purchase = [r for r in records if r['progress'] < 5]
|
||||||
|
for level, data in [('L1',[r for r in l1_recs if r['progress']<5]),
|
||||||
|
('L2',[r for r in l2_recs if r['progress']<5]),
|
||||||
|
('ALL',not_all5_at_purchase)]:
|
||||||
|
if not data: continue
|
||||||
|
print(f"\n[{level}] 购课时未全部完成: {len(data)} 人")
|
||||||
|
for prog in range(5):
|
||||||
|
subset = [r for r in data if r['progress']==prog]
|
||||||
|
if not subset: continue
|
||||||
|
after_done_dist = Counter(r['all5_done'] for r in subset)
|
||||||
|
all5_cnt = after_done_dist.get(True, 0)
|
||||||
|
not5_cnt = after_done_dist.get(False, 0)
|
||||||
|
print(f" 购课时完成{prog}节: {len(subset)}人 → 最终完成为5节: {all5_cnt}人, 仍未完成: {not5_cnt}人")
|
||||||
|
|
||||||
|
# ====================================================================
|
||||||
|
# 交叉统计6: 先购课后完成序章的用户详细分析
|
||||||
|
# ====================================================================
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("【交叉6】先购课后才完成序章的用户(16人)详细分析")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
buy_before_finish = [r for r in all5_recs if r['all5_to_pur'] is not None and r['all5_to_pur'] < 0]
|
||||||
|
print(f"\n共 {len(buy_before_finish)} 人,购课后才完成序章全部五节\n")
|
||||||
|
print(f" {'等级':<6} {'购课时进度':<14} {'购课日期':<14} {'全完成日期':<14} {'购课→完课':>10} {'注册→购课':>10} {'注册→全完':>10}")
|
||||||
|
for r in sorted(buy_before_finish, key=lambda x: x['all5_to_pur']):
|
||||||
|
print(f" {r['level']:<6} {'完成'+str(r['progress'])+'节':<14} {str(r['purchase_date']):<14} {str(r['all5_date']):<14} {abs(r['all5_to_pur']):>10}天 {r['reg_to_pur']:>10}天 {r['reg_to_all5']:>10}天")
|
||||||
|
|
||||||
|
# ====================================================================
|
||||||
|
# 交叉统计7: 逐月趋势 — 购课时完成节数分布
|
||||||
|
# ====================================================================
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("【交叉7】逐月趋势:购课时完成 5/5 节的比例")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
monthly = defaultdict(lambda: {'total':0,'all5':0,'partial':0})
|
||||||
|
for r in records:
|
||||||
|
m = r['purchase_date'].strftime('%Y-%m')
|
||||||
|
monthly[m]['total'] += 1
|
||||||
|
if r['progress'] == 5:
|
||||||
|
monthly[m]['all5'] += 1
|
||||||
|
else:
|
||||||
|
monthly[m]['partial'] += 1
|
||||||
|
|
||||||
|
for m in sorted(monthly.keys()):
|
||||||
|
d = monthly[m]
|
||||||
|
pct = d['all5']/d['total']*100
|
||||||
|
print(f" {m}: 总{d['total']:>4}单 | 购课时已完成5节={d['all5']:>3}人({pct:5.1f}%) | 未完成={d['partial']:>2}人")
|
||||||
|
|
||||||
|
# ====================================================================
|
||||||
|
# 交叉统计8: 注册→购课 按购课时进度分层
|
||||||
|
# ====================================================================
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("【交叉8】按购课时进度分层:各组注册→购课 天数统计")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
def pct(vals, p):
|
||||||
|
if not vals: return '-'
|
||||||
|
s = sorted(vals)
|
||||||
|
return s[int(len(s)*p/100)]
|
||||||
|
|
||||||
|
for level, data in [('L1',l1_recs),('L2',l2_recs),('ALL',records)]:
|
||||||
|
print(f"\n[{level}]")
|
||||||
|
print(f" {'购课时进度':<16} {'人数':>6} {'P10':>6} {'P25':>6} {'中位':>6} {'P75':>6} {'P90':>6} {'均值':>8}")
|
||||||
|
for prog in range(6):
|
||||||
|
subset = [r for r in data if r['progress']==prog]
|
||||||
|
if not subset: continue
|
||||||
|
vals = [r['reg_to_pur'] for r in subset]
|
||||||
|
print(f" {'完成'+str(prog)+'节':<16} {len(subset):>6} "
|
||||||
|
f"{pct(vals,10):>6} {pct(vals,25):>6} {pct(vals,50):>6} "
|
||||||
|
f"{pct(vals,75):>6} {pct(vals,90):>6} {sum(vals)/len(vals):>8.1f}")
|
||||||
|
|
||||||
|
# ====================================================================
|
||||||
|
# 导出CSV(带序章每节课完成明细)
|
||||||
|
# ====================================================================
|
||||||
|
output_path = '/root/.openclaw/workspace/output/endor_purchase_analysis_v3.csv'
|
||||||
|
with open(output_path, 'w', newline='', encoding='utf-8-sig') as f:
|
||||||
|
writer = csv.writer(f)
|
||||||
|
writer.writerow([
|
||||||
|
'order_id','account_id','等级','注册日期','购课日期','注册→购课(天)',
|
||||||
|
'购课时序章进度(节)','购课时已完成哪些课',
|
||||||
|
'L01完成日期','L01注册后几天','L02完成日期','L02注册后几天',
|
||||||
|
'L03完成日期','L03注册后几天','L04完成日期','L04注册后几天',
|
||||||
|
'L05完成日期','L05注册后几天',
|
||||||
|
'五节全完成','五节全完成日期','注册→全完成(天)','全完成→购课(天)',
|
||||||
|
'购课时机'
|
||||||
|
])
|
||||||
|
for r in sorted(records, key=lambda x: x['purchase_date']):
|
||||||
|
row = [r['order_id'],r['account_id'],r['level'],
|
||||||
|
r['register_date'].isoformat(),r['purchase_date'].isoformat(),r['reg_to_pur'],
|
||||||
|
r['progress'],','.join(r['lessons_done'])]
|
||||||
|
for ln in lesson_order:
|
||||||
|
info = r['lesson_dates'].get(ln, {})
|
||||||
|
row.append(info.get('date','').isoformat() if info.get('date') else '')
|
||||||
|
row.append(info.get('days_from_reg','') if info else '')
|
||||||
|
row.append('是' if r['all5_done'] else '否')
|
||||||
|
row.append(r['all5_date'].isoformat() if r['all5_date'] else '')
|
||||||
|
row.append(r['reg_to_all5'] if r['reg_to_all5'] is not None else '')
|
||||||
|
row.append(r['all5_to_pur'] if r['all5_to_pur'] is not None else '')
|
||||||
|
row.append(r['purchase_position'])
|
||||||
|
writer.writerow(row)
|
||||||
|
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print(f"已导出: {output_path}")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
cur.close()
|
||||||
|
conn.close()
|
||||||
|
print("分析完成!")
|
||||||
Loading…
Reference in New Issue
Block a user