auto-sync: refund-user-learning-analysis 2026-04-15_19:06
This commit is contained in:
commit
812e47763c
85
SKILL.md
Normal file
85
SKILL.md
Normal file
@ -0,0 +1,85 @@
|
|||||||
|
---
|
||||||
|
name: refund-user-learning-analysis
|
||||||
|
description: |
|
||||||
|
退费用户U0学习数据分析工具。统计指定时间段内购课并退费的用户在U0阶段的学习表现,
|
||||||
|
包括课程巩固(Review)正确率与用时、单元强化(Summary)参与与完成、单元挑战(Challenge)各维度成绩。
|
||||||
|
支持自动剔除脏数据、生成Excel多Sheet报表。
|
||||||
|
|
||||||
|
**触发场景**:
|
||||||
|
(1) 统计退费用户的学习数据/学习情况
|
||||||
|
(2) 分析退费用户在U0阶段的巩固/强化/挑战表现
|
||||||
|
(3) 退费用户有多少完成了U0课程
|
||||||
|
(4) 退费用户的学习完成率、正确率统计
|
||||||
|
(5) 用户提到"退费用户"+"学习数据/巩固/强化/挑战"的组合
|
||||||
|
---
|
||||||
|
|
||||||
|
# 退费用户U0学习数据分析
|
||||||
|
|
||||||
|
## 分析流程
|
||||||
|
|
||||||
|
### Step 1: 确认参数
|
||||||
|
|
||||||
|
向用户确认:
|
||||||
|
- **时间范围**: 订单付款的起止日期 (默认当月)
|
||||||
|
- **是否剔除仍有有效订单的用户**: 默认剔除
|
||||||
|
- **巩固用时异常阈值**: 默认 60 分钟,超过视为脏数据
|
||||||
|
|
||||||
|
### Step 2: 执行数据查询
|
||||||
|
|
||||||
|
运行查询脚本,传入参数:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 scripts/query_refund_learning.py \
|
||||||
|
--start 2026-04-01 --end 2026-05-01 \
|
||||||
|
--output /tmp/refund_learning_report.json \
|
||||||
|
--pure true --outlier 60
|
||||||
|
```
|
||||||
|
|
||||||
|
脚本自动完成:
|
||||||
|
1. 筛选时间段内购课且退费(order_status=4 + refund status=3)的用户
|
||||||
|
2. 可选剔除仍持有有效订单(order_status=3)的用户
|
||||||
|
3. 关联角色表找到user_id,查8张分表判断U0五节课完成情况
|
||||||
|
4. 统计巩固(Review)用时和正确率(从question_list JSON解析isRight)
|
||||||
|
5. 统计强化(Summary)进入和各知识模块完成情况
|
||||||
|
6. 统计挑战(Challenge)四维度参与和Perfect/Good/Oops分布
|
||||||
|
7. 自动识别并剔除巩固用时异常数据
|
||||||
|
|
||||||
|
### Step 3: 生成 Excel 报表
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 scripts/generate_excel.py \
|
||||||
|
--input /tmp/refund_learning_report.json \
|
||||||
|
--output /tmp/退费用户U0学习数据统计.xlsx
|
||||||
|
```
|
||||||
|
|
||||||
|
生成5个Sheet: 总览、课程巩固、单元强化、单元挑战、剔除的异常数据。
|
||||||
|
|
||||||
|
### Step 4: 发送文件
|
||||||
|
|
||||||
|
使用 `feishu-send-file` skill 将 Excel 文件发送给用户。
|
||||||
|
|
||||||
|
## 数据口径
|
||||||
|
|
||||||
|
- **退费用户**: `bi_vala_order.order_status = 4` 且 `bi_refund_order.status = 3`,通过 `out_trade_no` 关联
|
||||||
|
- **纯退费用户**: 上述用户中无任何 `order_status = 3` 的有效订单
|
||||||
|
- **完成U0**: 用户至少完成 L1-U0 或 L2-U0 的全部5节课 (`play_status = 1`)
|
||||||
|
- **巩固正确率**: `question_list` JSON 中 `isRight=true` 的数量 / 总题数 × 100
|
||||||
|
- **强化完成**: 做完该单元所有知识模块 (L1=3个, L2=4个)
|
||||||
|
- **挑战成绩**: 首次各维度的 `score_text` (Perfect/Good/Oops)
|
||||||
|
- **测试账号**: 通过 `bi_vala_app_account.status = 1` 过滤
|
||||||
|
|
||||||
|
## 扩展到其他单元
|
||||||
|
|
||||||
|
修改脚本中的 chapter_id 和 story_id 映射即可统计其他单元。
|
||||||
|
映射关系详见 `references/data-model.md`。
|
||||||
|
|
||||||
|
查询 `bi_level_unit_lesson` 表获取任意单元的 chapter_id:
|
||||||
|
```sql
|
||||||
|
SELECT * FROM bi_level_unit_lesson WHERE course_unit = 'U01' ORDER BY course_level;
|
||||||
|
```
|
||||||
|
|
||||||
|
查询 story_id:
|
||||||
|
```sql
|
||||||
|
SELECT DISTINCT story_id, level FROM bi_user_unit_review_question_result
|
||||||
|
WHERE chapter_id IN (<target_chapter_ids>) LIMIT 5;
|
||||||
|
```
|
||||||
86
references/data-model.md
Normal file
86
references/data-model.md
Normal file
@ -0,0 +1,86 @@
|
|||||||
|
# 退费用户学习数据 - 数据模型参考
|
||||||
|
|
||||||
|
## 数据库连接
|
||||||
|
|
||||||
|
| 库名 | 用途 | 主机 | 端口 | 用户 | 密码来源 |
|
||||||
|
|------|------|------|------|------|----------|
|
||||||
|
| vala_bi (PG) | BI统计表、同步表 | bj-postgres-16pob4sg.sql.tencentcdb.com | 28591 | ai_member | secrets.env → PG_ONLINE_PASSWORD |
|
||||||
|
| vala (PG) | 用户学习记录源表 | 同上 | 同上 | 同上 | 同上 |
|
||||||
|
|
||||||
|
## 核心表
|
||||||
|
|
||||||
|
### 订单相关 (vala_bi)
|
||||||
|
|
||||||
|
| 表名 | 说明 |
|
||||||
|
|------|------|
|
||||||
|
| `bi_vala_order` | 订单表。`order_status=3` 已支付, `order_status=4` 已退款, `pay_success_date` 付款时间 |
|
||||||
|
| `bi_refund_order` | 退费表。通过 `out_trade_no` 与订单关联, `status=3` 退费成功 |
|
||||||
|
| `bi_vala_app_account` | 账号表。`status=1` 正常, `status=2` 测试账号 |
|
||||||
|
| `bi_vala_app_character` | 角色表。`account_id` 关联账号, 一个账号可有多个角色 |
|
||||||
|
|
||||||
|
### 课时完成 (vala_bi)
|
||||||
|
|
||||||
|
| 表名 | 说明 |
|
||||||
|
|------|------|
|
||||||
|
| `bi_user_chapter_play_record_{0-7}` | 课时游玩记录分表(按user_id%8)。`play_status=1` 完成, `chapter_id` 课时ID |
|
||||||
|
| `bi_level_unit_lesson` | 课程结构映射表。`id`=chapter_id, 含 course_level/season/unit/lesson |
|
||||||
|
|
||||||
|
### 巩固 (vala_bi)
|
||||||
|
|
||||||
|
| 表名 | 说明 |
|
||||||
|
|------|------|
|
||||||
|
| `bi_user_unit_review_question_result` | 课程巩固记录。`chapter_id` 课时, `play_time` 用时(ms), `question_list` JSON含isRight |
|
||||||
|
|
||||||
|
### 单元强化 (vala_bi)
|
||||||
|
|
||||||
|
| 表名 | 说明 |
|
||||||
|
|------|------|
|
||||||
|
| `bi_user_unit_summary_km_result` | 强化练习记录。`story_id` 为 GameInfo.ID, `km_type` 知识模块类型(vocab/pron/sentence/grammar) |
|
||||||
|
|
||||||
|
### 单元挑战 (vala_bi)
|
||||||
|
|
||||||
|
| 表名 | 说明 |
|
||||||
|
|------|------|
|
||||||
|
| `bi_user_unit_challenge_question_result` | 挑战记录。`story_id` GameInfo.ID, `category` 维度(listening/speaking/reading/writing), `score_text` 评分(Perfect/Good/Oops) |
|
||||||
|
|
||||||
|
### 完成记录 (vala库)
|
||||||
|
|
||||||
|
| 表名 | 说明 |
|
||||||
|
|------|------|
|
||||||
|
| `user_learn_record_report_summary_{3-7}` | 学习完成汇总分表。`learn_card_type=1,record_type=3` 强化完成, `learn_card_type=1,record_type=4` 挑战完成 |
|
||||||
|
|
||||||
|
## U0 关键ID映射
|
||||||
|
|
||||||
|
### Chapter ID (课时)
|
||||||
|
|
||||||
|
| 等级 | 课时 | chapter_id |
|
||||||
|
|------|------|-----------|
|
||||||
|
| L1-U0 | L01~L05 | 343, 344, 345, 346, 348 |
|
||||||
|
| L2-U0 | L01~L05 | 55, 56, 57, 58, 59 |
|
||||||
|
|
||||||
|
### Story ID (单元)
|
||||||
|
|
||||||
|
| 等级 | story_id |
|
||||||
|
|------|----------|
|
||||||
|
| L1-U0 | 65 |
|
||||||
|
| L2-U0 | 8 |
|
||||||
|
|
||||||
|
### 知识模块
|
||||||
|
|
||||||
|
| 等级 | km_type 列表 | 总数 |
|
||||||
|
|------|-------------|------|
|
||||||
|
| L1-U0 | vocab, pron, sentence | 3 |
|
||||||
|
| L2-U0 | vocab, pron, sentence, grammar | 4 |
|
||||||
|
|
||||||
|
### 挑战维度
|
||||||
|
|
||||||
|
| 等级 | category 列表 |
|
||||||
|
|------|--------------|
|
||||||
|
| L1-U0 | listening, speaking |
|
||||||
|
| L2-U0 | listening, speaking, reading, writing |
|
||||||
|
|
||||||
|
## 课程结构映射公式
|
||||||
|
|
||||||
|
- `UnitIndex = (SeasonOfQuarter - 1) * 12 + GameInfo.Index`
|
||||||
|
- `ChapterIndex = UnitIndex * 5 + Chapter.Index`
|
||||||
|
- U0 对应 `season_package_index = 0, unit_index = 0`
|
||||||
92
scripts/generate_excel.py
Normal file
92
scripts/generate_excel.py
Normal file
@ -0,0 +1,92 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
从 JSON 结果生成 Excel 报表
|
||||||
|
用法: python3 generate_excel.py --input /tmp/report.json --output /tmp/report.xlsx
|
||||||
|
"""
|
||||||
|
import argparse, json
|
||||||
|
import openpyxl
|
||||||
|
from openpyxl.styles import Font, Alignment, PatternFill, Border, Side
|
||||||
|
from openpyxl.utils import get_column_letter
|
||||||
|
|
||||||
|
def style_sheet(ws):
|
||||||
|
hfont = Font(bold=True, size=11)
|
||||||
|
hfill = PatternFill(start_color="D9E1F2", end_color="D9E1F2", fill_type="solid")
|
||||||
|
halign = Alignment(horizontal="center", vertical="center", wrap_text=True)
|
||||||
|
calign = Alignment(horizontal="center", vertical="center")
|
||||||
|
border = Border(left=Side(style='thin'), right=Side(style='thin'),
|
||||||
|
top=Side(style='thin'), bottom=Side(style='thin'))
|
||||||
|
for col in range(1, ws.max_column + 1):
|
||||||
|
cell = ws.cell(row=1, column=col)
|
||||||
|
cell.font, cell.fill, cell.alignment, cell.border = hfont, hfill, halign, border
|
||||||
|
for row in range(2, ws.max_row + 1):
|
||||||
|
for col in range(1, ws.max_column + 1):
|
||||||
|
cell = ws.cell(row=row, column=col)
|
||||||
|
cell.alignment, cell.border = calign, border
|
||||||
|
for col in range(1, ws.max_column + 1):
|
||||||
|
mx = max((len(str(ws.cell(r, col).value or "")) for r in range(1, ws.max_row + 1)), default=5)
|
||||||
|
ws.column_dimensions[get_column_letter(col)].width = max(mx + 4, 10)
|
||||||
|
|
||||||
|
def main():
|
||||||
|
p = argparse.ArgumentParser()
|
||||||
|
p.add_argument("--input", required=True)
|
||||||
|
p.add_argument("--output", required=True)
|
||||||
|
args = p.parse_args()
|
||||||
|
|
||||||
|
with open(args.input) as f:
|
||||||
|
data = json.load(f)
|
||||||
|
|
||||||
|
wb = openpyxl.Workbook()
|
||||||
|
|
||||||
|
# Sheet 1: Overview
|
||||||
|
ws = wb.active
|
||||||
|
ws.title = "总览"
|
||||||
|
ws.append(["指标", "数值"])
|
||||||
|
fun = data["funnel"]
|
||||||
|
ws.append(["购课退费用户总数", fun["total_refund"]])
|
||||||
|
ws.append(["剔除仍有有效订单后", fun["pure_refund"]])
|
||||||
|
ws.append(["其中完成U0全部5节课", fun["completed_u0"]])
|
||||||
|
ws.append([" - 仅完成L1-U0", fun["l1_only"]])
|
||||||
|
ws.append([" - 仅完成L2-U0", fun["l2_only"]])
|
||||||
|
ws.append([" - L1+L2都完成", fun["both"]])
|
||||||
|
ws.append(["完成U0占比", f"{round(fun['completed_u0']/fun['pure_refund']*100, 1)}%"])
|
||||||
|
style_sheet(ws)
|
||||||
|
|
||||||
|
# Sheet 2: Review
|
||||||
|
ws2 = wb.create_sheet("课程巩固(Review)")
|
||||||
|
ws2.append(["等级", "课时", "做了巩固的人数", "平均用时(分钟)", "平均正确率"])
|
||||||
|
for r in data["review"]:
|
||||||
|
ws2.append([r["course"], r["lesson"], r["review_count"],
|
||||||
|
r["avg_duration_min"], f"{r['avg_right_rate_pct']}%"])
|
||||||
|
style_sheet(ws2)
|
||||||
|
|
||||||
|
# Sheet 3: Summary
|
||||||
|
ws3 = wb.create_sheet("单元强化(Summary)")
|
||||||
|
ws3.append(["等级", "知识模块总数", "进入人数", "全部完成", "做1个", "做2个", "做3个", "做4个"])
|
||||||
|
for r in data["summary"]:
|
||||||
|
ws3.append([r["course"], r["total_km"], r["enter_count"], r["all_done"],
|
||||||
|
r["done_1"], r["done_2"], r["done_3"], r["done_4"]])
|
||||||
|
style_sheet(ws3)
|
||||||
|
|
||||||
|
# Sheet 4: Challenge
|
||||||
|
ws4 = wb.create_sheet("单元挑战(Challenge)")
|
||||||
|
ws4.append(["等级", "维度", "参与人数", "Perfect", "Perfect%", "Good", "Good%", "Oops", "Oops%"])
|
||||||
|
for r in data["challenge"]:
|
||||||
|
ws4.append([r["course"], r["category"], r["enter_count"],
|
||||||
|
r["perfect"], f"{r['perfect_pct']}%", r["good"], f"{r['good_pct']}%",
|
||||||
|
r["oops"], f"{r['oops_pct']}%"])
|
||||||
|
style_sheet(ws4)
|
||||||
|
|
||||||
|
# Sheet 5: Outliers
|
||||||
|
if data.get("outliers"):
|
||||||
|
ws5 = wb.create_sheet("剔除的异常数据")
|
||||||
|
ws5.append(["等级", "课时", "user_id", "巩固用时(分钟)", "play_time(ms)", "记录时间"])
|
||||||
|
for r in data["outliers"]:
|
||||||
|
ws5.append([r["course"], r["lesson"], r["user_id"],
|
||||||
|
r["duration_min"], r["play_time_ms"], r["created_at"]])
|
||||||
|
style_sheet(ws5)
|
||||||
|
|
||||||
|
wb.save(args.output)
|
||||||
|
print(f"Excel saved: {args.output}")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
236
scripts/query_refund_learning.py
Normal file
236
scripts/query_refund_learning.py
Normal file
@ -0,0 +1,236 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
退费用户学习数据查询脚本
|
||||||
|
用法: python3 query_refund_learning.py --start 2026-04-01 --end 2026-05-01 --output /tmp/report.json
|
||||||
|
参数:
|
||||||
|
--start 订单付款起始日期 (YYYY-MM-DD)
|
||||||
|
--end 订单付款截止日期 (YYYY-MM-DD)
|
||||||
|
--output JSON 结果输出路径
|
||||||
|
--pure 是否剔除仍有有效订单的用户 (默认 true)
|
||||||
|
--outlier 巩固用时异常阈值(分钟), 超过此值视为脏数据 (默认 60)
|
||||||
|
"""
|
||||||
|
import argparse, json, os, subprocess, sys
|
||||||
|
|
||||||
|
def get_pg_password():
|
||||||
|
secrets_path = os.path.expanduser("~/.openclaw/workspace/secrets.env")
|
||||||
|
with open(secrets_path) as f:
|
||||||
|
for line in f:
|
||||||
|
if line.startswith("PG_ONLINE_PASSWORD="):
|
||||||
|
return line.split("'")[1]
|
||||||
|
raise RuntimeError("PG_ONLINE_PASSWORD not found in secrets.env")
|
||||||
|
|
||||||
|
def run_pg(db, sql, password):
|
||||||
|
env = os.environ.copy()
|
||||||
|
env["PGPASSWORD"] = password
|
||||||
|
r = subprocess.run(
|
||||||
|
["psql", "-h", "bj-postgres-16pob4sg.sql.tencentcdb.com", "-p", "28591",
|
||||||
|
"-U", "ai_member", "-d", db, "-t", "-A", "-F", "\t", "-c", sql],
|
||||||
|
capture_output=True, text=True, env=env, timeout=120
|
||||||
|
)
|
||||||
|
if r.returncode != 0:
|
||||||
|
print(f"SQL ERROR: {r.stderr}", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
rows = [line.split("\t") for line in r.stdout.strip().split("\n") if line.strip()]
|
||||||
|
return rows
|
||||||
|
|
||||||
|
def main():
|
||||||
|
p = argparse.ArgumentParser()
|
||||||
|
p.add_argument("--start", required=True)
|
||||||
|
p.add_argument("--end", required=True)
|
||||||
|
p.add_argument("--output", default="/tmp/refund_learning_report.json")
|
||||||
|
p.add_argument("--pure", default="true")
|
||||||
|
p.add_argument("--outlier", type=float, default=60.0)
|
||||||
|
args = p.parse_args()
|
||||||
|
|
||||||
|
pw = get_pg_password()
|
||||||
|
pure_clause = ""
|
||||||
|
if args.pure == "true":
|
||||||
|
pure_clause = "WHERE NOT EXISTS (SELECT 1 FROM bi_vala_order o2 WHERE o2.account_id = ra.account_id AND o2.order_status = 3)"
|
||||||
|
|
||||||
|
# --- Chapter ID mappings ---
|
||||||
|
# L1-U0: 343,344,345,346,348 | L2-U0: 55,56,57,58,59
|
||||||
|
l1_ids = "343,344,345,346,348"
|
||||||
|
l2_ids = "55,56,57,58,59"
|
||||||
|
all_ids = f"{l1_ids},{l2_ids}"
|
||||||
|
|
||||||
|
chapter_play_union = " UNION ALL ".join([
|
||||||
|
f"SELECT r.user_id, r.chapter_id FROM bi_user_chapter_play_record_{i} r JOIN refund_users ru ON r.user_id = ru.user_id WHERE r.play_status = 1 AND r.chapter_id IN ({all_ids})"
|
||||||
|
for i in range(8)
|
||||||
|
])
|
||||||
|
|
||||||
|
base_cte = f"""
|
||||||
|
WITH refund_accounts AS (
|
||||||
|
SELECT DISTINCT o.account_id FROM bi_vala_order o
|
||||||
|
JOIN bi_vala_app_account a ON a.id = o.account_id AND a.status = 1
|
||||||
|
JOIN bi_refund_order r ON r.out_trade_no = o.out_trade_no AND r.status = 3
|
||||||
|
WHERE o.order_status = 4 AND o.pay_success_date >= '{args.start}' AND o.pay_success_date < '{args.end}'
|
||||||
|
),
|
||||||
|
pure_refund_accounts AS (
|
||||||
|
SELECT ra.account_id FROM refund_accounts ra {pure_clause}
|
||||||
|
),
|
||||||
|
refund_users AS (
|
||||||
|
SELECT c.id AS user_id, c.account_id FROM bi_vala_app_character c
|
||||||
|
JOIN pure_refund_accounts pra ON c.account_id = pra.account_id WHERE c.deleted_at IS NULL
|
||||||
|
),
|
||||||
|
all_done AS ({chapter_play_union}),
|
||||||
|
user_done_count AS (
|
||||||
|
SELECT user_id,
|
||||||
|
COUNT(DISTINCT CASE WHEN chapter_id IN ({l1_ids}) THEN chapter_id END) AS l1_done,
|
||||||
|
COUNT(DISTINCT CASE WHEN chapter_id IN ({l2_ids}) THEN chapter_id END) AS l2_done
|
||||||
|
FROM (SELECT DISTINCT user_id, chapter_id FROM all_done) t GROUP BY user_id
|
||||||
|
),
|
||||||
|
qualified_users AS (
|
||||||
|
SELECT ru.user_id, ru.account_id FROM user_done_count udc
|
||||||
|
JOIN refund_users ru ON udc.user_id = ru.user_id WHERE udc.l1_done = 5 OR udc.l2_done = 5
|
||||||
|
)"""
|
||||||
|
|
||||||
|
result = {}
|
||||||
|
|
||||||
|
# 1. Funnel counts
|
||||||
|
print("Querying funnel counts...")
|
||||||
|
rows = run_pg("vala_bi", f"""
|
||||||
|
{base_cte}
|
||||||
|
SELECT
|
||||||
|
(SELECT COUNT(*) FROM refund_accounts),
|
||||||
|
(SELECT COUNT(*) FROM pure_refund_accounts),
|
||||||
|
(SELECT COUNT(DISTINCT account_id) FROM qualified_users),
|
||||||
|
(SELECT COUNT(DISTINCT account_id) FROM qualified_users qu
|
||||||
|
JOIN user_done_count udc ON qu.user_id = udc.user_id AND udc.l1_done = 5 AND udc.l2_done < 5),
|
||||||
|
(SELECT COUNT(DISTINCT account_id) FROM qualified_users qu
|
||||||
|
JOIN user_done_count udc ON qu.user_id = udc.user_id AND udc.l2_done = 5 AND udc.l1_done < 5),
|
||||||
|
(SELECT COUNT(DISTINCT account_id) FROM qualified_users qu
|
||||||
|
JOIN user_done_count udc ON qu.user_id = udc.user_id AND udc.l1_done = 5 AND udc.l2_done = 5)
|
||||||
|
""", pw)
|
||||||
|
r = rows[0]
|
||||||
|
result["funnel"] = {
|
||||||
|
"total_refund": int(r[0]), "pure_refund": int(r[1]),
|
||||||
|
"completed_u0": int(r[2]), "l1_only": int(r[3]),
|
||||||
|
"l2_only": int(r[4]), "both": int(r[5])
|
||||||
|
}
|
||||||
|
|
||||||
|
# 2. Review data (with outlier filtering)
|
||||||
|
print("Querying review data...")
|
||||||
|
outlier_ms = int(args.outlier * 60 * 1000)
|
||||||
|
rows = run_pg("vala_bi", f"""
|
||||||
|
{base_cte},
|
||||||
|
review_with_rate AS (
|
||||||
|
SELECT rv.level, rv.chapter_id, rv.user_id, rv.play_time,
|
||||||
|
(SELECT COUNT(*) FROM jsonb_array_elements(rv.question_list::jsonb) q WHERE (q->>'isRight')::boolean = true)::numeric
|
||||||
|
/ NULLIF((SELECT COUNT(*) FROM jsonb_array_elements(rv.question_list::jsonb))::numeric, 0) * 100 AS right_rate,
|
||||||
|
ROW_NUMBER() OVER (PARTITION BY rv.user_id, rv.chapter_id ORDER BY rv.id) AS rn
|
||||||
|
FROM bi_user_unit_review_question_result rv
|
||||||
|
JOIN qualified_users qu ON rv.user_id = qu.user_id
|
||||||
|
WHERE rv.chapter_id IN ({all_ids}) AND rv.deleted_at IS NULL AND rv.play_time <= {outlier_ms}
|
||||||
|
)
|
||||||
|
SELECT level, chapter_id,
|
||||||
|
COUNT(DISTINCT user_id),
|
||||||
|
ROUND(AVG(play_time / 1000.0 / 60)::numeric, 1),
|
||||||
|
ROUND(AVG(right_rate)::numeric, 1)
|
||||||
|
FROM review_with_rate WHERE rn = 1
|
||||||
|
GROUP BY level, chapter_id ORDER BY level, chapter_id
|
||||||
|
""", pw)
|
||||||
|
chapter_map = {
|
||||||
|
"343": "U0-L01", "344": "U0-L02", "345": "U0-L03", "346": "U0-L04", "348": "U0-L05",
|
||||||
|
"55": "U0-L01", "56": "U0-L02", "57": "U0-L03", "58": "U0-L04", "59": "U0-L05"
|
||||||
|
}
|
||||||
|
result["review"] = []
|
||||||
|
for r in rows:
|
||||||
|
result["review"].append({
|
||||||
|
"course": "L1" if r[0] == "A1" else "L2",
|
||||||
|
"lesson": chapter_map.get(r[1], r[1]),
|
||||||
|
"review_count": int(r[2]),
|
||||||
|
"avg_duration_min": float(r[3]),
|
||||||
|
"avg_right_rate_pct": float(r[4])
|
||||||
|
})
|
||||||
|
|
||||||
|
# 3. Summary (enhancement) data
|
||||||
|
print("Querying summary data...")
|
||||||
|
rows = run_pg("vala_bi", f"""
|
||||||
|
{base_cte},
|
||||||
|
summary_data AS (
|
||||||
|
SELECT s.level, s.user_id, COUNT(DISTINCT s.km_type) AS km_types_done
|
||||||
|
FROM bi_user_unit_summary_km_result s
|
||||||
|
JOIN qualified_users qu ON s.user_id = qu.user_id
|
||||||
|
WHERE s.story_id IN (65, 8) AND s.deleted_at IS NULL
|
||||||
|
GROUP BY s.level, s.user_id
|
||||||
|
)
|
||||||
|
SELECT
|
||||||
|
level,
|
||||||
|
COUNT(DISTINCT user_id),
|
||||||
|
COUNT(DISTINCT CASE WHEN (level = 'A1' AND km_types_done >= 3) OR (level = 'A2' AND km_types_done >= 4) THEN user_id END),
|
||||||
|
COUNT(DISTINCT CASE WHEN km_types_done = 1 THEN user_id END),
|
||||||
|
COUNT(DISTINCT CASE WHEN km_types_done = 2 THEN user_id END),
|
||||||
|
COUNT(DISTINCT CASE WHEN km_types_done = 3 THEN user_id END),
|
||||||
|
COUNT(DISTINCT CASE WHEN km_types_done = 4 THEN user_id END)
|
||||||
|
FROM summary_data GROUP BY level ORDER BY level
|
||||||
|
""", pw)
|
||||||
|
result["summary"] = []
|
||||||
|
for r in rows:
|
||||||
|
result["summary"].append({
|
||||||
|
"course": "L1" if r[0] == "A1" else "L2",
|
||||||
|
"total_km": 3 if r[0] == "A1" else 4,
|
||||||
|
"enter_count": int(r[1]), "all_done": int(r[2]),
|
||||||
|
"done_1": int(r[3]), "done_2": int(r[4]),
|
||||||
|
"done_3": int(r[5]), "done_4": int(r[6])
|
||||||
|
})
|
||||||
|
|
||||||
|
# 4. Challenge data
|
||||||
|
print("Querying challenge data...")
|
||||||
|
rows = run_pg("vala_bi", f"""
|
||||||
|
{base_cte},
|
||||||
|
challenge_first AS (
|
||||||
|
SELECT ch.level, ch.category, ch.score_text, ch.user_id,
|
||||||
|
ROW_NUMBER() OVER (PARTITION BY ch.user_id, ch.level, ch.category ORDER BY ch.id) AS rn
|
||||||
|
FROM bi_user_unit_challenge_question_result ch
|
||||||
|
JOIN qualified_users qu ON ch.user_id = qu.user_id
|
||||||
|
WHERE ch.story_id IN (65, 8) AND ch.deleted_at IS NULL
|
||||||
|
)
|
||||||
|
SELECT level, category,
|
||||||
|
COUNT(DISTINCT user_id),
|
||||||
|
COUNT(DISTINCT CASE WHEN score_text = 'Perfect' THEN user_id END),
|
||||||
|
COUNT(DISTINCT CASE WHEN score_text = 'Good' THEN user_id END),
|
||||||
|
COUNT(DISTINCT CASE WHEN score_text = 'Oops' THEN user_id END)
|
||||||
|
FROM challenge_first WHERE rn = 1
|
||||||
|
GROUP BY level, category ORDER BY level, category
|
||||||
|
""", pw)
|
||||||
|
result["challenge"] = []
|
||||||
|
for r in rows:
|
||||||
|
total = int(r[3]) + int(r[4]) + int(r[5])
|
||||||
|
result["challenge"].append({
|
||||||
|
"course": "L1" if r[0] == "A1" else "L2",
|
||||||
|
"category": r[1],
|
||||||
|
"enter_count": int(r[2]),
|
||||||
|
"perfect": int(r[3]), "good": int(r[4]), "oops": int(r[5]),
|
||||||
|
"perfect_pct": round(int(r[3]) / total * 100) if total else 0,
|
||||||
|
"good_pct": round(int(r[4]) / total * 100) if total else 0,
|
||||||
|
"oops_pct": round(int(r[5]) / total * 100) if total else 0,
|
||||||
|
})
|
||||||
|
|
||||||
|
# 5. Outlier records
|
||||||
|
print("Querying outliers...")
|
||||||
|
rows = run_pg("vala_bi", f"""
|
||||||
|
{base_cte}
|
||||||
|
SELECT rv.level, rv.chapter_id, rv.user_id,
|
||||||
|
ROUND((rv.play_time / 1000.0 / 60)::numeric, 1), rv.play_time, rv.created_at
|
||||||
|
FROM bi_user_unit_review_question_result rv
|
||||||
|
JOIN qualified_users qu ON rv.user_id = qu.user_id
|
||||||
|
WHERE rv.chapter_id IN ({all_ids}) AND rv.deleted_at IS NULL AND rv.play_time > {outlier_ms}
|
||||||
|
ORDER BY rv.play_time DESC
|
||||||
|
""", pw)
|
||||||
|
result["outliers"] = []
|
||||||
|
for r in rows:
|
||||||
|
result["outliers"].append({
|
||||||
|
"course": "L1" if r[0] == "A1" else "L2",
|
||||||
|
"lesson": chapter_map.get(r[1], r[1]),
|
||||||
|
"user_id": int(r[2]),
|
||||||
|
"duration_min": float(r[3]),
|
||||||
|
"play_time_ms": int(r[4]),
|
||||||
|
"created_at": r[5]
|
||||||
|
})
|
||||||
|
|
||||||
|
with open(args.output, "w") as f:
|
||||||
|
json.dump(result, f, ensure_ascii=False, indent=2)
|
||||||
|
print(f"Done. Output: {args.output}")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
Loading…
Reference in New Issue
Block a user