🤖 每日自动备份 - 2026-05-15 08:00:01

This commit is contained in:
小溪 2026-05-15 08:00:01 +08:00
parent ead09c9530
commit 83160895d0
25 changed files with 2169 additions and 119 deletions

View File

@ -154,6 +154,17 @@
| 41 | 官网 | | 41 | 官网 |
| 71 | 小程序 | | 71 | 小程序 |
| 其他值 | 站外 | | 其他值 | 站外 |
- **付费用户 L1/L2 区分规则(基于 goods_id[李承龙确认] 2026-05-14**
- **L1 商品:** `goods_id IN (57, 60, 63)` — 瓦拉英语level1 / level1·单季
- **L2 商品:** `goods_id IN (31, 32, 33, 54)` — 瓦拉英语level2 / 年包 / 单季度包 / 三季度课包 / 季度包
- 注goods_id=31 历史上名称从「瓦拉英语level2」演进为「瓦拉英语年包」实际为同一 L2 产品
- 注goods_id=32 历史上名称从「瓦拉英语level2·单季」演进为「瓦拉英语单季度包」实际为同一 L2 产品
- **L1+L2 商品:** `goods_id = 61` — 瓦拉英语level1+2
- **用户分类逻辑:** 汇总用户所有订单的 goods_id 后判断:
- 仅买过 L1 商品 → 「仅L1」
- 仅买过 L2 商品 → 「仅L2」
- 买过 L1+L2 商品goods_id=61或同时买过 L1 和 L2 商品 → 「L1+L2」
- **旧版通用通行券:** `goods_id IN (4, 5, 6, 10, 13, 14, 17, 20, 25, 29, 30, 35, 36, 37, 38)`,量极少(<30单不区分 L1/L2建议归入其他或通过 `bi_user_course_detail` 反查
- **金额单位规则:** `bi_vala_order`表中`pay_amount`字段以元为单位,`pay_amount_int`字段以分为单位;后续统一使用`pay_amount_int`计算销售金额统计为元时除以100即可 - **金额单位规则:** `bi_vala_order`表中`pay_amount`字段以元为单位,`pay_amount_int`字段以分为单位;后续统一使用`pay_amount_int`计算销售金额统计为元时除以100即可
- **学习数据统计维度:** 支持按单元/课时/组件维度统计完成人数、平均用时、正确率Perfect/Good/Oops三个等级 - **学习数据统计维度:** 支持按单元/课时/组件维度统计完成人数、平均用时、正确率Perfect/Good/Oops三个等级
- **特殊时间节点:** `2025-10-01`为核心版本上线时间,部分统计需要区分该节点前后的数据 - **特殊时间节点:** `2025-10-01`为核心版本上线时间,部分统计需要区分该节点前后的数据

View File

@ -1,6 +1,6 @@
{ {
"version": 1, "version": 1,
"updatedAt": "2026-05-13T08:20:55.037Z", "updatedAt": "2026-05-14T06:41:55.506Z",
"entries": { "entries": {
"memory:memory/2026-05-06.md:1:20": { "memory:memory/2026-05-06.md:1:20": {
"key": "memory:memory/2026-05-06.md:1:20", "key": "memory:memory/2026-05-06.md:1:20",
@ -128,6 +128,68 @@
"3月28.5", "3月28.5",
"4月38.3" "4月38.3"
] ]
},
"memory:memory/2026-05-09.md:1:17": {
"key": "memory:memory/2026-05-09.md:1:17",
"path": "memory/2026-05-09.md",
"startLine": 1,
"endLine": 17,
"source": "memory",
"snippet": "# 2026-05-09 工作日志 ## 王虹茗 - 销售线索用户分析 - **用户:** 王虹茗user_id: af61e4gc - **需求:** 用 `lead_user_analysis.py` 脚本处理线索用户 Excel659条2026年3月销售姜小龙/Bob/Tom/吴迪) - **权限处理:** 王虹茗不在 USER.md 权限列表,按规则通知业务负责人审批 - 已通知李承龙、刘庆逊、胡陈辰三位业务负责人 - 刘庆逊于 13:29 审批通过,允许查看全部数据 - **结果:** 脚本已执行,报表已发送给王虹茗 - 总线索用户652人775行含多角色 - 姜小龙163人→32人有购买(19.6%)退费5人 - Bob202人→3人有购买(1.5%)退费1人 - Tom171人→5人有购买(2.9%)退费2人 - 吴迪116人→19人有购买(16.4%)退费2人 - 输出文件:`output/销售线索_用户分析.xlsx`",
"recallCount": 1,
"dailyCount": 0,
"groundedCount": 0,
"totalScore": 1,
"maxScore": 1,
"firstRecalledAt": "2026-05-14T06:31:19.437Z",
"lastRecalledAt": "2026-05-14T06:31:19.437Z",
"queryHashes": [
"49e79af44bc3"
],
"recallDays": [
"2026-05-14"
],
"conceptTags": [
"user-id",
"lead-user-analysis.py",
"姜小龙/bob/tom/吴迪",
"user.md",
"19.6",
"1.5",
"2.9",
"16.4"
]
},
"memory:memory/2026-05-14.md:1:19": {
"key": "memory:memory/2026-05-14.md:1:19",
"path": "memory/2026-05-14.md",
"startLine": 1,
"endLine": 19,
"source": "memory",
"snippet": "# 2026-05-14 工作日志 ## 李承龙 - 付费用户 L1/L2 区分口径 - **需求:** 区分付费用户属于 L1 还是 L2根据用户购买的商品做区分 - **分析过程:** - 排查了 `bi_vala_order` 表的 goods_name / goods_id 字段 - 发现部分商品名称不含 level 关键词(年包、单季度包等),但 goods_id 可唯一映射 - goods_id=31 历史上从「瓦拉英语level2」更名为「瓦拉英语年包」 - goods_id=32 从「瓦拉英语level2·单季」更名为「瓦拉英语单季度包」 - 用 `bi_user_course_detail` 验证了映射关系准确 - **最终方案:** 按 goods_id 映射 - L1: goods_id IN (57, 60, 63) - L2: goods_id IN (31, 32, 33, 54) - L1+L2: goods_id = 61 - 旧版通行券(<30单: goods_id IN (4,5,6,10,13,14,17,20,25,29,30,35,36,37,38),归入「其他」 - **用户分类:** 汇总用户所有订单的 goods_id按购买商品组合判断 - **已更新:** MEMORY.md",
"recallCount": 1,
"dailyCount": 0,
"groundedCount": 0,
"totalScore": 1,
"maxScore": 1,
"firstRecalledAt": "2026-05-14T06:41:55.506Z",
"lastRecalledAt": "2026-05-14T06:41:55.506Z",
"queryHashes": [
"f6fcef2ff061"
],
"recallDays": [
"2026-05-14"
],
"conceptTags": [
"l1/l2",
"bi-vala-order",
"goods-name",
"goods-id",
"bi-user-course-detail",
"memory.md",
"工作",
"日志"
]
} }
} }
} }

39
memory/2026-05-14.md Normal file
View File

@ -0,0 +1,39 @@
# 2026-05-14 工作日志
## 李承龙 - 付费用户 L1/L2 区分口径
- **需求:** 区分付费用户属于 L1 还是 L2根据用户购买的商品做区分
- **分析过程:**
- 排查了 `bi_vala_order` 表的 goods_name / goods_id 字段
- 发现部分商品名称不含 level 关键词(年包、单季度包等),但 goods_id 可唯一映射
- goods_id=31 历史上从「瓦拉英语level2」更名为「瓦拉英语年包」
- goods_id=32 从「瓦拉英语level2·单季」更名为「瓦拉英语单季度包」
- 用 `bi_user_course_detail` 验证了映射关系准确
- **最终方案:** 按 goods_id 映射
- L1: goods_id IN (57, 60, 63)
- L2: goods_id IN (31, 32, 33, 54)
- L1+L2: goods_id = 61
- 旧版通行券(<30单: goods_id IN (4,5,6,10,13,14,17,20,25,29,30,35,36,37,38)归入其他
- **用户分类:** 汇总用户所有订单的 goods_id按购买商品组合判断
- **已更新:** MEMORY.md
## 课消指标 v2剔除U0序章
- **L1 U0**: chapter_id IN (343,344,345,346,348)
- **L2 U0**: chapter_id IN (55,56,57,58,59)
- **剔除后结果截至5/10:**
- 仅L1: 付费192/有消132/无消60(31%)/人均2.53/有消人均3.67
- 仅L2: 付费1370/有消461/无消909(66%)/人均1.18/有消人均3.49
- L1+L2: 付费1207/有消660/无消547(45%)/人均2.37/有消人均4.34
- **4张独立图表已生成至 output/**
## 李承龙 - 课消口径调整L1/L2按付费群重新分类
- **[李承龙确认]** L1付费用户 = 仅L1 + L1+L2L2付费用户 = 仅L2 + L1+L2L1+L2用户在两张图中均有计入
- **重新生成 Excel v3** (`output/course_consumption_by_level_v3.xlsx`)4个Sheet概览/每周明细/L1图表/L2图表
- **重新生成 4张独立PNG图表** (`output/L1_all_users_stack.png`, `L1_all_avg_trend.png`, `L2_all_users_stack.png`, `L2_all_avg_trend.png`)
- **最终数据截至最后一周剔除U0序章:**
- L1付费群: 1,399人 | 有消738 | 无消661(43%) | 人均1.97 | 有消人均3.73
- L2付费群: 2,577人 | 有消1,126 | 无消1,451(56%) | 人均1.51 | 有消人均3.46
- 合计(去重): 2,769人
- **关键发现:** L1+L2用户1,207人注入后L1无消率从31%升至43%L2从66%降至56%
- **脚本:** `scripts/course_excel_v3.py`, `scripts/generate_charts_v3.py`

BIN
output/L1_all_avg_trend.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 201 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 80 KiB

BIN
output/L1_avg_trend.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 124 KiB

BIN
output/L1_avg_trend_v4.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

BIN
output/L1_users_stack.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 82 KiB

BIN
output/L2_all_avg_trend.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 174 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 81 KiB

BIN
output/L2_avg_trend.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 168 KiB

BIN
output/L2_avg_trend_v4.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 170 KiB

BIN
output/L2_users_stack.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 82 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

File diff suppressed because one or more lines are too long

87
scripts/charts_v4.py Normal file
View File

@ -0,0 +1,87 @@
#!/usr/bin/env python3
"""图表 v4: L1只看L1课程, L2只看L2课程"""
import json, os
from datetime import date, timedelta
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.font_manager as fm
import numpy as np
fm.fontManager.addfont('/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc')
plt.rcParams['font.family'] = fm.FontProperties(fname='/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc').get_name()
plt.rcParams['axes.unicode_minus'] = False
with open('/root/.openclaw/workspace/output/course_data_v4.json') as f:
data = json.load(f)
results = data['results']
out = '/root/.openclaw/workspace/output'
configs = {
'L1': {'prefix': 'L1', 'color': '#4A90D9', 'light': '#A8CFF1', 'label': 'L1'},
'L2': {'prefix': 'L2', 'color': '#E85D47', 'light': '#F4A9A0', 'label': 'L2'},
}
for key, cfg in configs.items():
pfx = cfg['prefix']; color = cfg['color']; light = cfg['light']; label = cfg['label']
first = next(i for i, r in enumerate(results) if r[f'{pfx}_paid'] > 0)
data_sub = results[first:]
dates = [date.fromisoformat(r['ws']) for r in data_sub]
xs = [d + timedelta(days=3) for d in dates]
paid = [r[f'{pfx}_paid'] for r in data_sub]
cons_users = [r[f'{pfx}_cons_users'] for r in data_sub]
no_cons = [r[f'{pfx}_no_cons'] for r in data_sub]
avg_all = [r[f'{pfx}_avg_all'] for r in data_sub]
avg_cons = [r[f'{pfx}_avg_cons'] for r in data_sub]
# 图1: 堆叠柱状
fig, ax = plt.subplots(figsize=(18, 8))
x_idx = np.arange(len(xs))
ax.bar(x_idx, cons_users, 0.65, color=light, label='有课消用户', zorder=3)
ax.bar(x_idx, no_cons, 0.65, bottom=cons_users, color='#D0D0D0', label='无课消用户', zorder=3)
step = max(1, len(data_sub)//10)
for i in range(0, len(data_sub), step):
ax.annotate(str(paid[i]), (i, paid[i]), textcoords='offset points', xytext=(0, 5),
fontsize=7.5, ha='center', color='#333333', fontweight='bold')
ax.set_xticks(x_idx[::step])
ax.set_xticklabels([dates[i].strftime('%m/%d') for i in range(0, len(data_sub), step)], fontsize=8.5, rotation=45)
ax.set_ylabel('用户数', fontsize=13)
ax.set_title(f'{label}付费用户周课消分布(只看{label}课程剔除U0', fontsize=16, fontweight='bold')
ax.legend(fontsize=12, loc='upper left')
ax.grid(axis='y', alpha=0.3, zorder=0)
ax.set_xlim(-0.5, len(x_idx)-0.5)
no_rate = no_cons[-1]/paid[-1]*100 if paid[-1] else 0
ax.text(0.97, 0.95, f'付费{paid[-1]}人 | 无课消率{no_rate:.0f}%',
transform=ax.transAxes, fontsize=11, ha='right', va='top', color='#666666', fontstyle='italic')
plt.tight_layout()
plt.savefig(f'{out}/{pfx}_users_stack_v4.png', dpi=150, bbox_inches='tight', facecolor='white')
plt.close()
print(f'{pfx}_users_stack_v4.png')
# 图2: 折线
fig, ax = plt.subplots(figsize=(18, 8))
ax.plot(xs, avg_all, 'o-', color='#999999', linewidth=2.2, markersize=5, label='人均课消(全部付费用户)', markerfacecolor='white')
ax.plot(xs, avg_cons, 's-', color=color, linewidth=2.8, markersize=5, label='人均课消(有课消用户)', markerfacecolor='white')
ax.fill_between(xs, avg_all, avg_cons, alpha=0.08, color=color)
for i in range(0, len(data_sub), max(1, len(data_sub)//8)):
ax.annotate(f'{avg_all[i]:.1f}', (xs[i], avg_all[i]), textcoords='offset points',
xytext=(0,-15), fontsize=7.5, color='#999999', ha='center')
ax.annotate(f'{avg_cons[i]:.1f}', (xs[i], avg_cons[i]), textcoords='offset points',
xytext=(0,7), fontsize=7.5, color=color, ha='center', fontweight='bold')
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m/%d'))
ax.xaxis.set_major_locator(mdates.MonthLocator())
plt.setp(ax.xaxis.get_majorticklabels(), rotation=45, fontsize=9)
ax.set_ylabel('课消数(节/周)', fontsize=13)
ax.set_title(f'{label}付费用户周人均课消趋势(只看{label}课程剔除U0', fontsize=16, fontweight='bold')
ax.legend(fontsize=12, loc='upper left')
ax.grid(True, alpha=0.3)
ax.set_xlim(date(2025,8,30), date(2026,5,12))
plt.tight_layout()
plt.savefig(f'{out}/{pfx}_avg_trend_v4.png', dpi=150, bbox_inches='tight', facecolor='white')
plt.close()
print(f'{pfx}_avg_trend_v4.png')
print('\n✅ 4张v4图表已生成')

View File

@ -0,0 +1,165 @@
#!/usr/bin/env python3
"""
v4: L1付费群课消只看L1课程L2付费群课消只看L2课程
"""
import psycopg2
from collections import defaultdict
from datetime import datetime, timedelta, date
conn = psycopg2.connect(
host="bj-postgres-16pob4sg.sql.tencentcdb.com",
port=28591, user="ai_member",
password="LdfjdjL83h3h3^$&**YGG*", dbname="vala_bi"
)
cur = conn.cursor()
# 获取L1/L2有效章节剔除U0
cur.execute("SELECT id FROM bi_level_unit_lesson WHERE course_level='L1'")
l1_chapters = set(r[0] for r in cur.fetchall())
cur.execute("SELECT id FROM bi_level_unit_lesson WHERE course_level='L2'")
l2_chapters = set(r[0] for r in cur.fetchall())
u0 = {55, 56, 57, 58, 59, 343, 344, 345, 346, 348}
l1_chapters -= u0
l2_chapters -= u0
print(f"L1章节: {len(l1_chapters)} | L2章节: {len(l2_chapters)}")
overall_start = date(2025, 9, 1)
overall_end = date(2026, 5, 11)
weeks = []
d = overall_start
while d < overall_end:
ws = d
we = d + timedelta(days=6 - d.weekday())
if we >= overall_end: we = overall_end - timedelta(days=1)
weeks.append((ws, we))
d = we + timedelta(days=1)
print("分类付费用户...")
cur.execute("""
SELECT o.account_id, o.trade_no, o.order_status, o.pay_success_date,
CASE WHEN o.goods_id IN (57, 60, 63) THEN 'L1'
WHEN o.goods_id = 61 THEN 'L1+L2'
WHEN o.goods_id IN (31, 32, 33, 54) THEN 'L2'
ELSE '其他' END as level_type
FROM bi_vala_order o
INNER JOIN bi_vala_app_account a ON o.account_id = a.id
WHERE a.status = 1 AND a.deleted_at IS NULL AND o.pay_success_date IS NOT NULL
""")
orders = cur.fetchall()
cur.execute("SELECT trade_no FROM bi_refund_order WHERE status = 3")
refund_trades = set(r[0] for r in cur.fetchall())
user_levels = defaultdict(set)
user_orders = defaultdict(list)
for aid, trade_no, order_status, pay_date, lt in orders:
is_refunded = (order_status == 4 and trade_no in refund_trades)
user_levels[aid].add(lt)
user_orders[aid].append((pay_date.date(), is_refunded))
def is_paid(aid, as_of):
return sum(1 for pd, ref in user_orders[aid] if pd <= as_of and not ref) > 0
l1_pool = {aid for aid, lv in user_levels.items() if 'L1' in lv or 'L1+L2' in lv}
l2_pool = {aid for aid, lv in user_levels.items() if 'L2' in lv or 'L1+L2' in lv}
all_pool = l1_pool | l2_pool
print(f"L1池: {len(l1_pool)}, L2池: {len(l2_pool)}, 合计: {len(all_pool)}")
print("查询课消...")
cons_map = {}
for ti in range(8):
tbl = f"bi_user_chapter_play_record_{ti}"
cur.execute(f"""SELECT user_id, chapter_id, updated_at FROM {tbl}
WHERE play_status = 1 AND updated_at >= '2025-09-01' AND updated_at < '2026-05-11'""")
for uid, cid, ua in cur.fetchall():
if cid in u0: continue
# 只保留L1或L2课程
if cid not in l1_chapters and cid not in l2_chapters: continue
key = (uid, cid)
d = ua.date() if hasattr(ua, 'date') else datetime.strptime(str(ua)[:10], '%Y-%m-%d').date()
if key not in cons_map or d < cons_map[key]:
cons_map[key] = d
print("角色映射...")
all_uids = list(set(k[0] for k in cons_map))
char2acct = {}
for i in range(0, len(all_uids), 500):
batch = all_uids[i:i+500]
ph = ','.join(['%s'] * len(batch))
cur.execute(f"SELECT id, account_id FROM bi_vala_app_character WHERE id IN ({ph})", batch)
for cid, aid in cur.fetchall(): char2acct[cid] = aid
print("按周汇总...")
results = []
for ws, we in weeks:
l1_paid = {aid for aid in l1_pool if is_paid(aid, we)}
l2_paid = {aid for aid in l2_pool if is_paid(aid, we)}
t_paid = {aid for aid in all_pool if is_paid(aid, we)}
l1_cons, l1_cu = 0, set()
l2_cons, l2_cu = 0, set()
t_cons, t_cu = 0, set()
for (uid, ch_id), cons_date in cons_map.items():
if not (ws <= cons_date <= we): continue
aid = char2acct.get(uid)
if not aid: continue
# L1付费群 且 是L1课程
if aid in l1_paid and ch_id in l1_chapters:
l1_cons += 1
l1_cu.add(aid)
# L2付费群 且 是L2课程
if aid in l2_paid and ch_id in l2_chapters:
l2_cons += 1
l2_cu.add(aid)
# 合计:付费用户在对应级别课程上的课消
if aid in t_paid:
if (aid in l1_paid and ch_id in l1_chapters) or (aid in l2_paid and ch_id in l2_chapters):
t_cons += 1
t_cu.add(aid)
results.append({
'ws': ws, 'we': we,
'L1_paid': len(l1_paid), 'L1_cons': l1_cons, 'L1_cons_users': len(l1_cu),
'L1_no_cons': len(l1_paid) - len(l1_cu),
'L1_avg_all': round(l1_cons / len(l1_paid), 2) if l1_paid else 0,
'L1_avg_cons': round(l1_cons / len(l1_cu), 2) if l1_cu else 0,
'L2_paid': len(l2_paid), 'L2_cons': l2_cons, 'L2_cons_users': len(l2_cu),
'L2_no_cons': len(l2_paid) - len(l2_cu),
'L2_avg_all': round(l2_cons / len(l2_paid), 2) if l2_paid else 0,
'L2_avg_cons': round(l2_cons / len(l2_cu), 2) if l2_cu else 0,
'total_paid': len(t_paid), 'total_cons': t_cons, 'total_cons_users': len(t_cu),
'total_no_cons': len(t_paid) - len(t_cu),
'total_avg_all': round(t_cons / len(t_paid), 2) if t_paid else 0,
'total_avg_cons': round(t_cons / len(t_cu), 2) if t_cu else 0,
})
r = results[-1]
if (len(results) - 1) % 8 == 0 or len(results) == len(weeks):
print(f" W{len(results):2d} {ws}~{we} | L1:{r['L1_paid']}有消{r['L1_cons_users']} | L2:{r['L2_paid']}有消{r['L2_cons_users']}")
cur.close()
conn.close()
# 打印最终结果
last = results[-1]
print(f"\n=== 最终数据v4L1只看L1课程, L2只看L2课程===")
print(f"L1付费群: {last['L1_paid']}人 | 有消{last['L1_cons_users']} | 无消{last['L1_no_cons']}({last['L1_no_cons']/last['L1_paid']*100:.0f}%) | 人均{last['L1_avg_all']} | 有消人均{last['L1_avg_cons']}")
print(f"L2付费群: {last['L2_paid']}人 | 有消{last['L2_cons_users']} | 无消{last['L2_no_cons']}({last['L2_no_cons']/last['L2_paid']*100:.0f}%) | 人均{last['L2_avg_all']} | 有消人均{last['L2_avg_cons']}")
print(f"合计(去重): {last['total_paid']}人 | 有消{last['total_cons_users']} | 无消{last['total_no_cons']}({last['total_no_cons']/last['total_paid']*100:.0f}%)")
# 保存数据到 JSON 供后续图表脚本使用
import json
out = '/root/.openclaw/workspace/output/course_data_v4.json'
serializable = []
for r in results:
d = {}
for k, v in r.items():
if isinstance(v, date): d[k] = v.isoformat()
else: d[k] = v
serializable.append(d)
with open(out, 'w') as f:
json.dump({'results': serializable, 'L1_chapters': list(l1_chapters), 'L2_chapters': list(l2_chapters)}, f, ensure_ascii=False)
print(f"\n数据已保存: {out}")

View File

@ -1,167 +1,191 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
""" """
课消指标 L1/L2 等级统计 课消指标周统计 2025-09-01 ~ 2026-05-10 L1/L2/L1+L2
""" """
import psycopg2 import psycopg2
from collections import defaultdict from collections import defaultdict
from datetime import date, timedelta, datetime from datetime import datetime, timedelta, date
conn = psycopg2.connect( conn = psycopg2.connect(
host="bj-postgres-16pob4sg.sql.tencentcdb.com", host="bj-postgres-16pob4sg.sql.tencentcdb.com",
port=28591, port=28591, user="ai_member",
user="ai_member", password="LdfjdjL83h3h3^$&**YGG*", dbname="vala_bi"
password="LdfjdjL83h3h3^$&**YGG*",
dbname="vala_bi"
) )
cur = conn.cursor() cur = conn.cursor()
# ===== 时间参数 =====
overall_start = date(2025, 9, 1) overall_start = date(2025, 9, 1)
overall_end = date(2026, 5, 11) overall_end = date(2026, 5, 11)
# 生成周列表 # 生成周列表(周一~周日)
weeks = [] weeks = []
d = overall_start d = overall_start
while d < overall_end: while d < overall_end:
ws = d ws = d
we = d + timedelta(days=6 - d.weekday()) days_to_sunday = 6 - d.weekday()
we = d + timedelta(days=days_to_sunday)
if we >= overall_end: if we >= overall_end:
we = overall_end - timedelta(days=1) we = overall_end - timedelta(days=1)
weeks.append((ws, we)) weeks.append((ws, we))
d = we + timedelta(days=1) d = we + timedelta(days=1)
# ===== 获取 L1/L2 chapter_id ===== print(f"统计区间: {overall_start} ~ {overall_end - timedelta(days=1)}, 共 {len(weeks)}")
u0_ids = {343, 344, 345, 346, 348, 55, 56, 57, 58, 59}
cur.execute("SELECT DISTINCT id, course_level FROM bi_level_unit_lesson WHERE course_level IN ('L1','L2')")
l1_chapters = set()
l2_chapters = set()
for cid, lv in cur.fetchall():
if cid in u0_ids:
continue
if lv == 'L1':
l1_chapters.add(cid)
elif lv == 'L2':
l2_chapters.add(cid)
print(f"L1 chapters: {len(l1_chapters)}, L2 chapters: {len(l2_chapters)}") # ===== Step 1: 用户 L1/L2 分类 + 付费状态 =====
print("\nStep 1: 分类付费用户...")
# ===== Step 1: 付费用户 =====
print("Step 1: 查找付费用户...")
cur.execute(""" cur.execute("""
SELECT DISTINCT o.account_id SELECT o.account_id, o.trade_no, o.order_status, o.pay_success_date,
CASE
WHEN o.goods_id IN (57, 60, 63) THEN 'L1'
WHEN o.goods_id = 61 THEN 'L1+L2'
WHEN o.goods_id IN (31, 32, 33, 54) THEN 'L2'
ELSE '其他'
END as level_type
FROM bi_vala_order o FROM bi_vala_order o
INNER JOIN bi_vala_app_account a ON o.account_id = a.id INNER JOIN bi_vala_app_account a ON o.account_id = a.id
WHERE a.status = 1 AND a.deleted_at IS NULL AND o.pay_success_date IS NOT NULL WHERE a.status = 1 AND a.deleted_at IS NULL
GROUP BY o.account_id AND o.pay_success_date IS NOT NULL
HAVING COUNT(CASE WHEN o.order_status != 4
OR (o.order_status = 4 AND o.trade_no NOT IN (
SELECT trade_no FROM bi_refund_order WHERE status=3
)) THEN 1 END) > 0
""")
paid_account_ids = [row[0] for row in cur.fetchall()]
print(f" 付费用户: {len(paid_account_ids)}")
# 订单详情用于动态判断每周付费用户
cur.execute("""
SELECT o.account_id, o.trade_no, o.out_trade_no, o.pay_success_date, o.order_status
FROM bi_vala_order o
INNER JOIN bi_vala_app_account a ON o.account_id = a.id
WHERE a.status=1 AND a.deleted_at IS NULL AND o.pay_success_date IS NOT NULL
AND o.pay_success_date >= '2025-01-01'
""") """)
orders = cur.fetchall() orders = cur.fetchall()
cur.execute("SELECT trade_no, status FROM bi_refund_order WHERE status=3") print(f" 订单数: {len(orders)}")
refund_set = {r[0] for r in cur.fetchall() if r[0]}
account_orders = defaultdict(list) cur.execute("SELECT trade_no FROM bi_refund_order WHERE status = 3")
for aid, tn, otn, psd, os in orders: refund_trades = set(r[0] for r in cur.fetchall())
is_ref = os == 4 and tn in refund_set
account_orders[aid].append((psd, is_ref))
def is_paid(aid, as_of): # {account_id: {'levels': set, 'orders': [(pay_date, is_refunded, level), ...]}}
return sum(1 for pd, ref in account_orders.get(aid, []) if pd.date() <= as_of and not ref) > 0 user_data = defaultdict(lambda: {'levels': set(), 'orders': []})
for aid, trade_no, order_status, pay_date, lt in orders:
is_refunded = (order_status == 4 and trade_no in refund_trades)
user_data[aid]['levels'].add(lt)
user_data[aid]['orders'].append((pay_date.date(), is_refunded, lt))
# ===== Step 2: 课消分L1/L2===== # 确定每位用户的 L1/L2 分类
print("Step 2: 查询课消分L1/L2...") def classify_user(levels):
l1_consumption = {} # (user_id, chapter_id) -> earliest date has_l1 = 'L1' in levels
l2_consumption = {} has_l2 = 'L2' in levels
has_l1l2 = 'L1+L2' in levels
if has_l1l2 or (has_l1 and has_l2):
return 'L1+L2'
elif has_l1:
return '仅L1'
elif has_l2:
return '仅L2'
return '其他'
for t in range(8): for aid in user_data:
tbl = f"bi_user_chapter_play_record_{t}" user_data[aid]['category'] = classify_user(user_data[aid]['levels'])
# 统计各类用户数
cats = defaultdict(int)
for aid, d in user_data.items():
cats[d['category']] += 1
print(f" 仅L1: {cats['仅L1']}, 仅L2: {cats['仅L2']}, L1+L2: {cats['L1+L2']}, 其他: {cats['其他']}")
# 判断某用户截至某日是否为付费用户
def is_paid_as_of(aid, as_of_date):
d = user_data[aid]
unpaid = sum(1 for pd, ref, lt in d['orders'] if pd <= as_of_date and not ref)
return unpaid > 0
# ===== Step 2: 课消记录 =====
print("\nStep 2: 查询课消...")
consumption_map = {} # (user_id, chapter_id) -> earliest date
for table_idx in range(8):
tbl = f"bi_user_chapter_play_record_{table_idx}"
cur.execute(f""" cur.execute(f"""
SELECT user_id, chapter_id, updated_at FROM {tbl} SELECT user_id, chapter_id, updated_at
WHERE play_status=1 AND updated_at>='2025-09-01' AND updated_at<'2026-05-11' FROM {tbl}
WHERE play_status = 1
AND updated_at >= '2025-09-01'
AND updated_at < '2026-05-11'
""") """)
for uid, cid, upd in cur.fetchall(): cnt = 0
if cid in l1_chapters: for user_id, chapter_id, updated_at in cur.fetchall():
k, m = (uid, cid), l1_consumption key = (user_id, chapter_id)
elif cid in l2_chapters: d = updated_at.date() if hasattr(updated_at, 'date') else datetime.strptime(str(updated_at)[:10], '%Y-%m-%d').date()
k, m = (uid, cid), l2_consumption if key not in consumption_map or d < consumption_map[key]:
else: consumption_map[key] = d
continue cnt += 1
d = upd.date() if hasattr(upd, 'date') else upd print(f" {tbl}: {cnt}")
if k not in m or d < m[k]: print(f" 去重后: {len(consumption_map)}")
m[k] = d
print(f" L1 课消(去重): {len(l1_consumption)}") # ===== Step 3: character -> account =====
print(f" L2 课消(去重): {len(l2_consumption)}") print("\nStep 3: 角色映射...")
all_uids = list(set(k[0] for k in consumption_map))
# ===== Step 3: 角色映射 ===== char2acct = {}
print("Step 3: 关联角色...") bs = 500
all_uids = set(k[0] for k in l1_consumption) | set(k[0] for k in l2_consumption) for i in range(0, len(all_uids), bs):
char_to_account = {} batch = all_uids[i:i+bs]
for i in range(0, len(all_uids), 500):
batch = list(all_uids)[i:i+500]
ph = ','.join(['%s'] * len(batch)) ph = ','.join(['%s'] * len(batch))
cur.execute(f"SELECT id, account_id FROM bi_vala_app_character WHERE id IN ({ph})", batch) cur.execute(f"SELECT id, account_id FROM bi_vala_app_character WHERE id IN ({ph})", batch)
for cid, aid in cur.fetchall(): for cid, aid in cur.fetchall():
char_to_account[cid] = aid char2acct[cid] = aid
print(f" 映射: {len(char2acct)}")
# ===== Step 4: 按周汇总 ===== # ===== Step 4: 按周 + 按分类汇总 =====
print("Step 4: 按周汇总...") print("\nStep 4: 按周汇总...\n")
def weekly_stats(consumption_map):
"""返回每周的 (课消次数, 有消用户数)"""
results = [] results = []
for ws, we in weeks: for ws, we in weeks:
cons = 0 # 截至 we 的付费用户(按分类)
users = set() paid_by_cat = defaultdict(set)
for (uid, ch_id), d in consumption_map.items(): for aid in user_data:
if ws <= d <= we: if is_paid_as_of(aid, we):
cons += 1 cat = user_data[aid]['category']
aid = char_to_account.get(uid) paid_by_cat[cat].add(aid)
# 该周课消(付费用户)
cons_by_cat = defaultdict(int)
cons_users_by_cat = defaultdict(set)
for (uid, ch_id), cons_date in consumption_map.items():
if ws <= cons_date <= we:
aid = char2acct.get(uid)
if aid: if aid:
users.add(aid) cat = user_data.get(aid, {}).get('category', '其他')
results.append((ws, we, cons, len(users))) if aid in paid_by_cat.get(cat, set()):
return results cons_by_cat[cat] += 1
cons_users_by_cat[cat].add(aid)
l1_stats = weekly_stats(l1_consumption) week_label = f"{ws.strftime('%m/%d')}-{we.strftime('%m/%d')}"
l2_stats = weekly_stats(l2_consumption) row = {'week': week_label, 'ws': ws, 'we': we}
# 汇总 + 付费用户 for cat in ['仅L1', '仅L2', 'L1+L2', '其他', '合计']:
results = [] if cat == '合计':
for i, (ws, we) in enumerate(weeks): n_paid = sum(len(v) for v in paid_by_cat.values())
paid = set(aid for aid in account_orders if is_paid(aid, we)) n_cons = sum(cons_by_cat.values())
n_paid = len(paid) n_cons_users = len(set.union(*cons_users_by_cat.values())) if cons_users_by_cat else 0
else:
n_paid = len(paid_by_cat.get(cat, set()))
n_cons = cons_by_cat.get(cat, 0)
n_cons_users = len(cons_users_by_cat.get(cat, set()))
l1_cons, l1_users = l1_stats[i][2], l1_stats[i][3] avg_all = n_cons / n_paid if n_paid > 0 else 0
l2_cons, l2_users = l2_stats[i][2], l2_stats[i][3] avg_cons = n_cons / n_cons_users if n_cons_users > 0 else 0
l1_avg = l1_cons / n_paid if n_paid else 0 row[f'{cat}_paid'] = n_paid
l1_act_avg = l1_cons / l1_users if l1_users else 0 row[f'{cat}_cons'] = n_cons
l2_avg = l2_cons / n_paid if n_paid else 0 row[f'{cat}_users'] = n_cons_users
l2_act_avg = l2_cons / l2_users if l2_users else 0 row[f'{cat}_avg_all'] = avg_all
row[f'{cat}_avg_cons'] = avg_cons
results.append({ results.append(row)
'week': f"{ws.strftime('%m/%d')}-{we.strftime('%m/%d')}", print(f" {week_label} | 合计:付费{row['合计_paid']} 课消{row['合计_cons']} "
'ws': ws, 'we': we, 'paid': n_paid, f"人均{row['合计_avg_all']:.2f} | "
'l1_cons': l1_cons, 'l1_users': l1_users, 'l1_avg': l1_avg, 'l1_act': l1_act_avg, f"L1:{row['仅L1_avg_all']:.2f} L2:{row['仅L2_avg_all']:.2f} L1+L2:{row['L1+L2_avg_all']:.2f}")
'l2_cons': l2_cons, 'l2_users': l2_users, 'l2_avg': l2_avg, 'l2_act': l2_act_avg,
}) # ===== 输出完整表 =====
print("\n" + "="*120)
header = f"{'':<12} {'合计付费':>6} {'合计课消':>7} {'合计人均':>7} | {'L1付费':>6} {'L1课消':>6} {'L1人均':>6} {'L1有消人均':>7} | {'L2付费':>6} {'L2课消':>6} {'L2人均':>6} {'L2有消人均':>7} | {'L1L2付费':>7} {'L1L2课消':>7} {'L1L2人均':>7} {'L1L2有消人均':>8}"
print(header)
print("-"*120)
# 输出
print(f"\n{'':<16} {'付费':>6} {'L1课消':>7} {'L1有消':>7} {'L1人均':>7} {'L1有消人均':>9} {'L2课消':>7} {'L2有消':>7} {'L2人均':>7} {'L2有消人均':>9}")
for r in results: for r in results:
print(f"{r['week']:<16} {r['paid']:>6} {r['l1_cons']:>7} {r['l1_users']:>7} {r['l1_avg']:>7.2f} {r['l1_act']:>9.2f} {r['l2_cons']:>7} {r['l2_users']:>7} {r['l2_avg']:>7.2f} {r['l2_act']:>9.2f}") print(f"{r['week']:<12} {r['合计_paid']:>6} {r['合计_cons']:>7} {r['合计_avg_all']:>7.2f} | "
f"{r['仅L1_paid']:>6} {r['仅L1_cons']:>6} {r['仅L1_avg_all']:>6.2f} {r['仅L1_avg_cons']:>7.2f} | "
f"{r['仅L2_paid']:>6} {r['仅L2_cons']:>6} {r['仅L2_avg_all']:>6.2f} {r['仅L2_avg_cons']:>7.2f} | "
f"{r['L1+L2_paid']:>7} {r['L1+L2_cons']:>7} {r['L1+L2_avg_all']:>7.2f} {r['L1+L2_avg_cons']:>8.2f}")
cur.close() cur.close()
conn.close() conn.close()

View File

@ -0,0 +1,395 @@
#!/usr/bin/env python3
"""
课消指标 v2剔除 U0 序章4张图按 L1/L2 拆分
"""
import psycopg2
from collections import defaultdict
from datetime import datetime, timedelta, date
import openpyxl
from openpyxl.styles import Font, Alignment, PatternFill, Border, Side
from openpyxl.chart import LineChart, BarChart, Reference
from openpyxl.chart.series import DataPoint
from openpyxl.chart.label import DataLabelList
from openpyxl.utils import get_column_letter
conn = psycopg2.connect(
host="bj-postgres-16pob4sg.sql.tencentcdb.com",
port=28591, user="ai_member",
password="LdfjdjL83h3h3^$&**YGG*", dbname="vala_bi"
)
cur = conn.cursor()
# ===== U0 chapter_ids to exclude =====
u0_chapters = {55, 56, 57, 58, 59, 343, 344, 345, 346, 348}
print(f"剔除 U0 序章: {sorted(u0_chapters)}")
# ===== 时间参数 =====
overall_start = date(2025, 9, 1)
overall_end = date(2026, 5, 11)
weeks = []
d = overall_start
while d < overall_end:
ws = d
days_to_sunday = 6 - d.weekday()
we = d + timedelta(days=days_to_sunday)
if we >= overall_end:
we = overall_end - timedelta(days=1)
weeks.append((ws, we))
d = we + timedelta(days=1)
# ===== Step 1: 用户分类 =====
print("\nStep 1: 分类付费用户...")
cur.execute("""
SELECT o.account_id, o.trade_no, o.order_status, o.pay_success_date,
CASE WHEN o.goods_id IN (57, 60, 63) THEN 'L1'
WHEN o.goods_id = 61 THEN 'L1+L2'
WHEN o.goods_id IN (31, 32, 33, 54) THEN 'L2'
ELSE '其他' END as level_type
FROM bi_vala_order o
INNER JOIN bi_vala_app_account a ON o.account_id = a.id
WHERE a.status = 1 AND a.deleted_at IS NULL AND o.pay_success_date IS NOT NULL
""")
orders = cur.fetchall()
cur.execute("SELECT trade_no FROM bi_refund_order WHERE status = 3")
refund_trades = set(r[0] for r in cur.fetchall())
user_data = defaultdict(lambda: {'levels': set(), 'orders': []})
for aid, trade_no, order_status, pay_date, lt in orders:
is_refunded = (order_status == 4 and trade_no in refund_trades)
user_data[aid]['levels'].add(lt)
user_data[aid]['orders'].append((pay_date.date(), is_refunded, lt))
def classify_user(levels):
has_l1, has_l2 = 'L1' in levels, 'L2' in levels
return 'L1+L2' if ('L1+L2' in levels or (has_l1 and has_l2)) else ('仅L1' if has_l1 else ('仅L2' if has_l2 else '其他'))
for aid in user_data:
user_data[aid]['category'] = classify_user(user_data[aid]['levels'])
def is_paid_as_of(aid, as_of_date):
return sum(1 for pd, ref, lt in user_data[aid]['orders'] if pd <= as_of_date and not ref) > 0
# ===== Step 2: 课消 (剔除 U0) =====
print("\nStep 2: 查询课消剔除U0...")
consumption_map = {}
u0_skipped = 0
for table_idx in range(8):
tbl = f"bi_user_chapter_play_record_{table_idx}"
cur.execute(f"""
SELECT user_id, chapter_id, updated_at
FROM {tbl}
WHERE play_status = 1 AND updated_at >= '2025-09-01' AND updated_at < '2026-05-11'
""")
for user_id, chapter_id, updated_at in cur.fetchall():
if chapter_id in u0_chapters:
u0_skipped += 1
continue
key = (user_id, chapter_id)
d = updated_at.date() if hasattr(updated_at, 'date') else datetime.strptime(str(updated_at)[:10], '%Y-%m-%d').date()
if key not in consumption_map or d < consumption_map[key]:
consumption_map[key] = d
print(f" 剔除U0课消: {u0_skipped} 条, 去重后: {len(consumption_map)}")
# ===== Step 3: 角色映射 =====
print("Step 3: 角色映射...")
all_uids = list(set(k[0] for k in consumption_map))
char2acct = {}
bs = 500
for i in range(0, len(all_uids), bs):
batch = all_uids[i:i+bs]
ph = ','.join(['%s'] * len(batch))
cur.execute(f"SELECT id, account_id FROM bi_vala_app_character WHERE id IN ({ph})", batch)
for cid, aid in cur.fetchall():
char2acct[cid] = aid
print(f" 映射: {len(char2acct)}")
# ===== Step 4: 按周汇总 =====
print("Step 4: 按周汇总...")
results = []
for ws, we in weeks:
paid_by_cat = defaultdict(set)
for aid in user_data:
if is_paid_as_of(aid, we):
paid_by_cat[user_data[aid]['category']].add(aid)
cons_by_cat = defaultdict(int)
cons_users_by_cat = defaultdict(set)
for (uid, ch_id), cons_date in consumption_map.items():
if ws <= cons_date <= we:
aid = char2acct.get(uid)
if aid:
cat = user_data.get(aid, {}).get('category', '其他')
if aid in paid_by_cat.get(cat, set()):
cons_by_cat[cat] += 1
cons_users_by_cat[cat].add(aid)
row = {'ws': ws, 'we': we}
for cat in ['仅L1', '仅L2', 'L1+L2', '其他', '合计']:
if cat == '合计':
n_paid = sum(len(v) for v in paid_by_cat.values())
n_cons = sum(cons_by_cat.values())
n_cons_users = len(set.union(*cons_users_by_cat.values())) if cons_users_by_cat else 0
else:
n_paid = len(paid_by_cat.get(cat, set()))
n_cons = cons_by_cat.get(cat, 0)
n_cons_users = len(cons_users_by_cat.get(cat, set()))
row[f'{cat}_paid'] = n_paid
row[f'{cat}_cons'] = n_cons
row[f'{cat}_cons_users'] = n_cons_users
row[f'{cat}_no_cons'] = n_paid - n_cons_users
row[f'{cat}_avg_all'] = round(n_cons / n_paid, 2) if n_paid > 0 else 0
row[f'{cat}_avg_cons'] = round(n_cons / n_cons_users, 2) if n_cons_users > 0 else 0
results.append(row)
cur.close()
conn.close()
# ===== 过滤: 仅保留有足够数据的周(付费人数>0=====
for cat in ['仅L1', '仅L2', 'L1+L2']:
# 找到第一个付费>0的周
first_idx = next((i for i, r in enumerate(results) if r[f'{cat}_paid'] > 0), 0)
print(f"{cat} 数据起于第 {first_idx+1} 周 ({results[first_idx]['ws']})")
# ===== 生成 Excel =====
print("\n生成 Excel...")
wb = openpyxl.Workbook()
wb.remove(wb.active)
# 样式
header_font = Font(name='微软雅黑', bold=True, size=9, color='FFFFFF')
header_fill = PatternFill(start_color='2F5496', end_color='2F5496', fill_type='solid')
data_font = Font(name='微软雅黑', size=9)
title_font = Font(name='微软雅黑', bold=True, size=14, color='2F5496')
subtitle_font = Font(name='微软雅黑', bold=True, size=11, color='2F5496')
border = Border(left=Side(style='thin'), right=Side(style='thin'), top=Side(style='thin'), bottom=Side(style='thin'))
center = Alignment(horizontal='center', vertical='center')
l1_color = '4A90D9'
l2_color = 'E85D47'
l1l2_color = '7B9E4B'
def apply_cell(ws, row, col, value, font=data_font, fill=None, align=center, border_style=border):
c = ws.cell(row=row, column=col, value=value)
c.font, c.border, c.alignment = font, border_style, align
if fill: c.fill = fill
return c
def apply_header(ws, row, col, value):
c = ws.cell(row=row, column=col, value=value)
c.font, c.fill, c.border, c.alignment = header_font, header_fill, border, center
return c
# ===== Sheet 1: 概览 =====
ws1 = wb.create_sheet("概览")
ws1.merge_cells('A1:H1')
apply_cell(ws1, 1, 1, "付费用户 L1/L2 课消分析剔除U0序章", font=title_font, border_style=None, align=Alignment(horizontal='left'))
notes = [
"口径剔除L1/L2的U0序章课时L1 U00: 343-348, L2 U00: 55-59仅统计U1及之后的课消",
"课消用户首次完成某一课时付费用户status=1 + 未删除 + 有订单 + 未全部退款",
]
for i, n in enumerate(notes):
ws1.merge_cells(f'A{3+i}:H{3+i}')
apply_cell(ws1, 3+i, 1, n, font=Font(name='微软雅黑', size=9, color='666666'), border_style=None, align=Alignment(horizontal='left'))
# ===== Sheet 2: 每周明细 =====
ws2 = wb.create_sheet("每周明细")
headers_main = ['', '周一起', '周日'] + ['合计付费', '合计有消', '合计无消', '合计课消', '合计人均', '合计有消人均',
'仅L1付费', '仅L1有消', '仅L1无消', '仅L1课消', '仅L1人均', '仅L1有消人均',
'仅L2付费', '仅L2有消', '仅L2无消', '仅L2课消', '仅L2人均', '仅L2有消人均',
'L1+L2付费', 'L1+L2有消', 'L1+L2无消', 'L1+L2课消', 'L1+L2人均', 'L1+L2有消人均']
for j, h in enumerate(headers_main, 1):
apply_header(ws2, 1, j, h)
for ri, r in enumerate(results):
row = ri + 2
wl = f"{r['ws'].strftime('%m/%d')}-{r['we'].strftime('%m/%d')}"
apply_cell(ws2, row, 1, wl)
apply_cell(ws2, row, 2, r['ws'].strftime('%Y-%m-%d'))
apply_cell(ws2, row, 3, r['we'].strftime('%Y-%m-%d'))
col = 4
for prefix in ['合计', '仅L1', '仅L2', 'L1+L2']:
for metric in ['paid', 'cons_users', 'no_cons', 'cons', 'avg_all', 'avg_cons']:
val = r[f'{prefix}_{metric}']
apply_cell(ws2, row, col, val if isinstance(val, str) else val)
col += 1
for ci in range(1, len(headers_main)+1):
ws2.column_dimensions[get_column_letter(ci)].width = 11 if ci <= 3 else 10
ws2.freeze_panes = 'D2'
# ===== Sheet 3: L1 图表 =====
sheet_names = {'仅L1': ('L1图表', 'L1', l1_color, '4A90D9'), '仅L2': ('L2图表', 'L2', l2_color, 'E85D47')}
for cat, (sname, label, color, light_color) in sheet_names.items():
ws_chart_data = wb.create_sheet(sname)
# 只取该分类有付费用户的周
first_idx = next((i for i, r in enumerate(results) if r[f'{cat}_paid'] > 0), 0)
cat_results = results[first_idx:]
# Header
headers = ['', '付费用户', '有课消用户', '无课消用户', '课消总数', '人均课消', '有消人均']
for j, h in enumerate(headers, 1):
apply_header(ws_chart_data, 1, j, h)
for ri, r in enumerate(cat_results):
row = ri + 2
wl = f"{r['ws'].strftime('%m/%d')}"
apply_cell(ws_chart_data, row, 1, wl)
apply_cell(ws_chart_data, row, 2, r[f'{cat}_paid'])
apply_cell(ws_chart_data, row, 3, r[f'{cat}_cons_users'])
apply_cell(ws_chart_data, row, 4, r[f'{cat}_no_cons'])
apply_cell(ws_chart_data, row, 5, r[f'{cat}_cons'])
apply_cell(ws_chart_data, row, 6, r[f'{cat}_avg_all'])
apply_cell(ws_chart_data, row, 7, r[f'{cat}_avg_cons'])
n_rows = len(cat_results)
cats_ref = Reference(ws_chart_data, min_col=1, min_row=2, max_row=n_rows+1)
# --- Chart 1: 堆叠柱状图 (有课消/无课消) ---
chart1 = BarChart()
chart1.type = "col"
chart1.grouping = "stacked"
chart1.title = f"{label} 付费用户课消分布剔除U0序章"
chart1.style = 10
chart1.width = 24
chart1.height = 13
# 有课消用户
ref1 = Reference(ws_chart_data, min_col=3, min_row=1, max_row=n_rows+1)
chart1.add_data(ref1, titles_from_data=True)
chart1.set_categories(cats_ref)
chart1.series[0].graphicalProperties.solidFill = light_color
# 无课消用户
ref2 = Reference(ws_chart_data, min_col=4, min_row=1, max_row=n_rows+1)
chart1.add_data(ref2, titles_from_data=True)
chart1.series[1].graphicalProperties.solidFill = 'D9D9D9'
chart1.y_axis.title = '用户数'
chart1.legend.position = 'b'
ws_chart_data.add_chart(chart1, "A9")
# --- Chart 2: 折线图 (人均课消 + 有消人均) ---
chart2 = LineChart()
chart2.title = f"{label} 周人均课消趋势剔除U0序章"
chart2.style = 10
chart2.width = 24
chart2.height = 13
chart2.y_axis.title = '课消数(节/周)'
ref3 = Reference(ws_chart_data, min_col=6, min_row=1, max_row=n_rows+1)
chart2.add_data(ref3, titles_from_data=True)
chart2.set_categories(cats_ref)
chart2.series[0].graphicalProperties.line.solidFill = '999999'
chart2.series[0].graphicalProperties.line.width = 20000
chart2.series[0].graphicalProperties.line.dashStyle = 'dash'
ref4 = Reference(ws_chart_data, min_col=7, min_row=1, max_row=n_rows+1)
chart2.add_data(ref4, titles_from_data=True)
chart2.series[1].graphicalProperties.line.solidFill = color
chart2.series[1].graphicalProperties.line.width = 28000
chart2.y_axis.scaling.min = 0
chart2.legend.position = 'b'
ws_chart_data.add_chart(chart2, "A27")
# Column widths
for ci in range(1, 8):
ws_chart_data.column_dimensions[get_column_letter(ci)].width = 12
# ===== Sheet 4: L1+L2 图表(第三个分类)=====
ws_l1l2 = wb.create_sheet("L1+L2图表")
cat = 'L1+L2'
color = l1l2_color
light_color = 'A8C88E'
first_idx = next((i for i, r in enumerate(results) if r[f'{cat}_paid'] > 0), 0)
cat_results = results[first_idx:]
headers = ['', '付费用户', '有课消用户', '无课消用户', '课消总数', '人均课消', '有消人均']
for j, h in enumerate(headers, 1):
apply_header(ws_l1l2, 1, j, h)
n_rows = len(cat_results)
for ri, r in enumerate(cat_results):
row = ri + 2
wl = f"{r['ws'].strftime('%m/%d')}"
apply_cell(ws_l1l2, row, 1, wl)
apply_cell(ws_l1l2, row, 2, r[f'{cat}_paid'])
apply_cell(ws_l1l2, row, 3, r[f'{cat}_cons_users'])
apply_cell(ws_l1l2, row, 4, r[f'{cat}_no_cons'])
apply_cell(ws_l1l2, row, 5, r[f'{cat}_cons'])
apply_cell(ws_l1l2, row, 6, r[f'{cat}_avg_all'])
apply_cell(ws_l1l2, row, 7, r[f'{cat}_avg_cons'])
cats_ref = Reference(ws_l1l2, min_col=1, min_row=2, max_row=n_rows+1)
chart1 = BarChart()
chart1.type = "col"
chart1.grouping = "stacked"
chart1.title = f"L1+L2 付费用户课消分布剔除U0序章"
chart1.style = 10
chart1.width = 24
chart1.height = 13
ref1 = Reference(ws_l1l2, min_col=3, min_row=1, max_row=n_rows+1)
chart1.add_data(ref1, titles_from_data=True)
chart1.set_categories(cats_ref)
chart1.series[0].graphicalProperties.solidFill = light_color
ref2 = Reference(ws_l1l2, min_col=4, min_row=1, max_row=n_rows+1)
chart1.add_data(ref2, titles_from_data=True)
chart1.series[1].graphicalProperties.solidFill = 'D9D9D9'
chart1.y_axis.title = '用户数'
chart1.legend.position = 'b'
ws_l1l2.add_chart(chart1, "A9")
chart2 = LineChart()
chart2.title = f"L1+L2 周人均课消趋势剔除U0序章"
chart2.style = 10
chart2.width = 24
chart2.height = 13
chart2.y_axis.title = '课消数(节/周)'
ref3 = Reference(ws_l1l2, min_col=6, min_row=1, max_row=n_rows+1)
chart2.add_data(ref3, titles_from_data=True)
chart2.set_categories(cats_ref)
chart2.series[0].graphicalProperties.line.solidFill = '999999'
chart2.series[0].graphicalProperties.line.width = 20000
chart2.series[0].graphicalProperties.line.dashStyle = 'dash'
ref4 = Reference(ws_l1l2, min_col=7, min_row=1, max_row=n_rows+1)
chart2.add_data(ref4, titles_from_data=True)
chart2.series[1].graphicalProperties.line.solidFill = color
chart2.series[1].graphicalProperties.line.width = 28000
chart2.y_axis.scaling.min = 0
chart2.legend.position = 'b'
ws_l1l2.add_chart(chart2, "A27")
for ci in range(1, 8):
ws_l1l2.column_dimensions[get_column_letter(ci)].width = 12
# 保存
path = '/root/.openclaw/workspace/output/course_consumption_by_level_v2.xlsx'
wb.save(path)
print(f"\n✅ Excel v2 已保存: {path}")
# 简要摘要
last = results[-1]
print(f"""
=== 剔除U0后最终数据截至5/10 ===
仅L1: 付费{last['仅L1_paid']} 有消{last['仅L1_cons_users']} 无消{last['仅L1_no_cons']} 人均{last['仅L1_avg_all']} 有消人均{last['仅L1_avg_cons']}
仅L2: 付费{last['仅L2_paid']} 有消{last['仅L2_cons_users']} 无消{last['仅L2_no_cons']} 人均{last['仅L2_avg_all']} 有消人均{last['仅L2_avg_cons']}
L1+L2: 付费{last['L1+L2_paid']} 有消{last['L1+L2_cons_users']} 无消{last['L1+L2_no_cons']} 人均{last['L1+L2_avg_all']} 有消人均{last['L1+L2_avg_cons']}
合计: 付费{last['合计_paid']} 有消{last['合计_cons_users']} 无消{last['合计_no_cons']} 人均{last['合计_avg_all']} 有消人均{last['合计_avg_cons']}
""")

287
scripts/course_excel_v3.py Normal file
View File

@ -0,0 +1,287 @@
#!/usr/bin/env python3
import psycopg2
from collections import defaultdict
from datetime import datetime, timedelta, date
import openpyxl
from openpyxl.styles import Font, Alignment, PatternFill, Border, Side
from openpyxl.chart import LineChart, BarChart, Reference
from openpyxl.utils import get_column_letter
conn = psycopg2.connect(
host="bj-postgres-16pob4sg.sql.tencentcdb.com",
port=28591, user="ai_member",
password="LdfjdjL83h3h3^$&**YGG*", dbname="vala_bi"
)
cur = conn.cursor()
u0_chapters = {55, 56, 57, 58, 59, 343, 344, 345, 346, 348}
overall_start = date(2025, 9, 1)
overall_end = date(2026, 5, 11)
weeks = []
d = overall_start
while d < overall_end:
ws = d
we = d + timedelta(days=6 - d.weekday())
if we >= overall_end: we = overall_end - timedelta(days=1)
weeks.append((ws, we))
d = we + timedelta(days=1)
print("分类付费用户...")
cur.execute("""
SELECT o.account_id, o.trade_no, o.order_status, o.pay_success_date,
CASE WHEN o.goods_id IN (57, 60, 63) THEN 'L1'
WHEN o.goods_id = 61 THEN 'L1+L2'
WHEN o.goods_id IN (31, 32, 33, 54) THEN 'L2'
ELSE '其他' END as level_type
FROM bi_vala_order o
INNER JOIN bi_vala_app_account a ON o.account_id = a.id
WHERE a.status = 1 AND a.deleted_at IS NULL AND o.pay_success_date IS NOT NULL
""")
orders = cur.fetchall()
cur.execute("SELECT trade_no FROM bi_refund_order WHERE status = 3")
refund_trades = set(r[0] for r in cur.fetchall())
user_levels = defaultdict(set)
user_orders = defaultdict(list)
for aid, trade_no, order_status, pay_date, lt in orders:
is_refunded = (order_status == 4 and trade_no in refund_trades)
user_levels[aid].add(lt)
user_orders[aid].append((pay_date.date(), is_refunded))
def is_paid(aid, as_of):
return sum(1 for pd, ref in user_orders[aid] if pd <= as_of and not ref) > 0
l1_pool = {aid for aid, lv in user_levels.items() if 'L1' in lv or 'L1+L2' in lv}
l2_pool = {aid for aid, lv in user_levels.items() if 'L2' in lv or 'L1+L2' in lv}
all_pool = l1_pool | l2_pool
print(f"L1池: {len(l1_pool)}, L2池: {len(l2_pool)}, 合计: {len(all_pool)}")
print("查询课消...")
cons_map = {}
for ti in range(8):
tbl = f"bi_user_chapter_play_record_{ti}"
cur.execute(f"""SELECT user_id, chapter_id, updated_at FROM {tbl}
WHERE play_status = 1 AND updated_at >= '2025-09-01' AND updated_at < '2026-05-11'""")
for uid, cid, ua in cur.fetchall():
if cid in u0_chapters: continue
key = (uid, cid)
d = ua.date() if hasattr(ua, 'date') else datetime.strptime(str(ua)[:10], '%Y-%m-%d').date()
if key not in cons_map or d < cons_map[key]:
cons_map[key] = d
print("角色映射...")
all_uids = list(set(k[0] for k in cons_map))
char2acct = {}
for i in range(0, len(all_uids), 500):
batch = all_uids[i:i+500]
ph = ','.join(['%s'] * len(batch))
cur.execute(f"SELECT id, account_id FROM bi_vala_app_character WHERE id IN ({ph})", batch)
for cid, aid in cur.fetchall(): char2acct[cid] = aid
print("按周汇总...")
results = []
for ws, we in weeks:
l1_paid = {aid for aid in l1_pool if is_paid(aid, we)}
l2_paid = {aid for aid in l2_pool if is_paid(aid, we)}
t_paid = {aid for aid in all_pool if is_paid(aid, we)}
l1_cons, l1_cons_users = 0, set()
l2_cons, l2_cons_users = 0, set()
t_cons, t_cu = 0, set()
for (uid, ch_id), cons_date in cons_map.items():
if ws <= cons_date <= we:
aid = char2acct.get(uid)
if not aid: continue
if aid in l1_paid:
l1_cons += 1
l1_cons_users.add(aid)
if aid in l2_paid:
l2_cons += 1
l2_cons_users.add(aid)
if aid in t_paid:
t_cons += 1
t_cu.add(aid)
results.append({
'ws': ws, 'we': we,
'L1_paid': len(l1_paid), 'L1_cons': l1_cons, 'L1_cons_users': len(l1_cons_users),
'L1_no_cons': len(l1_paid) - len(l1_cons_users),
'L1_avg_all': round(l1_cons / len(l1_paid), 2) if l1_paid else 0,
'L1_avg_cons': round(l1_cons / len(l1_cons_users), 2) if l1_cons_users else 0,
'L2_paid': len(l2_paid), 'L2_cons': l2_cons, 'L2_cons_users': len(l2_cons_users),
'L2_no_cons': len(l2_paid) - len(l2_cons_users),
'L2_avg_all': round(l2_cons / len(l2_paid), 2) if l2_paid else 0,
'L2_avg_cons': round(l2_cons / len(l2_cons_users), 2) if l2_cons_users else 0,
'total_paid': len(t_paid), 'total_cons': t_cons, 'total_cons_users': len(t_cu),
'total_no_cons': len(t_paid) - len(t_cu),
'total_avg_all': round(t_cons / len(t_paid), 2) if t_paid else 0,
'total_avg_cons': round(t_cons / len(t_cu), 2) if t_cu else 0,
})
cur.close()
conn.close()
print("\n生成 Excel...")
wb = openpyxl.Workbook()
wb.remove(wb.active)
hfont = Font(name='微软雅黑', bold=True, size=9, color='FFFFFF')
hfill = PatternFill(start_color='2F5496', end_color='2F5496', fill_type='solid')
dfont = Font(name='微软雅黑', size=9)
tfont = Font(name='微软雅黑', bold=True, size=14, color='2F5496')
sfont = Font(name='微软雅黑', bold=True, size=11, color='2F5496')
bd = Border(left=Side(style='thin'), right=Side(style='thin'), top=Side(style='thin'), bottom=Side(style='thin'))
ctr = Alignment(horizontal='center', vertical='center')
def ac(ws, r, c, v, font=dfont, fill=None, align=ctr):
cl = ws.cell(row=r, column=c, value=v)
cl.font, cl.border, cl.alignment = font, bd, align
if fill: cl.fill = fill
return cl
def ah(ws, r, c, v):
cl = ws.cell(row=r, column=c, value=v)
cl.font, cl.fill, cl.border, cl.alignment = hfont, hfill, bd, ctr
return cl
# Sheet 1: 概览
ws1 = wb.create_sheet("概览")
ws1.merge_cells('A1:H1')
ac(ws1, 1, 1, "付费用户课消分析剔除U0序章", font=tfont, fill=None, align=Alignment(horizontal='left'))
notes = [
"口径L1付费用户 = 买过L1商品(含L1+L2)的付费用户 | L2付费用户 = 买过L2商品(含L1+L2)的付费用户",
"L1+L2用户同时出现在L1和L2两个视角中 | 合计为去重统计",
"课消用户首次完成某一课时剔除U0序章仅U1+",
"付费用户status=1 + 未删除 + 有未退款订单",
]
for i, n in enumerate(notes):
ws1.merge_cells(f'A{3+i}:H{3+i}')
ac(ws1, 3+i, 1, n, font=Font(name='微软雅黑', size=9, color='666666'), fill=None, align=Alignment(horizontal='left'))
row = 9
ws1.merge_cells(f'A{row}:H{row}')
ac(ws1, row, 1, "汇总(截至最后一周)", font=sfont, fill=None, align=Alignment(horizontal='left'))
row += 1
for j, h in enumerate(['分类', '付费用户', '有课消', '无课消', '无课消率', '人均课消', '有消人均'], 1):
ah(ws1, row, j, h)
row += 1
last = results[-1]
summary = [
('L1付费群', last['L1_paid'], last['L1_cons_users'], last['L1_no_cons'], last['L1_avg_all'], last['L1_avg_cons'], '#A8CFF1'),
('L2付费群', last['L2_paid'], last['L2_cons_users'], last['L2_no_cons'], last['L2_avg_all'], last['L2_avg_cons'], '#F4A9A0'),
('合计(去重)', last['total_paid'], last['total_cons_users'], last['total_no_cons'], last['total_avg_all'], last['total_avg_cons'], '#C8E6C9'),
]
for name, p, cu, nc, aa, ac_, clr in summary:
no_rate = f"{nc/p*100:.0f}%" if p else "0%"
fl = PatternFill(start_color='00'+clr[1:], end_color='00'+clr[1:], fill_type='solid')
for j, v in enumerate([name, p, cu, nc, no_rate, aa, ac_], 1):
f = Font(name='微软雅黑', bold=(j==1), size=10)
ac(ws1, row, j, v, font=f, fill=fl)
row += 1
# Sheet 2: 每周明细
ws2 = wb.create_sheet("每周明细")
headers = ['', '周一起', '周日']
for prefix in ['合计', 'L1付费群', 'L2付费群']:
for m in ['付费', '有消', '无消', '课消', '人均', '有消人均']:
headers.append(f'{prefix}{m}')
for j, h in enumerate(headers, 1):
ah(ws2, 1, j, h)
for ri, r in enumerate(results):
rw = ri + 2
ac(ws2, rw, 1, r['ws'].strftime('%m/%d'))
ac(ws2, rw, 2, r['ws'].strftime('%Y-%m-%d'))
ac(ws2, rw, 3, r['we'].strftime('%Y-%m-%d'))
col = 4
for prefix in ['total', 'L1', 'L2']:
for k in ['paid', 'cons_users', 'no_cons', 'cons', 'avg_all', 'avg_cons']:
ac(ws2, rw, col, r[f'{prefix}_{k}'])
col += 1
for ci in range(1, len(headers)+1):
ws2.column_dimensions[get_column_letter(ci)].width = 11 if ci <= 3 else 10
ws2.freeze_panes = 'D2'
# Sheet 3: L1图表
ws_l1 = wb.create_sheet("L1图表")
lh = ['', '付费用户', '有课消用户', '无课消用户', '课消总数', '人均课消', '有消人均']
first = next(i for i, r in enumerate(results) if r['L1_paid'] > 0)
l1d = results[first:]
for j, h in enumerate(lh, 1): ah(ws_l1, 1, j, h)
for ri, r in enumerate(l1d):
rw = ri + 2
ac(ws_l1, rw, 1, r['ws'].strftime('%m/%d'))
for j, k in enumerate(['L1_paid','L1_cons_users','L1_no_cons','L1_cons','L1_avg_all','L1_avg_cons'], 2):
ac(ws_l1, rw, j, r[k])
n = len(l1d)
cr = Reference(ws_l1, min_col=1, min_row=2, max_row=n+1)
ch1 = BarChart(); ch1.type = "col"; ch1.grouping = "stacked"
ch1.title = "L1付费用户周课消分布剔除U0序章"; ch1.style = 10; ch1.width = 24; ch1.height = 13
r1 = Reference(ws_l1, min_col=3, min_row=1, max_row=n+1); ch1.add_data(r1, titles_from_data=True)
r2 = Reference(ws_l1, min_col=4, min_row=1, max_row=n+1); ch1.add_data(r2, titles_from_data=True)
ch1.set_categories(cr)
ch1.series[0].graphicalProperties.solidFill = 'A8CFF1'
ch1.series[1].graphicalProperties.solidFill = 'D9D9D9'
ch1.y_axis.title = '用户数'; ch1.legend.position = 'b'
ws_l1.add_chart(ch1, "A9")
ch2 = LineChart(); ch2.title = "L1付费用户周人均课消趋势剔除U0序章"; ch2.style = 10; ch2.width = 24; ch2.height = 13
r3 = Reference(ws_l1, min_col=6, min_row=1, max_row=n+1); ch2.add_data(r3, titles_from_data=True)
r4 = Reference(ws_l1, min_col=7, min_row=1, max_row=n+1); ch2.add_data(r4, titles_from_data=True)
ch2.set_categories(cr)
ch2.series[0].graphicalProperties.line.solidFill = '999999'; ch2.series[0].graphicalProperties.line.width = 20000
ch2.series[1].graphicalProperties.line.solidFill = '4A90D9'; ch2.series[1].graphicalProperties.line.width = 28000
ch2.y_axis.scaling.min = 0; ch2.y_axis.title = '课消数(节/周)'; ch2.legend.position = 'b'
ws_l1.add_chart(ch2, "A27")
for ci in range(1, 8): ws_l1.column_dimensions[get_column_letter(ci)].width = 12
# Sheet 4: L2图表
ws_l2 = wb.create_sheet("L2图表")
first2 = next(i for i, r in enumerate(results) if r['L2_paid'] > 0)
l2d = results[first2:]
for j, h in enumerate(lh, 1): ah(ws_l2, 1, j, h)
for ri, r in enumerate(l2d):
rw = ri + 2
ac(ws_l2, rw, 1, r['ws'].strftime('%m/%d'))
for j, k in enumerate(['L2_paid','L2_cons_users','L2_no_cons','L2_cons','L2_avg_all','L2_avg_cons'], 2):
ac(ws_l2, rw, j, r[k])
n2 = len(l2d)
cr2 = Reference(ws_l2, min_col=1, min_row=2, max_row=n2+1)
ch3 = BarChart(); ch3.type = "col"; ch3.grouping = "stacked"
ch3.title = "L2付费用户周课消分布剔除U0序章"; ch3.style = 10; ch3.width = 24; ch3.height = 13
r5 = Reference(ws_l2, min_col=3, min_row=1, max_row=n2+1); ch3.add_data(r5, titles_from_data=True)
r6 = Reference(ws_l2, min_col=4, min_row=1, max_row=n2+1); ch3.add_data(r6, titles_from_data=True)
ch3.set_categories(cr2)
ch3.series[0].graphicalProperties.solidFill = 'F4A9A0'
ch3.series[1].graphicalProperties.solidFill = 'D9D9D9'
ch3.y_axis.title = '用户数'; ch3.legend.position = 'b'
ws_l2.add_chart(ch3, "A9")
ch4 = LineChart(); ch4.title = "L2付费用户周人均课消趋势剔除U0序章"; ch4.style = 10; ch4.width = 24; ch4.height = 13
r7 = Reference(ws_l2, min_col=6, min_row=1, max_row=n2+1); ch4.add_data(r7, titles_from_data=True)
r8 = Reference(ws_l2, min_col=7, min_row=1, max_row=n2+1); ch4.add_data(r8, titles_from_data=True)
ch4.set_categories(cr2)
ch4.series[0].graphicalProperties.line.solidFill = '999999'; ch4.series[0].graphicalProperties.line.width = 20000
ch4.series[1].graphicalProperties.line.solidFill = 'E85D47'; ch4.series[1].graphicalProperties.line.width = 28000
ch4.y_axis.scaling.min = 0; ch4.y_axis.title = '课消数(节/周)'; ch4.legend.position = 'b'
ws_l2.add_chart(ch4, "A27")
for ci in range(1, 8): ws_l2.column_dimensions[get_column_letter(ci)].width = 12
path = '/root/.openclaw/workspace/output/course_consumption_by_level_v3.xlsx'
wb.save(path)
print(f"\n{path}")
print(f"L1付费群: {last['L1_paid']}人 | L2付费群: {last['L2_paid']}人 | 合计(去重): {last['total_paid']}")
print(f"L1无消率: {last['L1_no_cons']/last['L1_paid']*100:.0f}% | L2无消率: {last['L2_no_cons']/last['L2_paid']*100:.0f}%")

129
scripts/excel_v4.py Normal file
View File

@ -0,0 +1,129 @@
#!/usr/bin/env python3
"""Excel v4: L1只看L1课程, L2只看L2课程"""
import json, openpyxl
from datetime import date
from openpyxl.styles import Font, Alignment, PatternFill, Border, Side
from openpyxl.chart import LineChart, BarChart, Reference
from openpyxl.utils import get_column_letter
with open('/root/.openclaw/workspace/output/course_data_v4.json') as f:
raw = json.load(f)
results = raw['results']
for r in results:
r['ws'] = date.fromisoformat(r['ws'])
r['we'] = date.fromisoformat(r['we'])
wb = openpyxl.Workbook()
wb.remove(wb.active)
hfont = Font(name='微软雅黑', bold=True, size=9, color='FFFFFF')
hfill = PatternFill(start_color='002F5496', end_color='002F5496', fill_type='solid')
dfont = Font(name='微软雅黑', size=9)
tfont = Font(name='微软雅黑', bold=True, size=14, color='002F5496')
sfont = Font(name='微软雅黑', bold=True, size=11, color='002F5496')
bd = Border(left=Side(style='thin'), right=Side(style='thin'), top=Side(style='thin'), bottom=Side(style='thin'))
ctr = Alignment(horizontal='center', vertical='center')
def ac(ws, r, c, v, font=dfont, fill=None, align=ctr):
cl = ws.cell(row=r, column=c, value=v)
cl.font, cl.border, cl.alignment = font, bd, align
if fill: cl.fill = fill
def ah(ws, r, c, v):
cl = ws.cell(row=r, column=c, value=v)
cl.font, cl.fill, cl.border, cl.alignment = hfont, hfill, bd, ctr
# Sheet 1
ws1 = wb.create_sheet("概览")
ws1.merge_cells('A1:H1')
ac(ws1,1,1,"付费用户课消分析 v4只看对应级别课程剔除U0",font=tfont,align=Alignment(horizontal='left'))
notes = [
"口径L1付费群 = 买过L1商品的付费用户, 只看L1课程课消 | L2付费群 = 买过L2商品的付费用户, 只看L2课程课消",
"L1+L2用户在L1视角只统计L1课程课消, L2视角只统计L2课程课消",
"课消用户首次完成某一课时剔除U0序章",
"付费用户status=1 + 未删除 + 有未退款订单",
]
for i,n in enumerate(notes):
ws1.merge_cells(f'A{3+i}:H{3+i}')
ac(ws1,3+i,1,n,font=Font(name='微软雅黑',size=9,color='666666'),align=Alignment(horizontal='left'))
row=9
ws1.merge_cells(f'A{row}:H{row}')
ac(ws1,row,1,"汇总(截至最后一周)",font=sfont,align=Alignment(horizontal='left'))
row+=1
for j,h in enumerate(['分类','付费用户','有课消','无课消','无课消率','人均课消','有消人均'],1):
ah(ws1,row,j,h)
row+=1
last=results[-1]
skus = [
('L1付费群(只看L1课程)', last['L1_paid'],last['L1_cons_users'],last['L1_no_cons'],last['L1_avg_all'],last['L1_avg_cons'], '00A8CFF1'),
('L2付费群(只看L2课程)', last['L2_paid'],last['L2_cons_users'],last['L2_no_cons'],last['L2_avg_all'],last['L2_avg_cons'], '00F4A9A0'),
('合计(去重)', last['total_paid'],last['total_cons_users'],last['total_no_cons'],last['total_avg_all'],last['total_avg_cons'], '00C8E6C9'),
]
for name,p,cu,nc,aa,ac_,clr in skus:
no_rate=f"{nc/p*100:.0f}%" if p else "0%"
fl=PatternFill(start_color=clr,end_color=clr,fill_type='solid')
for j,v in enumerate([name,p,cu,nc,no_rate,aa,ac_],1):
ac(ws1,row,j,v,font=Font(name='微软雅黑',bold=(j==1),size=10),fill=fl)
row+=1
# Sheet 2
ws2=wb.create_sheet("每周明细")
headers=['','周一起','周日']
for pfx in ['合计','L1付费群','L2付费群']:
for m in ['付费','有消','无消','课消','人均','有消人均']:
headers.append(f'{pfx}{m}')
for j,h in enumerate(headers,1): ah(ws2,1,j,h)
for ri,r in enumerate(results):
rw=ri+2
ac(ws2,rw,1,r['ws'].strftime('%m/%d'))
ac(ws2,rw,2,r['ws'].strftime('%Y-%m-%d'))
ac(ws2,rw,3,r['we'].strftime('%Y-%m-%d'))
col=4
for prefix in ['total','L1','L2']:
for k in ['paid','cons_users','no_cons','cons','avg_all','avg_cons']:
ac(ws2,rw,col,r[f'{prefix}_{k}'])
col+=1
for ci in range(1,len(headers)+1):
ws2.column_dimensions[get_column_letter(ci)].width=11 if ci<=3 else 10
ws2.freeze_panes='D2'
# Sheet 3+4: charts
for lvl, pf, clr in [('L1','L1','4A90D9'),('L2','L2','E85D47')]:
ws=wb.create_sheet(f"{pf}图表")
lh=['','付费用户','有课消用户','无课消用户','课消总数','人均课消','有消人均']
first=next(i for i,r in enumerate(results) if r[f'{pf}_paid']>0)
ld=results[first:]
for j,h in enumerate(lh,1): ah(ws,1,j,h)
for ri,r in enumerate(ld):
rw=ri+2
ac(ws,rw,1,r['ws'].strftime('%m/%d'))
for j,k in enumerate([f'{pf}_paid',f'{pf}_cons_users',f'{pf}_no_cons',f'{pf}_cons',f'{pf}_avg_all',f'{pf}_avg_cons'],2):
ac(ws,rw,j,r[k])
n=len(ld)
cr=Reference(ws,min_col=1,min_row=2,max_row=n+1)
ch1=BarChart(); ch1.type="col"; ch1.grouping="stacked"
ch1.title=f"{pf}付费用户周课消分布(只看{pf}课程)"; ch1.style=10; ch1.width=24; ch1.height=13
ch1.add_data(Reference(ws,min_col=3,min_row=1,max_row=n+1),titles_from_data=True)
ch1.add_data(Reference(ws,min_col=4,min_row=1,max_row=n+1),titles_from_data=True)
ch1.set_categories(cr)
ch1.series[0].graphicalProperties.solidFill='A8CFF1' if pf=='L1' else 'F4A9A0'
ch1.series[1].graphicalProperties.solidFill='D9D9D9'
ch1.y_axis.title='用户数'; ch1.legend.position='b'
ws.add_chart(ch1,"A9")
ch2=LineChart(); ch2.title=f"{pf}付费用户周人均课消趋势(只看{pf}课程)"; ch2.style=10; ch2.width=24; ch2.height=13
ch2.add_data(Reference(ws,min_col=6,min_row=1,max_row=n+1),titles_from_data=True)
ch2.add_data(Reference(ws,min_col=7,min_row=1,max_row=n+1),titles_from_data=True)
ch2.set_categories(cr)
ch2.series[0].graphicalProperties.line.solidFill='999999'; ch2.series[0].graphicalProperties.line.width=20000
ch2.series[1].graphicalProperties.line.solidFill=clr; ch2.series[1].graphicalProperties.line.width=28000
ch2.y_axis.scaling.min=0; ch2.y_axis.title='课消数(节/周)'; ch2.legend.position='b'
ws.add_chart(ch2,"A27")
for ci in range(1,8): ws.column_dimensions[get_column_letter(ci)].width=12
path='/root/.openclaw/workspace/output/course_consumption_by_level_v4.xlsx'
wb.save(path)
print(f'{path}')

247
scripts/generate_charts.py Normal file
View File

@ -0,0 +1,247 @@
#!/usr/bin/env python3
"""
生成 4 张课消图表剔除U0序章
1. L1 付费用户课消分布堆叠柱状图
2. L2 付费用户课消分布堆叠柱状图
3. L1 周人均课消趋势折线图
4. L2 周人均课消趋势折线图
"""
import psycopg2
from collections import defaultdict
from datetime import datetime, timedelta, date
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as ticker
import numpy as np
# 中文字体
import matplotlib.font_manager as fm
font_path = '/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc'
fm.fontManager.addfont(font_path)
prop = fm.FontProperties(fname=font_path)
font_name = prop.get_name()
plt.rcParams['font.family'] = font_name
plt.rcParams['axes.unicode_minus'] = False
print(f'使用字体: {font_name}')
conn = psycopg2.connect(
host="bj-postgres-16pob4sg.sql.tencentcdb.com",
port=28591, user="ai_member",
password="LdfjdjL83h3h3^$&**YGG*", dbname="vala_bi"
)
cur = conn.cursor()
# ===== 配置 =====
u0_chapters = {55, 56, 57, 58, 59, 343, 344, 345, 346, 348}
overall_start = date(2025, 9, 1)
overall_end = date(2026, 5, 11)
weeks = []
d = overall_start
while d < overall_end:
ws = d
we = d + timedelta(days=6 - d.weekday())
if we >= overall_end:
we = overall_end - timedelta(days=1)
weeks.append((ws, we))
d = we + timedelta(days=1)
# ===== 用户分类 =====
print("分类付费用户...")
cur.execute("""
SELECT o.account_id, o.trade_no, o.order_status, o.pay_success_date,
CASE WHEN o.goods_id IN (57, 60, 63) THEN 'L1'
WHEN o.goods_id = 61 THEN 'L1+L2'
WHEN o.goods_id IN (31, 32, 33, 54) THEN 'L2'
ELSE '其他' END as level_type
FROM bi_vala_order o
INNER JOIN bi_vala_app_account a ON o.account_id = a.id
WHERE a.status = 1 AND a.deleted_at IS NULL AND o.pay_success_date IS NOT NULL
""")
orders = cur.fetchall()
cur.execute("SELECT trade_no FROM bi_refund_order WHERE status = 3")
refund_trades = set(r[0] for r in cur.fetchall())
user_data = defaultdict(lambda: {'levels': set(), 'orders': []})
for aid, trade_no, order_status, pay_date, lt in orders:
is_refunded = (order_status == 4 and trade_no in refund_trades)
user_data[aid]['levels'].add(lt)
user_data[aid]['orders'].append((pay_date.date(), is_refunded, lt))
def classify(levels):
h1, h2 = 'L1' in levels, 'L2' in levels
return 'L1+L2' if ('L1+L2' in levels or (h1 and h2)) else ('仅L1' if h1 else ('仅L2' if h2 else '其他'))
for aid in user_data:
user_data[aid]['category'] = classify(user_data[aid]['levels'])
def is_paid(aid, as_of):
return sum(1 for pd, ref, lt in user_data[aid]['orders'] if pd <= as_of and not ref) > 0
# ===== 课消 =====
print("查询课消...")
cons_map = {}
for table_idx in range(8):
tbl = f"bi_user_chapter_play_record_{table_idx}"
cur.execute(f"""
SELECT user_id, chapter_id, updated_at
FROM {tbl}
WHERE play_status = 1 AND updated_at >= '2025-09-01' AND updated_at < '2026-05-11'
""")
for uid, cid, ua in cur.fetchall():
if cid in u0_chapters: continue
key = (uid, cid)
d = ua.date() if hasattr(ua, 'date') else datetime.strptime(str(ua)[:10], '%Y-%m-%d').date()
if key not in cons_map or d < cons_map[key]:
cons_map[key] = d
# 角色映射
print("角色映射...")
all_uids = list(set(k[0] for k in cons_map))
char2acct = {}
bs = 500
for i in range(0, len(all_uids), bs):
batch = all_uids[i:i+bs]
ph = ','.join(['%s'] * len(batch))
cur.execute(f"SELECT id, account_id FROM bi_vala_app_character WHERE id IN ({ph})", batch)
for cid, aid in cur.fetchall():
char2acct[cid] = aid
# ===== 按周汇总 =====
print("按周汇总...")
results = []
for ws, we in weeks:
paid_by_cat = defaultdict(set)
for aid in user_data:
if is_paid(aid, we):
paid_by_cat[user_data[aid]['category']].add(aid)
cons_by_cat = defaultdict(int)
cons_users_by_cat = defaultdict(set)
for (uid, ch_id), cons_date in cons_map.items():
if ws <= cons_date <= we:
aid = char2acct.get(uid)
if aid:
cat = user_data.get(aid, {}).get('category', '其他')
if aid in paid_by_cat.get(cat, set()):
cons_by_cat[cat] += 1
cons_users_by_cat[cat].add(aid)
row = {'ws': ws, 'we': we}
for cat in ['仅L1', '仅L2', 'L1+L2']:
n_paid = len(paid_by_cat.get(cat, set()))
n_cons = cons_by_cat.get(cat, 0)
n_cons_users = len(cons_users_by_cat.get(cat, set()))
row[f'{cat}_paid'] = n_paid
row[f'{cat}_cons'] = n_cons
row[f'{cat}_cons_users'] = n_cons_users
row[f'{cat}_no_cons'] = n_paid - n_cons_users
row[f'{cat}_avg_all'] = round(n_cons / n_paid, 2) if n_paid > 0 else 0
row[f'{cat}_avg_cons'] = round(n_cons / n_cons_users, 2) if n_cons_users > 0 else 0
results.append(row)
cur.close()
conn.close()
# ===== 图表生成 =====
print("\n生成图表...")
output_dir = '/root/.openclaw/workspace/output'
configs = {
'L1': {'cat': '仅L1', 'color': '#4A90D9', 'light': '#A8CFF1', 'label': 'L1'},
'L2': {'cat': '仅L2', 'color': '#E85D47', 'light': '#F4A9A0', 'label': 'L2'},
}
for key, cfg in configs.items():
cat = cfg['cat']
color = cfg['color']
light = cfg['light']
label = cfg['label']
# 过滤无数据周
first = next(i for i, r in enumerate(results) if r[f'{cat}_paid'] > 0)
data = results[first:]
xs = [r['ws'] + timedelta(days=3) for r in data]
labels = [r['ws'].strftime('%m/%d') for r in data]
paid = [r[f'{cat}_paid'] for r in data]
cons_users = [r[f'{cat}_cons_users'] for r in data]
no_cons = [r[f'{cat}_no_cons'] for r in data]
avg_all = [r[f'{cat}_avg_all'] for r in data]
avg_cons = [r[f'{cat}_avg_cons'] for r in data]
# --- 图1: 堆叠柱状图 ---
fig, ax = plt.subplots(figsize=(18, 8))
x_idx = np.arange(len(xs))
bar_w = 0.65
p1 = ax.bar(x_idx, cons_users, bar_w, color=light, label='有课消用户', zorder=3)
p2 = ax.bar(x_idx, no_cons, bar_w, bottom=cons_users, color='#D0D0D0', label='无课消用户', zorder=3)
# 标注付费总数
for i, (p, c, n) in enumerate(zip(paid, cons_users, no_cons)):
if i % max(1, len(data)//12) == 0:
ax.annotate(str(p), (i, p), textcoords='offset points', xytext=(0, 6),
fontsize=8, ha='center', color='#333333', fontweight='bold')
ax.set_xticks(x_idx[::max(1, len(data)//12)])
ax.set_xticklabels([labels[i] for i in range(0, len(data), max(1, len(data)//12))], fontsize=9, rotation=45)
ax.set_ylabel('用户数', fontsize=13)
ax.set_title(f'{label} 付费用户周课消分布剔除U0序章', fontsize=16, fontweight='bold')
ax.legend(fontsize=12, loc='upper left')
ax.grid(axis='y', alpha=0.3, zorder=0)
ax.set_xlim(-0.5, len(x_idx) - 0.5)
# 无消率标注
no_rate = no_cons[-1] / paid[-1] * 100 if paid[-1] else 0
ax.text(0.97, 0.95, f'无课消率: {no_rate:.0f}%', transform=ax.transAxes,
fontsize=11, ha='right', va='top', color='#999999', fontstyle='italic')
plt.tight_layout()
path1 = f'{output_dir}/{key}_users_stack.png'
plt.savefig(path1, dpi=150, bbox_inches='tight', facecolor='white')
plt.close()
print(f'{path1}')
# --- 图2: 折线图 ---
fig, ax = plt.subplots(figsize=(18, 8))
ax.plot(xs, avg_all, 'o-', color='#999999', linewidth=2.2, markersize=5,
label='周人均课消(全部付费用户)', linestyle='--', markerfacecolor='white')
ax.plot(xs, avg_cons, 's-', color=color, linewidth=2.8, markersize=5,
label='周有消人均课消', markerfacecolor='white')
# 填色区域
ax.fill_between(xs, avg_all, avg_cons, alpha=0.08, color=color)
# 标注关键数据点
for i in range(len(xs)):
if i % max(1, len(data)//8) == 0:
ax.annotate(f'{avg_all[i]:.1f}', (xs[i], avg_all[i]), textcoords='offset points',
xytext=(0, -16), fontsize=7.5, color='#999999', ha='center')
ax.annotate(f'{avg_cons[i]:.1f}', (xs[i], avg_cons[i]), textcoords='offset points',
xytext=(0, 8), fontsize=7.5, color=color, ha='center', fontweight='bold')
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m/%d'))
ax.xaxis.set_major_locator(mdates.MonthLocator())
plt.setp(ax.xaxis.get_majorticklabels(), rotation=45, fontsize=9)
ax.set_ylabel('课消数(节/周)', fontsize=13)
ax.set_title(f'{label} 周人均课消趋势剔除U0序章', fontsize=16, fontweight='bold')
ax.legend(fontsize=12, loc='upper left')
ax.grid(True, alpha=0.3)
ax.set_xlim(date(2025, 8, 30), date(2026, 5, 12))
plt.tight_layout()
path2 = f'{output_dir}/{key}_avg_trend.png'
plt.savefig(path2, dpi=150, bbox_inches='tight', facecolor='white')
plt.close()
print(f'{path2}')
print('\n全部 4 张图表已生成!')

View File

@ -0,0 +1,218 @@
#!/usr/bin/env python3
"""
图表 v2L1付费用户 = 仅L1 + L1+L2L2付费用户 = 仅L2 + L1+L2
"""
import psycopg2
from collections import defaultdict
from datetime import datetime, timedelta, date
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.font_manager as fm
import numpy as np
fm.fontManager.addfont('/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc')
plt.rcParams['font.family'] = fm.FontProperties(fname='/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc').get_name()
plt.rcParams['axes.unicode_minus'] = False
conn = psycopg2.connect(
host="bj-postgres-16pob4sg.sql.tencentcdb.com",
port=28591, user="ai_member",
password="LdfjdjL83h3h3^$&**YGG*", dbname="vala_bi"
)
cur = conn.cursor()
u0_chapters = {55, 56, 57, 58, 59, 343, 344, 345, 346, 348}
overall_start = date(2025, 9, 1)
overall_end = date(2026, 5, 11)
weeks = []
d = overall_start
while d < overall_end:
ws = d
we = d + timedelta(days=6 - d.weekday())
if we >= overall_end: we = overall_end - timedelta(days=1)
weeks.append((ws, we))
d = we + timedelta(days=1)
print("分类付费用户...")
cur.execute("""
SELECT o.account_id, o.trade_no, o.order_status, o.pay_success_date,
CASE WHEN o.goods_id IN (57, 60, 63) THEN 'L1'
WHEN o.goods_id = 61 THEN 'L1+L2'
WHEN o.goods_id IN (31, 32, 33, 54) THEN 'L2'
ELSE '其他' END as level_type
FROM bi_vala_order o
INNER JOIN bi_vala_app_account a ON o.account_id = a.id
WHERE a.status = 1 AND a.deleted_at IS NULL AND o.pay_success_date IS NOT NULL
""")
orders = cur.fetchall()
cur.execute("SELECT trade_no FROM bi_refund_order WHERE status = 3")
refund_trades = set(r[0] for r in cur.fetchall())
user_levels = defaultdict(set)
user_orders = defaultdict(list)
for aid, trade_no, order_status, pay_date, lt in orders:
is_refunded = (order_status == 4 and trade_no in refund_trades)
user_levels[aid].add(lt)
user_orders[aid].append((pay_date.date(), is_refunded))
def is_paid(aid, as_of):
return sum(1 for pd, ref in user_orders[aid] if pd <= as_of and not ref) > 0
# 分组L1群 = 仅L1 + L1+L2L2群 = 仅L2 + L1+L2
l1_group = set() # 买了L1的所有用户
l2_group = set() # 买了L2的所有用户
for aid, levels in user_levels.items():
has_l1 = 'L1' in levels or 'L1+L2' in levels
has_l2 = 'L2' in levels or 'L1+L2' in levels
if has_l1: l1_group.add(aid)
if has_l2: l2_group.add(aid)
print(f"L1付费群: {len(l1_group)}人, L2付费群: {len(l2_group)}人, 重叠(L1+L2): {len(l1_group & l2_group)}")
print("查询课消...")
cons_map = {}
for ti in range(8):
tbl = f"bi_user_chapter_play_record_{ti}"
cur.execute(f"""SELECT user_id, chapter_id, updated_at FROM {tbl}
WHERE play_status = 1 AND updated_at >= '2025-09-01' AND updated_at < '2026-05-11'""")
for uid, cid, ua in cur.fetchall():
if cid in u0_chapters: continue
key = (uid, cid)
d = ua.date() if hasattr(ua, 'date') else datetime.strptime(str(ua)[:10], '%Y-%m-%d').date()
if key not in cons_map or d < cons_map[key]:
cons_map[key] = d
print("角色映射...")
all_uids = list(set(k[0] for k in cons_map))
char2acct = {}
for i in range(0, len(all_uids), 500):
batch = all_uids[i:i+500]
ph = ','.join(['%s'] * len(batch))
cur.execute(f"SELECT id, account_id FROM bi_vala_app_character WHERE id IN ({ph})", batch)
for cid, aid in cur.fetchall(): char2acct[cid] = aid
print("按周汇总...")
results = []
for ws, we in weeks:
# 截至 we 的付费用户
l1_paid = {aid for aid in l1_group if is_paid(aid, we)}
l2_paid = {aid for aid in l2_group if is_paid(aid, we)}
l1_cons, l1_cons_users = 0, set()
l2_cons, l2_cons_users = 0, set()
for (uid, ch_id), cons_date in cons_map.items():
if ws <= cons_date <= we:
aid = char2acct.get(uid)
if not aid: continue
if aid in l1_paid:
l1_cons += 1
l1_cons_users.add(aid)
if aid in l2_paid:
l2_cons += 1
l2_cons_users.add(aid)
results.append({
'ws': ws, 'we': we,
'L1_paid': len(l1_paid), 'L1_cons': l1_cons, 'L1_cons_users': len(l1_cons_users),
'L1_no_cons': len(l1_paid) - len(l1_cons_users),
'L1_avg_all': round(l1_cons / len(l1_paid), 2) if l1_paid else 0,
'L1_avg_cons': round(l1_cons / len(l1_cons_users), 2) if l1_cons_users else 0,
'L2_paid': len(l2_paid), 'L2_cons': l2_cons, 'L2_cons_users': len(l2_cons_users),
'L2_no_cons': len(l2_paid) - len(l2_cons_users),
'L2_avg_all': round(l2_cons / len(l2_paid), 2) if l2_paid else 0,
'L2_avg_cons': round(l2_cons / len(l2_cons_users), 2) if l2_cons_users else 0,
})
cur.close()
conn.close()
# ===== 生成图表 =====
print("\n生成图表...")
out = '/root/.openclaw/workspace/output'
configs = {
'L1_all': {'prefix': 'L1', 'color': '#4A90D9', 'light': '#A8CFF1', 'label': 'L1'},
'L2_all': {'prefix': 'L2', 'color': '#E85D47', 'light': '#F4A9A0', 'label': 'L2'},
}
for key, cfg in configs.items():
pfx = cfg['prefix']
color = cfg['color']
light = cfg['light']
label = cfg['label']
first = next(i for i, r in enumerate(results) if r[f'{pfx}_paid'] > 0)
data = results[first:]
xs = [r['ws'] + timedelta(days=3) for r in data]
dates = [r['ws'] for r in data]
paid = [r[f'{pfx}_paid'] for r in data]
cons_users = [r[f'{pfx}_cons_users'] for r in data]
no_cons = [r[f'{pfx}_no_cons'] for r in data]
avg_all = [r[f'{pfx}_avg_all'] for r in data]
avg_cons = [r[f'{pfx}_avg_cons'] for r in data]
# 图1: 堆叠柱状
fig, ax = plt.subplots(figsize=(18, 8))
x_idx = np.arange(len(xs))
bar_w = 0.65
ax.bar(x_idx, cons_users, bar_w, color=light, label='有课消用户', zorder=3)
ax.bar(x_idx, no_cons, bar_w, bottom=cons_users, color='#D0D0D0', label='无课消用户', zorder=3)
step = max(1, len(data)//10)
for i in range(0, len(data), step):
ax.annotate(str(paid[i]), (i, paid[i]), textcoords='offset points', xytext=(0, 5),
fontsize=7.5, ha='center', color='#333333', fontweight='bold')
ax.set_xticks(x_idx[::step])
ax.set_xticklabels([dates[i].strftime('%m/%d') for i in range(0, len(data), step)], fontsize=8.5, rotation=45)
ax.set_ylabel('用户数', fontsize=13)
ax.set_title(f'{label}付费用户周课消分布剔除U0序章', fontsize=16, fontweight='bold')
ax.legend(fontsize=12, loc='upper left')
ax.grid(axis='y', alpha=0.3, zorder=0)
ax.set_xlim(-0.5, len(x_idx) - 0.5)
no_rate = no_cons[-1] / paid[-1] * 100 if paid[-1] else 0
ax.text(0.97, 0.95, f'付费{paid[-1]}人 | 无课消率{no_rate:.0f}%', transform=ax.transAxes,
fontsize=11, ha='right', va='top', color='#666666', fontstyle='italic')
plt.tight_layout()
plt.savefig(f'{out}/{key}_users_stack.png', dpi=150, bbox_inches='tight', facecolor='white')
plt.close()
print(f'{key}_users_stack.png')
# 图2: 折线
fig, ax = plt.subplots(figsize=(18, 8))
ax.plot(xs, avg_all, 'o-', color='#999999', linewidth=2.2, markersize=5,
label='人均课消(全部付费用户)', markerfacecolor='white')
ax.plot(xs, avg_cons, 's-', color=color, linewidth=2.8, markersize=5,
label='人均课消(有课消用户)', markerfacecolor='white')
ax.fill_between(xs, avg_all, avg_cons, alpha=0.08, color=color)
for i in range(0, len(data), max(1, len(data)//8)):
ax.annotate(f'{avg_all[i]:.1f}', (xs[i], avg_all[i]), textcoords='offset points',
xytext=(0, -15), fontsize=7.5, color='#999999', ha='center')
ax.annotate(f'{avg_cons[i]:.1f}', (xs[i], avg_cons[i]), textcoords='offset points',
xytext=(0, 7), fontsize=7.5, color=color, ha='center', fontweight='bold')
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m/%d'))
ax.xaxis.set_major_locator(mdates.MonthLocator())
plt.setp(ax.xaxis.get_majorticklabels(), rotation=45, fontsize=9)
ax.set_ylabel('课消数(节/周)', fontsize=13)
ax.set_title(f'{label}付费用户周人均课消趋势剔除U0序章', fontsize=16, fontweight='bold')
ax.legend(fontsize=12, loc='upper left')
ax.grid(True, alpha=0.3)
ax.set_xlim(date(2025, 8, 30), date(2026, 5, 12))
plt.tight_layout()
plt.savefig(f'{out}/{key}_avg_trend.png', dpi=150, bbox_inches='tight', facecolor='white')
plt.close()
print(f'{key}_avg_trend.png')
print('\n✅ 4张图表已生成')

385
scripts/generate_excel.py Normal file
View File

@ -0,0 +1,385 @@
#!/usr/bin/env python3
"""
生成课消指标 Excel按周 + L1/L2 拆分
"""
import psycopg2
from collections import defaultdict
from datetime import datetime, timedelta, date
import openpyxl
from openpyxl.styles import Font, Alignment, PatternFill, Border, Side
from openpyxl.chart import LineChart, Reference
from openpyxl.utils import get_column_letter
from openpyxl.chart.label import DataLabelList
from openpyxl.chart.series import DataPoint
conn = psycopg2.connect(
host="bj-postgres-16pob4sg.sql.tencentcdb.com",
port=28591, user="ai_member",
password="LdfjdjL83h3h3^$&**YGG*", dbname="vala_bi"
)
cur = conn.cursor()
# ===== 时间参数 =====
overall_start = date(2025, 9, 1)
overall_end = date(2026, 5, 11)
weeks = []
d = overall_start
while d < overall_end:
ws = d
days_to_sunday = 6 - d.weekday()
we = d + timedelta(days=days_to_sunday)
if we >= overall_end:
we = overall_end - timedelta(days=1)
weeks.append((ws, we))
d = we + timedelta(days=1)
# ===== Step 1: 用户分类 =====
print("Step 1: 分类付费用户...")
cur.execute("""
SELECT o.account_id, o.trade_no, o.order_status, o.pay_success_date,
CASE WHEN o.goods_id IN (57, 60, 63) THEN 'L1'
WHEN o.goods_id = 61 THEN 'L1+L2'
WHEN o.goods_id IN (31, 32, 33, 54) THEN 'L2'
ELSE '其他' END as level_type
FROM bi_vala_order o
INNER JOIN bi_vala_app_account a ON o.account_id = a.id
WHERE a.status = 1 AND a.deleted_at IS NULL AND o.pay_success_date IS NOT NULL
""")
orders = cur.fetchall()
print(f" 订单数: {len(orders)}")
cur.execute("SELECT trade_no FROM bi_refund_order WHERE status = 3")
refund_trades = set(r[0] for r in cur.fetchall())
user_data = defaultdict(lambda: {'levels': set(), 'orders': []})
for aid, trade_no, order_status, pay_date, lt in orders:
is_refunded = (order_status == 4 and trade_no in refund_trades)
user_data[aid]['levels'].add(lt)
user_data[aid]['orders'].append((pay_date.date(), is_refunded, lt))
def classify_user(levels):
has_l1, has_l2 = 'L1' in levels, 'L2' in levels
return 'L1+L2' if ('L1+L2' in levels or (has_l1 and has_l2)) else ('仅L1' if has_l1 else ('仅L2' if has_l2 else '其他'))
for aid in user_data:
user_data[aid]['category'] = classify_user(user_data[aid]['levels'])
def is_paid_as_of(aid, as_of_date):
return sum(1 for pd, ref, lt in user_data[aid]['orders'] if pd <= as_of_date and not ref) > 0
# ===== Step 2: 课消 =====
print("Step 2: 查询课消...")
consumption_map = {}
for table_idx in range(8):
tbl = f"bi_user_chapter_play_record_{table_idx}"
cur.execute(f"""
SELECT user_id, chapter_id, updated_at
FROM {tbl}
WHERE play_status = 1 AND updated_at >= '2025-09-01' AND updated_at < '2026-05-11'
""")
for user_id, chapter_id, updated_at in cur.fetchall():
key = (user_id, chapter_id)
d = updated_at.date() if hasattr(updated_at, 'date') else datetime.strptime(str(updated_at)[:10], '%Y-%m-%d').date()
if key not in consumption_map or d < consumption_map[key]:
consumption_map[key] = d
print(f" 去重后: {len(consumption_map)}")
# ===== Step 3: 角色映射 =====
print("Step 3: 角色映射...")
all_uids = list(set(k[0] for k in consumption_map))
char2acct = {}
bs = 500
for i in range(0, len(all_uids), bs):
batch = all_uids[i:i+bs]
ph = ','.join(['%s'] * len(batch))
cur.execute(f"SELECT id, account_id FROM bi_vala_app_character WHERE id IN ({ph})", batch)
for cid, aid in cur.fetchall():
char2acct[cid] = aid
print(f" 映射: {len(char2acct)}")
# ===== Step 4: 按周汇总 =====
print("Step 4: 按周汇总...")
results = []
for ws, we in weeks:
paid_by_cat = defaultdict(set)
for aid in user_data:
if is_paid_as_of(aid, we):
paid_by_cat[user_data[aid]['category']].add(aid)
cons_by_cat = defaultdict(int)
cons_users_by_cat = defaultdict(set)
for (uid, ch_id), cons_date in consumption_map.items():
if ws <= cons_date <= we:
aid = char2acct.get(uid)
if aid:
cat = user_data.get(aid, {}).get('category', '其他')
if aid in paid_by_cat.get(cat, set()):
cons_by_cat[cat] += 1
cons_users_by_cat[cat].add(aid)
row = {'ws': ws, 'we': we}
for cat in ['仅L1', '仅L2', 'L1+L2', '其他', '合计']:
if cat == '合计':
n_paid = sum(len(v) for v in paid_by_cat.values())
n_cons = sum(cons_by_cat.values())
n_cons_users = len(set.union(*cons_users_by_cat.values())) if cons_users_by_cat else 0
else:
n_paid = len(paid_by_cat.get(cat, set()))
n_cons = cons_by_cat.get(cat, 0)
n_cons_users = len(cons_users_by_cat.get(cat, set()))
row[f'{cat}_paid'] = n_paid
row[f'{cat}_cons'] = n_cons
row[f'{cat}_cons_users'] = n_cons_users
row[f'{cat}_avg_all'] = round(n_cons / n_paid, 2) if n_paid > 0 else 0
row[f'{cat}_avg_cons'] = round(n_cons / n_cons_users, 2) if n_cons_users > 0 else 0
results.append(row)
cur.close()
conn.close()
# ===== 生成 Excel =====
print("\n生成 Excel...")
wb = openpyxl.Workbook()
# 样式
header_font = Font(name='微软雅黑', bold=True, size=10, color='FFFFFF')
header_fill = PatternFill(start_color='2F5496', end_color='2F5496', fill_type='solid')
data_font = Font(name='微软雅黑', size=10)
title_font = Font(name='微软雅黑', bold=True, size=14, color='2F5496')
subtitle_font = Font(name='微软雅黑', bold=True, size=11, color='2F5496')
border = Border(left=Side(style='thin'), right=Side(style='thin'), top=Side(style='thin'), bottom=Side(style='thin'))
center = Alignment(horizontal='center', vertical='center')
l1_fill = PatternFill(start_color='DAEEF3', end_color='DAEEF3', fill_type='solid')
l2_fill = PatternFill(start_color='FDE9D9', end_color='FDE9D9', fill_type='solid')
l1l2_fill = PatternFill(start_color='E4DFEC', end_color='E4DFEC', fill_type='solid')
total_fill = PatternFill(start_color='D9EAD3', end_color='D9EAD3', fill_type='solid')
def apply_cell(ws, row, col, value, font=data_font, fill=None, border_style=border, align=center):
c = ws.cell(row=row, column=col, value=value)
c.font, c.border, c.alignment = font, border_style, align
if fill: c.fill = fill
return c
def apply_header(ws, row, col, value):
c = ws.cell(row=row, column=col, value=value)
c.font, c.fill, c.border, c.alignment = header_font, header_fill, border, center
return c
# ===== Sheet 1: 概览 =====
ws1 = wb.active
ws1.title = "概览"
ws1.merge_cells('A1:G1')
apply_cell(ws1, 1, 1, "付费用户 L1/L2 课消分析", font=title_font, border_style=Border(), align=Alignment(horizontal='left'))
ws1.merge_cells('A2:G2')
apply_cell(ws1, 2, 1, f"数据区间: 2025-09-01 ~ 2026-05-10 | 更新日期: 2026-05-14", font=Font(name='微软雅黑', size=9, color='666666'), border_style=Border(), align=Alignment(horizontal='left'))
# 口径说明
notes = [
"口径说明:",
"• 课消用户首次完成某一课时play_status=1按(user_id,chapter_id)取最早updated_at",
"• L1商品: goods_id IN (57,60,63) | L2商品: goods_id IN (31,32,33,54) | L1+L2商品: goods_id=61",
"• 付费用户status=1 + deleted_at IS NULL + 有订单 + 未全部退款",
"• 人均课消 = 周内课消次数 / 付费用户数",
"• 有消用户人均 = 周内课消次数 / 至少完成1次课消的付费用户数",
]
for i, note in enumerate(notes):
apply_cell(ws1, 4+i, 1, note, font=Font(name='微软雅黑', size=9), border_style=Border(), align=Alignment(horizontal='left'))
# 汇总表
row = 11
ws1.merge_cells(f'A{row}:K{row}')
apply_cell(ws1, row, 1, "付费用户分类(截至最后一周)", font=subtitle_font, border_style=Border(), align=Alignment(horizontal='left'))
row += 1
headers_summary = ['分类', '付费用户数', '占比']
for j, h in enumerate(headers_summary, 1):
apply_header(ws1, row, j, h)
row += 1
last = results[-1]
cats_data = [('仅L1', last['仅L1_paid']), ('仅L2', last['仅L2_paid']), ('L1+L2', last['L1+L2_paid'])]
total = sum(v for _, v in cats_data)
for cat, v in cats_data:
apply_cell(ws1, row, 1, cat)
apply_cell(ws1, row, 2, v)
apply_cell(ws1, row, 3, f"{v/total*100:.1f}%")
if '仅L1' in cat: fill = l1_fill
elif '仅L2' in cat: fill = l2_fill
else: fill = l1l2_fill
for c in range(1, 4): ws1.cell(row=row, column=c).fill = fill
row += 1
apply_cell(ws1, row, 1, '合计', font=Font(name='微软雅黑', bold=True, size=10))
apply_cell(ws1, row, 2, total, font=Font(name='微软雅黑', bold=True, size=10))
apply_cell(ws1, row, 3, '100%', font=Font(name='微软雅黑', bold=True, size=10))
for c in range(1, 4): ws1.cell(row=row, column=c).fill = total_fill
# 近期趋势摘要
row += 2
ws1.merge_cells(f'A{row}:K{row}')
apply_cell(ws1, row, 1, "近期人均课消趋势", font=subtitle_font, border_style=Border(), align=Alignment(horizontal='left'))
row += 1
trend_headers = ['', '合计人均', '仅L1人均', '仅L2人均', 'L1+L2人均', '合计有消人均', '仅L1有消人均', '仅L2有消人均', 'L1+L2有消人均']
for j, h in enumerate(trend_headers, 1):
apply_header(ws1, row, j, h)
row += 1
for r in results[-8:]: # 最近8周
wl = f"{r['ws'].strftime('%m/%d')}-{r['we'].strftime('%m/%d')}"
apply_cell(ws1, row, 1, wl, font=Font(name='微软雅黑', size=9))
apply_cell(ws1, row, 2, r['合计_avg_all'], font=Font(name='微软雅黑', size=9))
apply_cell(ws1, row, 3, r['仅L1_avg_all'], font=Font(name='微软雅黑', size=9))
apply_cell(ws1, row, 4, r['仅L2_avg_all'], font=Font(name='微软雅黑', size=9))
apply_cell(ws1, row, 5, r['L1+L2_avg_all'], font=Font(name='微软雅黑', size=9))
apply_cell(ws1, row, 6, r['合计_avg_cons'], font=Font(name='微软雅黑', size=9))
apply_cell(ws1, row, 7, r['仅L1_avg_cons'], font=Font(name='微软雅黑', size=9))
apply_cell(ws1, row, 8, r['仅L2_avg_cons'], font=Font(name='微软雅黑', size=9))
apply_cell(ws1, row, 9, r['L1+L2_avg_cons'], font=Font(name='微软雅黑', size=9))
row += 1
# 列宽
for col in range(1, 10):
ws1.column_dimensions[get_column_letter(col)].width = 14
# ===== Sheet 2: 明细 =====
ws2 = wb.create_sheet("每周明细")
# 标题行
row2 = 1
# 第一部分:付费用户数
group_headers = [
('付费用户数', ['合计', '仅L1', '仅L2', 'L1+L2']),
('课消次数', ['合计', '仅L1', '仅L2', 'L1+L2']),
('有课消用户数', ['合计', '仅L1', '仅L2', 'L1+L2']),
('人均课消(全部付费用户)', ['合计', '仅L1', '仅L2', 'L1+L2']),
('人均课消(有课消用户)', ['合计', '仅L1', '仅L2', 'L1+L2']),
]
apply_header(ws2, row2, 1, '')
apply_header(ws2, row2, 2, '周一起')
apply_header(ws2, row2, 3, '周日')
col = 4
spans = []
for grp_name, cols in group_headers:
start_col = col
for _ in cols:
col += 1
end_col = col - 1
if start_col < end_col:
ws2.merge_cells(start_row=row2, start_column=start_col, end_row=row2, end_column=end_col)
apply_header(ws2, row2, start_col, grp_name)
spans.append((start_col, end_col, grp_name, cols))
for ic, cname in enumerate(cols):
apply_header(ws2, row2+1, start_col+ic, cname)
col_count = col - 1
# 数据
row2 = 3
for r in results:
wl = f"{r['ws'].strftime('%m/%d')}-{r['we'].strftime('%m/%d')}"
apply_cell(ws2, row2, 1, wl)
apply_cell(ws2, row2, 2, r['ws'].strftime('%Y-%m-%d'))
apply_cell(ws2, row2, 3, r['we'].strftime('%Y-%m-%d'))
col = 4
for grp_name, cols in group_headers:
for cname in cols:
key_map = {
'付费用户数': f"{cname}_paid",
'课消次数': f"{cname}_cons",
'有课消用户数': f"{cname}_cons_users",
'人均课消(全部付费用户)': f"{cname}_avg_all",
'人均课消(有课消用户)': f"{cname}_avg_cons",
}
val = r[key_map[grp_name]]
apply_cell(ws2, row2, col, val)
col += 1
row2 += 1
# 列宽
ws2.column_dimensions['A'].width = 14
ws2.column_dimensions['B'].width = 12
ws2.column_dimensions['C'].width = 12
for ci in range(4, col_count + 1):
ws2.column_dimensions[get_column_letter(ci)].width = 10
# 冻结首3列+标题
ws2.freeze_panes = 'D4'
# ===== 图表 =====
chart_sheet = wb.create_sheet("图表")
# Chart 1: 人均课消趋势(按类别)
chart1 = LineChart()
chart1.title = "人均课消数(全部付费用户)"
chart1.style = 10
chart1.y_axis.title = "课消数(节/周)"
chart1.x_axis.title = None
chart1.width = 28
chart1.height = 14
chart1.y_axis.scaling.min = 0
data_row_start = 3
data_row_end = row2 - 1
# Categories (周标签)
cats_ref = Reference(ws2, min_col=1, min_row=data_row_start, max_row=data_row_end)
# 各系列列号(人均课消 - 全部付费用户 section
# 合计: col 16, 仅L1: col 17, 仅L2: col 18, L1+L2: col 19
# 需要先确定列号
header_row = 2
grp_col_map = {}
col = 4
for grp_name, cols in group_headers:
grp_col_map[grp_name] = col
col += len(cols)
# 人均课消(全部): group 4, 从 grp_col_map['人均课消(全部付费用户)']
start_avg = grp_col_map['人均课消(全部付费用户)']
colors = ['333333', '4A90D9', 'E85D47', '7B9E4B']
labels = ['合计', '仅L1', '仅L2', 'L1+L2']
for i in range(4):
ref = Reference(ws2, min_col=start_avg+i, min_row=data_row_start-1, max_row=data_row_end) # -1 for header in row2
chart1.add_data(ref, titles_from_data=True)
chart1.set_categories(cats_ref)
s = chart1.series[i]
s.graphicalProperties.line.solidFill = colors[i]
s.graphicalProperties.line.width = 25000 if i == 0 else 20000
if i > 0:
s.graphicalProperties.line.dashStyle = 'solid'
chart_sheet.add_chart(chart1, "A1")
# Chart 2: 付费用户数增长
chart2 = LineChart()
chart2.title = "付费用户数增长趋势"
chart2.style = 10
chart2.y_axis.title = "用户数"
chart2.width = 28
chart2.height = 14
start_paid = grp_col_map['付费用户数']
for i in range(4):
ref = Reference(ws2, min_col=start_paid+i, min_row=data_row_start-1, max_row=data_row_end)
chart2.add_data(ref, titles_from_data=True)
chart2.set_categories(cats_ref)
s = chart2.series[i]
s.graphicalProperties.line.solidFill = colors[i]
s.graphicalProperties.line.width = 25000 if i == 0 else 20000
chart_sheet.add_chart(chart2, "A18")
# ===== 保存 =====
path = '/root/.openclaw/workspace/output/course_consumption_by_level.xlsx'
wb.save(path)
print(f"\n✅ Excel 已保存: {path}")
print(f" Sheet 1: 概览(口径说明 + 近期趋势)")
print(f" Sheet 2: 每周明细36周完整数据")
print(f" Sheet 3: 图表(人均课消趋势 + 付费用户增长)")