🤖 每日自动备份 - 2026-05-13 08:00:01

This commit is contained in:
小溪 2026-05-13 08:00:01 +08:00
parent 4ec91e964a
commit 7447ed7c16
6 changed files with 719 additions and 6 deletions

View File

@ -102,12 +102,23 @@
- **端内订单条件:** `key_from IN ('app-active-h5-0-0', 'app-sales-bj-qhm-0')`, `pay_success_date IS NOT NULL`, `order_status IN (3, 4)`
- **端外订单条件:** `key_from NOT IN 端内`, `pay_success_date IS NOT NULL`, `order_status = 3`
- 基于纯净版分母,转化率 / 7日 / 14日 / 20日转化率的口径不变只是分母缩小为纯净版用户
- **拟合版转化率 [李承龙确认 2026-05-11]**
- **分母:** 用 LOESS 回归拟合每日新增注册人数基线,剔除营销活动带来的注册量尖峰后的有效注册人数
- **方法:** 活动日及余波日 → 用拟合值替代实际值;非活动日且实际低于拟合 → 保留实际值(保底规则)
- **拟合版转化率 [李承龙确认 2026-05-112026-05-12 补充实现细节]**
- **目的:** 剔除营销活动带来的注册量尖峰,反映「去噪」后的真实转化效率
- **三版关系:** 原始版 < 纯净版 < 拟合版分母逐层收紧转化率递增
- **分母计算5步法**
1. **LOESS 拟合** — 仅用清洁日(非活动日+非余波日)的每日注册人数做 LOESS 回归frac≈0.236),得到自然增长基线
2. **星期因子修正** — 基于清洁日计算每周每日平均注册量与全局均值的比值,修正 LOESS 基线(周末注册量通常高于工作日,因子范围约 0.85~1.25
3. **活动日+余波日** — 用星期修正后的 LOESS 拟合值替代实际注册人数(压低活动带来的虚增)
4. **非活动日** — 保留实际注册人数(不压低,非活动日的注册是「真实」的)
5. **月度汇总** — 将每日有效注册人数按月加总,得到拟合版分母
- **活动日历(活动日+余波日,[李承龙确认]**
- 2025年9/9-10, 9/19-23, 10/13-14, 10/16-17, 11/2, 11/7, 11/10, 11/12, 11/19, 12/3
- 2026年1/28-29(余波1天), 2/11, 2/26-3/2(余波4天), 3/5-8(余波3天), 3/9, 3/12-13, 4/3-7(余波4天), 4/8-10(余波2天), 4/22-23(余波1天), 4/28, 5/6-7
- 共 45 个活动/余波日254天中占18%
- 余波日:活动日后 N 天内仍有注册溢出效应,一并纳入替换范围
- **不考虑端外订单:** 拟合版分母直接使用拟合有效注册人数,不额外剔除端外-only用户
- **分子:** 端内付费用户数,口径与原始版一致(`key_from IN 端内`, `order_status IN (3,4)`
- 拟合版更大程度「去噪」,转化率数字更高、波动更明显,反映剔除活动低质流量后的真实转化效率
- **分子:** 端内付费用户数,口径与原始版一致(`key_from IN 端内`, `order_status IN (3,4)`,剔除端内订单全部退费的用户
- **拟合版分母参考值2025-09~2026-05** 9月966 / 10月1992 / 11月2541 / 12月3430 / 1月1789 / 2月1285 / 3月2938 / 4月3358 / 5月869
- **关键词订单统计规则:** 当查询形如"XX卖了多少单/XX渠道销量"XX为特定名称/关键词/渠道需同时返回四个指标订单总数量、GMV、GSV、退费率。
1. 统计逻辑:筛选`bi_vala_order`表中`key_from`字段包含该关键词的所有订单
2. 指标说明:

View File

@ -1,6 +1,6 @@
{
"version": 1,
"updatedAt": "2026-05-11T02:51:50.465Z",
"updatedAt": "2026-05-12T07:45:29.297Z",
"entries": {
"memory:memory/2026-05-06.md:1:20": {
"key": "memory:memory/2026-05-06.md:1:20",
@ -94,6 +94,38 @@
"skill-builder",
"first"
]
},
"memory:memory/2026-05-11.md:41:49": {
"key": "memory:memory/2026-05-11.md:41:49",
"path": "memory/2026-05-11.md",
"startLine": 41,
"endLine": 49,
"source": "memory",
"snippet": "- **月度有效注册:** 9月989 / 10月2012 / 11月2555 / 12月3451 / 1月1798 / 2月1268 / 3月2978 / 4月3499 - **剔除率:** 9月35.3% / 10月16.6% / 11月14.0% / 12月2.0% / 1月7.2% / 2月27.3% / 3月28.5% / 4月38.3% - **拟合版转化率:** 9月1.72% / 10月1.69% / 11月0.82% / 12月0.72% / 1月1.50% / 2月1.26% / 3月2.69% / 4月1.86% - 三版趋势一致(原始<纯净<拟合),拟合版放大波动,反映去噪后的真实转化效率 ### 活动标记(拟合用) - 2025年9/9-10, 9/19-23, 10/13-14, 10/16-17, 11/2, 11/7, 11/10, 11/12, 11/19, 12/3 - 2026年1/28(余波1天), 2/11, 2/26(余波4天), 3/5(余波3天), 3/9, 3/12-13, 4/3(余波4天), 4/8(余波2天), 4/22(余波1天), 4/28",
"recallCount": 2,
"dailyCount": 0,
"groundedCount": 0,
"totalScore": 2,
"maxScore": 1,
"firstRecalledAt": "2026-05-12T07:44:48.437Z",
"lastRecalledAt": "2026-05-12T07:45:29.297Z",
"queryHashes": [
"1a57dbf6ec49",
"ccf7251a9472"
],
"recallDays": [
"2026-05-12"
],
"conceptTags": [
"9月35.3",
"10月16.6",
"11月14.0",
"12月2.0",
"1月7.2",
"2月27.3",
"3月28.5",
"4月38.3"
]
}
}
}

124
memory/2026-05-12-1511.md Normal file
View File

@ -0,0 +1,124 @@
# Session: 2026-05-12 15:11:44 GMT+8
- **Session Key**: agent:main:feishu:direct:ou_e63ce6b760ad39382852472f28fbe2a2
- **Session ID**: 79f968a4-e253-4482-8a33-d2f5aa30dcbb
- **Source**: feishu
## Conversation Summary
user: 之前我们做过一次新增注册用户的拟合,目的是剔除活动带来的新增注册,你还记得么
assistant: 找到了!让我看看之前的分析记录。
assistant: 记得。我们之前做的分析是用 **LOESS 回归**对每日新增注册人数拟合基线趋势,目的是剔除营销活动带来的注册量尖峰,得到「有效注册人数」。
核心逻辑:
1. **LOESS 平滑** — 用活动日以外的清洁数据拟合每日自然增长趋势线
2. **星期潮汐效应** — 按比例修正周末/工作日波动(周六日通常注册量更高)
3. **活动日标记** — 已知活动日及余波日的注册量用拟合值替代
4. **保底规则** — 实际值低于拟合值时保留实际值(只有活动带来的虚增才替换)
当时的结论2025-09~2026-04
| 月份 | 实际注册 | 有效注册 | 剔除 | 剔除率 |
|------|:---:|:---:|:---:|:---:|
| 2025-09 | — | — | — | — |
| 2025-10 | — | — | — | — |
| 2026-03 | — | 4,167 | — | — |
| 2026-04 | — | 4,283 | — | 15.2% |
承龙,你是想把「有效注册人数」也应用到我们今天的转化率计算中吗?就是把分母从「纯净版注册用户」进一步替换为「剔除活动影响的纯净版注册用户」?
user: 我们在做一套拟合版转化率,分母是拟合版的有效注册人数,这里面不需要考虑用户是否有端外的订单,直接用拟合出来的数量错分母即可
assistant: 明白,拟合版转化率的分母直接取拟合后的有效注册数,不管端外订单。我先拉每日注册数据、跑拟合,再算转化率。
assistant: 7D/14D/20D 的转化用户数因为用了月初日期而非用户注册日做窗口,需要修正。让我重新计算:
assistant: 数据修正完毕:
## 📊 拟合版转化率
| 月份 | 拟合注册 | Conv% | 7D% | 14D% | 20D% |
|------|:---:|:---:|:---:|:---:|:---:|
| 2025-09 | 989 | 1.72% | 0.40% | 0.51% | 0.81% |
| 2025-10 | 2,012 | 1.69% | 0.75% | 0.89% | 0.99% |
| 2025-11 | 2,555 | 0.82% | 0.43% | 0.55% | 0.63% |
| 2025-12 | 3,451 | 0.72% | 0.29% | 0.55% | 0.58% |
| 2026-01 | 1,798 | 1.50% | 0.89% | 1.06% | 1.28% |
| 2026-02 | 1,268 | 1.26% | 0.79% | 0.87% | 0.87% |
| **2026-03** | 2,978 | **2.69%** | 1.24% | 1.95% | 2.38% |
| 2026-04 | 3,499 | 1.86% | 1.26% | 1.71% | 1.80% |
### 三版对比
| 月份 | 原始版 | 纯净版 | 拟合版 |
|------|:---:|:---:|:---:|
| 2025-09 | 1.11% | 1.34% | **1.72%** |
| 2025-10 | 1.41% | 1.59% | 1.69% |
| 2025-11 | 0.71% | 0.74% | 0.82% |
| 2025-12 | 0.71% | 0.75% | 0.72% |
| 2026-01 | 1.39% | 1.49% | 1.50% |
| 2026-02 | 0.92% | 1.01% | 1.26% |
| 2026-03 | 1.92% | 2.15% | **2.69%** |
| 2026-04 | 1.15% | 1.35% | 1.86% |
---
三版趋势方向一致拟合版因为剔除了活动带来的低质流量转化率数字更高、波动也更明显。3月拟合版达到 2.69%,说明去噪后的真实转化效率相当不错。
需要写入记忆吗?
user: 写入记忆
assistant: 已写入。今天沉淀了三套转化率指标体系:
| 版本 | 分母 | 用途 |
|------|------|------|
| **原始版** | 全部注册用户 | 基础指标 |
| **纯净版** | 剔除端外-only用户 | 只看有端内转化潜力的人群 |
| **拟合版** | LOESS拟合有效注册 | 去活动噪声,反映真实转化效率 |
user: 三个版本的数据帮我算一个2025年九月至今的合计版
assistant: ## 📊 2025年9月~2026年5月 三版合计
| 版本 | 分母 | 转化用户 | 转化率 | 7日 | 14日 | 20日 |
|------|:---:|:---:|:---:|:---:|:---:|:---:|
| 原始版 | 24,805 | 292 | **1.18%** | 0.62% | 0.85% | 0.96% |
| 纯净版 | 22,321 | 292 | **1.31%** | 0.69% | 0.95% | 1.07% |
| 拟合版 | 19,397 | 292 | **1.51%** | 0.79% | 1.09% | 1.23% |
分子相同292人分母逐层收紧原始 24,805 → 纯净 22,321剔除端外-only→ 拟合 19,397再剔除活动噪声转化率从 1.18% 到 1.31% 再到 1.51%,去噪效果明显。
user: 帮我算一下keyfrom为app-active-h5-0-0的订单的平均价格
keyfrom为app-sales-bj-qhm-0的订单的平均价格
assistant: | 渠道 | 订单数 | 均价 | 中位数 | 最低 | 最高 |
|------|:---:|:---:|:---:|:---:|:---:|
| `app-active-h5-0-0` | 296 | ¥1,092.92 | ¥599.00 | ¥599.00 | ¥1,999.00 |
| `app-sales-bj-qhm-0` | 34 | ¥2,251.71 | ¥1,999.00 | ¥599.00 | ¥3,598.00 |
`app-sales-bj-qhm-0` 均价是中位数近 2 倍,且大多集中在 ¥1,999 档位——看起来是销售渠道推高客单价的产品。量不大但单笔价值明显更高。
user: 我想区分下载渠道看三个版本的转化率
assistant: ## 📊 分下载渠道 三版转化率2025-09 ~ 2026-05
### 原始版 & 纯净版
| 下载渠道 | 注册用户 | 纯净用户 | 转化 | 原始转化率 | 纯净转化率 | 20日转化率 |
|------|:---:|:---:|:---:|:---:|:---:|:---:|
| Apple App Store | 4,949 | 4,351 | 101 | **2.04%** | **2.32%** | 1.70% |
| 华为应用市场 | 3,095 | 2,699 | 57 | 1.84% | 2.11% | 1.49% |
| 应用宝 | 1,288 | 1,156 | 22 | 1.71% | 1.90% | 1.24% |
| 小米 | 1,218 | 1,102 | 20 | 1.64% | 1.81% | 1.31% |
| vivo | 430 | 405 | 10 | **2.33%** | **2.47%** | 1.86% |
| 荣耀 | 520 | 456 | 7 | 1.35% | 1.54% | 1.35% |
| 魅族 | 75 | 72 | 2 | 2.67% | 2.78% | 2.67% |
| 学而思 | 3,888 | 3,593 | 31 | 0.80% | 0.86% | 0.67% |
| 科大讯飞 | 5,080 | 5,038 | 24 | 0.47% | 0.48% | 0.41% |
| 步步高 | 336 | 304 | 5 | 1.49% | 1.64% | 1.19% |
| 作业帮 | 238 | 219 | 2 | 0.84% | 0.91% | 0.84% |
| 京东方 | 362 | 274 | 3 | 0.83% | 1.09% | 0.83% |
| oppo | 443 | 410 | 4 | 0.90% | 0.98% | 0.68% |
| 希沃 | 845 | 671 | 4 | 0.47% | 0.60% | 0.12% |
| 小度 | 609 | 604 | 0 | 0.00% | 0.00% | 0.00% |
| (空) | 1,396 | 934 | 0 | 0.00% | 0.00% | 0.00% |
| 官网 | 33 | 32 | 0 | 0.00% | 0.00% | 0.00% |
---
几个特征:
- **手机厂商渠道Apple/华为/vivo/小米/魅族)转化率明显高于学习机渠道**Apple 2.04%、vivo 2.33%
- **学习机品牌中只有步步高还行**1.49%学而思0.80%、科大讯飞0.47%转化很低但科大讯飞注册量最大5,080人
- **(空) 渠道 1,396 人零转化**——值得排查,可能是某些未归类的非目标用户来源
拟合版需要按渠道分别做 LOESS但小渠道每日数据太稀疏拟合效果不好。你觉得拟合版是统一用整体拟合比例套到各渠道还是主要渠道单独拟合、小渠道合并

Binary file not shown.

After

Width:  |  Height:  |  Size: 218 KiB

BIN
output/font_test.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

View File

@ -0,0 +1,546 @@
#!/usr/bin/env python3
"""
2025年9月至今 三个版本原始版/纯净版/拟合版转化率分析
维度整体 / 分渠道(download_channel) / 区分key_from
按月份分组
"""
import psycopg2
import pandas as pd
import numpy as np
from statsmodels.nonparametric.smoothers_lowess import lowess
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')
# ========== 数据库连接 ==========
conn = psycopg2.connect(
host='bj-postgres-16pob4sg.sql.tencentcdb.com',
port=28591,
user='ai_member',
password='LdfjdjL83h3h3^$&**YGG*',
dbname='vala_bi'
)
START_DATE = '2025-09-01'
END_DATE = '2026-05-13' # 含今天
print("=" * 70)
print("Step 1: 提取注册用户数据...")
# ========== 提取注册用户 ==========
reg_sql = f"""
SELECT
a.id AS account_id,
a.created_at::date AS reg_date,
a.download_channel
FROM bi_vala_app_account a
WHERE a.created_at >= '{START_DATE}'
AND a.created_at < '{END_DATE}'
AND a.status = 1
AND a.deleted_at IS NULL
"""
reg_df = pd.read_sql(reg_sql, conn)
reg_df['reg_date'] = pd.to_datetime(reg_df['reg_date'])
reg_df['reg_month'] = reg_df['reg_date'].dt.to_period('M')
print(f" 注册用户数: {len(reg_df)}")
print(f" 月份分布: {reg_df['reg_month'].value_counts().sort_index().to_dict()}")
# ========== 提取这些用户的订单 ==========
account_ids = reg_df['account_id'].unique()
print(f"\nStep 2: 提取订单数据... (共 {len(account_ids)} 个账号)")
# 分批查询
batch_size = 50000
all_orders = []
for i in range(0, len(account_ids), batch_size):
batch = account_ids[i:i+batch_size]
ids_str = ','.join([str(x) for x in batch])
order_sql = f"""
SELECT
o.account_id,
o.key_from,
o.sale_channel,
o.pay_success_date::date AS pay_date,
o.order_status,
o.pay_amount_int,
o.trade_no,
o.out_trade_no
FROM bi_vala_order o
WHERE o.account_id IN ({ids_str})
AND o.pay_success_date IS NOT NULL
"""
batch_df = pd.read_sql(order_sql, conn)
all_orders.append(batch_df)
if (i // batch_size + 1) % 5 == 0:
print(f" 已处理 {min(i+batch_size, len(account_ids))}/{len(account_ids)} 个账号...")
order_df = pd.concat(all_orders, ignore_index=True) if all_orders else pd.DataFrame()
print(f" 订单总数: {len(order_df)}")
if len(order_df) > 0:
print(f" key_from分布: {order_df['key_from'].value_counts().head(10).to_dict()}")
# ========== 提取退费信息 ==========
print(f"\nStep 3: 提取退费数据...")
if len(order_df) > 0:
trade_nos = order_df['trade_no'].dropna().unique()
out_trade_nos = order_df['out_trade_no'].dropna().unique()
all_refunds = []
for i in range(0, len(trade_nos), batch_size):
batch = trade_nos[i:i+batch_size]
ids_str = ','.join([f"'{x}'" for x in batch])
refund_sql = f"""
SELECT trade_no, out_trade_no, status
FROM bi_refund_order
WHERE trade_no IN ({ids_str}) AND status = 3
"""
try:
batch_df = pd.read_sql(refund_sql, conn)
all_refunds.append(batch_df)
except:
pass
refund_df = pd.concat(all_refunds, ignore_index=True) if all_refunds else pd.DataFrame()
print(f" 退费记录数: {len(refund_df)}")
else:
refund_df = pd.DataFrame()
conn.close()
# ========== 数据预处理 ==========
print(f"\nStep 4: 数据预处理...")
# 标记端内/端外
INTERNAL_KF = ['app-active-h5-0-0', 'app-sales-bj-qhm-0']
if len(order_df) > 0:
order_df['is_internal'] = order_df['key_from'].isin(INTERNAL_KF)
# 已完成订单 (status 3 or 4)
order_df['is_completed'] = order_df['order_status'].isin([3, 4])
# 退费标记
if len(refund_df) > 0:
refund_trade = set(refund_df['trade_no'].dropna())
order_df['is_refunded'] = order_df['trade_no'].isin(refund_trade)
else:
order_df['is_refunded'] = False
# 端内有效付费订单(已完成,含退费)
internal_orders = order_df[order_df['is_internal'] & order_df['is_completed']]
# 端外有效付费订单(已完成)
external_orders = order_df[~order_df['is_internal'] & (order_df['order_status'] == 3)]
# 按用户汇总
# 端内付费用户
internal_paid_users = set(internal_orders['account_id'].unique())
# 端外付费用户
external_paid_users = set(external_orders['account_id'].unique())
# 端内付费用户(剔除全部退费)
# 统计每个用户的端内订单退费情况
user_internal_orders = internal_orders.groupby('account_id').agg(
total_orders=('trade_no', 'count'),
refunded_orders=('is_refunded', 'sum')
).reset_index()
user_internal_orders['all_refunded'] = user_internal_orders['total_orders'] == user_internal_orders['refunded_orders']
internal_paid_users_norefund = set(user_internal_orders[~user_internal_orders['all_refunded']]['account_id'])
print(f" 端内付费用户数: {len(internal_paid_users)}")
print(f" 端内付费用户数(剔除全部退费): {len(internal_paid_users_norefund)}")
print(f" 端外付费用户数: {len(external_paid_users)}")
else:
internal_paid_users = set()
internal_paid_users_norefund = set()
external_paid_users = set()
# 给注册用户打标签
reg_df['has_internal'] = reg_df['account_id'].isin(internal_paid_users)
reg_df['has_internal_norefund'] = reg_df['account_id'].isin(internal_paid_users_norefund)
reg_df['has_external'] = reg_df['account_id'].isin(external_paid_users)
reg_df['has_no_order'] = ~reg_df['has_internal'] & ~reg_df['has_external']
# 纯净版:剔除"只有端外订单且没有端内订单"的用户
reg_df['is_clean_user'] = ~(
reg_df['has_external'] & ~reg_df['has_internal'] & ~reg_df['has_no_order']
)
# 简化:剔除只有端外订单的用户
reg_df['only_external'] = reg_df['has_external'] & ~reg_df['has_internal'] & ~reg_df['has_no_order']
# ========== LOESS 拟合 ==========
print(f"\nStep 5: LOESS拟合每日注册人数含活动日历...")
# ===== 活动日历 =====
# 2025年9/9-10, 9/19-23, 10/13-14, 10/16-17, 11/2, 11/7, 11/10, 11/12, 11/19, 12/3
# 2026年1/28(余波1天), 2/11, 2/26(余波4天), 3/5(余波3天), 3/9, 3/12-13,
# 4/3(余波4天), 4/8(余波2天), 4/22(余波1天), 4/28, 5/6-7
activity_ranges = [
('2025-09-09', '2025-09-10'),
('2025-09-19', '2025-09-23'),
('2025-10-13', '2025-10-14'),
('2025-10-16', '2025-10-17'),
('2025-11-02', '2025-11-02'),
('2025-11-07', '2025-11-07'),
('2025-11-10', '2025-11-10'),
('2025-11-12', '2025-11-12'),
('2025-11-19', '2025-11-19'),
('2025-12-03', '2025-12-03'),
# 2026
('2026-01-28', '2026-01-29'), # 余波1天
('2026-02-11', '2026-02-11'),
('2026-02-26', '2026-03-02'), # 余波4天
('2026-03-05', '2026-03-08'), # 余波3天
('2026-03-09', '2026-03-09'),
('2026-03-12', '2026-03-13'),
('2026-04-03', '2026-04-07'), # 余波4天
('2026-04-08', '2026-04-10'), # 余波2天
('2026-04-22', '2026-04-23'), # 余波1天
('2026-04-28', '2026-04-28'),
('2026-05-06', '2026-05-07'),
]
activity_dates = set()
for start_s, end_s in activity_ranges:
s = pd.Timestamp(start_s)
e = pd.Timestamp(end_s)
for d in pd.date_range(s, e):
activity_dates.add(d.date())
print(f" 活动+余波天数: {len(activity_dates)}")
daily_reg = reg_df.groupby('reg_date')['account_id'].count().reset_index()
daily_reg.columns = ['reg_date', 'reg_count']
daily_reg = daily_reg.sort_values('reg_date')
daily_reg['is_activity'] = daily_reg['reg_date'].apply(lambda d: d.date() in activity_dates)
# 星期因子(基于清洁日计算)
daily_reg['weekday'] = daily_reg['reg_date'].dt.dayofweek # 0=Mon
daily_reg['is_weekend'] = daily_reg['weekday'] >= 5
# 计算day_numLOESS拟合用
start_dt = pd.Timestamp(START_DATE)
daily_reg['day_num'] = (daily_reg['reg_date'] - start_dt).dt.days
# 清洁日 = 非活动+非余波日
clean_days = daily_reg[~daily_reg['is_activity']]
if len(clean_days) > 0:
overall_avg = clean_days['reg_count'].mean()
weekday_avg = clean_days.groupby('weekday')['reg_count'].mean()
daily_reg['weekday_factor'] = daily_reg['weekday'].map(
weekday_avg / overall_avg
).fillna(1.0)
else:
daily_reg['weekday_factor'] = 1.0
# LOESS拟合仅用清洁日
if len(clean_days) >= 5:
frac_val = min(0.3, 60.0 / len(daily_reg))
frac_val = max(frac_val, 0.1)
loess_result = lowess(
clean_days['reg_count'].values,
clean_days['day_num'].values,
frac=frac_val,
it=3
)
# 将LOESS结果映射回所有天用最近邻插值
loess_days = clean_days['day_num'].values
loess_vals = loess_result[:, 1]
# 对所有天用线性插值获取LOESS基线
all_loess = np.interp(daily_reg['day_num'].values, loess_days, loess_vals)
daily_reg['loess_baseline'] = all_loess
else:
daily_reg['loess_baseline'] = daily_reg['reg_count']
# 星期修正后的拟合值
daily_reg['corrected_fitted'] = daily_reg['loess_baseline'] * daily_reg['weekday_factor']
# 应用规则:活动日→拟合值;非活动日→保底规则 max(实际, 拟合)
daily_reg['fitted_reg'] = np.where(
daily_reg['is_activity'],
daily_reg['corrected_fitted'], # 活动日:用拟合值替换
daily_reg['reg_count'] # 非活动日:保留实际值
)
# 保底规则拟合值不能低于0
# daily_reg['fitted_reg'] = np.maximum(daily_reg['fitted_reg'], 0)
act_count = daily_reg['is_activity'].sum()
print(f" 清洁日: {len(clean_days)}, 活动/余波日: {act_count}")
print(f" LOESS frac: {frac_val:.4f}")
print(f" 星期因子范围: {daily_reg['weekday_factor'].min():.2f} ~ {daily_reg['weekday_factor'].max():.2f}")
# 打印拟合对比
print(f" {'月份':<8} {'原始':>6} {'拟合':>6} {'剔除':>6} {'剔除率':>7}")
from collections import defaultdict
month_fit = defaultdict(lambda: {'orig': 0, 'fit': 0})
for _, row in daily_reg.iterrows():
m = str(row['reg_date'].to_period('M'))
month_fit[m]['orig'] += row['reg_count']
month_fit[m]['fit'] += row['fitted_reg']
for m in sorted(month_fit.keys()):
o = month_fit[m]['orig']
f = month_fit[m]['fit']
d = o - f
r = d/o*100 if o > 0 else 0
print(f" {m:<8} {int(o):>6} {int(f):>6} {int(d):>6} {r:>6.1f}%")
# 月度汇总拟合值
daily_reg['reg_month'] = daily_reg['reg_date'].dt.to_period('M')
monthly_fitted = daily_reg.groupby('reg_month')['fitted_reg'].sum().reset_index()
monthly_fitted.columns = ['reg_month', 'fitted_total']
# ========== 计算转化率 ==========
print(f"\nStep 6: 计算各版本各维度转化率...")
results = []
# 月份列表
months = sorted(reg_df['reg_month'].unique())
for month in months:
month_users = reg_df[reg_df['reg_month'] == month]
# ---- 原始版 ----
denom_orig = len(month_users)
num_orig = month_users['has_internal_norefund'].sum()
# ---- 纯净版 ----
clean_users = month_users[~month_users['only_external']]
denom_clean = len(clean_users)
num_clean = clean_users['has_internal_norefund'].sum()
# ---- 拟合版 ----
fitted_row = monthly_fitted[monthly_fitted['reg_month'] == month]
denom_fitted = fitted_row['fitted_total'].values[0] if len(fitted_row) > 0 else denom_orig
num_fitted = month_users['has_internal_norefund'].sum() # 分子不变
# 整体维度
results.append({
'月份': str(month),
'维度': '整体',
'渠道': '全部',
'版本': '原始版',
'注册用户数': denom_orig,
'付费用户数': num_orig,
'转化率': f"{num_orig/denom_orig*100:.2f}%" if denom_orig > 0 else '0%'
})
results.append({
'月份': str(month),
'维度': '整体',
'渠道': '全部',
'版本': '纯净版',
'注册用户数': denom_clean if isinstance(denom_clean, (int, np.integer)) else int(denom_clean),
'付费用户数': num_clean,
'转化率': f"{num_clean/denom_clean*100:.2f}%" if denom_clean > 0 else '0%'
})
results.append({
'月份': str(month),
'维度': '整体',
'渠道': '全部',
'版本': '拟合版',
'注册用户数': int(round(denom_fitted)),
'付费用户数': num_fitted,
'转化率': f"{num_fitted/denom_fitted*100:.2f}%" if denom_fitted > 0 else '0%'
})
# ---- 分渠道(download_channel) ----
# 计算拟合版缩放比例
fitted_ratio = denom_fitted / denom_orig if denom_orig > 0 else 1.0
for ch, ch_users in month_users.groupby('download_channel'):
ch_label = ch if ch else '未知'
d_orig = len(ch_users)
n_orig = ch_users['has_internal_norefund'].sum()
# 纯净版
ch_clean = ch_users[~ch_users['only_external']]
d_clean = len(ch_clean)
n_clean = ch_clean['has_internal_norefund'].sum()
# 拟合版(按整体拟合比例缩放)
d_fitted = int(round(d_orig * fitted_ratio))
results.append({
'月份': str(month),
'维度': 'download_channel',
'渠道': ch_label,
'版本': '原始版',
'注册用户数': d_orig,
'付费用户数': n_orig,
'转化率': f"{n_orig/d_orig*100:.2f}%" if d_orig > 0 else '0%'
})
results.append({
'月份': str(month),
'维度': 'download_channel',
'渠道': ch_label,
'版本': '纯净版',
'注册用户数': d_clean,
'付费用户数': n_clean,
'转化率': f"{n_clean/d_clean*100:.2f}%" if d_clean > 0 else '0%'
})
results.append({
'月份': str(month),
'维度': 'download_channel',
'渠道': ch_label,
'版本': '拟合版',
'注册用户数': d_fitted,
'付费用户数': n_orig,
'转化率': f"{n_orig/d_fitted*100:.2f}%" if d_fitted > 0 else '0%'
})
print(f" 整体维度结果: {len([r for r in results if r['维度']=='整体'])}")
print(f" download_channel维度结果: {len([r for r in results if r['维度']=='download_channel'])}")
# ---- 区分 key_from ----
# 按注册月份 × key_from 统计付费用户
if len(order_df) > 0:
# 端内已完成的订单
internal_completed = order_df[order_df['is_internal'] & order_df['is_completed']]
# 关联用户注册月份
user_month_map = reg_df[['account_id', 'reg_month']].drop_duplicates('account_id')
internal_completed = internal_completed.merge(user_month_map, on='account_id', how='inner')
# 剔除全部退费的用户
internal_completed['is_refunded_order'] = internal_completed['is_refunded']
user_refund_stat = internal_completed.groupby('account_id').agg(
total=('trade_no', 'count'),
refunded=('is_refunded_order', 'sum')
).reset_index()
user_refund_stat['all_refunded'] = user_refund_stat['total'] == user_refund_stat['refunded']
all_refunded_users = set(user_refund_stat[user_refund_stat['all_refunded']]['account_id'])
internal_completed_norefund = internal_completed[~internal_completed['account_id'].isin(all_refunded_users)]
# 按 reg_month × key_from 统计付费用户数
kf_stats = internal_completed_norefund.groupby(['reg_month', 'key_from'])['account_id'].nunique().reset_index()
kf_stats.columns = ['reg_month', 'key_from', 'paid_users']
# 各月份注册人数(原始+纯净+拟合)
month_denom = reg_df.groupby('reg_month')['account_id'].nunique().reset_index()
month_denom.columns = ['reg_month', 'total_users']
# 纯净版每月分母
clean_reg = reg_df[~reg_df['only_external']]
month_clean = clean_reg.groupby('reg_month')['account_id'].nunique().reset_index()
month_clean.columns = ['reg_month', 'clean_users']
month_denom = month_denom.merge(month_clean, on='reg_month', how='left')
month_denom['clean_users'] = month_denom['clean_users'].fillna(0).astype(int)
# 拟合版每月分母
month_denom = month_denom.merge(monthly_fitted, on='reg_month', how='left')
month_denom['fitted_total'] = month_denom['fitted_total'].fillna(month_denom['total_users']).round(0).astype(int)
kf_stats = kf_stats.merge(month_denom, on='reg_month', how='left')
for _, row in kf_stats.iterrows():
denom_orig = int(row['total_users'])
paid = int(row['paid_users'])
denom_clean = int(row['clean_users'])
denom_fitted = int(row['fitted_total'])
results.append({
'月份': str(row['reg_month']),
'维度': 'key_from',
'渠道': row['key_from'],
'版本': '原始版',
'注册用户数': denom_orig,
'付费用户数': paid,
'转化率': f"{paid/denom_orig*100:.2f}%" if denom_orig > 0 else '0%'
})
results.append({
'月份': str(row['reg_month']),
'维度': 'key_from',
'渠道': row['key_from'],
'版本': '纯净版',
'注册用户数': denom_clean,
'付费用户数': paid,
'转化率': f"{paid/denom_clean*100:.2f}%" if denom_clean > 0 else '0%'
})
results.append({
'月份': str(row['reg_month']),
'维度': 'key_from',
'渠道': row['key_from'],
'版本': '拟合版',
'注册用户数': denom_fitted,
'付费用户数': paid,
'转化率': f"{paid/denom_fitted*100:.2f}%" if denom_fitted > 0 else '0%'
})
# 端外汇总
external_completed = order_df[(~order_df['is_internal']) & (order_df['order_status'] == 3)]
external_completed = external_completed.merge(user_month_map, on='account_id', how='inner')
ext_stats = external_completed.groupby('reg_month')['account_id'].nunique().reset_index()
ext_stats.columns = ['reg_month', 'paid_users']
ext_stats = ext_stats.merge(month_denom, on='reg_month', how='left')
for _, row in ext_stats.iterrows():
denom_orig = int(row['total_users'])
paid = int(row['paid_users'])
denom_clean = int(row['clean_users'])
denom_fitted = int(row['fitted_total'])
results.append({
'月份': str(row['reg_month']),
'维度': 'key_from',
'渠道': '端外合计',
'版本': '原始版',
'注册用户数': denom_orig,
'付费用户数': paid,
'转化率': f"{paid/denom_orig*100:.2f}%" if denom_orig > 0 else '0%'
})
results.append({
'月份': str(row['reg_month']),
'维度': 'key_from',
'渠道': '端外合计',
'版本': '纯净版',
'注册用户数': denom_clean,
'付费用户数': paid,
'转化率': f"{paid/denom_clean*100:.2f}%" if denom_clean > 0 else '0%'
})
results.append({
'月份': str(row['reg_month']),
'维度': 'key_from',
'渠道': '端外合计',
'版本': '拟合版',
'注册用户数': denom_fitted,
'付费用户数': paid,
'转化率': f"{paid/denom_fitted*100:.2f}%" if denom_fitted > 0 else '0%'
})
print(f" key_from维度结果: {len([r for r in results if r['维度']=='key_from'])}")
# ========== 输出到Excel ==========
result_df = pd.DataFrame(results)
print(f"\n总结果数: {len(result_df)}")
# 按维度排序
dim_order = {'整体': 0, 'download_channel': 1, 'key_from': 2}
result_df['dim_sort'] = result_df['维度'].map(dim_order)
result_df = result_df.sort_values(['月份', 'dim_sort', '版本', '渠道']).drop(columns=['dim_sort'])
output_path = '/root/.openclaw/workspace/output/conversion_rate_202509_202605.xlsx'
with pd.ExcelWriter(output_path, engine='openpyxl') as writer:
# Sheet1: 全部数据
result_df.to_excel(writer, sheet_name='全部数据', index=False)
# 各维度分sheet
for dim in ['整体', 'download_channel', 'key_from']:
dim_df = result_df[result_df['维度'] == dim].copy()
dim_df = dim_df.drop(columns=['维度'])
sheet_name = dim[:31] # Excel sheet name limit
dim_df.to_excel(writer, sheet_name=sheet_name, index=False)
# 透视表:整体(原始版 vs 纯净版 vs 拟合版)
overall_df = result_df[result_df['维度'] == '整体'].copy()
pivot = overall_df.pivot_table(
index='月份', columns='版本', values='转化率', aggfunc='first'
)
pivot.to_excel(writer, sheet_name='整体对比')
print(f"\n输出文件: {output_path}")
print("Done!")