diff --git a/memory/2026-06-08-1517.md b/memory/2026-06-08-1517.md new file mode 100644 index 0000000..5fe1e98 --- /dev/null +++ b/memory/2026-06-08-1517.md @@ -0,0 +1,86 @@ +# Session: 2026-06-08 15:17:08 GMT+8 + +- **Session Key**: agent:main:feishu:direct:ou_e63ce6b760ad39382852472f28fbe2a2 +- **Session ID**: eaac1a29-586f-4c15-a7ff-8bebd52407ac +- **Source**: feishu + +## Conversation Summary + +assistant: 现在重新运行: + + +assistant: 已更新,运行结果: + +| 指标 | 之前 | 现在 | +|------|------|------| +| 分母 | 7(仅匹配手机号) | 16(全部行) | +| 转化率 | 28.6% | 12.5% | +| 退费率 | 0.0% | 0.0% | + +改动: +- **分母**:sheet 中所有行都纳入统计,不再只统计匹配到手机号的行 +- **未匹配行**:转化状态读取 F 列已有值(无手机号的行由销售手动标注) +- **转化率口径**:`已转化leads / 总leads`(不再剔除退费) +user: 另外,在匹配订单的时候,所有端内的订单都要进行匹配,也就是三个keyfrom都有算进来 +assistant: 脚本中已经是这样配置的,`INNER_CHANNELS` 包含了三个端内渠道: + +```python +INNER_CHANNELS = [ + "app-active-h5-0-0", + "app-sales-bj-qhm-0", + "app-sales-bj-wd-0", +] +``` + +`query_conversion` 查询转化信息时用的就是 `WHERE o.key_from IN ({channels_str})`,三个渠道的订单都会参与匹配,没有遗漏。 +user: 那你查一下29956这个用户有没有转化 +assistant: 用户 29956 有转化,明细如下: + +| 字段 | 值 | +|------|-----| +| 订单ID | 5799 | +| 交易号 | 430003087746530 | +| 渠道 | `app-active-h5-0-0`(端内) | +| 支付时间 | 2026-06-07 13:45 | +| 金额 | ¥599 | +| 状态 | 已完成 | +| 是否退费 | 否 | + +这笔订单渠道是 `app-active-h5-0-0`,属于端内三个渠道之一,脚本的 `query_conversion` 会匹配到。如果这个用户在吴迪的 sheet 里且手机号匹配成功,F 列应该已经回写为「是」。 +user: 那你看一下f列写的事情 +assistant: F 列写的是「否」,但我们刚查到 29956 在 6月7日有一笔端内订单(5799,`app-active-h5-0-0`,¥599,已完成)。这说明脚本的转化查询可能有问题,我来排查一下。 + + +assistant: 查询确认 29956 有端内订单(6月7日,`app-active-h5-0-0`,¥599),但 F 列却是「否」。这不对,我重新跑一次脚本看看。 + + + + +assistant: F 列还是「否」,但数据库确认有转化。加调试看看写入的数据。 + + + + +assistant: 找到根因了。`pg_query` 返回的行只有 3 个元素 `['29956', '2026-06-07', 'f']`,因为 `first_refund_date` 为 NULL 时 psql 不输出尾部空字段。但 `query_conversion` 里判断 `len(row) >= 4` 才处理,所以整行被跳过了。 + + + + +assistant: 已修复。F 列现在是「是」,G 列是 `2026-06-07`。 + +**根因:** `query_conversion` 里 SQL 返回 4 列,但用户没有退费时 `first_refund_date` 为 NULL,psql 不输出尾部空字段,导致行只有 3 个元素。原代码 `if len(row) >= 4` 把所有未退费用户的转化记录都跳过了。 + +**修复:** 改为 `len(row) >= 3` 即可处理,第 4 列可选。 + +转化率也从 12.5% 更新为 18.8%(3/16)。 +user: 看一下sheet曲慧萌是不是空的,为什么统计里面还有曲慧萌的统计数据 +assistant: 曲慧萌 sheet 确实是空的(0 行),但统计 sheet 第 3 行有旧数据「曲慧萌 2026-05」。问题在于 `write_all_stats` 只追加写入,不清除旧行。修复一下。 + + + + +assistant: 已修复。统计 sheet 现在只有「吴迪 2026-06」一行,旧的「曲慧萌 2026-05」已清除。 + +**原因:** 脚本之前只追加写入,不清除旧行。曲慧萌 sheet 是空的(0 行数据),但之前跑过的旧统计残留了下来。 + +**修复:** 写入统计前先清空 A2:N50 区域,避免残留。 diff --git a/memory/2026-06-08-1608.md b/memory/2026-06-08-1608.md new file mode 100644 index 0000000..0063fdf --- /dev/null +++ b/memory/2026-06-08-1608.md @@ -0,0 +1,115 @@ +# Session: 2026-06-08 16:08:24 GMT+8 + +- **Session Key**: agent:main:feishu:direct:ou_e63ce6b760ad39382852472f28fbe2a2 +- **Session ID**: d4693f57-ecd6-4d66-b7e4-5d69f293dc60 +- **Source**: feishu + +## Conversation Summary + +user: https://makee-interactive.feishu.cn/sheets/NoZqsFi47hIOHEt9j8WcfRtbnug这个表是你创建的,你应该具有完整的读和写的权限,现在我们要针对sheet“小龙”“吴迪”“成都”“订单汇总”,创建一个脚本。 +首先,对于sheet“小龙”“吴迪”“成都”,我需要你读取E列的手机号,然后通过XXTEA加密,去用户表里面通过加密手机号进行关联匹配,然后填上H列到V列的数据,第V列为操作更新的时间。 +然后将上述三个sheet里面有下单的用户,整理到sheet“订单汇总”中。 +有任何问题可以向我提。先不要动表格,先生成脚本,并且向我解释表格里面每一列你的理解和计算口径。 +assistant: Let我先读取表格结构和现有的加密脚本,理解清楚后再动手。 + + +assistant: Let我先读取四个 sheet 的表头和几行数据,理解列结构。 + + +assistant: 现在我对表格结构有了清晰的理解。让我再看看成都 sheet 的数据和现有的 phone_encrypt 依赖。 + + +assistant: 现在我已经完整理解了表格结构和现有逻辑。让我先确认一下 XXTEA 加密依赖是否可用,然后生成脚本。 + + +assistant: 现在我对表格结构、现有逻辑和加密方式都有了完整理解。让我生成脚本。 + + +assistant: 文件被截断了,让我继续完成脚本。 + + +assistant: 脚本被截断了,让我重新完整写入。 + + +assistant: 脚本已生成并通过语法检查。下面是完整的列说明。 + +--- + +## 📊 表格列结构说明 + +四个 sheet 的 A~V 列结构完全一致(共 22 列),订单汇总额外有 W/X 公式列。 + +### 👤 手填列(销售手动填写,脚本不动) + +| 列 | 名称 | 说明 | +|----|------|------| +| **A** | 销售归属 | 销售姓名:小龙 / 吴迪 / Tom / Bob | +| **B** | 微信昵称 | 用户微信昵称 | +| **C** | 进线日期 | 线索进入日期,格式如 `6月7日`,脚本会解析为 `2026-06-07` 用于日期比较 | +| **E** | 手机号 | 11 位明文手机号,**脚本的核心输入** | +| **F** | 用户年级 | 如"大班""一年级" | +| **G** | 课史/跟进 | 课程历史/跟进备注,**脚本不动此列** | + +### 🤖 自动列(脚本计算填写) + +| 列 | 名称 | 计算口径 | +|----|------|----------| +| **D** | 体验节数 | 从 `bi_user_course_detail` 统计 `expire_time IS NULL`(体验课)的记录数。**只补空,不覆盖已有值** | +| **H** | 用户ID | E 列手机号 → XXTEA 加密 → `bi_vala_app_account.tel_encrypt` 精确匹配 → 取 `id`。查不到留空 | +| **I** | 注册日期 | `bi_vala_app_account.created_at`,格式 `YYYY-MM-DD`。**只补空** | +| **J** | 下载渠道 | `bi_vala_app_account.download_channel`,经中文映射表转换(如 `Apple App Store` → `苹果`)。**只补空** | +| **K** | 是否下单 | 三个条件同时满足才写"是":①有支付成功订单 ②非全额退清 ③**下单日 ≥ 进线日期**。全额退清则清空 | +| **L** | 下单日期 | 最近一笔订单的 `pay_success_date`,格式 `YYYY-MM-DD` | +| **M** | 成交渠道 | 订单的 `key_from` 原始值(如 `sales-adp-bj-wd-0`、`app-active-h5-0-0`) | +| **N** | 产品 | 通过 `goods_id` 映射商品名(如 `瓦拉英语level1+2`、`瓦拉英语年包`) | +| **O** | 下单金额(GMV) | 该用户所有订单 `pay_amount_int` 之和 ÷ 100(元),含退费。0 留空 | +| **P** | 退款金额 | 该用户所有退费成功(`status=3`)的 `refund_amount_int` 之和 ÷ 100(元),取整。0 留空 | +| **Q** | 实际收入(GSV) | GMV − 退款金额。0 留空 | +| **R** | 激活课程 | 从 `bi_vala_seasonal_ticket` 取 `season_package_level`,A1→`A1体验课`,A2→`A2体验课` | +| **S** | 当前行课进度 | 最近完成的课时,格式 `L1-S0-U00-L03`(从 `bi_level_unit_lesson` 映射) | +| **T** | 最近行课时间 | 最近完成课时的日期,格式 `YYYY-MM-DD` | +| **U** | 累计学习时长(min) | 所有组件 `interval_time` 求和 ÷ 60000(分钟) | +| **V** | 更新时间 | 脚本执行时间,格式 `YYYY-MM-DD HH:MM:SS` | + +### 订单汇总额外列 + +| 列 | 名称 | 说明 | +|----|------|------| +| **W** | 渠道归属 | 🔵 公式列,自动判断:销转 / 达人 / 端内 / 直购 | +| **X** | 有效成单 | 🔵 公式列,1=计单,0=全额退 | + +--- + +## 🔧 脚本核心规则 + +1. **E→H**:XXTEA 精确匹配,查不到留空(不用前三后四模糊匹配) +2. **D/I/J 只补空**:已有值不覆盖 +3. **K=是**:有订单 + 非全额退清 + L(下单日) ≥ C(进线日期) +4. **全额退清**:GMV = 退款金额 且 > 0 → K/O/P/Q 全部清空 +5. **O/P/Q**:值为 0 时留空不写 +6. **G 列不动** + +## 📁 脚本位置 + +`scripts/sales_leads_full_refresh.py` + +运行方式: +```bash +python3 scripts/sales_leads_full_refresh.py +``` + +--- + +有几个问题想确认: + +1. **订单汇总的写入方式**:目前设计是**全量覆盖**(每次运行把 A~V 列从第 3 行开始全部重写),W/X 公式列不动。这样每次运行后订单汇总就是最新的下单用户列表。这个逻辑对吗?还是需要**追加**(只加新的,不删旧的)? + +2. **日期解析**:C 列进线日期目前是 `6月7日` 这种格式,脚本按当年解析为 `2026-06-07`。跨年场景(如 12 月→1 月)需要特殊处理吗? + +3. **订单汇总中同一用户多行**:如果同一个手机号在三个 sheet 中都出现且都下单了,订单汇总会出现多行(每个销售归属一行)。这是期望行为吗? +user: 1、全部覆盖 +2、不需要特殊处理,就都按照当年处理 +3、是的 +assistant: 好的,三点都确认了,脚本逻辑完全对齐。现在可以运行吗? +user: 脚本名字发给我 +assistant: `scripts/sales_leads_full_refresh.py` diff --git a/memory/2026-06-08-1637.md b/memory/2026-06-08-1637.md new file mode 100644 index 0000000..7bfb0f4 --- /dev/null +++ b/memory/2026-06-08-1637.md @@ -0,0 +1,284 @@ +# Session: 2026-06-08 16:37:07 GMT+8 + +- **Session Key**: agent:main:feishu:direct:ou_7ef247331804edf255f53e3be15963f4 +- **Session ID**: 1a2b1348-a300-4882-8473-fef8fda4e1ab +- **Source**: feishu + +## Conversation Summary + +user: 根据教育行业季节性趋势,预测接下来6月到12月的收入情况,考虑现有渠道的复播频次和新拓达人可能性 +assistant: 王虹茗,以下是基于数据库实际数据 + 教育行业季节性规律 + 达人运营趋势的 6-12 月收入预测分析: + +--- + +## 📊 瓦拉英语 2026年6-12月收入预测 + +### 一、预测模型方法论 + +预测基于三层模型: +1. **达人直播** = 核心达人 + 学霸系 + 其他复发达人 + 新拓达人 +2. **非达人渠道** = 端内 + 端外销售渠道(月环比增长) +3. **季节性因子** 调整各月预期 + +### 二、历史回顾(2025.09-2026.05) + +| 阶段 | 月份 | 月均 GMV | 达人特征 | +|------|------|----------|----------| +| 🧪 探索期 | 9-12月 | ¥58万 | 晚柠+念妈验证模式,7-15位达人探索 | +| ❄️ 淡季 | 1-2月 | ¥41万 | 春节+寒假,仅3-6位达人维持 | +| 🚀 爆发期 | 3-5月 | ¥325万 | 学霸系入驻,达人扩至25人,GMV飙升 | + +### 三、季节性因子 + +| 月份 | 因子 | 原因 | +|------|------|------| +| **6月** | 0.75 | 期末/中高考,家长关注考试,教育消费低谷 | +| **7月** | 1.40 | 暑假启动,需求爆发 | +| **8月** | 1.50 | 暑假高峰,全年峰值 | +| **9月** | 1.00 | 开学季,需求回归 | +| **10月** | 1.15 | 国庆黄金周+双11预热 | +| **11月** | 1.25 | 双11大促 | +| **12月** | 1.05 | 年末+寒假预售 | + +### 四、三场景预测 + +| 月份 | 保守 GMV | 基准 GMV | 乐观 GMV | +|------|----------|----------|----------| +| 6月 | ¥110万 | **¥148万** | ¥195万 | +| 7月 | ¥208万 | **¥289万** | ¥410万 | +| 8月 | ¥239万 | **¥344万** | ¥434万 | +| 9月 | ¥161万 | **¥225万** | ¥309万 | +| 10月 | ¥182万 | **¥276万** | ¥378万 | +| 11月 | ¥199万 | **¥299万** | ¥427万 | +| 12月 | ¥178万 | **¥242万** | ¥360万 | +| **合计** | **¥1,277万** | **¥1,823万** | **¥2,512万** | +| 月均 | ¥182万 | ¥260万 | ¥359万 | + +### 五、关键假设(基准场景) + +**达人维度:** +- 晚柠 + 念妈(核心):月均 GMV ¥75万,覆盖全月 +- 学霸系(学霸老王+三人行):月均 GMV ¥35万 +- 其他复发达人(百克力/小花生/盈姐/神奇瓜妈等):月均 ¥15万并逐月增长 +- 每月新拓 **6 位**达人,首场均价 ¥8万,次月留存率 35% + +**非达人渠道:** +- 端内 + 端外销售渠道月环比增长 10% +- 5月基准 ¥29万 → 12月预计 ¥56万 + +**退款率控制:** +- 目标 35%(当前波动 27-45%) +- 暑假旺季退款率预计偏高(+5pp) +- 双11退款率可控(-3pp) + +### 六、核心洞察 + +1. **7-8月是关键窗口**:暑假两个月贡献全年预测 GMV 的 35%,需全力储备达人资源和排期 +2. **达人集中度仍高**:晚柠+念妈+学霸系四人贡献约 65% 的达人 GMV,存在依赖风险 +3. **6月是蓄力期**:考试季天然低谷,应聚焦达人签约和暑假排期,而非追求 GMV +4. **非达人渠道在崛起**:端内 + 端外销售从 9月几乎为零增长到 5月 ¥29万,预计年底可达 ¥56万/月 +5. **退款率是利润杠杆**:将退款率从 40% 降至 35%,7 个月可多留存 ¥100-150万 GSV + +### ⚠️ 主要风险 +- 学霸系合作稳定性(目前仅合作 2 个月) +- 新达人首场表现方差大(历史上从 ¥599 到 ¥94万) +- 6月考试季实际可能比预测更低 +- 政策/平台规则变化影响达人直播生态 + +--- + +预测图表已生成: + +MEDIA:/root/.openclaw/workspace/output/revenue_forecast_2026.png + +需要我调整某个假设参数(如达人数、留存率、季节性因子)重新跑一版,或者按渠道维度做更细的拆解吗? +user: 只分析达播渠道,看看达人拓展的状况和合作产出,注意订单中可能有测试订单注意剔除 +assistant: 王虹茗,以下是纯达播渠道的深度分析,已剔除 9 笔测试订单(goods_id=32 的 ¥599 异常价格订单): + +--- + +## 📊 达播渠道深度分析(2025.09 - 2026.05) + +### 一、测试订单剔除说明 + +| 类型 | 订单数 | 涉及账户 | +|------|--------|----------| +| 测试账户(status≠1)| 6 单 | account 51, 316, 298 | +| 真实账户测试价(¥599)| 3 单 | account 1809, 2465, 4661 | +| **合计剔除** | **9 单** | ¥5,391 | + +> 特征:均为 goods_id=32、¥599,通过有赞渠道下单。以下所有数据均不含这 9 笔。 + +--- + +### 二、达人拓展全貌 + +| 月份 | 总达人数 | 新达人 | 复发达人 | 订单 | GMV(万) | +|------|----------|--------|----------|------|-----------| +| 9月 | 5 | 5 | 0 | 299 | ¥59.8 | +| 10月 | 10 | 7 | 3 | 383 | ¥76.6 | +| 11月 | 14 | 7 | 7 | 226 | ¥45.2 | +| 12月 | 4 | 0 | 4 | 89 | ¥17.8 | +| 1月 | 6 | 2 | 4 | 109 | ¥21.8 | +| 2月 | 3 | 2 | 1 | 177 | ¥35.4 | +| 3月 | 10 | 2 | 8 | 767 | ¥252.1 | +| **4月** 🔥 | **25** | **20** | **5** | **1,429** | **¥449.7** | +| 5月 | 17 | 7 | 10 | 490 | ¥160.5 | + +**达人生命周期关键发现:** +- 30 位达人合作过,但 **14 人(47%)仅活跃 1 个月**就再无产出 +- 仅 2 人活跃 ≥7 个月(晚柠 9 月全勤、念妈 7 月) +- **4 月大爆发**:新签 20 位达人,但 5 月仅 5 人复播(新达人留存率 25%) + +--- + +### 三、达人合作产出 + +| 排名 | 达人 | 订单 | GMV(万) | 活跃月 | 主平台 | GMV占比 | +|------|------|------|-----------|--------|--------|---------| +| 1 | **晚柠** | 914 | ¥278.8 | 9月 | 小红书 | 25.0% | +| 2 | **念妈** | 880 | ¥234.0 | 7月 | 多平台 | 20.9% | +| 3 | **学霸老王** | 611 | ¥186.9 | 2月 | 多平台 | 16.7% | +| 4 | **学霸三人行** | 477 | ¥149.7 | 2月 | 多平台 | 13.4% | +| 5 | 神奇瓜妈 | 156 | ¥52.1 | 2月 | 视频号 | 4.7% | +| 6 | 小花生 | 146 | ¥36.5 | 4月 | 视频号 | 3.3% | +| 7 | 老狼聊育儿 | 110 | ¥35.3 | 2月 | 视频号 | 3.2% | +| 8 | 小小鹰萱妈 | 174 | ¥34.8 | 1月 | 抖音 | 3.1% | +| 9 | 百克力 | 151 | ¥30.2 | 4月 | 多平台 | 2.7% | +| 10 | 开心妈妈 | 62 | ¥12.4 | 2月 | 小红书 | 1.1% | + +> TOP3 贡献 **63%** GMV,TOP5 贡献 **81%**。集中度极高。 + +--- + +### 四、平台效能对比 + +| 平台 | GMV(万) | 退款率 | GSV(万) | 净收率 | 达人渠道数 | +|------|-----------|--------|-----------|--------|------------| +| 抖音 | ¥393.6 | **48.5%** 🔴 | ¥202.8 | 51.5% | 14 | +| 小红书 | ¥392.1 | **29.3%** 🟢 | ¥277.0 | 70.7% | 10 | +| 视频号 | ¥332.7 | **34.4%** 🟡 | ¥218.3 | 65.6% | 18 | +| 有赞 | ¥0.4 | 0% | ¥0.4 | 100% | 2 | + +> 小红书 GMV 与抖音持平,但 GSV 高出 **¥74.2 万**(+37%),是实际回报最高的平台。 + +--- + +### 五、核心洞察 + +**🟢 达人拓展在加速,但质量在稀释** +- 达人总数从 5 人扩到 17 人,但「一次性达人」占比 47% +- 新达人首月 GMV 中位数仅 ¥3,598,远低于头部达人的 ¥60 万+ +- 4 月签了 20 位新达人,5 月仅 5 人复播——大量签约达人"首播即告别" + +**🔴 学霸系断崖式衰减——核心风险** +- 学霸老王:4 月 ¥165.1 万 → 5 月 ¥21.8 万(**-87%**) +- 学霸三人行:4 月 ¥148.9 万 → 5 月 ¥0.8 万(**-99.5%**) +- 两人的 4 月「首发专属」渠道占当月 GMV 的 38%,5 月该渠道完全消失 +- 学霸系 5 月仅在常规渠道有微量产出,合作模式不稳定 + +**🟡 抖音流量大但利润薄** +- 退款率 48.5%,意味着每卖出 ¥100,最终只收回 ¥51.5 +- 2 月小小鹰萱妈单场 ¥34.8 万但退款率 56% +- 抖音适合冲量,不适合利润考核 + +**🟢 晚柠是稳定器** +- 9 个月全勤,月均 GMV ¥31 万 +- 小红书单一平台,退款率可控 +- 3 月单月 ¥152.6 万创纪录(产品升级推动?) + +--- + +### 六、建议 + +1. **提高达人留存**:对新达人建立「首月辅导+次月跟进」机制,目标将月度留存率从 50% 提升至 70% +2. **优化平台结构**:加大小红书投放比例,控制抖音预算(退款率太高),视频号作为达人多样性池 +3. **学霸系风险对冲**:不依赖单一大达人,培养 3-5 个「中腰部达人」(月 GMV ¥10-30 万级别) +4. **退款率专项治理**:针对抖音渠道建立订单质量监控,设定 40% 退款率红线 + +--- + +MEDIA:/root/.openclaw/workspace/output/daren_deep_analysis.png + +需要我针对某个达人、某个平台或某个月度做更细的拆解吗? +user: 测试订单包括所有keyfrom中有且仅有5笔以内订单,且均已退款 +assistant: 已按新定义重新分析。测试订单剔除规则:**key_from 总订单 ≤5 且全部已退款**。 + +--- + +## 📊 达播渠道深度分析 v2(测试订单已按新定义剔除) + +### 测试订单剔除明细 + +| 被剔除的 key_from | 订单 | GMV | 原因 | +|---|---|---|---| +| `wxxd-城市阅读-0` | 2 | ¥3,998 | ≤5单且全退 | +| `douyin-读书学习吧-0` | 1 | ¥1,949 | ≤5单且全退 | +| `xhs-念妈讲学习规划-0` | 1 | ¥3,598 | ≤5单且全退 | +| `xhs-瓦拉英语-0` | 1 | ¥3,598 | ≤5单且全退 | +| **合计** | **5单** | **¥13,143** | | + +> 注:9月有 2 笔 ¥599 订单(刘敏生物、张声涛 Nelson)来自真实账户且未退款,按新定义**保留**。 + +--- + +### 核心指标(剔除后) + +| 指标 | v1(旧定义) | **v2(新定义)** | +|------|-------------|-----------------| +| 总订单 | 3,969 | **3,967** | +| 总 GMV | ¥1,118.8万 | **¥1,117.7万** | +| 总 GSV | — | **¥698.7万** | +| 整体退款率 | — | **37.5%** | +| 达人渠道 | 55 | **51** | +| 达人人数 | 30 | **约36位** | + +变化极小(仅 5 单差异),核心结论不变。 + +--- + +### 达人拓展状况 + +**月度趋势:** +| 月份 | 总达人 | 新达人 | 复发达人 | 新渠道数 | 新渠道首月均GMV | +|------|--------|--------|----------|----------|-----------------| +| 9月 | 7 | 7 | 0 | 7 | ¥9.3万 | +| 10月 | 10 | 7 | 3 | 7 | ¥9.1万 | +| 11月 | 15 | 8 | 7 | 8 | ¥3.1万 | +| 12月 | 4 | 0 | 4 | 0 | — | +| 1月 | 6 | 2 | 4 | 2 | ¥3.7万 | +| 2月 | 2 | 1 | 1 | 1 | ¥34.8万 | +| 3月 | 10 | 2 | 8 | 2 | ¥77.5万 | +| 4月 | 23 | 18 | 5 | 18 | ¥22.0万 | +| 5月 | 16 | 6 | 10 | 6 | ¥5.2万 | + +**关键发现:** +- **52% 的达人仅活跃 1 个月**就再无产出(一次性达人) +- 4 月签了 18 位新达人,5 月仅 6 人复播——**新达人留存率 33%** +- 达人月度留存率波动在 **25%-100%**,均值约 65% + +--- + +### 合作产出 + +**TOP10 达人:** +| 达人 | 订单 | GMV | 活跃月 | 主平台 | 占比 | +|------|------|------|--------|--------|------| +| 晚柠 | 914 | ¥279万 | 9月 | 小红书 | 25.0% | +| 念妈 | 879 | ¥234万 | 7月 | 抖音+视频号 | 20.9% | +| 学霸老王 | 611 | ¥187万 | 2月 | 多平台 | 16.7% | +| 学霸三人行 | 477 | ¥150万 | 2月 | 多平台 | 13.4% | +| 神奇瓜妈 | 156 | ¥52万 | 2月 | 视频号 | 4.7% | + +**平台效能:** +| 平台 | GMV | 退款率 | GSV | 净收率 | +|------|------|--------|------|--------| +| 抖音 | ¥393万 | 48.5% | ¥203万 | 51.5% | +| 小红书 | ¥392万 | 29.3% | ¥277万 | 70.7% | +| 视频号 | ¥332万 | 34.4% | ¥218万 | 65.6% | + +--- + +MEDIA:/root/.openclaw/workspace/output/daren_deep_analysis_v2.png + +需要针对某个具体维度(如学霸系衰减分析、新达人质量评估、平台退款率对比)做进一步深挖吗? diff --git a/scripts/age_learning_payment_analysis.py b/scripts/age_learning_payment_analysis.py new file mode 100644 index 0000000..176f84f --- /dev/null +++ b/scripts/age_learning_payment_analysis.py @@ -0,0 +1,263 @@ +#!/usr/bin/env python3 +""" +按孩子年龄区段分析:用户付费情况、上课时长、退费率 +输出 Excel 报表 +""" +import psycopg2 +import pandas as pd +from datetime import datetime, date +import os + +PG_CONFIG = { + 'host': 'bj-postgres-16pob4sg.sql.tencentcdb.com', + 'port': 28591, + 'user': 'ai_member', + 'password': 'LdfjdjL83h3h3^$&**YGG*', + 'dbname': 'vala_bi', +} + +OUTPUT = '/root/.openclaw/workspace/output/age_learning_payment_analysis.xlsx' + +def get_conn(): + return psycopg2.connect(**PG_CONFIG) + +def calc_age(birthday_str): + """从生日字符串计算当前年龄""" + if not birthday_str or birthday_str == '': + return None + try: + # 尝试多种日期格式 + for fmt in ['%Y-%m-%d', '%Y-%m-%d %H:%M:%S', '%Y/%m/%d', '%Y-%m-%dT%H:%M:%S']: + try: + bd = datetime.strptime(birthday_str.strip(), fmt).date() + break + except ValueError: + continue + else: + # 尝试 YYYY-M-D 格式 + parts = birthday_str.strip().split('-') + if len(parts) == 3: + bd = date(int(parts[0]), int(parts[1]), int(parts[2])) + else: + return None + + today = date.today() + age = today.year - bd.year - ((today.month, today.day) < (bd.month, bd.day)) + return age + except: + return None + +def age_group(age): + """年龄分组""" + if age is None: + return '未知' + if age <= 3: + return '0-3岁' + elif age <= 5: + return '4-5岁' + elif age <= 7: + return '6-7岁' + elif age <= 9: + return '8-9岁' + elif age <= 11: + return '10-11岁' + elif age <= 14: + return '12-14岁' + else: + return '15岁以上' + +def main(): + conn = get_conn() + + print("1/5 获取角色生日数据...") + chars_df = pd.read_sql(""" + SELECT c.id AS char_id, c.account_id, c.birthday, c.status, c.deleted_at, + a.status AS account_status + FROM bi_vala_app_character c + JOIN bi_vala_app_account a ON c.account_id = a.id + WHERE c.status = 1 AND c.deleted_at IS NULL + AND a.status = 1 AND a.deleted_at IS NULL + AND c.birthday IS NOT NULL AND c.birthday != '' + """, conn) + print(f" 有效角色数: {len(chars_df)}") + + # 计算年龄 + chars_df['age'] = chars_df['birthday'].apply(calc_age) + chars_df['age_group'] = chars_df['age'].apply(age_group) + + print("2/5 获取订单数据...") + orders_df = pd.read_sql(""" + SELECT o.account_id, o.id AS order_id, o.trade_no, + o.pay_amount_int::numeric/100 AS pay_amount, + o.order_status, o.key_from, o.pay_success_date + FROM bi_vala_order o + JOIN bi_vala_app_account a ON o.account_id = a.id + WHERE a.status = 1 AND a.deleted_at IS NULL + AND o.order_status IN (3, 4) + AND o.pay_success_date IS NOT NULL + """, conn) + print(f" 订单数: {len(orders_df)}") + + print("3/5 获取退款数据...") + refunds_df = pd.read_sql(""" + SELECT r.trade_no, r.refund_amount_int::numeric/100 AS refund_amount, + r.status AS refund_status + FROM bi_refund_order r + WHERE r.status = 3 + """, conn) + print(f" 退款数: {len(refunds_df)}") + + print("4/5 获取学习时长数据...") + learning_df = pd.read_sql(""" + SELECT ul.user_id AS char_id, SUM(ul.learning_time) AS total_learning_seconds + FROM user_learning ul + GROUP BY ul.user_id + """, conn) + print(f" 有学习记录的角色数: {len(learning_df)}") + + conn.close() + + print("5/5 关联计算...") + + # 关联订单到角色(通过 account_id) + # 一个 account 可能有多个角色,这里按 account 维度统计付费 + account_orders = orders_df.groupby('account_id').agg( + order_count=('order_id', 'count'), + total_gmv=('pay_amount', 'sum'), + has_order=('order_id', lambda x: 1), + ).reset_index() + + # 退款关联 + refund_trade_nos = set(refunds_df['trade_no'].tolist()) + orders_df['is_refunded'] = orders_df['trade_no'].apply(lambda x: x in refund_trade_nos) + + # 按 account 统计退款 + account_refund = orders_df.groupby('account_id').agg( + refund_order_count=('is_refunded', 'sum'), + total_refund_amount=('pay_amount', lambda x: x[orders_df.loc[x.index, 'is_refunded']].sum()), + ).reset_index() + account_refund['all_refunded'] = account_refund.apply( + lambda r: 1 if r['refund_order_count'] >= r['refund_order_count'] else 0, axis=1 + ) + + # 合并到角色 + chars_df = chars_df.merge(account_orders[['account_id', 'order_count', 'total_gmv', 'has_order']], + on='account_id', how='left') + chars_df = chars_df.merge(account_refund[['account_id', 'refund_order_count', 'total_refund_amount']], + on='account_id', how='left') + + chars_df['order_count'] = chars_df['order_count'].fillna(0).astype(int) + chars_df['total_gmv'] = chars_df['total_gmv'].fillna(0) + chars_df['has_order'] = chars_df['has_order'].fillna(0).astype(int) + chars_df['refund_order_count'] = chars_df['refund_order_count'].fillna(0).astype(int) + chars_df['total_refund_amount'] = chars_df['total_refund_amount'].fillna(0) + chars_df['gsv'] = chars_df['total_gmv'] - chars_df['total_refund_amount'] + chars_df['is_paid'] = (chars_df['has_order'] == 1).astype(int) + chars_df['is_all_refunded'] = ((chars_df['order_count'] > 0) & (chars_df['refund_order_count'] >= chars_df['order_count'])).astype(int) + + # 关联学习时长 + chars_df = chars_df.merge(learning_df[['char_id', 'total_learning_seconds']], + on='char_id', how='left') + chars_df['total_learning_seconds'] = chars_df['total_learning_seconds'].fillna(0) + chars_df['total_learning_min'] = chars_df['total_learning_seconds'] / 60.0 + + # 按年龄组汇总 + age_order = ['0-3岁', '4-5岁', '6-7岁', '8-9岁', '10-11岁', '12-14岁', '15岁以上', '未知'] + + results = [] + for ag in age_order: + subset = chars_df[chars_df['age_group'] == ag] + if len(subset) == 0: + continue + + total_chars = len(subset) + total_accounts = subset['account_id'].nunique() + paid_accounts = subset[subset['is_paid'] == 1]['account_id'].nunique() + all_refunded_accounts = subset[subset['is_all_refunded'] == 1]['account_id'].nunique() + + # 付费率 = 付费account数 / 总account数 + pay_rate = paid_accounts / total_accounts * 100 if total_accounts > 0 else 0 + + # 退费率 = 全部退款的account数 / 付费account数 + refund_rate = all_refunded_accounts / paid_accounts * 100 if paid_accounts > 0 else 0 + + # GMV / GSV(按角色汇总的account去重) + paid_subset = subset[subset['is_paid'] == 1] + # 按account去重取GMV + account_gmv = paid_subset.groupby('account_id')['total_gmv'].first().sum() + account_gsv = paid_subset.groupby('account_id')['gsv'].first().sum() + + # 人均GMV + avg_gmv_per_paid = account_gmv / paid_accounts if paid_accounts > 0 else 0 + + # 学习时长 + avg_learning_min = subset['total_learning_min'].mean() + median_learning_min = subset['total_learning_min'].median() + learned_chars = (subset['total_learning_seconds'] > 0).sum() + learn_rate = learned_chars / total_chars * 100 if total_chars > 0 else 0 + + # 有学习的角色平均学习时长 + learned_subset = subset[subset['total_learning_seconds'] > 0] + avg_learn_min_learned = learned_subset['total_learning_min'].mean() if len(learned_subset) > 0 else 0 + + results.append({ + '年龄组': ag, + '角色数': total_chars, + '用户数(account)': total_accounts, + '付费用户数': paid_accounts, + '付费率': round(pay_rate, 1), + '全部退款用户数': all_refunded_accounts, + '退费率(全额退)': round(refund_rate, 1), + 'GMV(元)': round(account_gmv, 0), + 'GSV(元)': round(account_gsv, 0), + '人均GMV(付费用户)': round(avg_gmv_per_paid, 0), + '有学习记录角色数': learned_chars, + '学习参与率': round(learn_rate, 1), + '全员平均学习时长(分钟)': round(avg_learning_min, 1), + '有学习角色平均时长(分钟)': round(avg_learn_min_learned, 1), + '中位学习时长(分钟)': round(median_learning_min, 1), + }) + + result_df = pd.DataFrame(results) + + # 汇总行 + total_row = { + '年龄组': '合计', + '角色数': chars_df['char_id'].nunique(), + '用户数(account)': chars_df['account_id'].nunique(), + '付费用户数': chars_df[chars_df['is_paid'] == 1]['account_id'].nunique(), + '付费率': round(chars_df[chars_df['is_paid'] == 1]['account_id'].nunique() / chars_df['account_id'].nunique() * 100, 1), + '全部退款用户数': chars_df[chars_df['is_all_refunded'] == 1]['account_id'].nunique(), + '退费率(全额退)': round(chars_df[chars_df['is_all_refunded'] == 1]['account_id'].nunique() / max(chars_df[chars_df['is_paid'] == 1]['account_id'].nunique(), 1) * 100, 1), + 'GMV(元)': round(chars_df[chars_df['is_paid'] == 1].groupby('account_id')['total_gmv'].first().sum(), 0), + 'GSV(元)': round(chars_df[chars_df['is_paid'] == 1].groupby('account_id')['gsv'].first().sum(), 0), + '人均GMV(付费用户)': round(chars_df[chars_df['is_paid'] == 1].groupby('account_id')['total_gmv'].first().mean(), 0), + '有学习记录角色数': (chars_df['total_learning_seconds'] > 0).sum(), + '学习参与率': round((chars_df['total_learning_seconds'] > 0).sum() / len(chars_df) * 100, 1), + '全员平均学习时长(分钟)': round(chars_df['total_learning_min'].mean(), 1), + '有学习角色平均时长(分钟)': round(chars_df[chars_df['total_learning_seconds'] > 0]['total_learning_min'].mean(), 1), + '中位学习时长(分钟)': round(chars_df['total_learning_min'].median(), 1), + } + result_df = pd.concat([result_df, pd.DataFrame([total_row])], ignore_index=True) + + # 写入 Excel + with pd.ExcelWriter(OUTPUT, engine='openpyxl') as writer: + result_df.to_excel(writer, sheet_name='年龄分析', index=False) + + # 年龄分布明细 + age_dist = chars_df.groupby('age').agg( + 角色数=('char_id', 'count'), + 付费角色数=('is_paid', 'sum'), + ).reset_index() + age_dist['付费率'] = round(age_dist['付费角色数'] / age_dist['角色数'] * 100, 1) + age_dist = age_dist.sort_values('age') + age_dist.to_excel(writer, sheet_name='年龄分布明细', index=False) + + print(f"\n✅ 报表已生成: {OUTPUT}") + print("\n=== 按年龄组汇总 ===") + print(result_df.to_string(index=False)) + + return OUTPUT + +if __name__ == '__main__': + main() diff --git a/scripts/fill_leads_sheet.py b/scripts/fill_leads_sheet.py index 9a7116d..2e74728 100644 --- a/scripts/fill_leads_sheet.py +++ b/scripts/fill_leads_sheet.py @@ -9,7 +9,7 @@ 4. 按析出月份汇总统计 → 写入"统计" sheet 统计口径: -- 转化率 = 未退费转化leads / 总leads +- 转化率 = 已转化leads / 有效析出leads(剔除端外购课用户) - 退费率 = 退费leads / 已转化leads - 完成率 = 完成该课的leads / 总leads @@ -84,6 +84,16 @@ import base64 XXTEA_KEY = "K1pNOZ5O5+ZqTPSHA2kzPdoNOMOGcv6g" +def excel_serial_to_date(serial: int) -> str: + """Excel 序列日期 → YYYY-MM-DD 字符串""" + from datetime import datetime, timedelta + if serial >= 61: + dt = datetime(1899, 12, 30) + timedelta(days=serial) + else: + dt = datetime(1899, 12, 31) + timedelta(days=serial) + return dt.strftime("%Y-%m-%d") + + def encrypt_phone(phone: str) -> str: encrypted = xxtea.encrypt(phone.encode(), XXTEA_KEY.encode()) result = base64.b64encode(encrypted).decode() @@ -194,8 +204,9 @@ def query_conversion(account_ids: list[str]) -> dict[str, dict]: GROUP BY o.account_id """ for row in pg_query(sql): - if len(row) >= 4: - acc_id, first_pay, has_refund, first_refund = row[0], row[1], row[2], row[3] + if len(row) >= 3: + acc_id, first_pay, has_refund = row[0], row[1], row[2] + first_refund = row[3] if len(row) >= 4 else "" results[acc_id] = { "converted": "是" if first_pay else "否", "convert_date": first_pay or "", @@ -205,6 +216,30 @@ def query_conversion(account_ids: list[str]) -> dict[str, dict]: return results +def query_outside_conversion(account_ids: list[str]) -> set[str]: + """查询哪些账号有端外购课(key_from 不在端内渠道中)""" + if not account_ids: + return set() + BATCH_SIZE = 100 + results = set() + channels_str = ",".join(f"'{c}'" for c in INNER_CHANNELS) + for i in range(0, len(account_ids), BATCH_SIZE): + batch = account_ids[i:i + BATCH_SIZE] + ids_str = ",".join(batch) + sql = f""" + SELECT DISTINCT o.account_id::text + FROM bi_vala_order o + WHERE o.account_id IN ({ids_str}) + AND o.key_from NOT IN ({channels_str}) + AND o.pay_success_date IS NOT NULL + AND o.order_status IN (3, 4) + """ + for row in pg_query(sql): + if len(row) >= 1: + results.add(row[0]) + return results + + def query_learning(account_ids: list[str]) -> dict[str, dict[str, str]]: if not account_ids: return {} @@ -306,11 +341,15 @@ def process_sheet(sheet_id: str, sheet_name: str, dry_run: bool = False) -> list print("→ 查询转化信息...") conv_info = query_conversion(matched_accounts) + print("→ 查询端外购课...") + outside_accounts = query_outside_conversion(matched_accounts) + print("→ 查询 U0 学习进度...") learn_info = query_learning(matched_accounts) updates = [] lead_data = [] + matched_row_indices = set() for phone, row_indices in phone_to_row.items(): info = acc_info.get(phone) @@ -321,14 +360,28 @@ def process_sheet(sheet_id: str, sheet_name: str, dry_run: bool = False) -> list learn = learn_info.get(acc_id, {}) for row_idx in row_indices: + matched_row_indices.add(row_idx) extract_date = "" if len(rows[row_idx]) > COL_EXTRACT_DATE and rows[row_idx][COL_EXTRACT_DATE]: extract_date = str(rows[row_idx][COL_EXTRACT_DATE]).strip() + # F 列逻辑:端内购课 > 端外购课 > 否 + is_inner = conv.get("converted", "否") == "是" + is_outside = acc_id in outside_accounts + if is_inner: + f_value = "是" + convert_date = conv.get("convert_date", "") + elif is_outside: + f_value = "端外购课" + convert_date = "" + else: + f_value = "否" + convert_date = "" + updates.append((row_idx, COL_USER_ID, acc_id)) updates.append((row_idx, COL_REG_DATE, info.get("created_at", ""))) - updates.append((row_idx, COL_CONVERTED, conv.get("converted", "否"))) - updates.append((row_idx, COL_CONVERT_DATE, conv.get("convert_date", ""))) + updates.append((row_idx, COL_CONVERTED, f_value)) + updates.append((row_idx, COL_CONVERT_DATE, convert_date)) updates.append((row_idx, COL_REFUND, conv.get("refunded", "否"))) updates.append((row_idx, COL_REFUND_DATE, conv.get("refund_date", ""))) for col_offset, lesson_name in enumerate(U0_COL_ORDER): @@ -336,11 +389,31 @@ def process_sheet(sheet_id: str, sheet_name: str, dry_run: bool = False) -> list lead_data.append({ "extract_date": extract_date, - "converted": conv.get("converted", "否"), + "converted": f_value, "refunded": conv.get("refunded", "否"), "lessons": {k: learn.get(k, "") for k in U0_COL_ORDER}, + "has_phone": True, }) + # 未匹配手机号的行也纳入统计,转化状态取 F 列已有值 + for row_idx, row in enumerate(rows): + if row_idx in matched_row_indices: + continue + extract_date = "" + if len(row) > COL_EXTRACT_DATE and row[COL_EXTRACT_DATE]: + extract_date = str(row[COL_EXTRACT_DATE]).strip() + # 读取 F 列已有值作为转化状态 + existing_converted = "否" + if len(row) > COL_CONVERTED and row[COL_CONVERTED]: + existing_converted = str(row[COL_CONVERTED]).strip() + lead_data.append({ + "extract_date": extract_date, + "converted": existing_converted, + "refunded": "否", + "lessons": {k: "" for k in U0_COL_ORDER}, + "has_phone": False, + }) + # 回写 row_updates = {} for row_idx, col, val in updates: @@ -384,7 +457,8 @@ def compute_stats(lead_data: list[dict]) -> dict[str, dict]: """ 按析出月份汇总统计 口径: - - 转化率 = 未退费转化leads / 总leads + - 有效析出用户数 = 总leads - 端外购课leads + - 转化率 = 已转化leads / 有效析出用户数 - 退费率 = 退费leads / 已转化leads - 完成率 = 完成该课的leads / 总leads """ @@ -393,11 +467,23 @@ def compute_stats(lead_data: list[dict]) -> dict[str, dict]: extract = lead.get("extract_date", "") if not extract: continue - m = re.match(r'(\d{4})[-/](\d{1,2})', extract) - if m: - month = f"{m.group(1)}-{m.group(2).zfill(2)}" + # 支持三种格式: YYYY-MM-DD / YYYY/MM/DD / M月D日 / Excel序列数字 + extract_str = str(extract).strip() + # 尝试 Excel 序列数字 + if extract_str.isdigit(): + month = excel_serial_to_date(int(extract_str))[:7] else: - continue + m = re.match(r'(\d{4})[-/](\d{1,2})', extract_str) + if m: + month = f"{m.group(1)}-{m.group(2).zfill(2)}" + else: + m = re.match(r'(\d{1,2})月\d{1,2}日', extract_str) + if m: + from datetime import datetime + year = datetime.now().year + month = f"{year}-{m.group(1).zfill(2)}" + else: + continue month_groups[month].append(lead) if not month_groups: @@ -406,11 +492,15 @@ def compute_stats(lead_data: list[dict]) -> dict[str, dict]: result = {} for month, leads in sorted(month_groups.items()): total = len(leads) + matched = sum(1 for l in leads if l.get("has_phone", False)) + outside_only = sum(1 for l in leads if l["converted"] == "端外购课") converted_all = sum(1 for l in leads if l["converted"] == "是") refunded = sum(1 for l in leads if l["refunded"] == "是") converted_unrefunded = sum(1 for l in leads if l["converted"] == "是" and l["refunded"] != "是") - conv_rate = converted_unrefunded / total * 100 if total > 0 else 0 + # 转化率分母剔除端外购课用户 + effective_total = total - outside_only + conv_rate = converted_all / effective_total * 100 if effective_total > 0 else 0 refund_rate = refunded / converted_all * 100 if converted_all > 0 else 0 lesson_rates = {} @@ -419,7 +509,9 @@ def compute_stats(lead_data: list[dict]) -> dict[str, dict]: lesson_rates[lesson_name] = completed / total * 100 if total > 0 else 0 result[month] = { - "total": total, + "total": effective_total, + "matched": matched, + "outside_only": outside_only, "converted_all": converted_all, "converted_unrefunded": converted_unrefunded, "refunded": refunded, @@ -438,6 +530,12 @@ def write_all_stats(all_stats: dict[str, dict[str, dict]], dry_run: bool = False 按 销售+月份 逐行写入,从第2行开始 """ + # 先写表头 + header = ["销售", "月份", "有效析出用户数", "匹配用户数", "转化用户数", "转化率", "退费率"] + \ + [f"{name}完成率" for name in U0_COL_ORDER] + if not dry_run: + lark_write(SHEET_STAT, f"{SHEET_STAT}!A1:Q1", [header]) + # 构建有序行列表: [(sales_name, month, stats), ...] rows_data = [] for sales_name in ["曲慧萌", "吴迪"]: @@ -445,6 +543,10 @@ def write_all_stats(all_stats: dict[str, dict[str, dict]], dry_run: bool = False for month in sorted(stats.keys()): rows_data.append((sales_name, month, stats[month])) + # 先清除统计 sheet 旧数据(A2:Q50),避免残留旧行 + print(" → 清除统计 sheet 旧数据...") + lark_write(SHEET_STAT, f"{SHEET_STAT}!A2:Q50", [[""] * 17] * 49) + if not rows_data: print(" 无统计数据") return @@ -456,16 +558,22 @@ def write_all_stats(all_stats: dict[str, dict[str, dict]], dry_run: bool = False lark_write(SHEET_STAT, f"{SHEET_STAT}!A{row_num}:A{row_num}", [[sales_name]]) # B: 月份 lark_write(SHEET_STAT, f"{SHEET_STAT}!B{row_num}:B{row_num}", [[month]]) - # C: 转化率(小数,配合百分比格式显示) - lark_write(SHEET_STAT, f"{SHEET_STAT}!C{row_num}:C{row_num}", [[round(s["conv_rate"] / 100, 3)]]) - # D: 退费率 - lark_write(SHEET_STAT, f"{SHEET_STAT}!D{row_num}:D{row_num}", [[round(s["refund_rate"] / 100, 3)]]) - # E-N: 完成率 + # C: 有效析出用户数(剔除端外购课) + lark_write(SHEET_STAT, f"{SHEET_STAT}!C{row_num}:C{row_num}", [[s["total"]]]) + # D: 匹配用户数 + lark_write(SHEET_STAT, f"{SHEET_STAT}!D{row_num}:D{row_num}", [[s["matched"]]]) + # E: 转化用户数 + lark_write(SHEET_STAT, f"{SHEET_STAT}!E{row_num}:E{row_num}", [[s["converted_all"]]]) + # F: 转化率(小数,配合百分比格式显示) + lark_write(SHEET_STAT, f"{SHEET_STAT}!F{row_num}:F{row_num}", [[round(s["conv_rate"] / 100, 3)]]) + # G: 退费率 + lark_write(SHEET_STAT, f"{SHEET_STAT}!G{row_num}:G{row_num}", [[round(s["refund_rate"] / 100, 3)]]) + # H-Q: 完成率 lesson_vals = [round(s["lesson_rates"][name] / 100, 3) for name in U0_COL_ORDER] - lark_write(SHEET_STAT, f"{SHEET_STAT}!E{row_num}:N{row_num}", [lesson_vals]) + lark_write(SHEET_STAT, f"{SHEET_STAT}!H{row_num}:Q{row_num}", [lesson_vals]) - print(f" ✓ {sales_name} {month}: 转化率={s['conv_rate']:.1f}% " - f"退费率={s['refund_rate']:.1f}% 总leads={s['total']}") + print(f" ✓ {sales_name} {month}: 有效析出={s['total']} 匹配={s['matched']} 转化={s['converted_all']} " + f"转化率={s['conv_rate']:.1f}% 退费率={s['refund_rate']:.1f}%") # ── 主流程 ────────────────────────────────────────────── @@ -492,7 +600,7 @@ def main(): if dry_run: for sales_name, stats in all_stats.items(): for month, s in stats.items(): - print(f" [DRY-RUN] {sales_name} {month}: 转化率={s['conv_rate']:.1f}% 退费率={s['refund_rate']:.1f}%") + print(f" [DRY-RUN] {sales_name} {month}: 有效析出={s['total']} 匹配={s['matched']} 转化={s['converted_all']} 转化率={s['conv_rate']:.1f}% 退费率={s['refund_rate']:.1f}%") else: write_all_stats(all_stats, dry_run) diff --git a/scripts/sales_leads_full_refresh.py b/scripts/sales_leads_full_refresh.py new file mode 100644 index 0000000..8e59c10 --- /dev/null +++ b/scripts/sales_leads_full_refresh.py @@ -0,0 +1,751 @@ +#!/usr/bin/env python3 +""" +销售线索全量刷新脚本 — XXTEA 精确匹配版 + +功能: + 1. 读取「小龙」「吴迪」「成都」三个 sheet 的 E 列手机号 + 2. XXTEA 加密 → bi_vala_app_account.tel_encrypt 精确匹配 → 获取 account_id + 3. 查询 PostgreSQL 获取用户订单/学习数据 + 4. 填写 H~V 列(自动列),V 列为操作更新时间 + 5. 将三个 sheet 中 K=是(已下单)的用户汇总到「订单汇总」sheet + +规则(沿用 S2 规则): + ① E→H: XXTEA 精确匹配, 查不到留空 + ② H→D/I/J: 只补空, 不覆盖已有值 + ③ K=是: 仅当 L(下单日) >= C(线索日期) + ④ 全额退清: 所有订单都退费 → K/O/P/Q 全部清空 + ⑤ O/P/Q 0 留空, P 整元 + ⑥ G 列不动 + +用法: + python3 scripts/sales_leads_full_refresh.py +""" + +import json, re, time, sys, os, requests, psycopg2 +from datetime import datetime +from collections import defaultdict + +SCRIPTS_DIR = os.path.dirname(os.path.abspath(__file__)) +WORKSPACE = os.path.dirname(SCRIPTS_DIR) +CRED_DIR = "/root/.openclaw/credentials/xiaoxi" + +sys.path.insert(0, SCRIPTS_DIR) +from phone_encrypt import encrypt_phone + +SPREADSHEET_TOKEN = "NoZqsFi47hIOHEt9j8WcfRtbnug" + +SALES_SHEETS = [ + ("qJF4I", "小龙", "A3:V2607"), + ("f975f0", "吴迪", "A3:V8149"), + ("qJF4J", "成都", "A3:V2500"), +] + +SUMMARY_SHEET_ID = "2smjwA" + +CS_MAP = {"吴迪": "吴迪", "小龙": "小龙", "Tom": "Tom", "Bob": "Bob"} + +GOODS_NAMES = { + 57: "瓦拉英语level1·单季", 60: "瓦拉英语level1", 63: "瓦拉英语level1·单季", + 31: "瓦拉英语年包", 32: "瓦拉英语单季度包", 33: "瓦拉英语level2", 54: "瓦拉英语季度包", + 61: "瓦拉英语level1+2", +} + +CHANNEL_MAP = { + "Apple App Store": "苹果", "科大讯飞学习机": "讯飞", "学而思学习机": "学而思", + "华为应用市场": "华为", "小米应用市场": "小米", "应用宝应用市场": "应用宝", + "希沃学习机": "希沃", "荣耀应用市场": "荣耀", "小度学习机": "小度", + "oppo应用市场": "OPPO", "vivo应用市场": "VIVO", "京东方学习机": "京东方", + "步步高学习机": "步步高", "作业帮学习机": "作业帮", "魅族应用市场": "魅族", + "官网": "官网", +} + +LOG_FILE = "/var/log/xiaoxi_full_refresh.log" + + +def log(msg): + ts = datetime.now().strftime("%Y-%m-%d %H:%M:%S") + line = f"[{ts}] {msg}" + print(line) + with open(LOG_FILE, "a") as f: + f.write(line + "\n") + + +def get_secret(key): + with open(os.path.join(WORKSPACE, "secrets.env")) as f: + for line in f: + if line.startswith(f"{key}="): + return line.strip().split("=", 1)[1].strip("'\"") + + +def get_fs_token(): + with open(os.path.join(CRED_DIR, "config.json")) as f: + cfg = json.load(f) + resp = requests.post( + "https://open.feishu.cn/open-apis/auth/v3/tenant_access_token/internal", + json={"app_id": cfg["apps"][0]["appId"], "app_secret": cfg["apps"][0]["appSecret"]}, + timeout=15 + ) + return resp.json()["tenant_access_token"] + + +def read_sheet(token, sheet_id, range_str=None): + url = f"https://open.feishu.cn/open-apis/sheets/v2/spreadsheets/{SPREADSHEET_TOKEN}/values/{sheet_id}" + if range_str: + url += f"!{range_str}" + resp = requests.get(url, headers={"Authorization": f"Bearer {token}"}, timeout=30) + data = resp.json() + if data.get("code") != 0: + raise RuntimeError(f"读取失败 {sheet_id}: {data}") + return data["data"]["valueRange"]["values"] + + +def put_values(token, sheet_id, range_str, values): + url = f"https://open.feishu.cn/open-apis/sheets/v2/spreadsheets/{SPREADSHEET_TOKEN}/values" + body = {"valueRange": {"range": f"{sheet_id}!{range_str}", "values": values}} + resp = requests.put(url, headers={ + "Authorization": f"Bearer {token}", + "Content-Type": "application/json" + }, json=body, timeout=30) + r = resp.json() + if r.get("code") != 0: + log(f" ❌ {range_str}: code={r.get('code')} msg={r.get('msg')}") + return False + return True + + +def batch_in(cur, sql_tpl, params, chunk=500): + results = [] + for i in range(0, len(params), chunk): + batch = params[i:i + chunk] + ph = ",".join(["%s"] * len(batch)) + cur.execute(sql_tpl % ph, batch) + results.extend(cur.fetchall()) + return results + + +def safe_cell(row, idx): + """安全获取单元格值,数字转整数字符串""" + if len(row) > idx and row[idx] is not None: + try: + if isinstance(row[idx], (int, float)): + if row[idx] == int(row[idx]): + return str(int(row[idx])) + return str(row[idx]).strip() + except (ValueError, TypeError): + return str(row[idx]).strip() + return "" + + +def parse_date_str(s): + """'6月7日' → '2026-06-07', YYYY-MM-DD 原样返回""" + if not s: + return "" + s = s.strip() + if re.match(r'^\d{4}-\d{2}-\d{2}$', s): + return s + m = re.match(r'^(\d{1,2})月(\d{1,2})日$', s) + if m: + year = datetime.now().year + return f"{year}-{int(m.group(1)):02d}-{int(m.group(2)):02d}" + return s + + +# ═══ Step 1: 解析三个销售 sheet ═══ + +def parse_sales_sheets(token): + all_data = {} + for sid, sname, rng in SALES_SHEETS: + rows = read_sheet(token, sid, rng) + entries = [] + for idx, row in enumerate(rows): + row_num = idx + 3 + if not row or all(not cell for cell in row): + continue + + a_val = safe_cell(row, 0) + sales = None + for k, v in CS_MAP.items(): + if k in a_val: + sales = v + break + if not sales: + continue + + phone = "" + if len(row) > 4 and row[4]: + try: + phone = str(int(float(str(row[4])))) + except (ValueError, TypeError): + phone = str(row[4]).strip() + + entries.append({ + "row": row_num, + "sales": sales, + "nickname": safe_cell(row, 1), + "clue_date": safe_cell(row, 2), + "clue_date_parsed": parse_date_str(safe_cell(row, 2)), + "phone": phone, + "grade": safe_cell(row, 5), + "history": safe_cell(row, 6), + "existing": { + "D": safe_cell(row, 3), + "H": safe_cell(row, 7), + "I": safe_cell(row, 8), + "J": safe_cell(row, 9), + "K": safe_cell(row, 10), + "L": safe_cell(row, 11), + "M": safe_cell(row, 12), + "N": safe_cell(row, 13), + "O": safe_cell(row, 14), + "P": safe_cell(row, 15), + "Q": safe_cell(row, 16), + "R": safe_cell(row, 17), + "S": safe_cell(row, 18), + "T": safe_cell(row, 19), + "U": safe_cell(row, 20), + "V": safe_cell(row, 21), + }, + }) + + all_data[sid] = entries + phone_cnt = sum(1 for e in entries if re.match(r'^\d{11}$', e["phone"])) + uid_cnt = sum(1 for e in entries if e["existing"]["H"] and e["existing"]["H"].isdigit()) + log(f" [{sname}] {len(entries)}行, 手机号{phone_cnt}, 已有UID{uid_cnt}") + + return all_data + + +# ═══ Step 2: XXTEA 加密 → PG tel_encrypt 精确匹配 ═══ + +def phone_to_uid_xxtea(all_entries): + phone_set = set() + for entries in all_entries.values(): + for e in entries: + if re.match(r'^\d{11}$', e["phone"]): + phone_set.add(e["phone"]) + + if not phone_set: + log(" 无有效手机号") + return {} + + log(f" XXTEA 加密匹配: {len(phone_set)} 个唯一手机号") + + phone_enc_map = {} + for phone in phone_set: + try: + phone_enc_map[encrypt_phone(phone)] = phone + except Exception as ex: + log(f" 加密失败 {phone}: {ex}") + + log(f" 加密完成, 唯一密文: {len(phone_enc_map)}") + + conn = psycopg2.connect( + host="bj-postgres-16pob4sg.sql.tencentcdb.com", port=28591, + user="ai_member", password=get_secret("PG_ONLINE_PASSWORD"), + dbname="vala_bi", connect_timeout=30 + ) + cur = conn.cursor() + + enc_list = list(phone_enc_map.keys()) + phone_to_uid = {} + for i in range(0, len(enc_list), 500): + chunk = enc_list[i:i + 500] + ph = ",".join(["%s"] * len(chunk)) + cur.execute( + f"SELECT id, tel_encrypt FROM bi_vala_app_account " + f"WHERE tel_encrypt IN ({ph}) AND status=1 AND deleted_at IS NULL", + chunk + ) + for uid, tel_enc in cur.fetchall(): + plain = phone_enc_map.get(tel_enc) + if plain: + phone_to_uid[plain] = str(uid) + time.sleep(0.05) + + cur.close() + conn.close() + log(f" 精确匹配到 {len(phone_to_uid)} 个 UID (via XXTEA)") + return phone_to_uid + + +# ═══ Step 3: PostgreSQL 批量查询 ═══ + +def query_all_pg(all_entries, phone_map): + uid_set = set() + for entries in all_entries.values(): + for e in entries: + if re.match(r'^\d{11}$', e["phone"]) and e["phone"] in phone_map: + uid_set.add(int(phone_map[e["phone"]])) + h_val = e["existing"]["H"] + if h_val and h_val.isdigit() and int(h_val) > 0: + uid_set.add(int(h_val)) + + uid_list = list(uid_set) + log(f" 有效 user_id: {len(uid_list)}") + + if not uid_list: + return {} + + conn = psycopg2.connect( + host="bj-postgres-16pob4sg.sql.tencentcdb.com", port=28591, + user="ai_member", password=get_secret("PG_ONLINE_PASSWORD"), + dbname="vala_bi", connect_timeout=30 + ) + cur = conn.cursor() + + info = {uid: { + "reg_date": "", "download_channel": "", "trial_count": 0, + "has_order": False, "order_date": "", "order_channel": "", "product": "", + "gmv": 0, "refund": 0, "gsv": 0, + "activation": "", "lesson_progress": "", "lesson_time": "", "lesson_minutes": 0, + } for uid in uid_set} + + # 3a. 注册信息 + log(" 查询注册信息...") + reg_info = batch_in(cur, + "SELECT id, created_at, download_channel FROM bi_vala_app_account " + "WHERE id IN (%s) AND status=1 AND deleted_at IS NULL", + uid_list + ) + for aid, created_at, dc in reg_info: + if aid in info: + info[aid]["reg_date"] = created_at.strftime("%Y-%m-%d") if created_at else "" + raw_ch = dc or "" + info[aid]["download_channel"] = CHANNEL_MAP.get(raw_ch, raw_ch) + + # 3b. 体验节数 + log(" 查询体验节数...") + trial_info = batch_in(cur, + "SELECT account_id, COUNT(*) FROM bi_user_course_detail " + "WHERE account_id IN (%s) AND expire_time IS NULL AND deleted_at IS NULL " + "GROUP BY account_id", + uid_list + ) + for aid, cnt in trial_info: + if aid in info: + info[aid]["trial_count"] = cnt + + # 3c. 订单信息 + log(" 查询订单信息...") + orders = batch_in(cur, + "SELECT account_id, trade_no, pay_success_date, key_from, goods_id, pay_amount_int, order_status " + "FROM bi_vala_order WHERE account_id IN (%s) AND pay_success_date IS NOT NULL " + "AND order_status IN (3,4) ORDER BY pay_success_date DESC", + uid_list + ) + user_orders = defaultdict(list) + for o in orders: + user_orders[o[0]].append(o) + + trade_nos = [o[1] for o in orders if o[1]] + refund_map = {} + if trade_nos: + refunds = batch_in(cur, + "SELECT trade_no, refund_amount_int FROM bi_refund_order " + "WHERE trade_no IN (%s) AND status=3", + trade_nos + ) + for tn, amt in refunds: + refund_map[tn] = amt + + for aid, olist in user_orders.items(): + if aid not in info: + continue + info[aid]["has_order"] = True + latest = olist[0] + info[aid]["order_date"] = latest[2].strftime("%Y-%m-%d") if latest[2] else "" + info[aid]["order_channel"] = latest[3] or "" + info[aid]["product"] = GOODS_NAMES.get(latest[4], f"商品{latest[4]}") + total_gmv = sum(o[5] for o in olist) / 100.0 + total_refund = sum(refund_map.get(o[1], 0) for o in olist) / 100.0 + info[aid]["gmv"] = total_gmv + info[aid]["refund"] = total_refund + info[aid]["gsv"] = total_gmv - total_refund + + # 3d. 激活课程 + log(" 查询激活课程...") + try: + activations = batch_in(cur, + "SELECT account_id, season_package_level FROM bi_vala_seasonal_ticket " + "WHERE account_id IN (%s) AND status=1 AND deleted_at IS NULL " + "AND season_package_level IN ('A1','A2')", + uid_list + ) + for aid, lvl in activations: + if aid in info: + info[aid]["activation"] = lvl + except Exception as ex: + log(f" 激活查询异常: {ex}") + + # 3e. 角色信息 + log(" 查询角色信息...") + char_info = batch_in(cur, + "SELECT account_id, id FROM bi_vala_app_character " + "WHERE account_id IN (%s) AND deleted_at IS NULL", + uid_list + ) + account_chars = defaultdict(list) + char_to_account = {} + for aid, cid in char_info: + account_chars[aid].append(cid) + char_to_account[cid] = aid + char_ids = list(char_to_account.keys()) + log(f" 角色数: {len(char_ids)}") + + # 3f. 课程映射 + cur.execute("SELECT id, course_level, course_season, course_unit, course_lesson FROM bi_level_unit_lesson") + chapter_map = {} + for ch_id, cl, cs, cu, cl2 in cur.fetchall(): + chapter_map[ch_id] = (cl or "", cs or "", cu or "", cl2 or "") + + # 3g. 课时完成记录 + log(" 查询课时完成记录...") + char_plays = defaultdict(lambda: {"latest_time": None, "latest_chapter": None, "total_ms": 0}) + for tbl_idx in range(8): + table = f"bi_user_chapter_play_record_{tbl_idx}" + try: + cur.execute( + f"SELECT user_id, chapter_id, created_at FROM {table} " + f"WHERE play_status=1 AND deleted_at IS NULL AND user_id = ANY(%s)", + (char_ids,) + ) + for uid, ch_id, created_at in cur.fetchall(): + ch_data = chapter_map.get(ch_id) + if not ch_data: + continue + rec = char_plays[uid] + if rec["latest_time"] is None or created_at > rec["latest_time"]: + rec["latest_time"] = created_at + rec["latest_chapter"] = (ch_id, ch_data) + except Exception as ex: + log(f" 警告 {table}: {ex}") + + # 3h. 学习总耗时 + log(" 查询学习耗时...") + for tbl_idx in range(8): + table = f"bi_user_component_play_record_{tbl_idx}" + try: + cur.execute( + f"SELECT user_id, SUM(COALESCE(interval_time,0)) FROM {table} " + f"WHERE user_id = ANY(%s) AND deleted_at IS NULL GROUP BY user_id", + (char_ids,) + ) + for uid, total_ms in cur.fetchall(): + if uid in char_plays: + char_plays[uid]["total_ms"] += (total_ms or 0) + except Exception as ex: + log(f" 警告 {table}: {ex}") + + cur.close() + conn.close() + + # 汇总到 account 级别 + for aid in uid_set: + chars = account_chars.get(aid, []) + best_time = None + best_ch = None + total_ms = 0 + for cid in chars: + play = char_plays.get(cid) + if not play: + continue + if play["latest_chapter"]: + if best_time is None or play["latest_time"] > best_time: + best_time = play["latest_time"] + best_ch = play["latest_chapter"] + total_ms += play["total_ms"] + + info[aid]["lesson_minutes"] = round(total_ms / 60000, 1) + if info[aid]["lesson_minutes"] == int(info[aid]["lesson_minutes"]): + info[aid]["lesson_minutes"] = int(info[aid]["lesson_minutes"]) + + if best_ch: + ch_id, (cl, cs, cu, cl2) = best_ch + info[aid]["lesson_progress"] = f"{cl}-{cs}-{cu}-{cl2}" + info[aid]["lesson_time"] = best_time.strftime("%Y-%m-%d") if best_time else "" + + log(f" 数据库查询完成") + return info + + +# ═══ Step 4: 写入销售三表 H~V 列 ═══ + +def write_sales_sheets(token, all_entries, phone_map, db_info): + now_str = datetime.now().strftime("%Y-%m-%d %H:%M:%S") + + for sid, sname, _ in SALES_SHEETS: + entries = all_entries[sid] + log(f" 写入 {sname} ({sid})...") + + # 按连续行分组 + groups = [] + cur_grp = [] + for e in entries: + if not cur_grp or e["row"] == cur_grp[-1]["row"] + 1: + cur_grp.append(e) + else: + groups.append(cur_grp) + cur_grp = [e] + if cur_grp: + groups.append(cur_grp) + + for g in groups: + sr, er = g[0]["row"], g[-1]["row"] + + d_vals, h_vals, i_vals, j_vals = [], [], [], [] + k_vals, l_vals, m_vals, n_vals = [], [], [], [] + o_vals, p_vals, q_vals, r_vals = [], [], [], [] + s_vals, t_vals, u_vals, v_vals = [], [], [], [] + + for e in g: + phone = e["phone"] + existing = e["existing"] + clue_date = e["clue_date_parsed"] + + # 确定 UID + aid = 0 + uid_str = "" + if re.match(r'^\d{11}$', phone) and phone in phone_map: + uid_str = phone_map[phone] + aid = int(uid_str) + elif existing["H"] and existing["H"].isdigit() and int(existing["H"]) > 0: + uid_str = existing["H"] + aid = int(existing["H"]) + + # H: UID — XXTEA 匹配到就写,否则留空 + if re.match(r'^\d{11}$', phone) and phone in phone_map: + h_vals.append([phone_map[phone]]) + elif re.match(r'^\d{11}$', phone): + h_vals.append([""]) + elif existing["H"] and existing["H"].isdigit(): + h_vals.append([existing["H"]]) + else: + h_vals.append([""]) + + if aid > 0 and aid in db_info: + di = db_info[aid] + + # D: 体验节数 — 只补空 + if existing["D"]: + d_vals.append([existing["D"]]) + else: + tc = di["trial_count"] + d_vals.append([tc if tc > 0 else ""]) + + # I: 注册日 — 只补空 + if existing["I"]: + i_vals.append([existing["I"]]) + else: + i_vals.append([di["reg_date"]]) + + # J: 下载渠道 — 只补空 + if existing["J"]: + j_vals.append([existing["J"]]) + else: + j_vals.append([di["download_channel"]]) + + # 全额退清判定 + gmv_int = int(di["gmv"]) + refund_int = int(di["refund"]) + gsv_int = int(di["gsv"]) + is_full_refund = (gmv_int > 0 and gmv_int == refund_int) + + # K=是: 有订单 且 非全额退清 且 L(下单日) >= C(线索日期) + order_date = di["order_date"] + should_k_yes = di["has_order"] and not is_full_refund + + # 日期比较: L >= C + if should_k_yes and clue_date and order_date: + if order_date < clue_date: + should_k_yes = False + + if is_full_refund: + k_vals.append([""]) + o_vals.append([""]) + p_vals.append([""]) + q_vals.append([""]) + else: + k_vals.append(["是" if should_k_yes else ""]) + o_vals.append([gmv_int if gmv_int > 0 else ""]) + p_vals.append([refund_int if refund_int > 0 else ""]) + q_vals.append([gsv_int if gsv_int > 0 else ""]) + + l_vals.append([order_date]) + m_vals.append([di["order_channel"]]) + n_vals.append([di["product"] if di["has_order"] else ""]) + + act = di["activation"] + if act: + r_vals.append([f"{act}体验课" if act in ("A1", "A2") else act]) + else: + r_vals.append([""]) + + s_vals.append([di["lesson_progress"] if di["lesson_progress"] else ""]) + t_vals.append([di["lesson_time"]]) + lm = di["lesson_minutes"] + u_vals.append([lm if lm > 0 else ""]) + else: + for arr in [d_vals, i_vals, j_vals, k_vals, l_vals, m_vals, n_vals, + o_vals, p_vals, q_vals, r_vals, s_vals, t_vals, u_vals]: + arr.append([""]) + + v_vals.append([now_str]) + + cols = [ + ("D", d_vals), ("H", h_vals), ("I", i_vals), ("J", j_vals), + ("K", k_vals), ("L", l_vals), ("M", m_vals), ("N", n_vals), + ("O", o_vals), ("P", p_vals), ("Q", q_vals), ("R", r_vals), + ("S", s_vals), ("T", t_vals), ("U", u_vals), ("V", v_vals), + ] + for col_letter, vals in cols: + put_values(token, sid, f"{col_letter}{sr}:{col_letter}{er}", vals) + time.sleep(0.1) + + log(f" {sname}: {len(entries)} 行写入完成") + + +# ═══ Step 5: 汇总到「订单汇总」sheet ═══ + +def write_summary_sheet(token, all_entries, phone_map, db_info): + """ + 将三个销售 sheet 中 K=是(已下单)的行汇总到「订单汇总」sheet。 + 订单汇总 sheet 的列结构(A~X): + A: 销售归属, B: 微信昵称, C: 进线日期, D: 体验节数, E: 手机号, + F: 用户年级, G: 课史/跟进, H: 用户ID, I: 注册日期, J: 下载渠道, + K: 是否下单, L: 下单日期, M: 成交渠道, N: 产品, + O: 下单金额(GMV), P: 退款金额, Q: 实际收入(GSV), R: 激活课程, + S: 当前行课进度, T: 最近行课时间, U: 累计学习时长(min), V: 更新时间, + W: 渠道归属(公式), X: 有效成单(公式) + """ + log(" 汇总订单数据...") + + # 收集所有 K=是 的行 + summary_rows = [] + for sid, sname, _ in SALES_SHEETS: + entries = all_entries[sid] + for e in entries: + phone = e["phone"] + existing = e["existing"] + + # 确定 UID 和 db_info + aid = 0 + if re.match(r'^\d{11}$', phone) and phone in phone_map: + aid = int(phone_map[phone]) + elif existing["H"] and existing["H"].isdigit() and int(existing["H"]) > 0: + aid = int(existing["H"]) + + di = db_info.get(aid, {}) if aid > 0 else {} + + # 判断是否下单 + gmv_int = int(di.get("gmv", 0)) + refund_int = int(di.get("refund", 0)) + gsv_int = int(di.get("gsv", 0)) + is_full_refund = (gmv_int > 0 and gmv_int == refund_int) + has_order = di.get("has_order", False) and not is_full_refund + + # 日期比较 + order_date = di.get("order_date", "") + clue_date = e["clue_date_parsed"] + if has_order and clue_date and order_date: + if order_date < clue_date: + has_order = False + + if not has_order: + continue + + # 构建汇总行 + row_data = [ + e["sales"], # A: 销售归属 + e["nickname"], # B: 微信昵称 + e["clue_date"], # C: 进线日期 + di.get("trial_count", 0) or "", # D: 体验节数 + phone, # E: 手机号 + e["grade"], # F: 用户年级 + e["history"], # G: 课史/跟进 + str(aid) if aid > 0 else "", # H: 用户ID + di.get("reg_date", ""), # I: 注册日期 + di.get("download_channel", ""), # J: 下载渠道 + "是", # K: 是否下单 + order_date, # L: 下单日期 + di.get("order_channel", ""), # M: 成交渠道 + di.get("product", ""), # N: 产品 + gmv_int if gmv_int > 0 else "", # O: GMV + refund_int if refund_int > 0 else "", # P: 退款金额 + gsv_int if gsv_int > 0 else "", # Q: GSV + (f"{di['activation']}体验课" if di.get("activation") in ("A1", "A2") else di.get("activation", "")), # R: 激活课程 + di.get("lesson_progress", ""), # S: 行课进度 + di.get("lesson_time", ""), # T: 最近行课时间 + di.get("lesson_minutes", 0) or "", # U: 学习时长 + datetime.now().strftime("%Y-%m-%d %H:%M:%S"), # V: 更新时间 + # W: 渠道归属 — 公式 + # X: 有效成单 — 公式 + ] + summary_rows.append(row_data) + + log(f" 共 {len(summary_rows)} 条下单记录待汇总") + + if not summary_rows: + log(" 无下单记录,跳过汇总") + return + + # 写入订单汇总 sheet(从第3行开始,覆盖 A~V 列,W/X 列保留公式) + now_str = datetime.now().strftime("%Y-%m-%d %H:%M:%S") + + # 分批写入,每批最多 500 行 + chunk_size = 500 + for chunk_start in range(0, len(summary_rows), chunk_size): + chunk = summary_rows[chunk_start:chunk_start + chunk_size] + start_row = chunk_start + 3 # 从第3行开始 + + # 构建 A~V 的值数组(22列) + values = [] + for row_data in chunk: + # 确保每行22列(A~V) + padded = row_data[:22] + while len(padded) < 22: + padded.append("") + values.append(padded) + + range_str = f"A{start_row}:V{start_row + len(chunk) - 1}" + put_values(token, SUMMARY_SHEET_ID, range_str, values) + time.sleep(0.2) + log(f" 写入 A{start_row}:V{start_row + len(chunk) - 1} ({len(chunk)}行)") + + log(f" 订单汇总写入完成, 共 {len(summary_rows)} 行") + + +# ═══ Main ═══ + +def main(): + log("=" * 60) + log("销售线索全量刷新 (XXTEA精确匹配版) 启动") + + try: + token = get_fs_token() + + log("Step 1: 解析销售三表") + all_entries = parse_sales_sheets(token) + + log("Step 2: XXTEA 加密 → PG tel_encrypt 精确匹配") + phone_map = phone_to_uid_xxtea(all_entries) + + log("Step 3: PostgreSQL 批量查询") + db_info = query_all_pg(all_entries, phone_map) + + log("Step 4: 写入销售三表 H~V 列") + write_sales_sheets(token, all_entries, phone_map, db_info) + + log("Step 5: 汇总到「订单汇总」sheet") + write_summary_sheet(token, all_entries, phone_map, db_info) + + log("✅ 全量刷新完成") + return 0 + except Exception as e: + log(f"❌ ERROR: {e}") + import traceback + traceback.print_exc() + return 1 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/scripts/ti_pool_split_20260608.py b/scripts/ti_pool_split_20260608.py new file mode 100644 index 0000000..76197c4 --- /dev/null +++ b/scripts/ti_pool_split_20260608.py @@ -0,0 +1,146 @@ +#!/usr/bin/env python3 +""" +TI沉淀/新进拆分 - 线索明细补全 +需求:基于 xiaoxi_xhs_lead_detail.csv 进线月 3-5 月,join bi_vala_app_account 补全字段 +输出:ti_pool_split_20260608.csv +""" + +import csv +import psycopg2 +from datetime import datetime, date +from collections import defaultdict + +# ── 1. 读取线索明细 ────────────────────────────────────────── +leads = [] +with open('/root/.openclaw/workspace/tmp/xiaoxi_xhs_lead_detail.csv', 'r', encoding='utf-8') as f: + reader = csv.DictReader(f) + for row in reader: + intake_month = row.get('进线月', '').strip() + if intake_month in ('2026-03', '2026-04', '2026-05'): + leads.append(row) + +print(f"进线月 3-5 月共 {len(leads)} 条线索") + +# ── 2. 收集所有用户ID ──────────────────────────────────────── +user_ids = set() +for row in leads: + uid = row.get('用户ID', '').strip() + if uid and uid.isdigit(): + user_ids.add(int(uid)) + +print(f"去重用户ID: {len(user_ids)} 个") + +# ── 3. 查询 bi_vala_app_account ────────────────────────────── +conn = psycopg2.connect( + host='bj-postgres-16pob4sg.sql.tencentcdb.com', + port=28591, + user='ai_member', + password='LdfjdjL83h3h3^$&**YGG*', + dbname='vala_bi' +) +cur = conn.cursor() + +# 批量查询 +account_map = {} # id -> {created_at, key_from} +batch_size = 500 +uid_list = list(user_ids) +for i in range(0, len(uid_list), batch_size): + batch = uid_list[i:i+batch_size] + placeholders = ','.join(['%s'] * len(batch)) + cur.execute(f""" + SELECT id, created_at, key_from + FROM bi_vala_app_account + WHERE id IN ({placeholders}) + """, batch) + for row in cur.fetchall(): + account_map[row[0]] = { + 'created_at': row[1], + 'key_from': row[2] or '' + } + +print(f"bi_vala_app_account 匹配到 {len(account_map)} 个用户") + +# ── 4. 计算 prior_lead_same_phone ──────────────────────────── +# 读取全部线索(含2月/6月),按手机号找最早进线月 +all_leads = [] +with open('/root/.openclaw/workspace/tmp/xiaoxi_xhs_lead_detail.csv', 'r', encoding='utf-8') as f: + reader = csv.DictReader(f) + for row in reader: + all_leads.append(row) + +phone_earliest_month = {} +for row in all_leads: + phone = row.get('手机号', '').strip() + intake_month = row.get('进线月', '').strip() + if phone and intake_month: + if phone not in phone_earliest_month or intake_month < phone_earliest_month[phone]: + phone_earliest_month[phone] = intake_month + +print(f"去重手机号: {len(phone_earliest_month)} 个") + +# ── 5. 组装输出 ────────────────────────────────────────────── +output_rows = [] +for row in leads: + uid_str = row.get('用户ID', '').strip() + uid = int(uid_str) if uid_str.isdigit() else None + phone = row.get('手机号', '').strip() + intake_month = row.get('进线月', '').strip() + intake_date_str = row.get('进线日期', '').strip() + + acct = account_map.get(uid, {}) + create_time = acct.get('created_at') + key_from = acct.get('key_from', '') + + # register_before_intake: 注册是否早于进线月第一天 + register_before_intake = '' + days_register_to_intake = '' + if create_time and intake_month: + try: + intake_first_day = datetime.strptime(intake_month + '-01', '%Y-%m-%d').date() + create_date = create_time.date() if hasattr(create_time, 'date') else datetime.strptime(str(create_time)[:10], '%Y-%m-%d').date() + register_before_intake = '是' if create_date < intake_first_day else '否' + # days from registration to intake date + if intake_date_str: + intake_date = datetime.strptime(intake_date_str, '%Y-%m-%d').date() + days_register_to_intake = (intake_date - create_date).days + except: + pass + + # prior_lead_same_phone: 进线月前是否另有同手机留资 + prior_lead_same_phone = '' + if phone and intake_month and phone in phone_earliest_month: + earliest = phone_earliest_month[phone] + prior_lead_same_phone = '是' if earliest < intake_month else '否' + + out = dict(row) + out['create_time'] = str(create_time)[:19] if create_time else '' + out['key_from'] = key_from + out['register_before_intake'] = register_before_intake + out['days_register_to_intake'] = days_register_to_intake + out['prior_lead_same_phone'] = prior_lead_same_phone + output_rows.append(out) + +# ── 6. 写 CSV ──────────────────────────────────────────────── +fieldnames = list(leads[0].keys()) + ['create_time', 'key_from', 'register_before_intake', 'days_register_to_intake', 'prior_lead_same_phone'] + +with open('/root/.openclaw/workspace/output/ti_pool_split_20260608.csv', 'w', encoding='utf-8-sig', newline='') as f: + writer = csv.DictWriter(f, fieldnames=fieldnames) + writer.writeheader() + writer.writerows(output_rows) + +print(f"\n输出: output/ti_pool_split_20260608.csv, {len(output_rows)} 行") + +# ── 7. 统计摘要 ────────────────────────────────────────────── +reg_before = sum(1 for r in output_rows if r['register_before_intake'] == '是') +reg_after = sum(1 for r in output_rows if r['register_before_intake'] == '否') +reg_unknown = sum(1 for r in output_rows if r['register_before_intake'] == '') +prior_yes = sum(1 for r in output_rows if r['prior_lead_same_phone'] == '是') +prior_no = sum(1 for r in output_rows if r['prior_lead_same_phone'] == '否') +prior_unknown = sum(1 for r in output_rows if r['prior_lead_same_phone'] == '') + +print(f"\n── 统计摘要 ──") +print(f"注册早于进线月: 是={reg_before}, 否={reg_after}, 未知={reg_unknown}") +print(f"同手机号早前进线: 是={prior_yes}, 否={prior_no}, 未知={prior_unknown}") + +cur.close() +conn.close() diff --git a/scripts/ti_pool_split_20260608_v2.py b/scripts/ti_pool_split_20260608_v2.py new file mode 100644 index 0000000..f7c6eb0 --- /dev/null +++ b/scripts/ti_pool_split_20260608_v2.py @@ -0,0 +1,183 @@ +#!/usr/bin/env python3 +""" +TI沉淀/新进拆分 - 线索明细补全 v2 +基于 xiaoxi_xhs_lead_detail.csv 进线月 3-5 月,join bi_vala_app_account +Join 键:手机号→tel(主),UID→id(备) +输出:ti_pool_split_20260608_v2.csv +""" + +import csv +import psycopg2 +from datetime import datetime, date +from collections import defaultdict + +# ── 1. 读取线索明细 ────────────────────────────────────────── +leads = [] +with open('/root/.openclaw/workspace/tmp/xiaoxi_xhs_lead_detail.csv', 'r', encoding='utf-8') as f: + reader = csv.DictReader(f) + for row in reader: + intake_month = row.get('进线月', '').strip() + if intake_month in ('2026-03', '2026-04', '2026-05'): + leads.append(row) + +print(f"进线月 3-5 月共 {len(leads)} 条线索") + +# ── 2. 收集所有手机号和用户ID ──────────────────────────────── +phones = set() +user_ids = set() +for row in leads: + phone = row.get('手机号', '').strip() + uid_str = row.get('用户ID', '').strip() + if phone: + phones.add(phone) + if uid_str and uid_str.isdigit(): + user_ids.add(int(uid_str)) + +print(f"去重手机号: {len(phones)} 个, 去重用户ID: {len(user_ids)} 个") + +# ── 3. 查询 bi_vala_app_account ────────────────────────────── +conn = psycopg2.connect( + host='bj-postgres-16pob4sg.sql.tencentcdb.com', + port=28591, + user='ai_member', + password='LdfjdjL83h3h3^$&**YGG*', + dbname='vala_bi' +) +cur = conn.cursor() + +# 3a. 按 tel 匹配 +tel_map = {} # tel -> {id, created_at, key_from} +batch_size = 500 +phone_list = list(phones) +for i in range(0, len(phone_list), batch_size): + batch = phone_list[i:i+batch_size] + placeholders = ','.join(['%s'] * len(batch)) + cur.execute(f""" + SELECT id, tel, created_at, key_from + FROM bi_vala_app_account + WHERE tel IN ({placeholders}) + """, batch) + for row in cur.fetchall(): + tel = row[1] + if tel: + tel_map[tel] = {'id': row[0], 'created_at': row[2], 'key_from': row[3] or ''} + +print(f"tel 匹配到 {len(tel_map)} 个用户") + +# 3b. 按 id 匹配(补充未通过 tel 匹配到的) +uid_list = list(user_ids) +id_map = {} # id -> {created_at, key_from} +for i in range(0, len(uid_list), batch_size): + batch = uid_list[i:i+batch_size] + placeholders = ','.join(['%s'] * len(batch)) + cur.execute(f""" + SELECT id, created_at, key_from + FROM bi_vala_app_account + WHERE id IN ({placeholders}) + """, batch) + for row in cur.fetchall(): + id_map[row[0]] = {'created_at': row[1], 'key_from': row[2] or ''} + +print(f"id 匹配到 {len(id_map)} 个用户") + +# ── 4. 计算 prior_lead_same_phone ──────────────────────────── +all_leads = [] +with open('/root/.openclaw/workspace/tmp/xiaoxi_xhs_lead_detail.csv', 'r', encoding='utf-8') as f: + reader = csv.DictReader(f) + for row in reader: + all_leads.append(row) + +phone_earliest_month = {} +for row in all_leads: + phone = row.get('手机号', '').strip() + intake_month = row.get('进线月', '').strip() + if phone and intake_month: + if phone not in phone_earliest_month or intake_month < phone_earliest_month[phone]: + phone_earliest_month[phone] = intake_month + +print(f"全部线索去重手机号: {len(phone_earliest_month)} 个") + +# ── 5. 组装输出 ────────────────────────────────────────────── +output_rows = [] +for row in leads: + phone = row.get('手机号', '').strip() + uid_str = row.get('用户ID', '').strip() + uid = int(uid_str) if uid_str.isdigit() else None + intake_month = row.get('进线月', '').strip() + intake_date_str = row.get('进线日期', '').strip() + + # Join 逻辑:优先 tel,备 id + acct = None + join_method = '' + if phone and phone in tel_map: + acct = tel_map[phone] + join_method = 'tel' + elif uid and uid in id_map: + acct = id_map[uid] + join_method = 'id' + + create_time = acct['created_at'] if acct else None + key_from = acct['key_from'] if acct else '' + + # register_before_intake: Y/N + register_before_intake = '' + days_register_to_intake = '' + if create_time and intake_month: + try: + intake_first_day = datetime.strptime(intake_month + '-01', '%Y-%m-%d').date() + create_date = create_time.date() if hasattr(create_time, 'date') else datetime.strptime(str(create_time)[:10], '%Y-%m-%d').date() + register_before_intake = 'Y' if create_date < intake_first_day else 'N' + if intake_date_str: + intake_date = datetime.strptime(intake_date_str, '%Y-%m-%d').date() + days_register_to_intake = (intake_date - create_date).days + except: + pass + + # prior_lead_same_phone: Y/N + prior_lead_same_phone = '' + if phone and intake_month and phone in phone_earliest_month: + earliest = phone_earliest_month[phone] + prior_lead_same_phone = 'Y' if earliest < intake_month else 'N' + + out = dict(row) + out['create_time'] = str(create_time)[:19] if create_time else '' + out['key_from'] = key_from + out['register_before_intake'] = register_before_intake + out['days_register_to_intake'] = days_register_to_intake + out['prior_lead_same_phone'] = prior_lead_same_phone + out['join_method'] = join_method + output_rows.append(out) + +# ── 6. 写 CSV ──────────────────────────────────────────────── +fieldnames = list(leads[0].keys()) + ['create_time', 'key_from', 'register_before_intake', 'days_register_to_intake', 'prior_lead_same_phone', 'join_method'] + +with open('/root/.openclaw/workspace/output/ti_pool_split_20260608_v2.csv', 'w', encoding='utf-8-sig', newline='') as f: + writer = csv.DictWriter(f, fieldnames=fieldnames) + writer.writeheader() + writer.writerows(output_rows) + +print(f"\n输出: output/ti_pool_split_20260608_v2.csv, {len(output_rows)} 行") + +# ── 7. 统计摘要 ────────────────────────────────────────────── +reg_y = sum(1 for r in output_rows if r['register_before_intake'] == 'Y') +reg_n = sum(1 for r in output_rows if r['register_before_intake'] == 'N') +reg_empty = sum(1 for r in output_rows if r['register_before_intake'] == '') +prior_y = sum(1 for r in output_rows if r['prior_lead_same_phone'] == 'Y') +prior_n = sum(1 for r in output_rows if r['prior_lead_same_phone'] == 'N') +prior_empty = sum(1 for r in output_rows if r['prior_lead_same_phone'] == '') +tel_join = sum(1 for r in output_rows if r['join_method'] == 'tel') +id_join = sum(1 for r in output_rows if r['join_method'] == 'id') +no_join = sum(1 for r in output_rows if r['join_method'] == '') + +print(f"\n── 统计摘要 ──") +print(f"Join方式: tel={tel_join}, id={id_join}, 未匹配={no_join}") +print(f"注册早于进线月: Y={reg_y}, N={reg_n}, 空={reg_empty}") +print(f"同手机号早前进线: Y={prior_y}, N={prior_n}, 空={prior_empty}") + +# 进线月分布 +from collections import Counter +months = Counter(r['进线月'] for r in output_rows) +print(f"进线月分布: {dict(months)}") + +cur.close() +conn.close() diff --git a/scripts/ti_pool_split_20260608_v3.py b/scripts/ti_pool_split_20260608_v3.py new file mode 100644 index 0000000..312deb2 --- /dev/null +++ b/scripts/ti_pool_split_20260608_v3.py @@ -0,0 +1,197 @@ +#!/usr/bin/env python3 +""" +TI沉淀/新进拆分 - 线索明细补全 v3 +基于 xiaoxi_xhs_lead_detail.csv 进线月 3-5 月,join bi_vala_app_account +Join 键:手机号 XXTEA 加密 → tel_encrypt(主),UID→id(备) +输出:ti_pool_split_20260608_v3.csv +""" + +import csv +import sys +sys.path.insert(0, '/root/.openclaw/workspace/scripts') +from phone_encrypt import encrypt_phone +import psycopg2 +from datetime import datetime +from collections import defaultdict + +# ── 1. 读取线索明细 ────────────────────────────────────────── +leads = [] +with open('/root/.openclaw/workspace/tmp/xiaoxi_xhs_lead_detail.csv', 'r', encoding='utf-8') as f: + reader = csv.DictReader(f) + for row in reader: + intake_month = row.get('进线月', '').strip() + if intake_month in ('2026-03', '2026-04', '2026-05'): + leads.append(row) + +print(f"进线月 3-5 月共 {len(leads)} 条线索") + +# ── 2. 收集所有手机号和用户ID ──────────────────────────────── +phones = set() +user_ids = set() +for row in leads: + phone = row.get('手机号', '').strip() + uid_str = row.get('用户ID', '').strip() + if phone: + phones.add(phone) + if uid_str and uid_str.isdigit(): + user_ids.add(int(uid_str)) + +print(f"去重手机号: {len(phones)} 个, 去重用户ID: {len(user_ids)} 个") + +# ── 3. 加密手机号 ──────────────────────────────────────────── +phone_enc_map = {} # 密文 -> 明文 +for p in phones: + try: + enc = encrypt_phone(p) + phone_enc_map[enc] = p + except Exception as e: + print(f" 加密失败: {p}: {e}") + +print(f"加密成功: {len(phone_enc_map)} 个") + +# ── 4. 查询 bi_vala_app_account ────────────────────────────── +conn = psycopg2.connect( + host='bj-postgres-16pob4sg.sql.tencentcdb.com', + port=28591, + user='ai_member', + password='LdfjdjL83h3h3^$&**YGG*', + dbname='vala_bi' +) +cur = conn.cursor() + +# 4a. 按 tel_encrypt 匹配 +enc_map = {} # 明文phone -> {id, created_at, key_from} +batch_size = 500 +enc_list = list(phone_enc_map.keys()) +for i in range(0, len(enc_list), batch_size): + batch = enc_list[i:i+batch_size] + placeholders = ','.join(['%s'] * len(batch)) + cur.execute(f""" + SELECT id, tel_encrypt, created_at, key_from + FROM bi_vala_app_account + WHERE tel_encrypt IN ({placeholders}) + """, batch) + for row in cur.fetchall(): + enc_val = row[1] + if enc_val in phone_enc_map: + plain = phone_enc_map[enc_val] + enc_map[plain] = {'id': row[0], 'created_at': row[2], 'key_from': row[3] or ''} + +print(f"tel_encrypt 匹配到 {len(enc_map)} 个用户") + +# 4b. 按 id 匹配(补充) +uid_list = list(user_ids) +id_map = {} +for i in range(0, len(uid_list), batch_size): + batch = uid_list[i:i+batch_size] + placeholders = ','.join(['%s'] * len(batch)) + cur.execute(f""" + SELECT id, created_at, key_from + FROM bi_vala_app_account + WHERE id IN ({placeholders}) + """, batch) + for row in cur.fetchall(): + id_map[row[0]] = {'created_at': row[1], 'key_from': row[2] or ''} + +print(f"id 匹配到 {len(id_map)} 个用户") + +# ── 5. 计算 prior_lead_same_phone ──────────────────────────── +all_leads = [] +with open('/root/.openclaw/workspace/tmp/xiaoxi_xhs_lead_detail.csv', 'r', encoding='utf-8') as f: + reader = csv.DictReader(f) + for row in reader: + all_leads.append(row) + +phone_earliest_month = {} +for row in all_leads: + phone = row.get('手机号', '').strip() + intake_month = row.get('进线月', '').strip() + if phone and intake_month: + if phone not in phone_earliest_month or intake_month < phone_earliest_month[phone]: + phone_earliest_month[phone] = intake_month + +print(f"全部线索去重手机号: {len(phone_earliest_month)} 个") + +# ── 6. 组装输出 ────────────────────────────────────────────── +output_rows = [] +for row in leads: + phone = row.get('手机号', '').strip() + uid_str = row.get('用户ID', '').strip() + uid = int(uid_str) if uid_str.isdigit() else None + intake_month = row.get('进线月', '').strip() + intake_date_str = row.get('进线日期', '').strip() + + # Join 逻辑:优先 tel_encrypt,备 id + acct = None + join_method = '' + if phone and phone in enc_map: + acct = enc_map[phone] + join_method = 'tel' + elif uid and uid in id_map: + acct = id_map[uid] + join_method = 'id' + + create_time = acct['created_at'] if acct else None + key_from = acct['key_from'] if acct else '' + + # register_before_intake: Y/N + register_before_intake = '' + days_register_to_intake = '' + if create_time and intake_month: + try: + intake_first_day = datetime.strptime(intake_month + '-01', '%Y-%m-%d').date() + create_date = create_time.date() if hasattr(create_time, 'date') else datetime.strptime(str(create_time)[:10], '%Y-%m-%d').date() + register_before_intake = 'Y' if create_date < intake_first_day else 'N' + if intake_date_str: + intake_date = datetime.strptime(intake_date_str, '%Y-%m-%d').date() + days_register_to_intake = (intake_date - create_date).days + except: + pass + + # prior_lead_same_phone: Y/N + prior_lead_same_phone = '' + if phone and intake_month and phone in phone_earliest_month: + earliest = phone_earliest_month[phone] + prior_lead_same_phone = 'Y' if earliest < intake_month else 'N' + + out = dict(row) + out['create_time'] = str(create_time)[:19] if create_time else '' + out['key_from'] = key_from + out['register_before_intake'] = register_before_intake + out['days_register_to_intake'] = days_register_to_intake + out['prior_lead_same_phone'] = prior_lead_same_phone + out['join_method'] = join_method + output_rows.append(out) + +# ── 7. 写 CSV ──────────────────────────────────────────────── +fieldnames = list(leads[0].keys()) + ['create_time', 'key_from', 'register_before_intake', 'days_register_to_intake', 'prior_lead_same_phone', 'join_method'] + +with open('/root/.openclaw/workspace/output/ti_pool_split_20260608_v3.csv', 'w', encoding='utf-8-sig', newline='') as f: + writer = csv.DictWriter(f, fieldnames=fieldnames) + writer.writeheader() + writer.writerows(output_rows) + +print(f"\n输出: output/ti_pool_split_20260608_v3.csv, {len(output_rows)} 行") + +# ── 8. 统计摘要 ────────────────────────────────────────────── +reg_y = sum(1 for r in output_rows if r['register_before_intake'] == 'Y') +reg_n = sum(1 for r in output_rows if r['register_before_intake'] == 'N') +reg_empty = sum(1 for r in output_rows if r['register_before_intake'] == '') +prior_y = sum(1 for r in output_rows if r['prior_lead_same_phone'] == 'Y') +prior_n = sum(1 for r in output_rows if r['prior_lead_same_phone'] == 'N') +prior_empty = sum(1 for r in output_rows if r['prior_lead_same_phone'] == '') +tel_join = sum(1 for r in output_rows if r['join_method'] == 'tel') +id_join = sum(1 for r in output_rows if r['join_method'] == 'id') +no_join = sum(1 for r in output_rows if r['join_method'] == '') + +print(f"\n── 统计摘要 ──") +print(f"Join方式: tel={tel_join}, id={id_join}, 未匹配={no_join}") +print(f"注册早于进线月: Y={reg_y}, N={reg_n}, 空={reg_empty}") +print(f"同手机号早前进线: Y={prior_y}, N={prior_n}, 空={prior_empty}") + +from collections import Counter +months = Counter(r['进线月'] for r in output_rows) +print(f"进线月分布: {dict(months)}") + +cur.close() +conn.close() diff --git a/scripts/verify_disputed_orders.py b/scripts/verify_disputed_orders.py new file mode 100644 index 0000000..89ddc27 --- /dev/null +++ b/scripts/verify_disputed_orders.py @@ -0,0 +1,168 @@ +#!/usr/bin/env python3 +""" +争议订单核实 — 回写处理状态到飞书表格 +""" +import json, requests, os + +CRED_DIR = "/root/.openclaw/credentials/xiaoxi" +WORKSPACE = os.path.dirname(os.path.abspath(__file__)).rsplit('/', 1)[0] + +def get_secret(key): + with open(os.path.join(WORKSPACE, "secrets.env")) as f: + for line in f: + if line.startswith(f"{key}="): + return line.strip().split("=", 1)[1].strip("'\"") + +def get_fs_token(): + with open(os.path.join(CRED_DIR, "config.json")) as f: + cfg = json.load(f) + resp = requests.post( + "https://open.feishu.cn/open-apis/auth/v3/tenant_access_token/internal", + json={"app_id": cfg["apps"][0]["appId"], "app_secret": cfg["apps"][0]["appSecret"]}, + timeout=15 + ) + return resp.json()["tenant_access_token"] + +SPREADSHEET_TOKEN = "NoZqsFi47hIOHEt9j8WcfRtbnug" +SHEET_ID = "1jpQNa" + +def put_values(token, range_str, values): + url = f"https://open.feishu.cn/open-apis/sheets/v2/spreadsheets/{SPREADSHEET_TOKEN}/values" + body = {"valueRange": {"range": f"{SHEET_ID}!{range_str}", "values": values}} + resp = requests.put(url, headers={ + "Authorization": f"Bearer {token}", + "Content-Type": "application/json" + }, json=body, timeout=30) + r = resp.json() + if r.get("code") != 0: + print(f" ❌ {range_str}: {r.get('code')} {r.get('msg')}") + return False + return True + +# 核实结果: (seq, 处理状态, 备注) +# A类12单: 全部口径OK — DB GSV = 旧表金额 +# G类6单: BotQ已匹配DB GSV,旧表金额有误 +# B类8单: DB零单 +# C类7单: 未注册/已注册无订单 +# D类6单: 待销售确认C/L +# E类19单: 待确认是否补行 +# H类1单: DB GSV=1999 ≠ BotQ=3598 + +RESULTS = [ + # A类 + (1, "口径OK", "DB GSV=3598=旧表3598,旧表记GSV非GMV,非争议"), + (2, "口径OK", "DB GSV=3598=旧表3598,旧表记GSV非GMV,非争议"), + (3, "口径OK", "DB GSV=3598=旧表3598,旧表记GSV非GMV,非争议"), + (4, "口径OK", "DB GSV=3598=旧表3598,旧表记GSV非GMV,非争议"), + (5, "口径OK", "DB GSV=599=旧表599,旧表记GSV非GMV,非争议"), + (6, "口径OK", "DB GSV=3598=旧表3598,旧表记GSV非GMV,非争议"), + (7, "口径OK", "DB GSV=3598=旧表3598,旧表记GSV非GMV,非争议"), + (8, "口径OK", "DB GSV=3598=旧表3598,旧表记GSV非GMV,非争议"), + (9, "口径OK", "DB GSV=3598=旧表3598,旧表记GSV非GMV,非争议"), + (10, "口径OK", "DB GSV=1999=旧表1999,旧表记GSV非GMV,非争议"), + (11, "口径OK", "DB GSV=1999=旧表1999,旧表记GSV非GMV,非争议"), + (12, "口径OK", "DB GSV=3598=旧表3598,旧表记GSV非GMV,非争议"), + # G类 — 全部BotQ已匹配DB GSV + (13, "口径OK", "DB GSV=1999=BotQ=1999,旧表3598为GMV;Bot已正确"), + (14, "口径OK", "DB GSV=3598=BotQ=3598,旧表1999为部分金额;Bot已正确(3笔: 退3598+退1999+正常3598)"), + (15, "口径OK", "DB GSV=7196=BotQ=7196,旧表3598为部分金额;Bot已正确(2笔各3598)"), + (16, "口径OK", "DB GSV=2098=BotQ=2098,旧表599为部分金额;Bot已正确(端内599+1499)"), + (17, "口径OK", "DB GSV=3598=BotQ=3598,旧表1999为部分金额;Bot已正确(1999正常+3598退1999)"), + (18, "口径OK", "DB GSV=7196=BotQ=7196,旧表1999为部分金额;Bot已正确(2笔各3598)"), + # B类 — 全部DB零单 + (19, "DB零单", "UID=17581 DB已注册但无任何订单,UID可能录错或虚假"), + (20, "DB零单", "UID=28808 DB已注册但无任何订单,UID可能录错或虚假"), + (21, "DB零单", "UID=26398 DB已注册但无任何订单,UID可能录错或虚假"), + (22, "DB零单", "UID=19504 DB已注册但无任何订单,UID可能录错或虚假"), + (23, "DB零单", "UID=26857 DB已注册但无任何订单,UID可能录错或虚假"), + (24, "DB零单", "UID=25717 DB已注册但无任何订单,UID可能录错或虚假"), + (25, "DB零单", "UID=18808 DB已注册但无任何订单,UID可能录错或虚假"), + (26, "DB零单", "UID=18577 DB已注册但无任何订单,UID可能录错或虚假"), + # C类 + (27, "未注册", "无UID,需补注册/手机号/班主任凭证"), + (28, "未注册", "无UID(E列待补号),需补注册/手机号/班主任凭证"), + (29, "未注册", "无UID(E列空),需补注册/手机号/班主任凭证"), + (30, "未注册", "无UID(E列空),且L