auto-sync: kids-english-script-production 2026-04-01

2026-04-01 23:17:01 +08:00 · 2026-04-01 23:17:01 +08:00 · d401f344af
commit d401f344af
12 changed files with 514 additions and 0 deletions
--- a/kids-english-script-production/.DS_Store
+++ b/kids-english-script-production/.DS_Store
--- a/kids-english-script-production/SKILL.md
+++ b/kids-english-script-production/SKILL.md
@ -0,0 +1,50 @@
+---
+name: kids-english-script-production
+description: 4-8岁儿童英文台词标准化生产工具，支持纯中文/纯英文/中英混合任意输入，自动生成符合难度要求、自然地道的分级英文台词，内置全流程生产规则。典型使用场景：动画/课程台词批量生产、已有剧本难度适配、中英混合剧本标准化翻译、台词自动审校。
+---
+
+# 儿童英文台词生产技能
+
+## 核心功能
+- ✅ 任意输入归一：支持纯中文/纯英文/中英混合剧本输入，自动转成标准格式，100%保留剧情信息
+- ✅ 自动AR预处理：内置7条拆句规则+4个保留机制，自动降级复杂内容不丢剧情
+- ✅ 分级生成：支持4个难度等级（S1-S4），完全匹配4-8岁不同水平儿童
+- ✅ 自动校验：内置四层合规校验+L1核心词表白名单校验，输出结果100%符合生产标准
+- ✅ 超纲词自动提醒：S1/S2阶段自动识别超纲词汇并高亮提示，审校效率翻倍
+- ✅ 自然化优化：自动优化情绪词/长句拆分/同义口语替换，无翻译腔，符合儿童表达习惯，100%忠于原剧本无新增删减
+- ✅ 科幻词自动降级：内置可配置科幻词映射表，自动把复杂科幻词汇转换成儿童易懂表达
+- ✅ 批量处理：支持单个文件/目录批量处理，自动保存结果到指定路径
+
+## 执行流程
+1. 输入解析：根据输入类型（文本/文件/目录）加载待处理内容
+2. 输入归一：统一转成标准中文「角色: 台词」格式，保留所有核心剧情信息
+3. AR预处理：按规则拆分复杂句子，打AR等级，过滤超纲认知内容
+4. 分级生成：按目标Stage的词汇/句法/句长要求生成地道英文台词
+5. 自动校验：检查AR等级、难度、自然度、内容合规性
+6. 输出结果：支持控制台打印或保存到指定目录
+
+## 参数说明
+| 参数 | 必须 | 格式 | 说明 |
+| ---- | ---- | ---- | ---- |
+| --input | 二选一 | 字符串 | 直接输入待处理的剧本文本 |
+| --path | 二选一 | 文件/目录路径 | 待处理的单个剧本txt文件，或包含多个txt剧本的目录（批量处理） |
+| --stage | 是 | S1/S2/S3/S4 | 目标难度等级：<br>S1=4-5岁零基础<br>S2=5-6岁入门<br>S3=6-7岁进阶<br>S4=7-8岁提升 |
+| --output | 否 | 目录路径 | 结果输出目录，指定后自动保存所有结果到该目录，不指定则直接打印到控制台 |
+
+## 错误处理规则
+- 输入路径不存在/无txt文件：直接报错退出，给出明确提示
+- 配置文件加载失败：报错退出，提示检查配置文件格式
+- LLM调用失败：单个剧本生成失败不影响其他批量任务，给出错误提示
+- 参数不合法：直接输出参数说明，提示正确用法
+
+## 使用示例
+```bash
+# 1. 直接输入文本生成，结果打印到控制台
+openclaw skill run kids-english-script-production --input "角色A: 光有水不行，得先拿上毛巾。角色B: 好的，我现在去拿" --stage S2
+
+# 2. 处理单个文件，结果保存到output目录
+openclaw skill run kids-english-script-production --path ./script.txt --stage S3 --output ./result
+
+# 3. 批量处理目录下所有txt剧本，结果保存到output目录
+openclaw skill run kids-english-script-production --path ./scripts_dir --stage S1 --output ./batch_result
+```
--- a/kids-english-script-production/assets/expression_map.yaml
+++ b/kids-english-script-production/assets/expression_map.yaml
@ -0,0 +1,27 @@
+# 可自定义的表达映射表，教研老师可直接修改，无需动代码
+# 情绪词映射：中文情绪表达 → 对应地道英文儿童口语表达
+emotion_map:
+  天呐: Oh my
+  呜呼: Woo-hoo
+  太棒了: Awesome
+  哇: Wow
+  哦不: Oh no
+  耶: Yay
+  嘿: Hey
+  等等: Wait
+
+# 同义替换表：标准表达 → 更口语化的儿童表达（同义替换，不改变原意）
+synonym_replace:
+  it is so nice: it is beautiful
+  Do you?: Wanna see?
+  Let's get in: Here we go
+  Let's start: Let's go
+  very good: Great
+  I like it: I love it
+  very fast: So fast
+  very slow: So slow
+
+# 拆分规则：需要拆成两句的常见长句规则
+split_rules:
+  - 包含两个动作的句子自动拆分
+  - 包含"，"的短句优先拆分为单信息句
--- a/kids-english-script-production/assets/prompt_config.yaml
+++ b/kids-english-script-production/assets/prompt_config.yaml
@ -0,0 +1,21 @@
+# Prompt生成配置，教研老师可自定义调整生成效果，无需改代码
+# 生成温度：越高越灵活，越低越严格遵守规则
+temperature:
+  S1: 0.2
+  S2: 0.3
+  S3: 0.4
+  S4: 0.5
+
+# 自然化开关
+naturalization:
+  enable_emotion_word: true # 是否启用情绪词映射
+  enable_synonym_replace: true # 是否启用同义口语替换
+  enable_long_sentence_split: true # 是否启用长句拆分
+  enable_exclamation_mark: true # 是否给情绪强烈的句子加感叹号
+  allow_repeat_expression: true # 是否允许自然重复（如It is dirty. Very dirty.）
+
+# 剧本忠实度开关（核心规则，谨慎修改）
+script_fidelity:
+  strictly_no_add: true # 100%禁止新增原剧本没有的内容
+  strictly_no_delete: true # 100%禁止删除原剧本已有的内容
+  allow_detail_optimization: true # 允许同义细节优化（不改变核心信息）
--- a/kids-english-script-production/assets/sci_fi_map.yaml
+++ b/kids-english-script-production/assets/sci_fi_map.yaml
@ -0,0 +1,14 @@
+# 科幻词汇降级映射表，可随时更新无需修改代码
+energy core: "a bright red light inside the robot"
+system error: "the robot cannot work because something inside is wrong"
+malfunction: "the robot stops and will not move"
+space station: "a big house in space"
+orbit shift: "the ship goes the wrong way in space"
+radiation leak: "a bad light that can hurt people"
+shield generator: "a big machine that makes us safe"
+AI control room: "a smart room that tells the robots what to do"
+emergency evacuation: "we all have to leave this place very fast"
+life support system: "the part that gives us air and keeps us alive"
+gravity failure: "there is no pull, so we all float"
+communication signal lost: "we cannot talk to them anymore"
+explosion: "a big boom"
--- a/kids-english-script-production/assets/skill.yml
+++ b/kids-english-script-production/assets/skill.yml
@ -0,0 +1,27 @@
+name: kids-english-script-production
+description: 4-8岁儿童英文台词标准化生产工具，支持纯中/纯英/中英混合输入，自动生成分级合规地道台词
+version: 1.1.0
+author: shark
+entry: python3 scripts/gen_script.py
+parameters:
+  - name: input
+    type: string
+    description: 直接输入待处理的剧本文本
+    required: false
+  - name: path
+    type: string
+    description: 待处理的单个剧本文件路径或包含多个剧本的目录路径
+    required: false
+  - name: stage
+    type: string
+    description: 目标难度等级 S1/S2/S3/S4
+    required: true
+  - name: output
+    type: string
+    description: 结果输出目录，指定后自动保存结果
+    required: false
+tags:
+  - 内容生产
+  - 英语课程
+  - 台词生成
+  - 批量处理
--- a/kids-english-script-production/assets/stage_config.yaml
+++ b/kids-english-script-production/assets/stage_config.yaml
@ -0,0 +1,17 @@
+# 各Stage难度配置，可随时调整无需修改代码
+S1:
+  age: "4-5岁"
+  lexile: "≤200L"
+  rules: "词汇90%+Starters核心词，禁止抽象词（fix/before/finish等）；仅简单句（This is/It is/I/We开头），无连词从句，仅用一般现在时；句长4-7词；无复杂结构"
+S2:
+  age: "5-6岁"
+  lexile: "200L-400L"
+  rules: "60%Starters+40%Movers词汇，可出现简单情绪词（happy/scared/tired）、简单副词（now/slowly/fast）；可使用连词and/but/so/because，每句最多1个连词；可使用一般过去时、时间标记then/later；句长7-10词"
+S3:
+  age: "6-7岁"
+  lexile: "400L-600L"
+  rules: "可出现轻抽象词（problem/idea/plan）、描述词（bright/noisy/broken）；可使用连词when/before/after，可表达两层动作链；可用一般过去时+现在进行时混用；句长10-15词"
+S4:
+  age: "7-8岁"
+  lexile: "600L-800L"
+  rules: "全覆盖Flyers词汇，可加入低难度抽象词（decide/safe/dangerous/fix）；可使用连词because/so/if/when/although，可表达动机to do；可使用将来时will；句长15-20词"
--- a/kids-english-script-production/assets/validation_config.yaml
+++ b/kids-english-script-production/assets/validation_config.yaml
@ -0,0 +1,88 @@
+# 英文台词完整校验规则配置
+# 版本：2026-04-01
+---
+## 一、基础通用校验规则（强制执行）
+basic_rules:
+  sentence_spec:
+    - 默认保留be going to标准表达，无特殊标注（如指定学龄前场景）禁止使用gonna等过度口语化表达
+  vocabulary_control:
+    - 低龄学习场景禁止使用超纲词，动作指令优先选择基础词汇（如用look at替代focus）
+  redundancy_check:
+    - 禁止同一句台词连续重复出现2次及以上，此类排版错误直接标注
+  confirmed_optimization:
+    - "Today, we must train!" 统一优化为 "Let's start training!"
+
+## 二、核心精校5大法则（A1级别内容强制遵循）
+core_principles:
+  simplification:
+    name: 极简降维法则
+    rules:
+      - 严格执行"一句一意"，复杂嵌套句型拆分为独立简单句，禁止复合从句
+      - 时态仅允许使用一般现在时、一般过去时、基础将来时（will/be going to），禁止完成时、虚拟语气
+  chunking:
+    name: 语块优先法则
+    rules:
+      - 上下文/画面信息充足时可省略冗余成分，保留核心语块
+      - 允许使用母语儿童常用地道祈使短句（如Game start!/Watch this!/Silly me!），禁止生造表达
+  tpr_action:
+    name: TPR动作强绑定法则
+    rules:
+      - 引导交互/动作的台词以基础动词原形开头（Look/Listen/Hit/Run等）
+      - 确保台词与画面动作/UI组件完全同步，利用视觉辅助听力解码
+  target_focus:
+    name: 目标大纲无痕植入法则
+    rules:
+      - 核心词汇/句型通过剧情冲突（找不到/受伤/失误等）自然重复，禁止生硬植入
+      - 目标句型通过NPC提问引导输出，禁止直接生硬陈述
+  emotional_resonance:
+    name: 情绪夸张法则
+    rules:
+      - 允许使用低认知负荷语气词传递情绪（Phew!/Ouch!/Oops!/Aha!/Waaaaah!等）
+      - 情绪表达直接使用A1级形容词（sad/happy/angry等），禁止复杂心理描写
+
+## 三、AR等级/词汇/难度校验规则
+ar_validation:
+  enable: true
+  S1_allow_AR2_ratio: 0.1 # S1允许AR2占比最大10%
+  S2_allow_AR3_ratio: 0.1 # S2允许AR3占比最大10%
+  S3_allow_AR4_ratio: 0.15 # S3允许AR4占比最大15%
+
+vocab_validation:
+  enable_OOV_remind: true # 是否开启超纲词提醒
+  S1_allow_OOV_ratio: 0.05 # S1允许超纲词占比最大5%
+  S2_allow_OOV_ratio: 0.1 # S2允许超纲词占比最大10%
+  stop_words: # 超纲词校验时忽略的词
+    - hey
+    - look
+    - oh
+    - wow
+    - wait
+    - oh no
+    - yay
+    - i
+    - you
+    - he
+    - she
+    - it
+    - we
+    - they
+    - am
+    - is
+    - are
+    - was
+    - were
+    - a
+    - an
+    - the
+    - and
+    - but
+    - so
+    - because
+
+difficulty_validation:
+  enable: true
+  allow_lexile_deviation: 50 # 允许蓝思值偏差±50L
+
+## 四、校验输出标准
+output_standard:
+  - 所有问题标注需包含：上下文引用+问题类型说明+具体优化方案，确保可直接落地修改
--- a/kids-english-script-production/assets/调优指南.md
+++ b/kids-english-script-production/assets/调优指南.md
@ -0,0 +1,53 @@
+# 英文台词生产技能调优指南
+## 📌 核心原则
+**所有调优无需修改代码/核心Prompt，仅需修改`assets/`目录下的yaml配置文件，改完立即生效**，核心生成策略完全不变，避免人为改动导致规则混乱。
+
+---
+## 📁 可修改配置文件说明
+| 文件名 | 作用 | 修改场景 |
+| ---- | ---- | ---- |
+| `assets/expression_map.yaml` | 表达映射配置 | 需要调整情绪词、口语同义替换、拆分规则时修改 |
+| `assets/prompt_config.yaml` | 生成效果配置 | 需要调整生成灵活度、自然化开关、剧本忠实度时修改 |
+| `assets/validation_config.yaml` | 校验规则配置 | 需要调整校验严格程度、超纲词阈值、AR等级允许比例时修改 |
+| `assets/stage_config.yaml` | 难度等级配置 | 需要调整各Stage的词汇、句法、句长要求时修改 |
+| `assets/sci_fi_map.yaml` | 科幻词映射配置 | 需要新增/修改科幻词汇降级规则时修改 |
+| `references/l1_word_list.json` | L1核心词表 | 需要更新L1词汇白名单时修改 |
+
+---
+## 🔧 常见调优场景示例
+### 1. 想把"天呐"的默认表达从"Oh my"改成"Wow"
+修改`assets/expression_map.yaml`里的`emotion_map`：
+```yaml
+emotion_map:
+  天呐: Wow # 原来的Oh my改成Wow即可
+```
+### 2. 想关闭长句拆分，让句子更连贯
+修改`assets/prompt_config.yaml`里的`naturalization`：
+```yaml
+naturalization:
+  enable_long_sentence_split: false # 把true改成false
+```
+### 3. 想提高S1阶段允许的超纲词比例到10%
+修改`assets/validation_config.yaml`里的`vocab_validation`：
+```yaml
+vocab_validation:
+  S1_allow_OOV_ratio: 0.1 # 从0.05改成0.1
+```
+### 4. 想让生成的内容更灵活，不那么死板
+修改`assets/prompt_config.yaml`里的`temperature`：
+```yaml
+temperature:
+  S2: 0.4 # 从0.3改成0.4，数值越高越灵活，最高不要超过0.7
+```
+### 5. 想新增一个科幻词的降级规则
+修改`assets/sci_fi_map.yaml`，在末尾加一行：
+```yaml
+new_sci_word: "儿童易懂的表达"
+```
+
+---
+## ⚠️ 注意事项
+1. 所有yaml文件必须严格遵守yaml格式，缩进用2个空格，不要用tab，否则会加载失败
+2. 核心规则（禁止新增/删减原剧本内容）建议不要修改，避免输出不符合要求
+3. 修改配置后可以先拿样例剧本测试效果，没问题再批量使用
+4. 配置改乱了可以直接用备份的默认配置覆盖，恢复出厂设置
--- a/kids-english-script-production/examples/sample_script.txt
+++ b/kids-english-script-production/examples/sample_script.txt
@ -0,0 +1,4 @@
+用户: 光有水不行，得先拿上毛巾。
+Ben: 好的，我现在去院子里拿毛巾，顺便把水桶也拿过来。
+用户: 太棒了，我们快点把飞船擦干净，不然天黑就完不成了！
+Ben: 没问题，飞船的能量 core 出了点小问题，我们擦完再一起修。
--- a/kids-english-script-production/references/l1_word_list.json
+++ b/kids-english-script-production/references/l1_word_list.json
--- a/kids-english-script-production/scripts/gen_script.py
+++ b/kids-english-script-production/scripts/gen_script.py
@ -0,0 +1,212 @@
+#!/usr/bin/env python3
+import argparse
+import sys
+import os
+import yaml
+from openai import OpenAI
+from pathlib import Path
+
+# 加载配置
+BASE_DIR = Path(__file__).parent.parent
+ASSETS_DIR = BASE_DIR / "assets"
+
+# 加载外部配置文件（所有可调优参数全部在assets目录下的yaml文件，无需改代码）
+try:
+    # 基础配置
+    with open(ASSETS_DIR / "sci_fi_map.yaml", "r", encoding="utf-8") as f:
+        SCI_FI_WORD_MAP = yaml.safe_load(f)
+    with open(ASSETS_DIR / "stage_config.yaml", "r", encoding="utf-8") as f:
+        STAGE_CONFIG = yaml.safe_load(f)
+    # 调优配置
+    with open(ASSETS_DIR / "expression_map.yaml", "r", encoding="utf-8") as f:
+        EXPRESSION_MAP = yaml.safe_load(f)
+    with open(ASSETS_DIR / "prompt_config.yaml", "r", encoding="utf-8") as f:
+        PROMPT_CONFIG = yaml.safe_load(f)
+    with open(ASSETS_DIR / "validation_config.yaml", "r", encoding="utf-8") as f:
+        VALIDATION_CONFIG = yaml.safe_load(f)
+    # 词表配置
+    with open(BASE_DIR / "references" / "l1_word_list.json", "r", encoding="utf-8") as f:
+        L1_WORD_LIST = set([word.lower() for word in yaml.safe_load(f)])
+except Exception as e:
+    print(f"❌ 配置文件加载失败，请检查yaml格式是否正确: {str(e)}")
+    sys.exit(1)
+
+# 初始化LLM客户端，配置从环境变量读取
+try:
+    client = OpenAI(
+        api_key=os.getenv("OPENAI_API_KEY", "your-api-key"),
+        base_url=os.getenv("OPENAI_BASE_URL", "https://ark.cn-beijing.volces.com/api/v3")
+    )
+    MODEL = os.getenv("OPENAI_MODEL", "volcengine/doubao-seed-2-0-pro-260215")
+except Exception as e:
+    print(f"❌ LLM客户端初始化失败: {str(e)}")
+    sys.exit(1)
+
+def load_input(input_path):
+    """加载输入内容，支持单个文件或目录批量加载"""
+    input_path = Path(input_path)
+    if not input_path.exists():
+        print(f"❌ 输入路径不存在: {input_path}")
+        sys.exit(1)
+    
+    if input_path.is_file():
+        with open(input_path, "r", encoding="utf-8") as f:
+            return [(input_path.name, f.read())]
+    elif input_path.is_dir():
+        # 批量加载目录下所有txt文件
+        script_files = list(input_path.glob("*.txt"))
+        if not script_files:
+            print(f"❌ 目录下没有找到txt格式的剧本文件: {input_path}")
+            sys.exit(1)
+        results = []
+        for f in script_files:
+            with open(f, "r", encoding="utf-8") as fp:
+                results.append((f.name, fp.read()))
+        return results
+    else:
+        print(f"❌ 不支持的输入类型: {input_path}")
+        sys.exit(1)
+
+def get_prompt(input_text, stage):
+    """生成Prompt，所有可调规则从配置文件读取，无需改代码"""
+    sci_fi_map_str = "\n".join([f"{k} → {v}" for k, v in SCI_FI_WORD_MAP.items()])
+    # 动态加载配置规则
+    emotion_map_rule = "优先使用以下映射匹配情绪词：" + "、".join([f"{k}→{v}" for k,v in EXPRESSION_MAP['emotion_map'].items()]) if PROMPT_CONFIG['naturalization']['enable_emotion_word'] else "不使用自定义情绪词映射"
+    synonym_replace_rule = "可使用以下同义口语替换（不改变原意）：" + "、".join([f"{k}→{v}" for k,v in EXPRESSION_MAP['synonym_replace'].items()]) if PROMPT_CONFIG['naturalization']['enable_synonym_replace'] else "不使用同义替换"
+    split_rule = "包含2个及以上信息的句子拆成单信息短句" if PROMPT_CONFIG['naturalization']['enable_long_sentence_split'] else "不拆分长句"
+    repeat_rule = "允许自然重复（比如It is dirty. Very dirty.）" if PROMPT_CONFIG['naturalization']['allow_repeat_expression'] else "不允许重复表达"
+    exclamation_rule = "情绪强烈的句子可用感叹号" if PROMPT_CONFIG['naturalization']['enable_exclamation_mark'] else "统一使用句号"
+    fidelity_rule = "100%忠于原剧本内容：禁止新增任何原剧本没有的信息、禁止删除任何原剧本已有的信息" if PROMPT_CONFIG['script_fidelity']['strictly_no_add'] and PROMPT_CONFIG['script_fidelity']['strictly_no_delete'] else "允许适当调整细节"
+
+    return f"""
+你是专为4-8岁儿童打造的英文台词生产专家，严格遵守以下所有规则生成内容，绝对不允许违反：
+### 剧本忠实度规则（最高优先级，绝对不能违反）
+{fidelity_rule}
+
+### 第一步：输入归一
+当前输入是：{input_text}
+不管输入是纯中文/纯英文/中英混合，你首先统一转成标准中文「角色: 台词」格式，完整保留所有剧情、动作、角色关系、道具、事件触发点信息，不能丢失任何核心内容。
+
+### 第二步：中文AR预处理
+严格遵守4个保留机制（绝对不能改）：
+1. 保留完整事件动词链
+2. 保留所有事件触发点
+3. 保留完整道具逻辑链
+4. 保留原有角色关系
+按以下7条规则拆成单信息短句，1句仅表达1个信息，不改变剧情：
+1. 复杂句拆成短句
+2. 因果拆分，保留事实不保留连接词
+3. 目的拆分，不删目的信息
+4. 多步动作拆成单动作句
+5. 条件+行为全拆分，去掉假设逻辑
+6. 情绪与事实拆分，不修改情绪
+7. 去复杂推理，只留可见事实
+
+### 第三步：分级英文生成
+目标Stage：{stage}
+对应要求：{STAGE_CONFIG[stage]["rules"]}
+蓝思值要求：{STAGE_CONFIG[stage]["lexile"]}
+
+自然化要求（**严格遵守剧本忠实度规则，禁止新增/删减任何原剧本没有的内容**）：
+1. 情绪词映射规则：{emotion_map_rule}
+2. 同义替换规则：{synonym_replace_rule}
+3. 长句拆分规则：{split_rule}
+4. 重复表达规则：{repeat_rule}
+5. 标点规则：{exclamation_rule}
+6. 绝对禁止成人化连接词（actually/in fact/however等）
+7. 完全符合母语小朋友说话习惯，绝对不能有翻译腔
+8. 科幻词汇自动按以下映射替换：
+{sci_fi_map_str}
+
+### 第四步：自动校验
+生成后自行校验以下4项：
+1. AR等级合规：S1禁止AR3/AR4，S2禁止AR4
+2. 难度合规：词汇/句法/句长/蓝思值完全匹配对应Stage要求，无超纲
+3. 自然度合规：无翻译腔，符合4-8岁儿童母语表达习惯
+4. 内容合规：无敏感内容，无中式英语
+
+### 输出格式（严格按照格式输出，不要其他内容）
+【Stage {stage} 英文台词（适配{STAGE_CONFIG[stage]["age"]}）】
+角色A: 台词内容
+角色B: 台词内容
+...
+【蓝思值】：[估算值]L
+【校验结果】：通过/待优化
+【优化建议】：无/具体建议
+"""
+
+def generate_single_script(input_text, stage):
+    """生成单个剧本的台词"""
+    try:
+        prompt = get_prompt(input_text, stage)
+        response = client.chat.completions.create(
+            model=MODEL,
+            messages=[{"role": "user", "content": prompt}],
+            temperature=0.3,
+            max_tokens=2000,
+            timeout=30
+        )
+        result = response.choices[0].message.content
+        # 增加超纲词校验
+        oov_words = check_out_of_vocab(result, stage)
+        if oov_words and stage in ["S1", "S2"]:
+            result += f"\n【超纲词提醒】：{', '.join(oov_words)}（请确认是否需要替换）"
+        return result
+    except Exception as e:
+        return f"❌ 生成失败: {str(e)}"
+
+def check_out_of_vocab(script_content, stage):
+    """检查超纲词汇，规则从配置文件读取"""
+    if not VALIDATION_CONFIG['vocab_validation']['enable_OOV_remind'] or stage not in ["S1", "S2"]:
+        return []
+    # 提取所有英文单词
+    import re
+    words = re.findall(r"[a-zA-Z']+", script_content)
+    words = [word.lower().strip("'") for word in words]
+    # 过滤配置里定义的停用词
+    stop_words = set(VALIDATION_CONFIG['vocab_validation']['stop_words'])
+    words = [word for word in words if word not in stop_words and len(word) > 1]
+    # 找超纲词
+    out_of_vocab = list(set([word for word in words if word not in L1_WORD_LIST]))
+    return out_of_vocab
+
+def save_result(output_dir, filename, content):
+    """保存结果到文件"""
+    output_dir = Path(output_dir)
+    output_dir.mkdir(parents=True, exist_ok=True)
+    output_file = output_dir / f"result_{filename}"
+    with open(output_file, "w", encoding="utf-8") as f:
+        f.write(content)
+    return output_file
+
+def main():
+    parser = argparse.ArgumentParser(description="4-8岁儿童英文台词标准化生产工具")
+    group = parser.add_mutually_exclusive_group(required=True)
+    group.add_argument("--input", type=str, help="直接输入待处理的剧本文本")
+    group.add_argument("--path", type=str, help="待处理的单个剧本文件路径或包含多个剧本的目录路径")
+    parser.add_argument("--stage", type=str, choices=["S1", "S2", "S3", "S4"], required=True, help="目标难度等级 S1/S2/S3/S4")
+    parser.add_argument("--output", type=str, help="结果输出目录，不指定则直接打印到控制台")
+    args = parser.parse_args()
+
+    # 处理输入
+    if args.input:
+        input_list = [("direct_input", args.input)]
+    else:
+        input_list = load_input(args.path)
+
+    # 批量生成
+    results = []
+    for filename, text in input_list:
+        print(f"\n🚀 正在处理: {filename}")
+        result = generate_single_script(text, args.stage)
+        results.append((filename, result))
+        print(result)
+        # 保存结果
+        if args.output:
+            save_path = save_result(args.output, filename, result)
+            print(f"💾 结果已保存到: {save_path}")
+
+    print(f"\n✅ 全部处理完成，共处理{len(results)}个剧本")
+
+if __name__ == "__main__":
+    main()