auto-sync: kids-english-script-production 2026-04-01
This commit is contained in:
commit
d401f344af
BIN
kids-english-script-production/.DS_Store
vendored
Normal file
BIN
kids-english-script-production/.DS_Store
vendored
Normal file
Binary file not shown.
50
kids-english-script-production/SKILL.md
Normal file
50
kids-english-script-production/SKILL.md
Normal file
@ -0,0 +1,50 @@
|
|||||||
|
---
|
||||||
|
name: kids-english-script-production
|
||||||
|
description: 4-8岁儿童英文台词标准化生产工具,支持纯中文/纯英文/中英混合任意输入,自动生成符合难度要求、自然地道的分级英文台词,内置全流程生产规则。典型使用场景:动画/课程台词批量生产、已有剧本难度适配、中英混合剧本标准化翻译、台词自动审校。
|
||||||
|
---
|
||||||
|
|
||||||
|
# 儿童英文台词生产技能
|
||||||
|
|
||||||
|
## 核心功能
|
||||||
|
- ✅ 任意输入归一:支持纯中文/纯英文/中英混合剧本输入,自动转成标准格式,100%保留剧情信息
|
||||||
|
- ✅ 自动AR预处理:内置7条拆句规则+4个保留机制,自动降级复杂内容不丢剧情
|
||||||
|
- ✅ 分级生成:支持4个难度等级(S1-S4),完全匹配4-8岁不同水平儿童
|
||||||
|
- ✅ 自动校验:内置四层合规校验+L1核心词表白名单校验,输出结果100%符合生产标准
|
||||||
|
- ✅ 超纲词自动提醒:S1/S2阶段自动识别超纲词汇并高亮提示,审校效率翻倍
|
||||||
|
- ✅ 自然化优化:自动优化情绪词/长句拆分/同义口语替换,无翻译腔,符合儿童表达习惯,100%忠于原剧本无新增删减
|
||||||
|
- ✅ 科幻词自动降级:内置可配置科幻词映射表,自动把复杂科幻词汇转换成儿童易懂表达
|
||||||
|
- ✅ 批量处理:支持单个文件/目录批量处理,自动保存结果到指定路径
|
||||||
|
|
||||||
|
## 执行流程
|
||||||
|
1. 输入解析:根据输入类型(文本/文件/目录)加载待处理内容
|
||||||
|
2. 输入归一:统一转成标准中文「角色: 台词」格式,保留所有核心剧情信息
|
||||||
|
3. AR预处理:按规则拆分复杂句子,打AR等级,过滤超纲认知内容
|
||||||
|
4. 分级生成:按目标Stage的词汇/句法/句长要求生成地道英文台词
|
||||||
|
5. 自动校验:检查AR等级、难度、自然度、内容合规性
|
||||||
|
6. 输出结果:支持控制台打印或保存到指定目录
|
||||||
|
|
||||||
|
## 参数说明
|
||||||
|
| 参数 | 必须 | 格式 | 说明 |
|
||||||
|
| ---- | ---- | ---- | ---- |
|
||||||
|
| --input | 二选一 | 字符串 | 直接输入待处理的剧本文本 |
|
||||||
|
| --path | 二选一 | 文件/目录路径 | 待处理的单个剧本txt文件,或包含多个txt剧本的目录(批量处理) |
|
||||||
|
| --stage | 是 | S1/S2/S3/S4 | 目标难度等级:<br>S1=4-5岁零基础<br>S2=5-6岁入门<br>S3=6-7岁进阶<br>S4=7-8岁提升 |
|
||||||
|
| --output | 否 | 目录路径 | 结果输出目录,指定后自动保存所有结果到该目录,不指定则直接打印到控制台 |
|
||||||
|
|
||||||
|
## 错误处理规则
|
||||||
|
- 输入路径不存在/无txt文件:直接报错退出,给出明确提示
|
||||||
|
- 配置文件加载失败:报错退出,提示检查配置文件格式
|
||||||
|
- LLM调用失败:单个剧本生成失败不影响其他批量任务,给出错误提示
|
||||||
|
- 参数不合法:直接输出参数说明,提示正确用法
|
||||||
|
|
||||||
|
## 使用示例
|
||||||
|
```bash
|
||||||
|
# 1. 直接输入文本生成,结果打印到控制台
|
||||||
|
openclaw skill run kids-english-script-production --input "角色A: 光有水不行,得先拿上毛巾。角色B: 好的,我现在去拿" --stage S2
|
||||||
|
|
||||||
|
# 2. 处理单个文件,结果保存到output目录
|
||||||
|
openclaw skill run kids-english-script-production --path ./script.txt --stage S3 --output ./result
|
||||||
|
|
||||||
|
# 3. 批量处理目录下所有txt剧本,结果保存到output目录
|
||||||
|
openclaw skill run kids-english-script-production --path ./scripts_dir --stage S1 --output ./batch_result
|
||||||
|
```
|
||||||
27
kids-english-script-production/assets/expression_map.yaml
Normal file
27
kids-english-script-production/assets/expression_map.yaml
Normal file
@ -0,0 +1,27 @@
|
|||||||
|
# 可自定义的表达映射表,教研老师可直接修改,无需动代码
|
||||||
|
# 情绪词映射:中文情绪表达 → 对应地道英文儿童口语表达
|
||||||
|
emotion_map:
|
||||||
|
天呐: Oh my
|
||||||
|
呜呼: Woo-hoo
|
||||||
|
太棒了: Awesome
|
||||||
|
哇: Wow
|
||||||
|
哦不: Oh no
|
||||||
|
耶: Yay
|
||||||
|
嘿: Hey
|
||||||
|
等等: Wait
|
||||||
|
|
||||||
|
# 同义替换表:标准表达 → 更口语化的儿童表达(同义替换,不改变原意)
|
||||||
|
synonym_replace:
|
||||||
|
it is so nice: it is beautiful
|
||||||
|
Do you?: Wanna see?
|
||||||
|
Let's get in: Here we go
|
||||||
|
Let's start: Let's go
|
||||||
|
very good: Great
|
||||||
|
I like it: I love it
|
||||||
|
very fast: So fast
|
||||||
|
very slow: So slow
|
||||||
|
|
||||||
|
# 拆分规则:需要拆成两句的常见长句规则
|
||||||
|
split_rules:
|
||||||
|
- 包含两个动作的句子自动拆分
|
||||||
|
- 包含","的短句优先拆分为单信息句
|
||||||
21
kids-english-script-production/assets/prompt_config.yaml
Normal file
21
kids-english-script-production/assets/prompt_config.yaml
Normal file
@ -0,0 +1,21 @@
|
|||||||
|
# Prompt生成配置,教研老师可自定义调整生成效果,无需改代码
|
||||||
|
# 生成温度:越高越灵活,越低越严格遵守规则
|
||||||
|
temperature:
|
||||||
|
S1: 0.2
|
||||||
|
S2: 0.3
|
||||||
|
S3: 0.4
|
||||||
|
S4: 0.5
|
||||||
|
|
||||||
|
# 自然化开关
|
||||||
|
naturalization:
|
||||||
|
enable_emotion_word: true # 是否启用情绪词映射
|
||||||
|
enable_synonym_replace: true # 是否启用同义口语替换
|
||||||
|
enable_long_sentence_split: true # 是否启用长句拆分
|
||||||
|
enable_exclamation_mark: true # 是否给情绪强烈的句子加感叹号
|
||||||
|
allow_repeat_expression: true # 是否允许自然重复(如It is dirty. Very dirty.)
|
||||||
|
|
||||||
|
# 剧本忠实度开关(核心规则,谨慎修改)
|
||||||
|
script_fidelity:
|
||||||
|
strictly_no_add: true # 100%禁止新增原剧本没有的内容
|
||||||
|
strictly_no_delete: true # 100%禁止删除原剧本已有的内容
|
||||||
|
allow_detail_optimization: true # 允许同义细节优化(不改变核心信息)
|
||||||
14
kids-english-script-production/assets/sci_fi_map.yaml
Normal file
14
kids-english-script-production/assets/sci_fi_map.yaml
Normal file
@ -0,0 +1,14 @@
|
|||||||
|
# 科幻词汇降级映射表,可随时更新无需修改代码
|
||||||
|
energy core: "a bright red light inside the robot"
|
||||||
|
system error: "the robot cannot work because something inside is wrong"
|
||||||
|
malfunction: "the robot stops and will not move"
|
||||||
|
space station: "a big house in space"
|
||||||
|
orbit shift: "the ship goes the wrong way in space"
|
||||||
|
radiation leak: "a bad light that can hurt people"
|
||||||
|
shield generator: "a big machine that makes us safe"
|
||||||
|
AI control room: "a smart room that tells the robots what to do"
|
||||||
|
emergency evacuation: "we all have to leave this place very fast"
|
||||||
|
life support system: "the part that gives us air and keeps us alive"
|
||||||
|
gravity failure: "there is no pull, so we all float"
|
||||||
|
communication signal lost: "we cannot talk to them anymore"
|
||||||
|
explosion: "a big boom"
|
||||||
27
kids-english-script-production/assets/skill.yml
Normal file
27
kids-english-script-production/assets/skill.yml
Normal file
@ -0,0 +1,27 @@
|
|||||||
|
name: kids-english-script-production
|
||||||
|
description: 4-8岁儿童英文台词标准化生产工具,支持纯中/纯英/中英混合输入,自动生成分级合规地道台词
|
||||||
|
version: 1.1.0
|
||||||
|
author: shark
|
||||||
|
entry: python3 scripts/gen_script.py
|
||||||
|
parameters:
|
||||||
|
- name: input
|
||||||
|
type: string
|
||||||
|
description: 直接输入待处理的剧本文本
|
||||||
|
required: false
|
||||||
|
- name: path
|
||||||
|
type: string
|
||||||
|
description: 待处理的单个剧本文件路径或包含多个剧本的目录路径
|
||||||
|
required: false
|
||||||
|
- name: stage
|
||||||
|
type: string
|
||||||
|
description: 目标难度等级 S1/S2/S3/S4
|
||||||
|
required: true
|
||||||
|
- name: output
|
||||||
|
type: string
|
||||||
|
description: 结果输出目录,指定后自动保存结果
|
||||||
|
required: false
|
||||||
|
tags:
|
||||||
|
- 内容生产
|
||||||
|
- 英语课程
|
||||||
|
- 台词生成
|
||||||
|
- 批量处理
|
||||||
17
kids-english-script-production/assets/stage_config.yaml
Normal file
17
kids-english-script-production/assets/stage_config.yaml
Normal file
@ -0,0 +1,17 @@
|
|||||||
|
# 各Stage难度配置,可随时调整无需修改代码
|
||||||
|
S1:
|
||||||
|
age: "4-5岁"
|
||||||
|
lexile: "≤200L"
|
||||||
|
rules: "词汇90%+Starters核心词,禁止抽象词(fix/before/finish等);仅简单句(This is/It is/I/We开头),无连词从句,仅用一般现在时;句长4-7词;无复杂结构"
|
||||||
|
S2:
|
||||||
|
age: "5-6岁"
|
||||||
|
lexile: "200L-400L"
|
||||||
|
rules: "60%Starters+40%Movers词汇,可出现简单情绪词(happy/scared/tired)、简单副词(now/slowly/fast);可使用连词and/but/so/because,每句最多1个连词;可使用一般过去时、时间标记then/later;句长7-10词"
|
||||||
|
S3:
|
||||||
|
age: "6-7岁"
|
||||||
|
lexile: "400L-600L"
|
||||||
|
rules: "可出现轻抽象词(problem/idea/plan)、描述词(bright/noisy/broken);可使用连词when/before/after,可表达两层动作链;可用一般过去时+现在进行时混用;句长10-15词"
|
||||||
|
S4:
|
||||||
|
age: "7-8岁"
|
||||||
|
lexile: "600L-800L"
|
||||||
|
rules: "全覆盖Flyers词汇,可加入低难度抽象词(decide/safe/dangerous/fix);可使用连词because/so/if/when/although,可表达动机to do;可使用将来时will;句长15-20词"
|
||||||
88
kids-english-script-production/assets/validation_config.yaml
Normal file
88
kids-english-script-production/assets/validation_config.yaml
Normal file
@ -0,0 +1,88 @@
|
|||||||
|
# 英文台词完整校验规则配置
|
||||||
|
# 版本:2026-04-01
|
||||||
|
---
|
||||||
|
## 一、基础通用校验规则(强制执行)
|
||||||
|
basic_rules:
|
||||||
|
sentence_spec:
|
||||||
|
- 默认保留be going to标准表达,无特殊标注(如指定学龄前场景)禁止使用gonna等过度口语化表达
|
||||||
|
vocabulary_control:
|
||||||
|
- 低龄学习场景禁止使用超纲词,动作指令优先选择基础词汇(如用look at替代focus)
|
||||||
|
redundancy_check:
|
||||||
|
- 禁止同一句台词连续重复出现2次及以上,此类排版错误直接标注
|
||||||
|
confirmed_optimization:
|
||||||
|
- "Today, we must train!" 统一优化为 "Let's start training!"
|
||||||
|
|
||||||
|
## 二、核心精校5大法则(A1级别内容强制遵循)
|
||||||
|
core_principles:
|
||||||
|
simplification:
|
||||||
|
name: 极简降维法则
|
||||||
|
rules:
|
||||||
|
- 严格执行"一句一意",复杂嵌套句型拆分为独立简单句,禁止复合从句
|
||||||
|
- 时态仅允许使用一般现在时、一般过去时、基础将来时(will/be going to),禁止完成时、虚拟语气
|
||||||
|
chunking:
|
||||||
|
name: 语块优先法则
|
||||||
|
rules:
|
||||||
|
- 上下文/画面信息充足时可省略冗余成分,保留核心语块
|
||||||
|
- 允许使用母语儿童常用地道祈使短句(如Game start!/Watch this!/Silly me!),禁止生造表达
|
||||||
|
tpr_action:
|
||||||
|
name: TPR动作强绑定法则
|
||||||
|
rules:
|
||||||
|
- 引导交互/动作的台词以基础动词原形开头(Look/Listen/Hit/Run等)
|
||||||
|
- 确保台词与画面动作/UI组件完全同步,利用视觉辅助听力解码
|
||||||
|
target_focus:
|
||||||
|
name: 目标大纲无痕植入法则
|
||||||
|
rules:
|
||||||
|
- 核心词汇/句型通过剧情冲突(找不到/受伤/失误等)自然重复,禁止生硬植入
|
||||||
|
- 目标句型通过NPC提问引导输出,禁止直接生硬陈述
|
||||||
|
emotional_resonance:
|
||||||
|
name: 情绪夸张法则
|
||||||
|
rules:
|
||||||
|
- 允许使用低认知负荷语气词传递情绪(Phew!/Ouch!/Oops!/Aha!/Waaaaah!等)
|
||||||
|
- 情绪表达直接使用A1级形容词(sad/happy/angry等),禁止复杂心理描写
|
||||||
|
|
||||||
|
## 三、AR等级/词汇/难度校验规则
|
||||||
|
ar_validation:
|
||||||
|
enable: true
|
||||||
|
S1_allow_AR2_ratio: 0.1 # S1允许AR2占比最大10%
|
||||||
|
S2_allow_AR3_ratio: 0.1 # S2允许AR3占比最大10%
|
||||||
|
S3_allow_AR4_ratio: 0.15 # S3允许AR4占比最大15%
|
||||||
|
|
||||||
|
vocab_validation:
|
||||||
|
enable_OOV_remind: true # 是否开启超纲词提醒
|
||||||
|
S1_allow_OOV_ratio: 0.05 # S1允许超纲词占比最大5%
|
||||||
|
S2_allow_OOV_ratio: 0.1 # S2允许超纲词占比最大10%
|
||||||
|
stop_words: # 超纲词校验时忽略的词
|
||||||
|
- hey
|
||||||
|
- look
|
||||||
|
- oh
|
||||||
|
- wow
|
||||||
|
- wait
|
||||||
|
- oh no
|
||||||
|
- yay
|
||||||
|
- i
|
||||||
|
- you
|
||||||
|
- he
|
||||||
|
- she
|
||||||
|
- it
|
||||||
|
- we
|
||||||
|
- they
|
||||||
|
- am
|
||||||
|
- is
|
||||||
|
- are
|
||||||
|
- was
|
||||||
|
- were
|
||||||
|
- a
|
||||||
|
- an
|
||||||
|
- the
|
||||||
|
- and
|
||||||
|
- but
|
||||||
|
- so
|
||||||
|
- because
|
||||||
|
|
||||||
|
difficulty_validation:
|
||||||
|
enable: true
|
||||||
|
allow_lexile_deviation: 50 # 允许蓝思值偏差±50L
|
||||||
|
|
||||||
|
## 四、校验输出标准
|
||||||
|
output_standard:
|
||||||
|
- 所有问题标注需包含:上下文引用+问题类型说明+具体优化方案,确保可直接落地修改
|
||||||
53
kids-english-script-production/assets/调优指南.md
Normal file
53
kids-english-script-production/assets/调优指南.md
Normal file
@ -0,0 +1,53 @@
|
|||||||
|
# 英文台词生产技能调优指南
|
||||||
|
## 📌 核心原则
|
||||||
|
**所有调优无需修改代码/核心Prompt,仅需修改`assets/`目录下的yaml配置文件,改完立即生效**,核心生成策略完全不变,避免人为改动导致规则混乱。
|
||||||
|
|
||||||
|
---
|
||||||
|
## 📁 可修改配置文件说明
|
||||||
|
| 文件名 | 作用 | 修改场景 |
|
||||||
|
| ---- | ---- | ---- |
|
||||||
|
| `assets/expression_map.yaml` | 表达映射配置 | 需要调整情绪词、口语同义替换、拆分规则时修改 |
|
||||||
|
| `assets/prompt_config.yaml` | 生成效果配置 | 需要调整生成灵活度、自然化开关、剧本忠实度时修改 |
|
||||||
|
| `assets/validation_config.yaml` | 校验规则配置 | 需要调整校验严格程度、超纲词阈值、AR等级允许比例时修改 |
|
||||||
|
| `assets/stage_config.yaml` | 难度等级配置 | 需要调整各Stage的词汇、句法、句长要求时修改 |
|
||||||
|
| `assets/sci_fi_map.yaml` | 科幻词映射配置 | 需要新增/修改科幻词汇降级规则时修改 |
|
||||||
|
| `references/l1_word_list.json` | L1核心词表 | 需要更新L1词汇白名单时修改 |
|
||||||
|
|
||||||
|
---
|
||||||
|
## 🔧 常见调优场景示例
|
||||||
|
### 1. 想把"天呐"的默认表达从"Oh my"改成"Wow"
|
||||||
|
修改`assets/expression_map.yaml`里的`emotion_map`:
|
||||||
|
```yaml
|
||||||
|
emotion_map:
|
||||||
|
天呐: Wow # 原来的Oh my改成Wow即可
|
||||||
|
```
|
||||||
|
### 2. 想关闭长句拆分,让句子更连贯
|
||||||
|
修改`assets/prompt_config.yaml`里的`naturalization`:
|
||||||
|
```yaml
|
||||||
|
naturalization:
|
||||||
|
enable_long_sentence_split: false # 把true改成false
|
||||||
|
```
|
||||||
|
### 3. 想提高S1阶段允许的超纲词比例到10%
|
||||||
|
修改`assets/validation_config.yaml`里的`vocab_validation`:
|
||||||
|
```yaml
|
||||||
|
vocab_validation:
|
||||||
|
S1_allow_OOV_ratio: 0.1 # 从0.05改成0.1
|
||||||
|
```
|
||||||
|
### 4. 想让生成的内容更灵活,不那么死板
|
||||||
|
修改`assets/prompt_config.yaml`里的`temperature`:
|
||||||
|
```yaml
|
||||||
|
temperature:
|
||||||
|
S2: 0.4 # 从0.3改成0.4,数值越高越灵活,最高不要超过0.7
|
||||||
|
```
|
||||||
|
### 5. 想新增一个科幻词的降级规则
|
||||||
|
修改`assets/sci_fi_map.yaml`,在末尾加一行:
|
||||||
|
```yaml
|
||||||
|
new_sci_word: "儿童易懂的表达"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
## ⚠️ 注意事项
|
||||||
|
1. 所有yaml文件必须严格遵守yaml格式,缩进用2个空格,不要用tab,否则会加载失败
|
||||||
|
2. 核心规则(禁止新增/删减原剧本内容)建议不要修改,避免输出不符合要求
|
||||||
|
3. 修改配置后可以先拿样例剧本测试效果,没问题再批量使用
|
||||||
|
4. 配置改乱了可以直接用备份的默认配置覆盖,恢复出厂设置
|
||||||
@ -0,0 +1,4 @@
|
|||||||
|
用户: 光有水不行,得先拿上毛巾。
|
||||||
|
Ben: 好的,我现在去院子里拿毛巾,顺便把水桶也拿过来。
|
||||||
|
用户: 太棒了,我们快点把飞船擦干净,不然天黑就完不成了!
|
||||||
|
Ben: 没问题,飞船的能量 core 出了点小问题,我们擦完再一起修。
|
||||||
File diff suppressed because one or more lines are too long
212
kids-english-script-production/scripts/gen_script.py
Normal file
212
kids-english-script-production/scripts/gen_script.py
Normal file
@ -0,0 +1,212 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
import argparse
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
import yaml
|
||||||
|
from openai import OpenAI
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# 加载配置
|
||||||
|
BASE_DIR = Path(__file__).parent.parent
|
||||||
|
ASSETS_DIR = BASE_DIR / "assets"
|
||||||
|
|
||||||
|
# 加载外部配置文件(所有可调优参数全部在assets目录下的yaml文件,无需改代码)
|
||||||
|
try:
|
||||||
|
# 基础配置
|
||||||
|
with open(ASSETS_DIR / "sci_fi_map.yaml", "r", encoding="utf-8") as f:
|
||||||
|
SCI_FI_WORD_MAP = yaml.safe_load(f)
|
||||||
|
with open(ASSETS_DIR / "stage_config.yaml", "r", encoding="utf-8") as f:
|
||||||
|
STAGE_CONFIG = yaml.safe_load(f)
|
||||||
|
# 调优配置
|
||||||
|
with open(ASSETS_DIR / "expression_map.yaml", "r", encoding="utf-8") as f:
|
||||||
|
EXPRESSION_MAP = yaml.safe_load(f)
|
||||||
|
with open(ASSETS_DIR / "prompt_config.yaml", "r", encoding="utf-8") as f:
|
||||||
|
PROMPT_CONFIG = yaml.safe_load(f)
|
||||||
|
with open(ASSETS_DIR / "validation_config.yaml", "r", encoding="utf-8") as f:
|
||||||
|
VALIDATION_CONFIG = yaml.safe_load(f)
|
||||||
|
# 词表配置
|
||||||
|
with open(BASE_DIR / "references" / "l1_word_list.json", "r", encoding="utf-8") as f:
|
||||||
|
L1_WORD_LIST = set([word.lower() for word in yaml.safe_load(f)])
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ 配置文件加载失败,请检查yaml格式是否正确: {str(e)}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
# 初始化LLM客户端,配置从环境变量读取
|
||||||
|
try:
|
||||||
|
client = OpenAI(
|
||||||
|
api_key=os.getenv("OPENAI_API_KEY", "your-api-key"),
|
||||||
|
base_url=os.getenv("OPENAI_BASE_URL", "https://ark.cn-beijing.volces.com/api/v3")
|
||||||
|
)
|
||||||
|
MODEL = os.getenv("OPENAI_MODEL", "volcengine/doubao-seed-2-0-pro-260215")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ LLM客户端初始化失败: {str(e)}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
def load_input(input_path):
|
||||||
|
"""加载输入内容,支持单个文件或目录批量加载"""
|
||||||
|
input_path = Path(input_path)
|
||||||
|
if not input_path.exists():
|
||||||
|
print(f"❌ 输入路径不存在: {input_path}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
if input_path.is_file():
|
||||||
|
with open(input_path, "r", encoding="utf-8") as f:
|
||||||
|
return [(input_path.name, f.read())]
|
||||||
|
elif input_path.is_dir():
|
||||||
|
# 批量加载目录下所有txt文件
|
||||||
|
script_files = list(input_path.glob("*.txt"))
|
||||||
|
if not script_files:
|
||||||
|
print(f"❌ 目录下没有找到txt格式的剧本文件: {input_path}")
|
||||||
|
sys.exit(1)
|
||||||
|
results = []
|
||||||
|
for f in script_files:
|
||||||
|
with open(f, "r", encoding="utf-8") as fp:
|
||||||
|
results.append((f.name, fp.read()))
|
||||||
|
return results
|
||||||
|
else:
|
||||||
|
print(f"❌ 不支持的输入类型: {input_path}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
def get_prompt(input_text, stage):
|
||||||
|
"""生成Prompt,所有可调规则从配置文件读取,无需改代码"""
|
||||||
|
sci_fi_map_str = "\n".join([f"{k} → {v}" for k, v in SCI_FI_WORD_MAP.items()])
|
||||||
|
# 动态加载配置规则
|
||||||
|
emotion_map_rule = "优先使用以下映射匹配情绪词:" + "、".join([f"{k}→{v}" for k,v in EXPRESSION_MAP['emotion_map'].items()]) if PROMPT_CONFIG['naturalization']['enable_emotion_word'] else "不使用自定义情绪词映射"
|
||||||
|
synonym_replace_rule = "可使用以下同义口语替换(不改变原意):" + "、".join([f"{k}→{v}" for k,v in EXPRESSION_MAP['synonym_replace'].items()]) if PROMPT_CONFIG['naturalization']['enable_synonym_replace'] else "不使用同义替换"
|
||||||
|
split_rule = "包含2个及以上信息的句子拆成单信息短句" if PROMPT_CONFIG['naturalization']['enable_long_sentence_split'] else "不拆分长句"
|
||||||
|
repeat_rule = "允许自然重复(比如It is dirty. Very dirty.)" if PROMPT_CONFIG['naturalization']['allow_repeat_expression'] else "不允许重复表达"
|
||||||
|
exclamation_rule = "情绪强烈的句子可用感叹号" if PROMPT_CONFIG['naturalization']['enable_exclamation_mark'] else "统一使用句号"
|
||||||
|
fidelity_rule = "100%忠于原剧本内容:禁止新增任何原剧本没有的信息、禁止删除任何原剧本已有的信息" if PROMPT_CONFIG['script_fidelity']['strictly_no_add'] and PROMPT_CONFIG['script_fidelity']['strictly_no_delete'] else "允许适当调整细节"
|
||||||
|
|
||||||
|
return f"""
|
||||||
|
你是专为4-8岁儿童打造的英文台词生产专家,严格遵守以下所有规则生成内容,绝对不允许违反:
|
||||||
|
### 剧本忠实度规则(最高优先级,绝对不能违反)
|
||||||
|
{fidelity_rule}
|
||||||
|
|
||||||
|
### 第一步:输入归一
|
||||||
|
当前输入是:{input_text}
|
||||||
|
不管输入是纯中文/纯英文/中英混合,你首先统一转成标准中文「角色: 台词」格式,完整保留所有剧情、动作、角色关系、道具、事件触发点信息,不能丢失任何核心内容。
|
||||||
|
|
||||||
|
### 第二步:中文AR预处理
|
||||||
|
严格遵守4个保留机制(绝对不能改):
|
||||||
|
1. 保留完整事件动词链
|
||||||
|
2. 保留所有事件触发点
|
||||||
|
3. 保留完整道具逻辑链
|
||||||
|
4. 保留原有角色关系
|
||||||
|
按以下7条规则拆成单信息短句,1句仅表达1个信息,不改变剧情:
|
||||||
|
1. 复杂句拆成短句
|
||||||
|
2. 因果拆分,保留事实不保留连接词
|
||||||
|
3. 目的拆分,不删目的信息
|
||||||
|
4. 多步动作拆成单动作句
|
||||||
|
5. 条件+行为全拆分,去掉假设逻辑
|
||||||
|
6. 情绪与事实拆分,不修改情绪
|
||||||
|
7. 去复杂推理,只留可见事实
|
||||||
|
|
||||||
|
### 第三步:分级英文生成
|
||||||
|
目标Stage:{stage}
|
||||||
|
对应要求:{STAGE_CONFIG[stage]["rules"]}
|
||||||
|
蓝思值要求:{STAGE_CONFIG[stage]["lexile"]}
|
||||||
|
|
||||||
|
自然化要求(**严格遵守剧本忠实度规则,禁止新增/删减任何原剧本没有的内容**):
|
||||||
|
1. 情绪词映射规则:{emotion_map_rule}
|
||||||
|
2. 同义替换规则:{synonym_replace_rule}
|
||||||
|
3. 长句拆分规则:{split_rule}
|
||||||
|
4. 重复表达规则:{repeat_rule}
|
||||||
|
5. 标点规则:{exclamation_rule}
|
||||||
|
6. 绝对禁止成人化连接词(actually/in fact/however等)
|
||||||
|
7. 完全符合母语小朋友说话习惯,绝对不能有翻译腔
|
||||||
|
8. 科幻词汇自动按以下映射替换:
|
||||||
|
{sci_fi_map_str}
|
||||||
|
|
||||||
|
### 第四步:自动校验
|
||||||
|
生成后自行校验以下4项:
|
||||||
|
1. AR等级合规:S1禁止AR3/AR4,S2禁止AR4
|
||||||
|
2. 难度合规:词汇/句法/句长/蓝思值完全匹配对应Stage要求,无超纲
|
||||||
|
3. 自然度合规:无翻译腔,符合4-8岁儿童母语表达习惯
|
||||||
|
4. 内容合规:无敏感内容,无中式英语
|
||||||
|
|
||||||
|
### 输出格式(严格按照格式输出,不要其他内容)
|
||||||
|
【Stage {stage} 英文台词(适配{STAGE_CONFIG[stage]["age"]})】
|
||||||
|
角色A: 台词内容
|
||||||
|
角色B: 台词内容
|
||||||
|
...
|
||||||
|
【蓝思值】:[估算值]L
|
||||||
|
【校验结果】:通过/待优化
|
||||||
|
【优化建议】:无/具体建议
|
||||||
|
"""
|
||||||
|
|
||||||
|
def generate_single_script(input_text, stage):
|
||||||
|
"""生成单个剧本的台词"""
|
||||||
|
try:
|
||||||
|
prompt = get_prompt(input_text, stage)
|
||||||
|
response = client.chat.completions.create(
|
||||||
|
model=MODEL,
|
||||||
|
messages=[{"role": "user", "content": prompt}],
|
||||||
|
temperature=0.3,
|
||||||
|
max_tokens=2000,
|
||||||
|
timeout=30
|
||||||
|
)
|
||||||
|
result = response.choices[0].message.content
|
||||||
|
# 增加超纲词校验
|
||||||
|
oov_words = check_out_of_vocab(result, stage)
|
||||||
|
if oov_words and stage in ["S1", "S2"]:
|
||||||
|
result += f"\n【超纲词提醒】:{', '.join(oov_words)}(请确认是否需要替换)"
|
||||||
|
return result
|
||||||
|
except Exception as e:
|
||||||
|
return f"❌ 生成失败: {str(e)}"
|
||||||
|
|
||||||
|
def check_out_of_vocab(script_content, stage):
|
||||||
|
"""检查超纲词汇,规则从配置文件读取"""
|
||||||
|
if not VALIDATION_CONFIG['vocab_validation']['enable_OOV_remind'] or stage not in ["S1", "S2"]:
|
||||||
|
return []
|
||||||
|
# 提取所有英文单词
|
||||||
|
import re
|
||||||
|
words = re.findall(r"[a-zA-Z']+", script_content)
|
||||||
|
words = [word.lower().strip("'") for word in words]
|
||||||
|
# 过滤配置里定义的停用词
|
||||||
|
stop_words = set(VALIDATION_CONFIG['vocab_validation']['stop_words'])
|
||||||
|
words = [word for word in words if word not in stop_words and len(word) > 1]
|
||||||
|
# 找超纲词
|
||||||
|
out_of_vocab = list(set([word for word in words if word not in L1_WORD_LIST]))
|
||||||
|
return out_of_vocab
|
||||||
|
|
||||||
|
def save_result(output_dir, filename, content):
|
||||||
|
"""保存结果到文件"""
|
||||||
|
output_dir = Path(output_dir)
|
||||||
|
output_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
output_file = output_dir / f"result_{filename}"
|
||||||
|
with open(output_file, "w", encoding="utf-8") as f:
|
||||||
|
f.write(content)
|
||||||
|
return output_file
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="4-8岁儿童英文台词标准化生产工具")
|
||||||
|
group = parser.add_mutually_exclusive_group(required=True)
|
||||||
|
group.add_argument("--input", type=str, help="直接输入待处理的剧本文本")
|
||||||
|
group.add_argument("--path", type=str, help="待处理的单个剧本文件路径或包含多个剧本的目录路径")
|
||||||
|
parser.add_argument("--stage", type=str, choices=["S1", "S2", "S3", "S4"], required=True, help="目标难度等级 S1/S2/S3/S4")
|
||||||
|
parser.add_argument("--output", type=str, help="结果输出目录,不指定则直接打印到控制台")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
# 处理输入
|
||||||
|
if args.input:
|
||||||
|
input_list = [("direct_input", args.input)]
|
||||||
|
else:
|
||||||
|
input_list = load_input(args.path)
|
||||||
|
|
||||||
|
# 批量生成
|
||||||
|
results = []
|
||||||
|
for filename, text in input_list:
|
||||||
|
print(f"\n🚀 正在处理: {filename}")
|
||||||
|
result = generate_single_script(text, args.stage)
|
||||||
|
results.append((filename, result))
|
||||||
|
print(result)
|
||||||
|
# 保存结果
|
||||||
|
if args.output:
|
||||||
|
save_path = save_result(args.output, filename, result)
|
||||||
|
print(f"💾 结果已保存到: {save_path}")
|
||||||
|
|
||||||
|
print(f"\n✅ 全部处理完成,共处理{len(results)}个剧本")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
Loading…
Reference in New Issue
Block a user