Alpha Agent：让 AI 帮我生成量化因子

正文开始前先说一下感想吧：

最近在做 Alpha Agent 的过程中，一个越来越明显的感受是：AI Agent 影响的不是某一个工具，而是“工作本身是怎么被完成的”。

过去无论是量化研究、写代码，还是日常办公，本质都是同一件事：人把任务拆成步骤，然后亲自操作工具一步步完成。比如写报告要查资料、整理数据、写初稿、修改；做分析要自己跑代码、画图、反复调参。瓶颈一直是执行成本。

但 AI Agent 的变化在于，它开始接管的不只是“回答问题”，而是整条执行链路：理解目标 → 拆解任务 → 调用工具 → 迭代优化。

这个变化其实是跨领域的。在办公场景里，以前是人做 Excel、SQL、PPT、写结论；现在更像是你给目标，agent 直接完成分析和初稿。在编程里，以前是写代码+debug+测试，现在变成 agent 生成代码并自我修复。在内容生产里，以前是从零写作，现在更像是人定结构，agent 生成并优化。

放到 Alpha Agent 里，这种变化更明显：因子研究本质是“生成→回测→修正”的循环，以前靠人推进，现在可以被 agent 自动化执行。

所以研究者的角色也在变化，从执行者变成系统设计者——不再是“找 alpha”，而是“设计一个持续找 alpha 的系统”。

但同时也很明显，目前 agent 仍然缺乏真正的结构理解能力，更多是在做高维组合搜索，而不是建立金融机制模型。所以它更像是一个放大器，而不是替代者。

一、为什么做 Alpha Agent

最近 AI Agent 的概念很火。

传统 LLM 更像是“问答机器”，输入问题 → 输出答案；而 Agent 的核心变化是：

把“任务”交给 AI，而不是“问题”。

AI 不仅生成结果，还会：

拆解任务
自主生成方案
执行多轮迭代优化
基于反馈持续改进

这本质上是在构建一个“自动研究员”。

二、我的目标：用 AI 自动生成 Alpha 因子

我尝试把这个思路迁移到量化研究中：

用 LLM 从海量因子字段中自动组合 alpha 表达式，并根据回测结果进行迭代优化。

整体流程如下：

从数据字段中抽取可用 feature
LLM 生成 alpha 公式
回测得到 Sharpe / Fitness / Turnover 等指标
将结果反馈给 LLM
LLM 进行下一轮优化

形成一个闭环 Alpha Agent。

三、模型选择：DeepSeek API

这里使用的是 DeepSeek API（接口兼容 OpenAI SDK）：

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get('DEEPSEEK_API_KEY'),
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Hello"},
    ],
    stream=False,
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}}
)

print(response.choices[0].message.content)

四、核心模块设计

整个系统核心其实只有两个 Prompt 工程模块：

4.1 Alpha 初始生成 Prompt（build_prompt）

用于从 feature space 中“生成候选 alpha”。

核心约束包括：

仅使用给定 features
每个 alpha 使用 2–4 个变量
必须有经济学解释
输出严格 JSON

示例结构：

def build_prompt(samples: List[Dict]) -> str:
    lines = []

    lines.append("You are an expert quantitative researcher at WorldQuant Brain.")
    lines.append("Your goal is to design robust, economically meaningful alpha signals.")
    lines.append("")

    lines.append("TASK (must follow step-by-step):")
    lines.append("STEP 1 — Generate 10 alpha signals")
    lines.append("- Use ONLY the provided features")
    lines.append("- Each alpha must use 2–4 features")
    lines.append("- Avoid overly complex formulas")
    lines.append("- Each alpha must have clear economic intuition")
    ......

4.2 Alpha 优化 Prompt（build_alpha_optimization_prompt）

第二个模块是整个系统的关键：

不是“重新生成 alpha”，而是“在历史失败基础上做结构性优化”。

输入包括：

当前 alpha
最近 N 轮回测结果
feature 列表
operator 约束

核心思想是：

强制 LLM 做“研究员式推理”，而不是随机改公式。

关键设计点：

（1）历史轨迹输入

1
2
3

RECENT OPTIMIZATION HISTORY (last 5 iterations):
- Iteration 1: ...
- Iteration 2: ...

让模型学习“哪些修改是失败的”。

（2）经济学约束（关键）

1 2	Your goal is NOT to mechanically improve metrics. Your goal is to infer the economic hypothesis behind the alpha.

强制模型回答：

这个 alpha 在表达什么市场行为？
最近改动是否破坏了这个假设？

（3）强约束不变性

1 2	1. Preserve the exact set of input fields 2. No new fields, no removed fields

避免模型“作弊式优化”。

（4）结构性优化限制

1	Prefer modifying no more than TWO operator choices per iteration

避免完全重写导致搜索空间爆炸。

（5）输出结构

{
  "action": "improve",
  "alpha": "...",
  "idea": "...",
  "category": "...",
  "modification_summary": "..."
}

五、完整 Prompt 示例（初始生成）

系统会生成类似这样的 prompt：

You are an expert quantitative researcher at WorldQuant Brain.
Your goal is to design robust, economically meaningful alpha signals.

TASK (must follow step-by-step):
STEP 1 — Generate 10 alpha signals
- Use ONLY the provided features
- Each alpha must use 2–4 features
- Avoid overly complex formulas
- Each alpha must have clear economic intuition

STEP 2 — Evaluate and rank all 10 alphas
- Rank by economic plausibility + robustness + simplicity

STEP 3 — Output only the top 6 alphas in JSON format

AVAILABLE FEATURES:
- unsystematic_risk_last_360_days | MATRIX | The portion of return variance not explained by SPY (idiosyncratic risk), calculated as 1 minus R² over the last 360 calendar days | source=model51_USA_TOP3000_d1.csv
- correlation_last_30_days_spy | MATRIX | The Pearson correlation coefficient of daily log returns between the security and SPY, calculated over the most recent 30 calendar days | source=model51_USA_TOP3000_d1.csv
- beta_last_30_days_spy | MATRIX | The rolling beta value of the security relative to SPY, calculated via regression over the last 30 calendar days, representing market sensitivity | source=model51_USA_TOP3000_d1.csv
- snt_social_value | MATRIX | Z-score of sentiment | source=socialmedia8_USA_TOP3000_d0.csv
- snt_social_volume | MATRIX | Normalized tweet volume | source=socialmedia8_USA_TOP3000_d0.csv
- scl12_sentiment | MATRIX | sentiment | source=socialmedia12_USA_TOP3000_d0.csv
- scl12_buzzvec | VECTOR | Vector representing the volume of social media sentiment/mentions related to the instrument | source=socialmedia12_USA_TOP3000_d0.csv
- scl12_buzz | MATRIX | relative sentiment volume | source=socialmedia12_USA_TOP3000_d0.csv
- anl4_basicqfv4_maxguidance | VECTOR | Max guidance value | source=analyst4_USA_TOP3000_d0.csv
- anl4_basicconqf_down | VECTOR | Number of lower estimations | source=analyst4_USA_TOP3000_d0.csv

HARD CONSTRAINTS (must not violate):
1. Use only provided features (no external variables)
2. Each alpha must use 2–4 features only
3. Do NOT reuse all features in one alpha
4. Keep formulas interpretable
5. Output must be valid JSON ONLY

VECTOR RULE (important):
- If a feature is vector-type, you MUST apply vec_* operators first
- vec_* outputs can then be used in ts_* or cross-sectional ops

Example:
  vec_avg(x) -> ts_rank(..., 20) -> final alpha
  NOT allowed: ts_rank(raw_vector, 20)

ALLOWED OPERATORS (must use EXACTLY these names, no typos):
abs(x), add(x, y, filter=false), densify(x), divide(x, y), inverse(x), log(x), max(x, y, ..), min(x, y, ..), multiply(x, y, ..., filter=false), power(x, y), reverse(x), sign(x), signed_power(x, y), sqrt(x), subtract(x, y, filter=false), and(input1, input2), if_else(input1, input2, input3), input1 < input2, input1 <= input2, input1 == input2, input1 > input2, input1 >= input2, input1 != input2, is_nan(input), not(x), or(input1, input2), days_from_last_change(x), hump(x, hump=0.01), kth_element(x, d, k, ignore="NaN"), last_diff_value(x, d), ts_arg_max(x, d), ts_arg_min(x, d), ts_av_diff(x, d), ts_backfill(x, lookback=d, k=1), ts_corr(x, y, d), ts_count_nans(x, d), ts_covariance(y, x, d), ts_decay_linear(x, d, dense=false), ts_delay(x, d), ts_delta(x, d), ts_mean(x, d), ts_product(x, d), ts_quantile(x, d, driver="gaussian"), ts_rank(x, d, constant=0), ts_regression(y, x, d, lag=0, rettype=0), ts_scale(x, d, constant=0), ts_std_dev(x, d), ts_step(1), ts_sum(x, d), ts_zscore(x, d), normalize(x, useStd=false, limit=0.0), quantile(x, driver=gaussian, sigma=1.0), rank(x, rate=2), scale(x, scale=1, longscale=1, shortscale=1), winsorize(x, std=4), zscore(x), vec_avg(x), vec_sum(x), bucket(rank(x), range="0,1,0.1", skipBoth=False, NaNGroup=False), trade_when(x, y, z), group_backfill(x, group, d, std=4.0), group_mean(x, weight, group), group_neutralize(x, group), group_rank(x, group), group_scale(x, group), group_zscore(x, group)
- EVERY function that has a `d` parameter MUST include that parameter.
- Correct examples: ts_rank(close, 20), ts_mean(volume, 10), ts_corr(high, low, 5)
- Wrong examples: ts_rank(close), ts_mean(volume), ts_corr(high, low)

FIELD USAGE RULE:
- Each alpha MUST include 'fields_used'
- Must copy exact field objects from AVAILABLE FEATURES
- No hallucinated fields allowed

OUTPUT FORMAT (strict JSON only):

{
  "ranking_analysis": "short explanation of ranking logic",
  "top_alphas": [
    {"rank": 1, "alpha": "...", "idea": "...", "category": "...", "fields_used": [...]},
    {"rank": 2, "alpha": "...", "idea": "...", "category": "...", "fields_used": [...]},
    {"rank": 3, "alpha": "...", "idea": "...", "category": "...", "fields_used": [...]},
    {"rank": 4, "alpha": "...", "idea": "...", "category": "...", "fields_used": [...]}
  ]
}

FINAL RULE: Return ONLY valid JSON. No explanation. No markdown. No extra text.

六、优化 Prompt 示例（核心）

优化阶段 prompt：

You are a senior quantitative researcher at a hedge fund.
Your task is to improve an existing WorldQuant Brain alpha using iterative optimization.
ALWAYS THINK STEP-BY-STEP:

CURRENT ALPHA:
multiply(beta_last_360_days_spy, subtract(ts_mean(vec_avg(anl4_basicqfv4_minguidance), 60), vec_avg(anl4_basicqfv4_minguidance)))

RECENT OPTIMIZATION HISTORY (last 5 iterations):
- Iteration 1: alpha="multiply(beta_last_360_days_spy, subtract(vec_avg(anl4_basicqfv4_minguidance), ts_mean(vec_avg(anl4_basicqfv4_minguidance), 60)))", metrics={'sharpe': -0.26, 'fitness': -0.12, 'turnover': 0.1662, 'sub_universe_sharpe': 0.15, 'total_score': 'badness'}
- Iteration 2: alpha="multiply(beta_last_360_days_spy, subtract(ts_mean(vec_avg(anl4_basicqfv4_minguidance), 60), vec_avg(anl4_basicqfv4_minguidance)))", metrics={'sharpe': 0.26, 'fitness': 0.12, 'turnover': 0.1662, 'sub_universe_sharpe': -0.15, 'total_score': 'badness'}

AVAILABLE FEATURES:
- beta_last_360_days_spy | MATRIX | The rolling beta value of the security relative to SPY, calculated via regression over the last 360 calendar days, representing market sensitivity | source=model51_USA_TOP3000_d1.csv
- systematic_risk_last_30_days | MATRIX | The portion of the security’s return variance attributed to systematic (market) risk, quantified as R² from a regression on SPY, over the last 30 calendar days | source=model51_USA_TOP3000_d1.csv
- unsystematic_risk_last_360_days | MATRIX | The portion of return variance not explained by SPY (idiosyncratic risk), calculated as 1 minus R² over the last 360 calendar days | source=model51_USA_TOP3000_d1.csv
- anl4_basicqfv4_minguidance | VECTOR | Min guidance value | source=analyst4_USA_TOP3000_d0.csv
- anl4_basicconqf_mean | VECTOR | Mean of estimations | source=analyst4_USA_TOP3000_d0.csv
- anl4_bac1conaf_item | VECTOR | Financial item | source=analyst4_USA_TOP3000_d0.csv
- nws18_event_similarity_days | VECTOR | Number of days since a similar event was last detected within the past 365 days, up to 5 decimals | source=news18_USA_TOP3000_d0.csv
- event_end_date_utc | VECTOR | The UTC date when the event concludes. | source=news18_USA_TOP3000_d0.csv
- event_start_time_utc | VECTOR | The UTC timestamp when the event begins. | source=news18_USA_TOP3000_d0.csv
- snt_social_value | MATRIX | Z-score of sentiment | source=socialmedia8_USA_TOP3000_d0.csv

ALPHA RESEARCH TASK:

Your goal is NOT to mechanically improve metrics.
Your goal is to infer the economic hypothesis behind the alpha.

For every optimization:
1. Infer what market behavior the alpha may capture.
2. Determine whether recent modifications strengthened or weakened that hypothesis.
3. Avoid modifications that change the economic meaning unless recent results strongly suggest the current hypothesis is invalid.
4. Prefer transformations that improve signal extraction from the same underlying hypothesis.

CRITICAL CONSTRAINTS:
1. Preserve the exact set of input fields appearing in the current alpha.
   - No new fields, no removed fields, no replaced fields
2. You may modify operators, parameters, weights, lookback windows, and nesting structure.
   - Large changes are allowed if recent metrics are poor.
3. Analyze historical iterations to identify failed modification patterns.
   - Avoid repeating unsuccessful operator families or directions.
4. Prefer modifying no more than TWO operator choices in one iteration unless metrics are very poor for multiple iterations.
5. Consider abandoning the alpha if after 5+ iterations metrics show no meaningful improvement.

AVAILABLE OPERATORS (must use exactly these names, no typos):
abs(x), add(x, y, filter=false), densify(x), divide(x, y), inverse(x), log(x), max(x, y, ..), min(x, y, ..), multiply(x, y, ..., filter=false), power(x, y), reverse(x), sign(x), signed_power(x, y), sqrt(x), subtract(x, y, filter=false), and(input1, input2), if_else(input1, input2, input3), input1 < input2, input1 <= input2, input1 == input2, input1 > input2, input1 >= input2, input1 != input2, is_nan(input), not(x), or(input1, input2), days_from_last_change(x), hump(x, hump=0.01), kth_element(x, d, k, ignore="NaN"), last_diff_value(x, d), ts_arg_max(x, d), ts_arg_min(x, d), ts_av_diff(x, d), ts_backfill(x, lookback=d, k=1), ts_corr(x, y, d), ts_count_nans(x, d), ts_covariance(y, x, d), ts_decay_linear(x, d, dense=false), ts_delay(x, d), ts_delta(x, d), ts_mean(x, d), ts_product(x, d), ts_quantile(x, d, driver="gaussian"), ts_rank(x, d, constant=0), ts_regression(y, x, d, lag=0, rettype=0), ts_scale(x, d, constant=0), ts_std_dev(x, d), ts_step(1), ts_sum(x, d), ts_zscore(x, d), normalize(x, useStd=false, limit=0.0), quantile(x, driver=gaussian, sigma=1.0), rank(x, rate=2), scale(x, scale=1, longscale=1, shortscale=1), winsorize(x, std=4), zscore(x), vec_avg(x), vec_sum(x), bucket(rank(x), range="0,1,0.1", skipBoth=False, NaNGroup=False), trade_when(x, y, z), group_backfill(x, group, d, std=4.0), group_mean(x, weight, group), group_neutralize(x, group), group_rank(x, group), group_scale(x, group), group_zscore(x, group)

OPTIMIZATION GUIDANCE:
1. Analyze recent iterations and identify patterns of failure.
2. Avoid repeating unsuccessful operator families or transformations.
3. Consider structural changes if Sharpe < 0.8 or Fitness < 0.5 for multiple iterations.
4. Ensure economic or statistical justification for every modification.
5. Prefer modifications that improve predictive power, not cosmetic tweaks.
6. If after 5 iterations no improvement, consider action='abandon'.

OUTPUT ONLY VALID JSON (no extra text, no markdown):

{
  "action": "improve" | "abandon",
  "alpha": "optimized expression",
  "idea": "updated economic intuition",
  "category": "momentum/reversal/liquidity/volatility/value/quality",
  "modification_summary": "clearly describe changes in operators, parameters, nesting, and rationale"
}

七、结果

实际运行后：

✔ 优点

可以稳定生成大量 alpha
结构符合 WorldQuant Brain 语法
LLM 能一定程度理解“因子逻辑”

❌ 问题

可交易 alpha 极少
- 大多数 Sharpe < 0
- Fitness 不稳定
token 成本极高
- 每轮优化都带完整历史
- prompt rapidly grows
模型缺乏金融归纳能力
- 仍然偏“语言组合”
- 而不是“统计结构学习”

八、下一步改进方向

目前瓶颈很明显：

通用 LLM ≠ 量化 Alpha 模型

下一步可能的方向：

1. 专用微调模型

用：

历史 alpha
回测结果
成功/失败标签

训练专用模型

2. 压缩 prompt memory

用：

embedding summarization
或 state compression

减少 token 消耗

3. 强化学习优化 alpha space

把问题转成：

policy = alpha generator
reward = Sharpe / Fitness