PLUS会员

邹的作业

由bq9dhg5r创建,最终由bq9dhg5r 被浏览 11 用户

手工通过大模型挖掘因子

一、改为deepseek API


二、因子挖掘

user = """
日期为2022-01-01到2024-01-01
我想做一个能够反应量价关系和个股财务情况相关的因子
"""
{
    "Thought": "为了构建一个能够反映量价关系和个股财务情况的因子,我首先需要获取基础的量价数据和财务数据。量价数据包括收盘价、成交量等,财务数据可以选择市盈率TTM(pe_ttm)作为起点,因为它能反映公司的盈利能力和估值水平。",
    "Action": "get_data",
    "Action_input": {
        "feature_name": ["close", "volume", "pe_ttm"],
        "start_date": "2022-01-01",
        "end_date": "2024-01-01"
    }
}
user = """
已使用 get_data 抽取表格 table_1 数据, 表格包含字段 date, instrument, close, volume, pe_ttm
"""
{
    "Thought": "为了构建量价关系因子,我首先计算价格与成交量的乘积,这可以反映资金流动对价格的影响。这是一个常见的量价关系指标,能够捕捉市场参与者的活跃程度。",
    "Action": "times",
    "Action_input": {
        "data": "table_1",
        "feature_name_1": "close",
        "feature_name_2": "volume"
    }
}
user = """
已使用 divide 将表格 table_1 中的字段  close作了20日均线运算, 并保存到了表 table_2 中, 该表包含字段 date, instrument, close, volume, (m_avg(close, 20))
"""
print(model(user))
{
    "Thought": "现在我将计算市盈率TTM的截面标准化值,这可以消除市场整体估值水平的影响,更好地反映个股的相对估值水平。这在价值投资中很重要,因为我们需要找到相对于市场被低估的股票。",
    "Action": "c_z_normalize",
    "Action_input": {
        "data": "table_2",
        "feature_name": "pe_ttm"
    }
}

user = """
已使用 c_z_normalize 将表格 table_2 中的字段 (pe_ttm) 作了截面(或绝对值)运算, 并保存到了表 table_3 中, 该表包含字段 date, instrument, close, volume, (m_avg(close, 20)),c_zscore(pe_ttm)
"""
print(model(user))

```json
{
    "Thought": "为了结合量价和财务信息,我将计算价格与20日均线的偏离度,这反映了股票价格相对于其近期趋势的位置。同时,我将把这个偏离度与标准化后的市盈率相乘,创建一个综合因子。这能同时捕捉趋势强度和估值水平。",
    "Action": "divide",
    "Action_input": {
        "data": "table_3",
        "feature_name_1": "close",
        "feature_name_2": "m_avg(close, 20)"
    }
}

user = """ 已使用 divide 将表格 table_3 中的字段 (close) 和 (m_avg(close, 20))作了divide运算, 并保存到了表 table_4 中, 该表包含字段 date, instrument, close, volume, (m_avg(close, 20)),c_zscore(pe_ttm),(close / m_avg(close, 20)) """ print(model(user))

{
    "Thought": "现在我将把价格偏离度指标与标准化市盈率相乘,创建一个综合因子。这个因子能同时反映股票的趋势强度(量价关系)和相对估值水平(财务指标)。这种结合在行为金融学中很有意义,因为它同时考虑了市场情绪和基本面价值。",
    "Action": "times",
    "Action_input": {
        "data": "table_4",
        "feature_name_1": "close / m_avg(close, 20)",
        "feature_name_2": "c_zscore(pe_ttm)"
    }
}

user = """ 已使用 times 将表格 table_5 中的字段 "close / m_avg(close, 20)"和"c_zscore(pe_ttm)"作了乘法运算, 并保存到了表 table_5 中, 该表包含字段 date, instrument, close, volume, (m_avg(close, 20)),c_zscore(pe_ttm),(close / m_avg(close, 20)),((close / m_avg(close, 20)) * c_zscore(pe_ttm)) """ print(model(user))

{
    "Thought": "为了进一步优化这个综合因子,我将计算其20日移动标准差,这可以反映因子值的稳定性。波动性较小的因子可能更具预测性。这在量化投资中很重要,因为稳定的信号通常更可靠。",
    "Action": "m_std",
    "Action_input": {
        "data": "table_5",
        "feature_name": "(close / m_avg(close, 20)) * c_zscore(pe_ttm)",
        "N": 20
    }
}

user = """ 已使用 m_std 将表格 table_6 中的字段 (close / m_avg(close, 20)) * c_zscore(pe_ttm)"作了20日标准差, 并保存到了表 table_6 中, 该表包含字段 date, instrument, close, volume, (m_avg(close, 20)),c_zscore(pe_ttm),(close / m_avg(close, 20)),((close / m_avg(close, 20)) * c_zscore(pe_ttm)),m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20) """ print(model(user))

{
    "Thought": "现在我将用综合因子除以其20日移动标准差,创建一个风险调整后的因子值。这种标准化处理可以使因子值在不同股票间更具可比性,同时降低波动性过大的股票的影响。这在构建多因子模型时是常见做法。",
    "Action": "divide",
    "Action_input": {
        "data": "table_6",
        "feature_name_1": "(close / m_avg(close, 20)) * c_zscore(pe_ttm)",
        "feature_name_2": "m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)"
    }
}

user = """ 已使用 divide 将表格 table_6 中的字段 "(close / m_avg(close, 20)) * c_zscore(pe_ttm)","m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)"作了相除, 并保存到了表 table_7 中, 该表包含字段 date, instrument, close, volume, (m_avg(close, 20)),c_zscore(pe_ttm),(close / m_avg(close, 20)),((close / m_avg(close, 20)) * c_zscore(pe_ttm)),m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20),(((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)) """ print(model(user))

{
    "Thought": "为了进一步增强因子的预测能力,我将计算风险调整后因子的20日移动平均值与当前值的差值。这个差值可以反映因子近期变化趋势,捕捉市场情绪的变化。这在动量策略中很有效,能识别因子加速改善的股票。",
    "Action": "sub",
    "Action_input": {
        "data": "table_7",
        "feature_name_1": "(((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20))",
        "feature_name_2": "m_avg((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)), 20)"
    }
}

user = """ 已使用 sub 将表格 table_7 中的字段 "(((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20))","m_avg((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)), 20)"作了运算, 并保存到了表 table_8 中, 该表包含字段 date, instrument, close, volume, (m_avg(close, 20)),c_zscore(pe_ttm),(close / m_avg(close, 20)),((close / m_avg(close, 20)) * c_zscore(pe_ttm)),m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20),((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)) - m_avg((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)), 20)) """ print(model(user))

{
    "Thought": "最后,我将对这个差值因子进行截面标准化处理,使其在不同股票间具有可比性。这种标准化处理可以消除市场整体波动的影响,使因子真正反映个股的相对强弱。这是构建多因子模型的最后关键步骤。",
    "Action": "c_z_normalize",
    "Action_input": {
        "data": "table_8",
        "feature_name": "((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)) - m_avg((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)), 20))"
    }
}

user = """ 已使用 c_z_normalize 将表格 table_8 中的字段 "((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)) - m_avg((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)), 20))"作了运算, 并保存到了表 table_8 中, 该表包含字段 date, instrument, close, volume, (m_avg(close, 20)),c_zscore(pe_ttm),(close / m_avg(close, 20)),((close / m_avg(close, 20)) * c_zscore(pe_ttm)),m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20),((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)) - m_avg((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)), 20)),c_zscore(((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)) - m_avg((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)), 20))) """ print(model(user))

{
    "Thought": "现在我将评估最终构建的复合因子效果。这个因子结合了量价关系(价格与均线偏离度)、财务指标(标准化市盈率)、风险调整(波动率标准化)和趋势变化(移动平均差值)等多个维度,是一个综合性的多因子指标。通过评估可以验证其预测能力。",
    "Action": "eval_factor",
    "Action_input": {
        "start_date": "2022-01-01",
        "end_date": "2024-01-01",
        "feature_name": "c_zscore(((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)) - m_avg((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)), 20)))"
    }
}

'因子 c_zscore(((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)) - m_avg((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)), 20))) 的 IC 为 0.0005, 因子累计收益为 -0.037, 年化收益为 -0.022, 夏普比例为 0.017, 年化波动率为 0.228, 最大回撤为 -0.215'

user = """ 已使用 eval_factor 作因子分析: 因子 c_zscore(((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)) - m_avg((((close / m_avg(close, 20)) * c_zscore(pe_ttm)) / m_stddev(((close / m_avg(close, 20)) * c_zscore(pe_ttm)), 20)), 20))) 的 IC 为 0.0005, 因子累计收益为 -0.037, 年化收益为 -0.022, 夏普比例为 0.017, 年化波动率为 0.228, 最大回撤为 -0.215' """ print(model(user))

{
    "Thought": "当前因子的IC值接近0,说明预测能力较弱。我们需要进一步优化因子结构。考虑到市盈率TTM可能受行业影响较大,我将引入市净率(PB)作为补充估值指标,构建更全面的估值因子组合。",
    "Action": "get_data",
    "Action_input": {
        "feature_name": ["pb"],
        "start_date": "2022-01-01",
        "end_date": "2024-01-01"
    }
}

的 IC 为 0.0005, 因子累计收益为 -0.037, 年化收益为 -0.022, 夏普比例为 0.017, 年化波动率为 0.228, 最大回撤为 -0.215'

完成9轮对话,形成并测试了因子,效果不理想。手工做的比较累,还是让它自动化循环挖掘比较好。应该开放有剔除已挖掘因子的功能。

标签

因子挖掘
{link}