worldquantbrain_5_vector

杰瑞发布于2025-09-26

vector-> matrix->stock positions
  1. plot n. 阴谋;情节;图
  2. undergo vt. 经历,经受;忍受
  3. macro adj. 巨大的,大量的
Vector Data are a distinct type of data fields that do not have a fixed size. In such type of data fields, the number of events recorded per day per instrument varies, so they are typically stored in a vector. This is unlike regular matrix data that you work with, which has one value per day per instrument. For example: If a dataset covers news data, it could be a vector because for each instrument there can be different number of news/events happening hence, covering such information in a matrix data tends to result in missing useful information. For example, a vector field reporting multiple news events for a single instrument in a day.
矢量数据是一种不同类型的数据字段,没有固定的大小。在这种类型的数据字段中,每个仪器每天记录的事件数量各不相同,因此它们通常存储在向量中。这与您使用的常规矩阵数据不同,后者每个仪器每天都有一个值。例如:如果一个数据集涵盖了新闻数据,它可能是一个向量,因为对于每种工具,可能会发生不同数量的新闻/事件,因此,在矩阵数据中覆盖这些信息往往会导致丢失有用的信息。例如,一个矢量场在一天内报告单个仪器的多个新闻事件。
Now, whenever you write an Alpha expression, the end result is a matrix of Alpha values which is the position that is taken in the market. And all the operators on platform are made for matrix input, hence use the matrix operator only after using the vec_ operators to convert the vector data field to matrix field. This is done by aggregating vector for each day and instrument into a single value like a matrix has. The same is depicted in figure below:
现在,无论何时编写Alpha表达式,最终结果都是一个Alpha值矩阵,即市场中的头寸。平台上的所有运算符都是为矩阵输入而设计的,因此只有在使用vec_运算符将向量数据字段转换为矩阵字段后,才能使用矩阵运算符。这是通过将每天的向量和仪器聚合成一个单一的值来实现的,就像矩阵一样。如下图所示:
Following are the different operators to convert vector data field into a matrix each differing in the way vector for a particular date and instrument is aggregated to a single value:
以下是将向量数据字段转换为矩阵的不同运算符,每个运算符在特定日期和仪器的向量聚合为单个值的方式上都不同:
a general observation that when a stock has high intensity, it follows momentum and when a stock has low news intensity, it follows reversion, we require news count data.
我们需要新闻计数数据,这是一个普遍的观察,当一只股票具有高强度时,它遵循动量,当一支股票具有低新闻强度时,则遵循回归。
reversion
回归
the plots for average value
平均值图
as an input as well
也作为输入
we just need one value of sentiment
我们只需要一种情感价值
if there are 5 sentiment scores and we have to use just one, generally mean of all those scores can be reasonable representative of sentiment in that entire day.
如果有5个情绪评分,我们只能使用一个,那么所有这些评分的平均值通常可以合理地代表当天的情绪。
So, to convert this sentiment vector to a matrix field, we will use vec_avg(scl15_d1_sentiment).
因此,为了将这个情感向量转换为矩阵字段,我们将使用vec_avg(scl15_d1_emotion)。
If you think a median could be a better representative, you can use vec_median(scl15_d1_sentiment) instead.
如果你认为中位数可能是一个更好的代表,你可以使用vec_median(scl15_d1_sentiment)代替。
Below are again the average value and turnover plots for the vec_avg field. Average value hovers densely around 15,000 and turnover around 130%. Here as well, you need to reduce turnover by using ts_rank or ts_decay in your Alpha expression.
下面是vec_avg字段的平均值和周转图。平均价值在15000左右,成交率在130%左右。在这里,您也需要在Alpha表达式中使用ts_rank或ts_decay来减少周转率。
Both Delay-0 and Delay-1 Alphas[1] (referred to as D0 and D1 Alphas respectively throughout) try to capitalize by rebalancing the Alpha positions daily. D0 Alphas are Alphas that also try to benefit from using the most recent information. These Alphas utilize the same available data during the day and usually simulate trades some period before the market close.
Delay-0和Delay-1阿尔法[1](分别称为D0和D1阿尔法)都试图通过每天重新平衡阿尔法头寸来实现资本化。D0阿尔法是也试图从使用最新信息中受益的阿尔法。这些阿尔法在白天利用相同的可用数据,通常在市场收盘前的一段时间内模拟交易。
Due to D0 Alphas normally having higher turnover than D1 Alphas, to compensate for the increasing transaction costs, higher Sharpe and higher returns are required. But there are also other tests that you have to consider, such as the SubUniverse test and the RobustUniverse test for the CHN region. Good performance in the liquid universe means that the Alpha should have higher capacity.
由于D0 Alphas的营业额通常高于D1 Alphas,为了弥补不断增加的交易成本,需要更高的夏普和更高的回报。但你也必须考虑其他测试,例如针对CHN地区的亚宇宙测试和RobustUniverse测试。在液体宇宙中的良好性能意味着阿尔法应该具有更高的容量。
The transition you make from being a user to a consultant is not straightforward since you will work on many open-ended problems in the process of making alphas. There are no right or wrong answers in the alpha-making process since quant research is a mixture of creative and scientific thinking.
从用户到顾问的过渡并不简单,因为在制作阿尔法的过程中,你将处理许多开放式问题。在阿尔法制造过程中没有对错的答案,因为定量研究是创造性和科学思维的混合体。
Typically, the alpha research cycle is comprised of 3 stages: Coming up with an intuitive alpha idea; Implementing it using the available datasets and operators; and Optimizing the parameters and neutralization settings of the alpha in order to submit it in its best form.
通常,阿尔法研究周期由3个阶段组成:提出一个直观的阿尔法想法;使用可用的数据集和运算符来实现它;优化阿尔法的参数和中和设置,以便以最佳形式提交。
While the alpha idea, datasets and operators keep changing, the techniques and methodologies needed to achieve the three remain largely the same. Hence, the following posts are a generic guide for your alpha research journey and will act as solid foundations for you:
虽然阿尔法思想、数据集和运算符不断变化,但实现这三个目标所需的技术和方法基本保持不变。因此,以下帖子是您阿尔法研究之旅的通用指南,将为您奠定坚实的基础:
These readings answer some of the most common problems faced during the process of making alphas. Incorporating the practices suggested above should assist you in your journey with WorldQuant Brain, and may lead to significantly better results in the long run.
这些阅读材料回答了制作阿尔法过程中面临的一些最常见的问题。结合上述建议的实践应该有助于您在WorldQuant Brain的旅程中,并可能从长远来看带来更好的结果。
How to get a higher Sharpe? How to potentially increase returns of an alpha? How to reduce correlation of an alpha? How to reduce turnover? How to potentially decrease PnL fluctuations? How to gain intuition for Neutralization? How to avoid overfitting?
如何获得更高的夏普?如何潜在地增加阿尔法的回报?如何降低阿尔法的相关性?如何减少营业额?如何潜在地降低PnL波动?如何获得中和的直觉?如何避免过度拟合?
Neutralization is an operation in which the raw Alpha values are split into groups, and then normalized (the mean is subtracted from each value) within each group. The group can be the entire market, or the groups could be made using other classifications like industry or sub-industry.
中和是一种操作,其中原始阿尔法值被分成几组,然后在每组内进行归一化(从每个值中减去平均值)。该组可以是整个市场,也可以使用其他分类(如行业或子行业)进行分组。
This is done to focus on the relative returns of stocks within the group, and minimize risk exposure to the returns of the group. As a consequence of neutralization, the portfolio is half long, half short, and may guard the portfolio from market or industry shocks.
这样做是为了关注组内股票的相对回报,并尽量减少组内回报的风险敞口。作为中和的结果,投资组合是半长半短,可以保护投资组合免受市场或行业冲击。
Below are some of the recommended neutralization based on the dataset category. We highly recommend you to try these in your research
以下是根据数据集类别推荐的一些中和方法。我们强烈建议您在研究中尝试这些方法
Suppose we have Alpha = -ts_delta (close, 5), where Alpha is the vector of values. Setting neutralization = market, would make the mean of the Alpha vector equal to zero, i.e. the Alpha vector would undergo the change: Alpha = Alpha - mean(Alpha).
假设我们有Alpha=-ts_delta(close,5),其中Alpha是值的向量。设置中和=市场,将使阿尔法向量的均值等于零,即阿尔法向量将发生变化:阿尔法=阿尔法-均值(阿尔法)。
This new vector is then normalized and scaled to booksize. The portfolio thus formed would contain equal money invested in long and short positions, and can be used to calculate that day's PnL.
然后,这个新向量被归一化并缩放到书本大小。由此形成的投资组合将包含投资于多头和空头头寸的相等资金,可用于计算当天的PnL。
This ensures that your Alpha is long short neutral.
这确保了你的阿尔法是长短中性的。