碧沼
V1
2022/10/29阅读:35主题:默认主题
python时间序列VAR
向量自回归VAR
VAR模型简介
VAR是一种随机过程双向模型,可以捕捉时间序列之间多个变量的线性关系,变量之间存在相互影响关系,因此称为双向模型。
AR(p)的模型形式:
在VAR模型中,每个变量都是一个自回归或引入其他变量过去值的回归过程。本文展示一个最简单的VAR(1)情况。
VAR模型与单变量的时间序列模型具有同样的评价标准,AIC、BIC、HQIC、FPE等。
VAR模型的Python实现
本文使用的第三方库如下:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.api import VAR
import numpy as np
from statsmodels.tsa.stattools import adfuller
from sklearn import metrics
from timeit import default_timer as timer
import warnings
import tushare as ts
import cufflinks as cf
import plotly.offline as plyo
warnings.filterwarnings("ignore")
数据通过tushare获取贵州茅台600519的开、高、低、收日线数据。并绘制K线图展示如下:
data = ts.pro_bar('600519.SH').set_index('trade_date').sort_index()[['open','high','low','close']]
qf = cf.QuantFig(data[4500:],title='values',legend='top',name='1')
qf.add_bollinger_bands(periods=60,boll_std=2)
plyo.iplot(qf.iplot(asFigure=True))

接下来本文对变量的分布和时间序列的稳健性进行检验。
for c in data[['open', 'high', 'low', 'close']]:
plt.figure(1, figsize=(15,6))
plt.subplot(211)
plt.title(f"{str(c)} Histogram before stationary")
data[str(c)].hist()
plt.subplot(212)
data[str(c)].plot(kind='kde')
plt.title(f"{str(c)} Kernal Density Estimator before stationary")
plt.show()

这里使用本公众号前文定义的时间序列评价函数和平稳性检验函数。
def timeseries_evaluation_fun(y_true,y_pred):
def mean_absolute_percentage_error(y_true,y_pred):
y_true,y_pred = np.array(y_true),np.array(y_pred)
return np.mean(np.abs((y_true-y_pred)/y_true))*100
print('Evakuation metric results:')
print(f'MSE is :{metrics.mean_squared_error(y_true,y_pred)}')
print(f'MAE is :{metrics.mean_absolute_error(y_true,y_pred)}')
print(f'RMSE is :{np.sqrt(metrics.mean_squared_error(y_true,y_pred))}')
print(f'MAPE is :{mean_absolute_percentage_error(y_true,y_pred)}')
print(f"R2 is :{metrics.r2_score(y_true,y_pred)}",end='\n\n')
def adf_test_func(series,column_name):
print(f'Results of DF test for column: {column_name}')
dftest = adfuller(series,autolag='AIC')
dfoutput = pd.Series(dftest[0:4],index=['Test Statistic','p-value','No Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Values(%s)'%key] = value
print(dfoutput)
if dftest[1]<=0.05:
print('Conclusion: ----->')
print('Reject the null hypothesis')
print('Data is stationary')
else:
print('Conclusion: ----->')
print('Fail to reject the null hypothesis')
print('Data is non-stationary')
平稳性检验:
for name, column in data[['open', 'high', 'low', 'close']].iteritems():
adf_test_func(data[name],name)
print('\n')

显然数据不平稳,应进行差分,同时为了后续检验,将数据最后100天设定为测试集。
train,test = data[0:-100],data[-100:]
train_diff = train.diff().dropna()
for name, column in train_diff[['open', 'high', 'low', 'close' ]].iteritems():
adf_test_func(train_diff[name],name)
print('\n')
for c in train_diff[['open', 'high', 'low', 'close']]:
train_diff[str(c)].plot(figsize=(15, 6))
plt.xlabel("Date")
plt.ylabel(c)
plt.title(f"{str(c)} price of 600519 stocks after stationary")
plt.show()


协整检验:
from statsmodels.tsa.vector_ar.vecm import coint_johansen
def cointegration_test(df):
res = coint_johansen(df,-1,5)
d = {'0.90':0,'0.95':1,"0.99":2}
trace = res.lr1
cvts = res.cvt[:,d[str(1-0.05)]]
def adjust(val,length=6):
return str(val).ljust(length)
print('column name > test stat >c(95%) -> signif \n','--'*20)
for col,trace,cvt in zip(df.columns,trace,cvts):
print(adjust(col),'>',adjust(round(trace,2),9),'>',adjust(cvt,8),'->',trace>cvt)
cointegration_test(train_diff[['open','high','low','close']])

选择AR(p)不同阶数。
for i in range(1,20):
model = VAR(train_diff)
results = model.fit(i)
print(f"order:{i},aic={results.aic},bic={results.bic}")

综合AIC和BIC准则,本文选择13阶。
我们的模型输入是差分后序列,结果也是差分后序列,因此需要进行还原。定义还原函数如下:
def inverse_diff(actual_df,pred_df):
df_res = pred_df.copy()
columns = actual_df.columns
for col in columns:
df_res[str(col)+'_1_inv_diff'] = actual_df[col].iloc[-1]+df_res[str(col)].cumsum()
return df_res
对模型进行估计和预测:
results = model.fit(13)
results.summary()
z = results.forecast(y = train_diff[['open','high','low','close']].values,steps=100)
df_pred = pd.DataFrame(z,columns = ['open','high','low','close'])
df_pred.index = data.index[-100:]
res = inverse_diff(train[['open','high','low','close']],df_pred)
for i in ['open', 'high', 'low', 'close' ]:
print(f'Evaluation metric for {i}')
timeseries_evaluation_fun(test[str(i)],res[str(i)+'_1_inv_diff'])
import matplotlib.pyplot as plt
for i in ['open', 'high', 'low', 'close' ]:
plt.figure()
plt.plot(train[str(i)], label='Train '+str(i))
plt.plot(test[str(i)], label='Test '+str(i))
plt.plot(res[str(i)+'_1_inv_diff'], label='Predicted '+ str(i))
plt.legend(loc='best')
plt.show()
模型表现:


VARMA模型简介
VARMA模型是ARMA模型的一种拓展,包含VAR和VMA两部分,针对多个并行的ARMA过程的一种复合模型,该方法适合用于去除趋势和季节成分的多变量时间序列。
以VAR(1)和VMA(1)组成的VARMA(1,1)模型为例,模型形式如下:
实际上,与公众号前文类似,VARMA模型也可以引入外生变量,变成VARMAX模型,此时对金融股价序列的变化,跟踪误差会极大减少。
VARMA模型的python实现
VARMA模型的实现可以参考statsmodels的官方文档,VARMAX models — statsmodels
与前文VAR模型的实现过程基本一致,仅需要更改模型函数。
作者介绍
碧沼
V1