Modeling and Analysis of Bitcoin Volatility Based on ARMA-EGARCH Model (2)
2. Smoothness of the time series
If it is a non-stationary series, it needs to be adjusted approximately to a stationary series. The common way is to do difference processing. Theoretically, after many times of difference, the non-stationary series can be approximated to a stationary series. If the covariance of the sample series is stable, the expectation, variance and covariance of its observations will not change with time, indicating that the sample series is more convenient for inference in statistical analysis.
The unit root test, namely ADF test, is used here. ADF test uses t test to observe significance. In principle, if the series does not show obvious trend, only constant items are retained. If the series has trend, the regression equation should include both constant items and time trend items. In addition, AIC and BIC criteria can be used for evaluation based on information criteria. If formula is required, it is as follows:
In [8]:
stable_test = kline_all['log_return']
adftest = sm.tsa.stattools.adfuller(np.array(stable_test), autolag='AIC')
adftest2 = sm.tsa.stattools.adfuller(np.array(stable_test), autolag='BIC')
output=pd.DataFrame(index=['ADF Statistic Test Value', "ADF P-value", "Lags", "Number of Observations",
"Critical Value(1%)","Critical Value(5%)","Critical Value(10%)"],
columns=['AIC','BIC'])
output['AIC']['ADF Statistic Test Value'] = adftest[0]
output['AIC']['ADF P-value'] = adftest[1]
output['AIC']['Lags'] = adftest[2]
output['AIC']['Number of Observations'] = adftest[3]
output['AIC']['Critical Value(1%)'] = adftest[4]['1%']
output['AIC']['Critical Value(5%)'] = adftest[4]['5%']
output['AIC']['Critical Value(10%)'] = adftest[4]['10%']
output['BIC']['ADF Statistic Test Value'] = adftest2[0]
output['BIC']['ADF P-value'] = adftest2[1]
output['BIC']['Lags'] = adftest2[2]
output['BIC']['Number of Observations'] = adftest2[3]
output['BIC']['Critical Value(1%)'] = adftest2[4]['1%']
output['BIC']['Critical Value(5%)'] = adftest2[4]['5%']
output['BIC']['Critical Value(10%)'] = adftest2[4]['10%']
output
Out[8]:
The original assumption is that there is no unit root in the series, that is, the alternative assumption is that the series is stationary. Test P value is far less than 0.05% confidence level cut-off value, reject the original assumption, so the log rate of return is a stationary series, can be modeled by using statistical time series model.
3. Model identification and order determination
In order to establish the mean value equation, it is necessary to do an autocorrelation test on the sequence to ensure that the error term does not have autocorrelation. First, try to plot autocorrelation ACF and partial correlation PACF as follows:
In [19]:
tsplot(kline_all['log_return'], kline_all['log_return'], title='Log Return', lags=100)
Out[19]:
It can be seen that the effect of truncation is perfect. At that moment, this picture gave me an inspiration. Is the market really invalid? In order to verify, we will do autocorrelation analysis on the return series and determine the lag order of the model.
The commonly used correlation coefficient is to measure the correlation between it and itself, that is, the correlation between r(t) and r (t-l) at a certain time in the past:
Then let's do a quantitative test. The original assumption is that all autocorrelation coefficients are 0, that is, there is no autocorrelation in the series. The test statistics formula is written as follows:
Ten autocorrelation coefficients were taken for analysis, as follows:
In [9]:
acf,q,p = sm.tsa.acf(kline_all['log_return'], nlags=15,unbiased=True,qstat = True, fft=False) # Test 10 autocorrelation coefficients
output = pd.DataFrame(np.c_[range(1,16), acf[1:], q, p], columns=['lag', 'ACF', 'Q', 'P-value'])
output = output.set_index('lag')
output
Out[9]:
According to the test statistic Q and P-value, we can see that the autocorrelation function ACF gradually becomes 0 after order 0. The P-values of Q test statistics are small enough to reject the original assumption, so there is autocorrelation in the series.
To be continued...