Time Series Analysis of a REIT Portfolio

By Andrew Cole

Financial modeling is a technical application as old as finance itself. Traditionally performed through excel sheets and various DCF models, I wanted to set out and use Machine Learning to build a predictive model for the movement of an asset’s price. Within a two week project time period, I set out to analyze and build predictive SARIMAX (Seasonal Auto-Regressive Integrated Moving Average with Exogeneity) models to capture the movement of eight different Real Estate Investment Trusts (REITs).

We first will take apart the time series data to account for correct input parameters when making the model. Once we have optimal parameters selected, multiple exogenous variables will be introduced to see the effects of what outside variables in the financial industry may or may not have impacts on the target variable that we are predicting. Finally, the most important variables relative to each individual REIT will be selected for usage in the final SARIMAX model. The GitHub repository attached will provide further insights into the underlying code for the model builds and can be found here.

Real Estate Investment Trusts

Before we get into the data, it is essential to first understand the nature of a REIT. REITs are just like any other financial asset in that ownership represents a “share” of a companies profit. These companies own and operate income-producing real estate and are usually centric to a particular sector of real estate type (data centers, retail properties, Senior/Assisted living communities, healthcare, etc…). There are two main types of REITS:

Equity: Investments in hard real estate assets. Incomes are generally from traditional real estate sources like rents, and property management fees. They also generate profits from obtaining undervalued real estate in industry-centric spaces.
Mortgage (mREITs): These REITs only invest in mortgages and make up a small percentage of REITs. Incomes are generated primarily from interest on mortgage loans.

They also boast historically some of the highest capital returns when compared to any other type of income-producing asset. The graph below shows annual return rates of multiple asset classes, with REITs performing the highest at a compound rate of almost 12% from 1972-present.

Source: Morningstar

REITs do operate a bit differently than your normal income producing stock or mutual fund. REIT companies are actually required by the SEC to distribute 90% of their annual taxable income to its shareholders in the form of dividends. This allows for an investor to gain diverse & liquid exposure to tangible equity-producing assets with a proven track record of success. So, naturally, it makes sense to try and produce a predictive machine learning model that can give advantages to REIT investing.

The Time Series Analysis

I began by gathering the financial data of eight different REITs (Ticker Symbols: AMT, ELS, PLD, FR, MAA, SUI, BXMT, RHP) from 2000 to present using an API provided by AlphaVantage. The data returned is daily, but daily data will cause for an extremely noisy (lots of variance) time series data so we resample the indices to monthly means of each value. The resampling to monthly indices will also allow for us to account for any seasonality present in the data later on in the model.

Because the financial assets high and low prices can be drastically different depending on market cycles in a day, a monthly mid (average) price is calculated as the target variable for prediction. By using monthly averages, we are also reducing the computational price of performing these calculations (my MacBook Air unfortunately cannot run too deep of data models in the allotted time period). We finally drop the remaining DataFrame features except for the datetime index and ‘mid’ endogenous variable.

We now have 254 monthly average mid prices which we see below:

Choosing Model Parameters

When building the model, we will be tuning the inputs to reflect the underlying movements of the series, so performing a decomposition will allow us to take an in-depth look at what is really going on. To decompose the model, we first take the log of the series to account for the exponential rise (and fall) in prices which we see from 2000 to 2020. We then take a look at three individual components of the series which will have parameter input implications when building the model:

Trend: A continued increase or decrease in the series over time. We need the series to be stationary in movement so that we will be able to identify and assess future changes to the series. The remainders after removal can be seen as irregular and this might reveal changes for future predictions of the series. In our case, we see that FR has a slight trend upwards until the recession of 2008, then a steep decrease, and then back to a gradual increase in trend to the present day.

Seasonality: Time series naturally can hold cycles that repeat over time (monthly, yearly, seasonally). This repeating pattern can provide strong signals to the models when frequent enough which may influence future irregular movements. Identifying and modeling the seasonality is necessary to input as a parameter into our SARIMAX model and the decomposition process will show us if it is present. Below, we see that there is notable monthly seasonality present in the series, which we will note and account for as a parameter input in our model.

Residuals: The residuals are what is left after trend and seasonality have been removed from the time series. These residuals show any abnormality in the time series movement, in particular we can look at whether 2008 is in fact an outlier in the movement of the time series. The plot below shows that the recession caused irregular movement for the duration of the series which signals the values are outliers, evident by the drastic variance from the mean 0.0 line on the plot. Normally, these outliers would be removed from the time series. However, in analysis of all eight REITs of the portfolio, we do not consistently see 2008 behaving as an outlier, therefore we will keep it in the time series for all of the eight REIT models.

Now that we have taken a deep-level account for underlying effects present in the time series data, we want to statistically prove that the model is stationary. Again, we must have a stationary time series meaning that there is constant mean, variance, and auto-correlation through time. We take a one period difference of the model (meaning we take a one-month period difference between the current value and the previous month’s value), to remove any effects in the interim month which may alter those constant assumptions.

Although we have accounted for most of these by taking the exponential log and differencing, we must statistically prove that the series is stationary before we can proceed with modeling. To do this, we will conduct a Dickey-Fuller Test (null hypothesis: the series is not stationary). If the p-value returned is less than 0.05, we can effectively reject the null hypothesis. This particular series returns a p-value of 0.005 which is well below the critical value, allowing us to conclusively say that the time series is stationary and ready for modeling.

Choosing Model Parameters: Autoregressive & Moving Average Orders

Perhaps the two main components of a SARIMAX model are the AR (Autoregressive) and MA (Moving Average) terms.

Autoregressive Term: accounts for the variables regression against itself, accounting for lagged values of the variable (mid price) plus a constant and white noise (a no variance series). This input parameter allows us to account for how a particular instance in the series at a previous point, and all of the time in between, will affect the model at the present time. In simpler terms, if present day is January 31, 2020, how does everything occurring between January 31, 2019 and today impact the movement of the variable for February of 2020.
Moving Average Term: accounts for past errors in a regression model. This is essentially accounting for movements in January of the previous years, with all the in-between movements are removed.

These parameters are determined by looking at Autocorrelation and Partial Autocorrelation plots available in the Statsmodels Python library. For selection of appropriate autoregressive and moving average orders, a SARIMAX model is now performed with the previously selected optimal parameters. The SARIMAX model metric which we will be used for model selection is the AIC & BIC scores. These scores essentially allow for you to see how well your model fits the data available, without over-fitting the data.

Endogenous Model Selection

For the modeling of the Machine Learning model, we must split the data into a train and a test dataset. The training dataset is a majority percentage of the data (2000–2017). The model we build will use this training data to then predict 2017–2020 price movements. Because we already know what the 2017–2020 data actually is, we will layer the predicted price movement over the actual data to see how effective we are with capturing the movements. This is known as validation. If we were to include all of the data from 2000–present, the model would be 100% trained on the previous data and therefore would be overfit. An overfit dataset will interpret the model to 100% dependent on the full dataset and therefore it will not know how to react when new future data is introduced.

The model is now constructed and computed with the following code:

model = sm.tsa.statespace.SARIMAX(FR_train, 
                                  order = (0,1,5), 
                                  seasonal_parameters = (0,1,5,12))

results = model.fit()

FR_train: the FR mid price training dataset from (2000–2017). Normally train sets are 70–80% of the data, but because we have resampled to monthly time periods we reduced our data by a factor of 12 and need to account for it by increasing the size of the train set.
Order: (auto-regressive order, differencing order, moving average order)
Seasonal Parameters: the same as Order, but including a final seasonality term = 12, to tell the model that the time series has monthly seasonality present (which we found in our decomposition process).

We are evaluating the performance of our models by its AIC & BIC scores. These scores essentially tell us how well our model fits the dataset without over-fitting. The lower the score, the more effective your model is at representing the data’s actual movement. Multiple autoregressive and moving average order parameters are executed in the model and the scores are compared to each other, with the lowest returned AIC score being the model which we will choose to move forward with for modeling of the SARIMAX with exogenous variables.

Gathering and Selection of Exogenous Variables

The model which we have just built only contains internal information. However, as we know, there are a vast amount of influences on a financial vehicle’s movement. We want to see if the REITs movements are tied in any way to a variety of financial vehicles, as it is inherent that adding more information into a model will improve the performance of it. To do this, 55 exogenous variables were scraped from the Federal Reserve Economic Database (FRED) all varying in nature. All exogenous variables were resampled to monthly means similar to the endogenous variables as well as split into training and test dataframes. Exogenous variables gathered fall into the following categories:

Interest Rates (48.3%): Federal Prime Lending Rate, 30-Year Mortgage Rates, etc.
Commercial Banking (22.4%): Total Assets, Total Liabilities of all domestic commercial banking establishments, etc.
Financial Indicators (15.5%): CBOE Volatility Index, Consumer Price Index, etc.
Exchange Rates (6.9%): US/Euro FX, US/China FX, etc.
Academic Data (5.2%): Any mathematically derived indicator such as Financial Uncertainty Index, etc.
Prices (1.7%): Average price indices for various sectors

Exogenous Variable Selection: LASSO Regression

Even though we have all of the exogenous variables now, it is extremely unlikely that every variable will have some, if any, impact on the endogenous price variable of the REIT. To reduce the variables used in the models, we will perform a LASSO regression to see which variables have the most impact on the price variable that we are predicting. This is known as an ‘L1’ regularization penalty which essentially just identifies which variables coefficients have a significant impact on the predictive variable. This LASSO regression is run on every single REIT in the portfolio, because REITs are from different industries and may have different impacting variables. The lasso regression returns:

The y-axis represents the name of the variable, and the x-axis represents the value of the coefficients in the regression. We look at the absolute value when determining feature importance. In this case, the FR time series is most significantly impacted by two variables:

dexsius: Singapore/US FX Rate
dexusuk: UK/US FX Rate

Final SARIMAX Model

Now that we have the exogenous variables selected per the REIT, we can now run the model with the final exogeneity in place. The code is as follows:

model = sm.tsa.statespace.SARIMAX(FR_train,
                                  exog = exog_train,
                                  order = (0,1,5),
                                  seasonal_order = (0,1,5,12),
                                  enforce_stationarity = False,
                                  enforce_invertability = False,
                                  trend = 't')
results = model.fit()
predictions = results.get_prediction(start = pd.todatetime('2015-01-01),
                                     end = pd.todatetime('2020-01-01'),
                                     dynamic = False,
                                     exog = exog_test)
pred_conf = predictions.conf_int()

The parameters from the endogenous model will remain the same and we are adding the exogenous variables training set to the model. This model uses a ‘one-step ahead forecast’, in which it is predicting one month (one time series step) forwards, and then recalculating for the next step. The model, when graphed, looks like this:

The orange line is the original mid price movement, and the blue dashed line is the model’s forecasted price. The blue shaded region around the blue prediction line shows us that we are 95% confident that the value, given unexpected events, would fall within this region.

Conclusions

The model result is positive, as we generally capture the movement of the price pretty well, but not exactly. The autoregressive terms of this particular series prove to be less useful than the moving average terms, highlighting the fact that the 2008 recession was a statistical anomaly which is definitely weighing heavily on the performance of the model. Furthermore, this model is lacking sufficient data sizes. We reduced the data size tremendously when we resampled to monthly time frames, so general capture of the actual movements of the share’s mid-price is sufficient as an outcome, but I would not take this model to the application without increasing the data size.

Furthermore, parameter selection would be greatly enhanced given access to more computational power. Given the presence of a GPU or two, a grid search method would be employed on top of an Auto-ARIMA model which would iterate through all possible AR & MA terms and return the most optimal ones. On my computer, iterating through just one of these models takes about three hours with sub-optimal results.

The model predictions of all eight REITs in the portfolio can be seen below, note that some were more effective than others at capturing data movement.

AMT:

ELS:

PLD:

MAA:

BXMT:

SUI:

RHP: