Python Pandas || Moving Averages and Rolling Window Statistics for Stock Prices



okay I'm going to demonstrate the pandas rolling method and the rolling method can be used to sort of automate a fairly tedious group of calculations one of which is the moving average which I'll be demonstrating and then a rolling standard deviation it can also be used to calculate other rolling window aggregations I'm going to be using pandas 0.21 and I'm gonna make the notebook I'm using Jupiter notebook I'm gonna make it available at a link in the video okay so the first thing we're gonna need to do is sort of set up our environment I'm going to import numpy pandas relies on numpy so even though I'm not gonna use it too much directly we need it to make pandas work all right I am going to also get the pandas data reader and I'm gonna plot a little bit of this so we can see what's going on so I'm using a mat plot lib and I'm using the magic function so that it shows in the notebook all right so I'm gonna run this cell and to run a cell in in the notebook right you can click the the Run button up here or it's probably easier to hold the shift key and press Enter all right and then the next thing we're gonna do is get some data and I'm gonna get the gold ETF I'm gonna download that using the pandas data reader all right and I'm going to get it from Yahoo Finance I saw all it really needs is the symbol all right but if if I don't tell it where to start then it will probably give me more data than I want so I'm gonna go back to you might as well go back to the day of the presidential election I get some data from about a year ago all right and then I just so we can see what we got I will display the first few lines of that so I will go ahead and get it and so we can see you know the typical historical data that you get from Yahoo Finance price action of the day and volume okay all right all we are really interested in for this demonstration is the the closing prices so I'm gonna go ahead and get rid of everything else and since the data frame that this gets downloaded into is a series of series and they don't really have all the properties I want I'm going to have to cast cast this as a data frame all right so with that done we're ready to go ahead and start our demonstration of the rolling method all right and so what I'm gonna do is add a column to the to the data frame I just made and and this is why I made it a data frame because I can't really do this with a series of series is a is a one a sort of a one-dimensional array so I'm going to call this one MA nine so we'll get a a nine day moving average all right and so this is a pretty convenient way to add columns all right so I'll just take the clothes variable I'll get the clothes column all right and then I'll hang the rolling method off the end of it all right so this is the most basic usage of it alright so I just give it the number of periods or rows that I want to include in my aggregation all right and I will get the average there I so I'm gonna do that twice all right so we can compare a short window to a longer one okay so we now have two new columns in here and to prove it we're gonna go ahead and graph that okay and down here I'm sort of in the interest of time I have set up the the plot parameters here so we're gonna make a fairly large plot we're gonna turn on the grid lines and we're going to plot the clothes the nine day moving average and the 21 day moving average all right and so I'll go ahead and run that cell and we can see what happened to gold you know following the presidential election and then leading up until the most recent the recent days okay all right so these are the the moving averages and I and I showed you the most basic usage all right another thing sometimes you want to do is is lag these these averages all right so I'm gonna add a parameter here and the parameter I'm gonna add is Center and I'm going to tell it to be true so this will basically it's a 21 period moving average so this will calculate the 21 day and then start it'll roll it back ten days so that you get sort of a lagged line alright so we'll be able to see that when I regrab it all right so when we graph this we can see that now the Green Line which is the longer moving average has just been shifted back and it's basically instead of being smoothed and pushed over to the right it's now sort of coincident with with the data that it's representing okay all right so that's sort of the most basic lagging activity you can do with a with a moving average now I should just sort of mention that when you do the moving average you are including the the most current observation in your average right so to show you what I mean by that I'll add a cell here okay and then I will look at the nine day moving average all right and I'll get sort of twelve rows so we can see what's going on here all right so if we look at it you can see that okay there are eight blanks and on the ninth one that's where we get the moving average so we're actually including the closing price from 1118 there all right and you may not want to do that if you want to use this as sort of a traditional forecasting tool all right then we will have to shift this forward all right and so I'm gonna hang the shift method off it and by default it's 1 but I could put any integer value in here that I wanted all right and then when I show this again we see that oh okay now we get you know nine nine days in our average all right but I've shifted it forward one day all right so it looks like the 18th was a Friday so the next day was the Monday on the 21st all right so there's there are other there are other parameters you can use in the in the rolling method okay so I showed you Center you could set any leg you want in fact if you wanted to lag it some other way other than on the center I can put a negative integer in here and then roll it all the way back to you know to be the on the first day if I want alright so one more thing I'm going to just show is how to calculate a volatility that's used in usually in pricing options so this is what we will call a historical volatility alright so the first thing I'm going to need to do is add another column okay I'm gonna set that equal to the MP log so here I'm actually gonna use numpy what I'm gonna do is take the clothes and I'm going to divide it by the previous day's close all right and then if we just want to see what that sort of looks like we can plot it and so we end up with a pretty noisy graph here and centered around zero as you might expect and we can sort of get a high-level view of what what happens in gold price are as we can see that okay there are no days where it goes up more than two percent in this in this dataset and there's only one day where it went down by more than two percent so to calculate the historical volatility which usually done is we use a rolling standard deviation to get that and and the number that's often used is the 21-day rolling window so the 21-day is used because there are 21 trading days in a month on average alright so that adds up to 252 trading days in a year so I'm gonna go ahead and calculate that as the rolling 21 all right and then the STD the standard deviation alright and again this will this will include the twenty first observation all right and so really the the volatility should be used from the next day all right otherwise it implies that we know something we don't okay so I'm gonna go ahead and shift that by one all right I'll run that and then we'll plot that all right so when I do that I get a different a sort of a cleaned up version of of what this volatility looks like over time all right so it looks like we just came out of a particularly highly volatile period in gold okay all right so in a follow-on video I will add some some more basic price analysis and that you can do but for now I hope that helps with using the rolling method in pandas

8 Replies to “Python Pandas || Moving Averages and Rolling Window Statistics for Stock Prices”

  1. Hi Matt!
    The data in which I am working on is yearly data and I dont know how to do prediction in python. Could u suggest any method.
    Please help 🙁

  2. 1. You got it the opposite in terms of lagging. The rolling mean by default is lagging. When you use "center=True", that actually takes away the lagging. To reiterate, when you specify centering, that takes away lagging. Not the other way around.
    2. Instead of shifting the rolling mean by 1 to exclude the same row, you should use "closed=Left".

  3. Hi Matt, I'm getting
    ModuleNotFoundError: No module named 'pandas_datareader'
    when I try to import pandas-datareader. Any solutions to this?
    Thanks

  4. Hey Matt, great video 🙂 Do you have an alternative to Yahoo finance now that it's unstable? Or a workaround of some sort

Leave a Reply

Your email address will not be published. Required fields are marked *