hello world it’s Suraj and stock price prediction how can we use AI to predict stock prices this is a part of the AI for business series that I’m doing and what you’re seeing behind me is a demo of using AI to predict stock prices so in this is actually a game what I’m doing is I’m competing against an AI to see who can make better buys and sells and this is a simulated market so I’m gonna say I’m going to buy some stock right here okay so you see this dark blue on the AI already beat me so I can start over again but the idea in this simulation is that the AI and I will both be buying and selling stocks and we’re gonna see who can make the better buy sell orders in this simulated environment so it’s gonna start like this I can speed it up I can slow it down and I’m gonna say I’m gonna buy a stock right here and then notice that this stock is bought it’s gonna go up and down and up and down I can speed up the timeline and I’m gonna say okay let’s wait for it to go up Oh time to sell okay so I sold it there and now the AI is doing the same thing I’m gonna buy it here and then I’m gonna try to sell it higher than I bought it right that’s the point come on you know you could there we go so I made a little kind of well not really some kind of profit but the AI is gonna do the same we’re gonna see who has better so that’s the AI buying right there and it’s gonna learn over time so this is the demo that we’re going to build today it’s using the linear regression model but in this video we’re gonna go over several different types of AI applications when it comes to finance it’s not just about stock prediction AI can be used immensely in the field of FinTech and we’re going to talk about how we’re gonna talk about who is doing this we’re gonna talk about everything so sit down and get ready for this so how do we start here what do we talk about first well we know that the finance industry in general has been an early pioneer of AI technologies since the 70s Wall Street has been using predictive models to try to predict the prices of the market right how is what trend what direction is the market moving in how can these hedge funds best allocate their funds such that they are optimizing how much money they’re earning at any given point right there’s so many different data points out there on the web and these techniques are very closely guarded as secrets they don’t want us to know how they’re doing things right that’s their a trade secret why would they open-source that that’s that’s their value that’s that’s how they make money but to me I don’t care like I’m making enough money from YouTube ads and you know partnerships and etc so I don’t care I just want you guys to you know make make that money so just listen up for you know all the different ways I’m gonna tell you how it’s done so there’s so many different data points out there rights there’s tweets there’s reddit posts there’s in general sentiment analysis right from news headlines from people consumers businesses how are they feeling about a specific stock so we can think of that as classification at the same time we can use all of the financial metadata think of what’s all those financial data points and dividends that are being sent stock prices obviously but PNE all of these different little financial metrics these numerical data points how can we use that to perform a regression analysis as in time series analysis given these numbers in the past what is the number in the future and if you think about it it’s not a single variant problem it’s a multivariate problem but we can graph that out in a line that’s two-dimensional for us to view later on and I’ll talk about that as well but basically what I’m trying to say is that there are different ways that we can classify this problem it can be a classification problem it could be a regression problem there are different models we can use we can use a neural network we can use a support vector machine we can use linear regression so we’ll go into that later on but Citibank or Citigroup estimates that the biggest banks have doubled the number of people that they employ to handle compliance and regulation and this is costs of banking industry billions of dollars lots and loss of money so this was a very interesting survey that I found where a bunch of banks were asked are you considering deploying an AI solution in the next 18 months and the majority said that they have it on their roadmap the majority of the responses within the next 18 months so there is an opportunity here to work with these bigger FinTech companies and say hey I know you had this on your road Mac I have this great solution use it try it out and you won’t be disappointed right so for those of you who want to start an anti startup in FinTech this is a great opportunity and 90% of the world’s data has been collected in the past two years never before has this happened so the opportunity the time is now if you’re going to do something now is the time because we are in this Renaissance of data and there’s so much of it and nobody has any idea of how what to do with it or about AI in general or how it works so you if you have some interest in both AI and in finance it’s your responsibility to do something good for people using that make companies more profitable good companies and give people jobs right be a job creator in this way like what numr is doing they’re giving jobs to data scientists across the world they’re putting real money into the pockets of talented developers by crowdsourcing they’re hedge fund that’s one great idea we’ll get into later but if you’re a CFO of a big company your job is to figure out in what ways a I can help increase the efficiency of your organization operational accounting financial reporting allocations adjustments reconciliations intercompany transactions these are just a few ways that AI can be used to make your company more efficient and then in another survey where companies were asked where do you expect AI to be introduced in your organization in the next three years most said risk assessment which is a really interesting response and we’ll talk about that and in general but to really drive my point home the market for AI in finance is expected to grow from 1.3 billion last year to seven point four billion in 2022 that’s four years from now that’s a lot so that the market will grow okay so but the problem is that there are problems with integrating AI right they’ve got these these companies have these legacy technology environments it’s hard to upgrade they have the lack of skills and expertise the lack of budgets all of these problems that are pain points for these companies they want to grow they want to you know perform fraud detection they want to optimize their budgeting they wants to allocate resources most efficiently to then grow the company but they have all these problems so if you can tackle remember a single niche problem very very well then you are golden they will come to you and you will build a brand around this as I built a brand around AI education so you’ve got to really pick one niche feel and just go in all the way right so there are a lot of problems especially data silos right so here’s a little map for you to see all the different ways that AI can be used in the FinTech market from credit scoring to personal finance assistance for Millennials so there’s there’s also opportunity in the consumer space as well young Millennials prefer or not prefer but they’re more willing to listen to financial advice from not a human but from an AI right a chatbot hey I see you have this goal for your budget to be you know save this much amount in this month here’s your bank account I see what you’re spending here’s how you can best spend here’s a good budget for you and it can update in real time right right so there are there opportunities here for consumers as well so there there are startups in this space and we’re going to talk about them as we go but one huge use case is increasing security right how can these companies increase the rate of or decrease the rate of false positives that is that our transactions that are classified as fraudulent but they’re not really fraudulent they just belong to a good person but this the human behind the process misclassified it and an AI can help reduce that right so because I mean this causes billions in losses for retailers and this is a real problem and remember any money you can save these institutions they’re gonna throw money at you rights because they are saving money save them money and then you will make money right that’s the that’s the whole idea here save these companies money and you will make money so when it comes to improving security MasterCard implements what’s called decision intelligence so this is just one example of a FinTech company that’s using ai2 for security in this case it’s fraud detection what they’re doing is anomaly detection obviously this is closed source but I’m gonna guess that they’re using some deep autoencoder where they are trying to detect the anomaly in a transaction data set that is pick the needle in the haystack the transaction that doesn’t look like all the others and what do I mean by look like in our case right look like it means there are hundreds of thousands of features the time that a user spends putting their mouths on one specific corner of a screen the amount that they purchased where they’re purchasing from data points we wouldn’t even think about an AI can learn from so remember that consumers are generating a lot of data and when it comes to fraudulent transactions you can use a network a neural network or any type of model really that will perform anomaly detection an autoencoder is a great example of that who will YouTube auto-encoder Siraj for a great video on that sift science is another startup that is focusing on this but they’re collecting data from over 6,000 websites and then using that in their fraud detection solution another great example of AI in FinTech is reducing processing times right so one example would be receipt like processing receipts right so one startup that’s focused on this is called para script and what they’re doing is they’re using OCR that is object optical character recognition technology to read in receipts create a datasets from that and easily you know that’s just data input right it’s a manual human data input that they’re replacing with automation great use case another one is obviously algorithmic trading which is a topic of our video today where we have some past data set and what we do is we take this past data set and we learn from it so that we can predict the price of certain companies stopped in the future and there are lots of hedge funds that are doing this right now obviously they’re very closed source because that’s their you know that’s their secret right but what we can do is we can look at the open source hedge funds out there a newer one is new Mirai and what new Mirai does is they have built an open sourced hedge fund for data scientists where you can submit a model based on some they provide and they will award the best data scientists with some cryptocurrency so if your model outperforms the others you will win so that’s a great way of imagining what a good hedge fund looks like obviously there’s a lot of room for improvement here and they’re just one company and you can definitely make a competitor to numr I find a pain point that they don’t really focus on and focus on that sentient technologies is another one what they’ve done is a used AI to create a high-frequency trading bot that is is running trillions of simulated trading scenarios using public data so they can squeeze eighteen hundred days of trading into a few minutes that’s incredible and that’s something that only a machine not a human could do credit lending right if you think about deciding whether or not to give a person some insurance or a loan what you are doing is you’re making a prediction right based on their past data where they’re from how much money they make their marital status these are features this is the perfect application for machine learning and that’s what we’re going to start to see over time zest finance is one startup that does this but there’s a lot of room for competition in the space approved borrower borrowers that other lenders are missing so check out that link as well portfolio management right so this goes back to what I was talking about about helping Millennials keep track of their finances right so if you if they have a human financial advisor that’s expensive but not everybody has the money for that so ideally that they don’t need that they can use a machine leave this could be an app this could be some kind of Chrome extension but what it does it says it’s a robo advisor we could call it a Robo advisor that will spread investments across asset classes and financial instruments in order to reach the user’s goals like I want to save X amount in the next years okay let’s see how much money you have let’s see what fields you’re interested in let me invest in this this this and this and then track returns over time users returns and learn from them and then just keep investing so it’s an optimized wealth experience responsive AI does this but there’s a lot like I said a lot of room for competition in the space so I just wanted to go over a few examples of startups that are doing this right now and now what we can do is we can look at some theory right so so how do we do this right using the tools that we have at our availability we have the Python programming language we have tensorflow we have chaos we have scikit-learn these are open source machine learning models we have data sets available online from quant Opia from Google Finance from kaggle there is a bunch of public transaction datasets online at stock data sets online and we have twitter we can scrape the Twitter we can scrape Twitter we can scrape reddit we can scrape CNN a bunch of news headlines but but how do we really learn from this right that’s the real question so there’s a lot of different data points that we could use but then it comes down to do we want this to be a regression problem or do we want it to be a classification problem and the answer is we could do it both ways right so one way to think of it is as a regression model where we are only using numerical data right so the simplest way to think about this is a multi is a is a single variable regression problem right so in this image what you’re seeing is a line this is line called y equals MX plus B MX plus B which is a slope of a straight line and what we can do is we can say based on the prices alone let’s try to predict the price in the future great but the problem is that there are more data points than that we could build a simple model like that but if we wanted more data points we would just add them in as rows as rows in a data set right not just prices but all these different financial data points that would be inside of a financial report for example so returns dividends etc things like that so that’s one way to think about it given the past data what’s the next point another way to think about this as says is as a classification problem right so binary yes/no buy or sell right the price will go up or it will go down and how do we do that well we can use numerical data as well but another way we could do we could think about is using textual data so sentiment analysis we could compile all of these tweets we can compile all of these right up oats and then we could say good or bad right we can run a sentiment analysis algorithm on all of those one great one would be a neural network that’s pre trained so we can use a pre training neural network that’s been trained on different types of text label datasets so it knows if some text is generally positive or negative some great libraries that do this right out of the box our text blob is one text blob is one but there are others as well but I’m going to get back to you on what some of those good libraries are there we go NLT can Tek is another great library but a no tkn text blob so we could think of a think of it as regression we can think of it as classification and then you might think well okay what if I want to combine both the numerical data and the textual data how do I do that well what you can do is you can say okay I’m gonna compile a data set of for a given company all of the reddit posts all the tweets all the blog posts all of the comments everything for a given date and I’m gonna run sentiment analysis on all of that and it’s gonna be like yes no yes no yes no you know good bad good bad good good bad and then we could say let’s take the majority of those sentiments and say that the the majority is gonna be the overall sentiment for that day so let’s say the majority are good then we could just say for for this day the sentiment generally for this stock is good yes no zero one so we then turn that into a numerical data point zero one then what we could say is what we’ve already run sentiment analysis let’s take that zero or one and add it to our numerical data set and then we could classify it as a regression problem right so we’ve already run sentiment analysis we could take that sentiment data add it as a row a single row in our numerical data set and there we go now we have a regression problem right so that’s how we would do that so let’s think about this when it comes to linear regression that’s the the simple equation y equals MX plus B I’ve got a great video on this called linear regression using gradient descent Siraj but anyway we can do is if we have a single data point the price and the dates we can build a model around that the line of best fit then we could use that line plug in a new date and it’s going to output the price and that’s gonna be either higher or lower and we can buy based on that now that this is very easy to do we can do this very simply using the scikit-learn library right so we’ll load up a data set this is a data set of Boston house prices but this is just for show then we’ll say well we want a single feature from that data set which are the prices for a given date we’ll split the data into training and testing and once we do that in a single line we can build a linear regression model once we have that model we can then predict the next price for the next day that’s the easy way so so that’s one way another model we could use is the support vector machine so support vector machines same exact concept we can say instead of doing linear regression we’ll just say support vector machine just like that and so for support vector machines we just replace that single line of linear regression with the word support vector machine so it’s the same library scikit-learn but we replace it with support vector machine so remember support vector machines can act as both regression analysis or as sentiment analysis for classification either or in this case we’re talking about regression then it comes to neural networks right we can’t have a video on stock prices without talking about neural networks neural networks have been very popular in recent years but it’s very simple if you think about it it’s the same equation right here what we’re looking at is the equation for a simple neural network we have some input data we give it to this model what the model does is essentially taking every single data point multiplying it by input times a weight matrix and then we’re gonna add a bias value that’s gonna be the output we apply a non-linearity or activation function to that and it’s gonna give it give us an output yes/no right up down or in the case of regression you could say what’s gonna be the next stock price so it’s it’s it’s very simple in this case as well however when it comes to neural networks the one that has outperformed most others when it comes to stock price analysis is the LST M network this is the long short-term memory network so neural networks can predict prices that are short term but a long short-term memory network can make predictions about sequences far in the future and these have been applied to textual character recognition modules and they’ve been applied to numbers as well and so that’s that’s what’s really given a lot of that’s what’s given a lot of progress in this field what we can do is we can use the Carrows D pointing library to do this which is built on tensor flow and here’s an example of that we use the same scikit-learn library to load the input data and we’re using tear-offs to build a simple model now there’s one more type of learning I want to talk about and that’s a reinforcement learning so reinforcement learning is all about learning from trial and error and I think that in in our case the best way to really learn from our input data is to use reinforcement learning combined with a supervised learning model what I’m talking about is supervised and unsupervised learning forecasts they provide a forecast that predict future events and reinforcement learning optimizes future outcomes there’s a difference right so what we can do is we can say inside of a simulated environment let’s run some market but outside of that we have an agent that’s learning from multiple simulations what to do and what not to do so reinforcement learning is all about trial and error an agent in an environment will tip will make an action receive a reward update itself and repeat to try to optimize for that action to receive that reward right so the best way to do this or the easiest way is to use the open AI gym environment which makes making reinforcement learning agents very simple with just a few lines of code and what we can do is use the siren open AI reinforcement learning environment this is pulling data directly from the Interactive Brokers API to create an interface for off-the-shelf ml algorithms to train on real live financial markets very exciting stuff so I definitely recommend checking that out when it comes to our demo now it’s time for our demo when it comes to our demo what we’re really doing is we’re just using a simple linear regression model for the sake of this demo but it’s a it’s a packaged web app you can run it directly from your local machine there’s a web demo of it as well but what I really wanted to do was give you this kind of skeleton that you could then just plug and play your own model very easily you don’t have to think about other models right you can just plug and play your model right here and it’s gonna work but if we look in the get stock data function which is the really interesting part what it’s doing is its pulling that stock data from Quon doll which is an API for for financial data and it’s using this API key to pull that data it’s right here I’ll show it to you in a second but once it’s pulled that data this is just based on past stock prices for a given stock right so past prices one pulled that data it’s gonna format it right so adjusting price opening Kleiss high-low clothes etc once it’s formatted that notice that all it’s doing is using scikit-learn to perform a simple linear regression right input output the input is the price the output is the is a date right so that’s it well the input is the date the output is the price what’s the mapping right so this is the most simple version of a stock price prediction algorithm we could make but the the the idea is the same right so if we have that input data whether it’s one feature whether it’s ten features whether it’s hundred features we can always put that into a simple model so with these libraries like Kerris with the secular we don’t have to build them from scratch we don’t have to think about all of those magic numbers the parameters all of these you know different things we could just say in a single line you know LS TM Network if this was chaos we could say support vector machine if this is if this is scikit-learn it’s always the same thing but this is the basic idea here if you compile it and run it it’s gonna look like this you can compete against an AI so the AI in this case is making decisions based on the past these these these data points are all random and you can compete with them so this is just one way to do it if you have any kind of conference or project or presentation coming up definitely feel free to use this web app as a base template to then you know just plug and play inside of get stock data yes you could get the stock data you can add in more data like you know sentiment analysis from Twitter API from the reddit post and remember you just add those in as a single feature in your Excel spreadsheet 0 or 1 so then you can run numerical analysis on what’s essentially textual data so I hope this video was helpful to you please subscribe for more programming videos and for now I’ve got to predict the future so thanks for watching