1. Introduction and motivation for the problem
Predicting how market moves has been an unsolved problem for a long time. Several analyst, fund houses have tried to crack this problem but it is a tough nut to crack
Why predicting market is very tough mainly because of the following two reasons
Problem 1: Tough to predict how humans will react to any news available in the market Problem 2: Not much of data is available in terms of trading activities |
We tried to break these two problems further down before actually going to the predictive analysis part.
Breaking down of Problem 1:
Tough to predict how humans will react to any news available in the market
What all believe is: Even though fundamentals play the major role in market movement in the long term, still short term prices are dictated by the way humans react.
On a bird's view this idea might sound logical, but if we see closely we will understand the stock market is just not drive by individual retail investors alone, it is driven by FIIs and DIIs
One of the major market index : SENSEX is nothing but a free-float market-weighted stock market index of 30 well-established and financially sound companies listed on the Bombay Stock Exchange.
Companies listed in SENSEX are as follows:-
When we starteded seeing the share holding pattern of all these companies it was clear that FII and DII held more than 50% of non promoters share, few examples:- (top left corner companies name is available in the screenshot)
HCL Technologies
Asian paints
ITC
TATA Steel
Power grid
If we notice on an average
Promoters hold 50-55% of the shares
FII and DII put to gether hold 30-35% share
Public (Human/Individual retail investors) hold 10-15% of the shares
In large cap companies (Sensex has only large cap companies) promoters hardly sell their shares
Public will be majorly employees and rest of them woul dhave invented for long term. Most of them are not active traders. More over they will attract short term capital gain and long term capital gain.
But if we see FII DII hold 30-35% , and all these investment institutes are active traders. They also don't have long term and short term capital gain, this gives them extreme leverage to sell when every they want.
From all these data we can understand FII and DII are the major factor for price movements.
Breaking down of Problem 2
:
Not much of data is available in terms of trading activities
Unfortunately we don't get the volume of individual retail investor trades
But FII and DII we get it on daily basis it is available in NSE/BSE website and the entire archive is available in Money control website (for last 10 years day wise)
Final inference
By breaking down the two points it is clear that FII and DII play key role in short term market fluctuation (for sensex index)
FII and DII data is available day wise
Each day has 4 data points
Gross purchase of FII Gross sales of FII Gross purchase of DII Gross sales of DII |
Based on the 4 independent variable we can derive three more data points
Net purchase/sales of FII (Gross purchase of FII - Gross sales of FII) Net purchase/sales of DII (Gross purchase of DII - Gross sales of DII) Net purchase/sales of FII DII combined (Net purchase/sales of FII +Net purchase/sales of DII ) |
Now we have all the independent variables, we can start comparing them with dependent variables (Sensex) and do our analysis.
Data Source(s)
NSE: https://www.nseindia.com/reports/fii-dii
Money control: https://www.moneycontrol.com/stocks/marketstats/fii_dii_activity/index.php
We have downloaded the data and put it in google sheet: https://docs.google.com/spreadsheets/d/112H8I3jHA5xH4_HBg7Cqie2DdAsB2UqpCg9seq-jGP0/edit?gid=1179995138#gid=1179995138
Sensex Data: taken using Google finance formula in google sheet
Final Data Table
Data is taken from 1st Jan 2021 to 17th Jan 2025 - Total of 1003 Data points good enough to carry out regression analysis.
2. Descriptive Analytics
Since we are tracking day wise +/- (inflow and outflow) of FII and DII, it will make sense to compare it will day wise change of Sensex rather than the absolute value, adding a new column Sensex_Change
Lets Finding relation between FII Net/DII Net/ Total Net vs Change in Sensex
Using scatter plot
The above plot is not is not very conclusive let see it individually in a blown up view.
It looks like FII and Total net have a positive correlation, whereas DII has almost a straight line (towards negative)
Lets check the correlation
As concluded from the scatter plot from correlation also it is clear that FII and Total_Net has the highest correlation and DII has negative correlation with Sensex
To make the study more simpler lets convert all positives to 1 and negatives to 0 for FII_Net, DII_net, Total_net and Senex_Change:
Sensex growth/degrowth based on FII and DII
From the above box plot also it is clearly visible that when FII is investing Sensex goes up.
Let see the probability table data
There is a 76% change that sensex will reduce if FII does not invest (First table)
There is a good chance that Sensex might go up or down since sensex goes up 71% times and goes down 74% times when DII invests. DII does not play any major role in moving the market forward or backward
For net value, When there is a total cash inflow (FII Net+DII net) 71% of the times sensex increases and when total cash outflow is there market falls 65% of the times.
3. Methodology with a detailed discussion on the application of business analytics methods (Multiple methods should be compared)
For the above descriptive analysis it is evident that there is some correlation between FII, Net amount and Sensex
If we are able to find this out we can time the market and get better profits.
If we can find out the model we will also get to know when bulk amounts can be put and withdrawn from the market to book the profit.
Logically for the regression problem we would think we should be comparing cumulative FII amount cumulative DII amount and Cumulative Net amount with Sensex, because Sensex is a cumulative number that is ever increasing.
But market is all about demand and supply, through which the right rate is fixed for the stocks all the time though the bidding and selling (demand and supply)
So it will be ideal to compare the increase of FII, DII, Net with the Sensex.
We can think of three methods to find the model
Dependent Variable: Sensex
Independent : FII DII And Net amount put into the market
Logistic regression: Complete binary variable both dependent and independent variable
Multiple linear regression: Actual incremental values for dependent variable and independent variable
Logistic regression: Binary independent variable (Sensex) and actual value for dependent variable (incremental FII, DII and Net)
Once we find out the relation between FII, DII and Sensex we can perform time series analysis to find out the pattern.
4. Results
We will be using all the data to train the model since we are trying to find out the relations here.
Method 1: Logistic Regression for Binary Values
Development of the model:-
Conclusion from logistic regression:-
Highest correlation is with Net amount put it, when net amount (net of DII and FII) is positive the market goes up.
The second highest correlation is with FII net is positive and DII is not significant at all.
So when ever FII net and Total net are positive there is high change of market going positive.
DII does not have any effect on the market.
Evaluation of Model
Confusion matrix
This model predicts the right value 68% of the time
Deciding parameters based on confusion matrix
Method 2: For The incremental values using Multiple Linear regression
Method 3: For The incremental values using Multiple Linear regression
Conclusion
Model 1 |
We have a model that has Sensitivity of 66% and specificity of 70%. Accuracy is 68% This is a good enough model to establish the relation ship, but the only problem could be taken FII,DII and Net inflow as Binary which will get cleared in Model 3 |
Model 2 |
Not conclusive adjusted R square value is 22% |
Model 3 |
This model has Sensitivity of 65% and Specificity of 72%, And the model has an accuracy of 69%. 1% better than the model 1. |
Overall the model 1 and model 3 are similar. For pattern of investing (time series analysis) we can Model 1 i.e binary and Model 3 can be helpful in knowing how much amount can be put into the market or withdrawn.
What Next ??
Based on the above points it is quite clear that FII is the major contributor for Sensex value to go up. It becomes more when DII also pumps in money along with FII.
So the next question will be when will FII invest, for this there can be two possible answers based on macro and micro views.
Macro View
FII investment depends up on several factors like
How attractive Indian market compared to other markets
Treasury Bill coupon rate and US Interest Rates
US vs Indian currency rates
Micro View
Once we fix the macro view and if we see the a stable trend we can drill down to micro view
Here we can try to do trend analysis once the macro view is established
From the concept learnt during the trend analysis additional class, i.e adding the three T, T+1, T+2 and finding out pattern.
We drew inspiration from the same and tried finding the pattern of DII, FII, And Net inflow by the previous three days based on the binary code
Yellow represents the pattern
We will totally have 8 possibilities (2x2x2), but only 6 are practically possible one and all were given a serial number as follows:-
First digit - Net Amount
Second digit - FII
Third digit - DII
011 and 100 are not possible because when both FII DII put amount in net also should be positive and when both pull out amount net has to be negative.
Based on this each binary pattern is mapped to serial number and the previous three days number are clubbed into one number
Green - Pattern serial number
Blue combination on the last three days
Eg: take row 21, the previous three days are in the row 22, 23, 24, their number are added to form 544
Based on all the previous data a table is made to find out which sr number will appear based on the probability
Above table shows how many time a pattern has appeared and what is the probability of the serial number (6 possibilities) that will appear next.
Based on the above table a back test was run for 1003 data that we took for study
Orange predicted values
Confusion matrix for series pattern prediction
This model has a high sensitivity and over all good accuracy which is more than 50%.
If we see the model accuracy is 61.5%, still we feel this is a good model.
Why this is a good model:
People say stock market reactions are completely random. If it is random we should have got 50% accuracy, but here we are getting 61.5% for the overall trend.
Sensitivity is 74% - Which mean there model is even better to predict Sensex growing days.
Specificity is 46% - which is almost close to 50%, here we can say things are completely random (sensex reducing days)
How to use this model (as explained earlier we have to start from macro view and then drill to micro ever - First understand if the market is in Bull or bear phase)
Leverage High Sensitivity: In stock market angle sensitivity is nothing but a bull run phase - In Bull run phase we can predict the number of positive days easily with accuracy of 74%. This can be much more accurate if we can build a model with FII influencing factors like US T-bill yield rates and Fed Interest rates.(This will be the next stage of study)
Close to 50% specificity - Where as in bear run phase (which is explained by specificity ) all things are completely random, so it will be better to stay out or invest regularly to avoid the “Fear of missing out” (FOMO)
Google sheet link of the data used and trend prediction https://docs.google.com/spreadsheets/d/112H8I3jHA5xH4_HBg7Cqie2DdAsB2UqpCg9seq-jGP0/edit?usp=sharing