Predicting Sensex trend based on FII and DII investment pattern

Hindi Impositione


1. Introduction and motivation for the problem 


Predicting how market moves has been an unsolved problem for a long time. Several analyst, fund houses have tried to crack this problem but it is a tough nut to crack 

Why predicting market is very tough mainly because of the following two reasons


Problem 1: Tough to predict how humans will react to any news available in the market

Problem 2: Not much of data is available in terms of trading activities 

We tried to break these two problems further down before actually going to the predictive analysis part.


Breaking down of Problem 1: 

Tough to predict how humans will react to any news available in the market

What all believe is: Even though fundamentals play the major role in market movement in the long term, still short term prices are dictated by the way humans react.

On a bird's view this idea might sound logical, but if we see closely we will understand the stock market is just not drive by individual retail investors alone, it is driven by FIIs and DIIs

One of the major market index : SENSEX is nothing but  a free-float market-weighted stock market index of 30 well-established and financially sound companies listed on the Bombay Stock Exchange.

Companies listed in SENSEX are as follows:-

When we starteded seeing the share holding pattern of all these companies it was clear that FII and DII held more than 50% of non promoters share, few examples:- (top left corner companies name is available in the screenshot)

HCL Technologies

Asian paints 

ITC

TATA Steel

Power grid


If we notice on an average



Breaking down of Problem 2 :
Not much of data is available in terms of trading activities 


Unfortunately we don't get the volume of individual retail investor trades 

But FII and DII we get it on daily basis it is available in NSE/BSE website and the entire archive is available in Money control website (for last 10 years day wise)


Final inference 


By breaking down the two points it is clear that FII and DII play key role in short term market fluctuation (for sensex index)

FII and DII data is available day wise 

Each day has 4 data points 


Gross purchase of FII

Gross sales of FII

Gross purchase of DII

Gross sales of DII

Based on the 4 independent variable we can derive three more data points

Net purchase/sales of FII (Gross purchase of FII - Gross sales of FII)

Net purchase/sales of DII (Gross purchase of DII - Gross sales of DII)

Net purchase/sales of FII DII combined (Net purchase/sales of FII +Net purchase/sales of DII )

Now we have all the independent variables, we can start comparing them with dependent variables (Sensex) and do our analysis.


Data Source(s) 


NSE: https://www.nseindia.com/reports/fii-dii

Money control: https://www.moneycontrol.com/stocks/marketstats/fii_dii_activity/index.php

We have downloaded the data and put it in google sheet: https://docs.google.com/spreadsheets/d/112H8I3jHA5xH4_HBg7Cqie2DdAsB2UqpCg9seq-jGP0/edit?gid=1179995138#gid=1179995138

Sensex Data: taken using Google finance formula in google sheet 

Final Data Table 

Data is taken from 1st Jan 2021 to 17th Jan 2025 - Total of 1003 Data points good enough to carry out regression analysis. 


2. Descriptive Analytics


Since we are tracking day wise +/- (inflow and outflow) of FII and DII, it will make sense to compare it will day wise change of Sensex rather than the absolute value, adding a new column Sensex_Change


Lets Finding relation between FII Net/DII Net/ Total Net vs Change in Sensex


Using scatter plot 

The above plot is not is not very conclusive let see it individually in a blown up view.

It looks like FII and Total net have a positive correlation, whereas DII has almost a straight line (towards negative)


Lets check the correlation 


As concluded from the scatter plot from correlation also it is clear that FII and Total_Net has the highest correlation and DII has negative correlation with Sensex 


To make the study more simpler lets convert all positives to 1 and negatives to 0 for FII_Net, DII_net, Total_net and Senex_Change:


Sensex growth/degrowth based on FII and DII

From the above box plot also it is clearly visible that when FII is investing Sensex goes up.


Let see the probability table data 

There is a 76% change that sensex will reduce if FII does not invest (First table)

There is a good chance that Sensex might go up or down since sensex goes up 71% times and goes down 74% times when DII invests. DII does not play any major role in moving the market forward or backward 

For net value, When there is a total cash inflow (FII Net+DII net) 71% of the times sensex increases and when total cash outflow is there market falls 65% of the times.



3. Methodology with a detailed discussion on the application of business analytics methods (Multiple methods should be compared)


For the above descriptive analysis it is evident that there is some correlation between FII, Net amount and Sensex

If we are able to find this out we can time the market and get better profits. 

If we can find out the model we will also get to know when bulk amounts can be put and withdrawn from the market to book the profit.

Logically for the regression problem we would think we should be comparing cumulative FII amount cumulative DII amount and Cumulative Net amount with Sensex, because Sensex is a cumulative number that is ever increasing.

But market is all about demand and supply, through which the right rate is fixed for the stocks all the time though the bidding and selling (demand and supply)

So it will be ideal to compare the increase of FII, DII, Net with the Sensex.


We can think of three methods to find the model 


  1. Logistic regression: Complete binary variable both dependent and independent variable

  2. Multiple linear regression: Actual incremental values for dependent variable and independent variable

  3. Logistic regression: Binary independent variable (Sensex) and actual value for dependent variable (incremental FII, DII and Net)


Once we find out the relation between FII, DII and Sensex we can perform time series analysis to find out the pattern.


4. Results

We will be using all the data to train the model since we are trying to find out the relations here.


Method 1: Logistic Regression for Binary Values 


    Development of the model:-

Conclusion from logistic regression:-

Highest correlation is with Net amount put it, when net amount (net of DII and FII) is positive the market goes up.

The second highest correlation is with FII net is positive and DII is not significant at all.

So when ever FII net and Total net are positive there is high change of market going positive. 


DII does not have any effect on the market.


Evaluation of Model




Confusion matrix 



This model predicts the right value 68% of the time


Deciding parameters based on confusion matrix



Method 2: For The incremental values using Multiple Linear regression 



Method 3: For The incremental values using Multiple Linear regression 







  1. Conclusion


Model 1

We have a model that has Sensitivity of 66% and specificity of 70%.


Accuracy is 68%


This is a good enough model to establish the relation ship, but the only problem could be taken FII,DII and Net inflow as Binary which will get cleared in Model 3

Model 2

Not conclusive adjusted R square value is 22%

Model 3 

This model has Sensitivity of 65% and Specificity of 72%,


And the model has an accuracy of 69%. 1% better than the model 1. 


Overall the model 1 and model 3 are similar. For pattern of investing (time series analysis) we can Model 1 i.e binary and Model 3 can be helpful in knowing how much amount can be put into the market or withdrawn.


What Next ??

Based on the above points it is quite clear that FII is the major contributor for Sensex value to go up. It becomes more when DII also pumps in money along with FII.

So the next question will be when will FII invest, for this there can be two possible answers based on macro and micro views.


Macro View 

FII investment depends up on several factors like 

How attractive Indian market compared to other markets  

Treasury Bill coupon rate and US Interest Rates 

US vs Indian currency rates 


Micro View

Once we fix the macro view and if we see the a stable trend we can drill down to micro view 

Here we can try to do trend analysis once the macro view is established

From the concept learnt during the trend analysis additional class, i.e adding the three T, T+1, T+2 and finding out pattern.

We drew inspiration from the same and tried finding the pattern of DII, FII, And Net  inflow by the previous three days based on the binary code

Yellow represents the pattern 


We will totally have 8 possibilities (2x2x2), but only 6 are practically possible one and all were given a serial number as follows:-

First digit - Net Amount

Second digit - FII

Third digit - DII

011 and 100 are not possible because when both FII DII put amount in net also should be positive and when both pull out amount net has to be negative.


Based on this each binary pattern is mapped to serial number and the previous three days number are clubbed into one number 

Green - Pattern serial number 

Blue combination on the last three days 

Eg: take row 21, the previous three days are in the row 22, 23, 24, their number are added to form 544

Based on all the previous data a table is made to find out which sr number will appear based on the probability 


Above table shows how many time a pattern has appeared and what is the probability of the serial number (6 possibilities) that will appear next.

Based on the above table a back test was run for 1003 data that we took for study 

Orange predicted values 

Confusion matrix for series pattern prediction 

This model has a high sensitivity and over all good accuracy which is more than 50%. 

If we see the model accuracy is 61.5%, still we feel this is a good model.


Why this is a good model: 

People say stock market reactions are completely random. If it is random we should have got 50% accuracy, but here we are getting 61.5% for the overall trend.

Sensitivity is 74% - Which mean there model is even better to predict Sensex growing days.

Specificity is 46% - which is almost close to 50%, here we can say things are completely random (sensex reducing days)


How to use this model (as explained earlier we have to start from macro view and then drill to micro ever - First understand if the market is in Bull or bear phase)

Leverage High Sensitivity: In stock market angle sensitivity is nothing but a bull run phase - In Bull run phase we can predict the number of positive days easily with accuracy of 74%. This can be much more accurate if we can build a model with FII influencing factors like US T-bill yield rates and Fed Interest rates.(This will be the next stage of study)

Close to 50% specificity - Where as in bear run phase (which is explained by specificity ) all things are completely random, so it will be better to stay out or invest regularly to avoid the “Fear of missing out” (FOMO)


Google sheet link of the data used  and trend prediction https://docs.google.com/spreadsheets/d/112H8I3jHA5xH4_HBg7Cqie2DdAsB2UqpCg9seq-jGP0/edit?usp=sharing