# Easier Factor Screening with RANK BY

### Risk and Reward

Factor models are how academics explain both the returns and risks involved in stock portfolios. They are also standard fair in quantitative investing strategies. I recently added a feature to the SQL Screener which makes factor-based screening and ranking much easier, but before we dive into the mechanics, let’s take tour of how factor models work.

### The CAPM

We start with the most simple, yet prevalent, way to model the returns of a stock (or portfolio): The Capital Asset Pricing Model (CAPM). The premise of the model is that some stocks are more sensitive than others to market swings. The market level, after all, is a barometer for how people expect the economy to perform in the future. If the economy will be booming, then you should expect all the businesses of the market to have higher cash flows, so valuations rise (and vice versa). But, the sensitivity of each business’s cash flows to the economy varies widely. If you run a business that sells a staple, like toilet paper, you wouldn’t expect as much of a hit from a recession compared to a highly discretionary business, like expensive tech gadgets. The CAPM captures this sensitivity to the economy, and therefore to market prices, with a variable called “beta” (or $\beta$).

Technically beta relates the “excess” return of a stock to the market’s “excess” return. By excess return we mean the amount of return above the risk-free rate, since you don’t need to take any risk unless you expect more return than the risk-free rate. But, by taking higher risks you have the potential for higher returns above the risk-free rate. A high beta stock may soar during boom times, even though it is also more likely to drop like a stone when the market goes down. In this way beta is similar to leverage. Here’s how it’s expressed mathematically: $$r_{stock}-r_{free}=\beta(r_{market}-r_{free})+\alpha+\epsilon$$

There are two other terms involved in this equation. The epsilon ($\epsilon$) is a zero-mean random variable, which is responsible for the random deviations from the model predictions. You can also think of it as the risk that is unique to that stock, or the "idiosyncratic" risk. Alpha ($\alpha$) is a constant and if you believe in efficient markets then it should be zero. The idea behind the Efficient Market Hypothesis (EMH) is that all the agents involved in bidding a stock’s price up and down might have inaccurate guesses for a stock’s value, but on average they get it right. This seems a bit naive to me, because it assumes that there is no systematic bias involved and it would be hard to argue that there weren’t systematic biases in play during the extremely high prices of the 1999 tech bubble, for example.

If EMH is correct (or at least approximately correct), you can’t get high returns without taking additional risk (i.e. increasing your portfolio’s beta). However, the model allows you to target a specific beta, because the equation is linear. You can diversify across many stocks such that the weighted average of the betas is a beta that meets your risk tolerance, while the random part gets averaged away. Meanwhile, the holy grail for hedge funds is to achieve positive alpha, which means you can get extra expected returns at the same level of risk or even hedge the beta component away. This is why you see the word “alpha” bandied about everywhere in finance.

Also, if the model is correct you can solve for beta and alpha by running a linear regression against historical market prices vs a stock’s price (after subtracting out the risk-free rate). The slope of the fit will be the beta and the y-intercept is the alpha. In fact, this is how most websites compute the beta they publish. But, I would caution against blindly trusting regression betas, because 1) the model doesn’t perfectly reflect reality, 2) the random part can fool a regression, and 3) if a stock is huge and makes up a significant portion of the index, then it’s beta will tend towards 1 no matter what it’s risk level is. The CAPM is most useful because it allows us to compute reasonable cost-of-equity rates. If the model is correct then the return you should expect from equity is equal to the risk-free rate plus beta times the market risk premium (or excess return expected from the market). But, if you do take this approach, be careful to make sure the beta you use makes sense (e.g. a small tech company should *not* have a beta less than 1, because it’s definitely more risky than the market on average).

### Fama and French

The problem with the CAPM is that it doesn’t fully explain stock returns. Investors buying small and cheap stocks seem to produce portfolios that have positive alpha over time. Being the EMH true believers that they are, Eugene Fama and Kenneth French proposed that small cap and value investing aren’t a way to generate alpha, because if everybody knew about those excess returns, they would bid up the prices until they were priced such that you couldn’t get higher return without taking higher risk. Therefore, they postulated, there have to be risks you can expose yourself to outside of standard “market risk”. They proposed two additional factors to the model: the value factor and the size factor. $$r_{stock}-r_{free}=\beta(r_{market}-r_{free})+b_1HML+b_2SMB+\alpha+\epsilon$$

In order to make these factors independent from the market factor, they defined them as hedged portfolios. If you ranked all stocks by price-to-book (high PB meaning growth stocks and low PB meaning value stocks), shorted highest PB stocks, and bought the lowest PB stocks, then the resulting portfolio would capture the value premium alone, because the betas should roughly cancel out. This factor was called HML or high minus low. The same procedure is used to define the SMB factor or Small Minus Big. They define the factor as the portfolio that would be formed if you shorted large stocks and bought small stocks. Fama and French claimed this model explains over 90% of diversified portfolios returns, compared with the average 70% given by the CAPM.

There is debate about whether the value factor does generate alpha or if it entails additional risks. It makes sense that small stocks would be riskier than large stocks and, similarly, value stocks might have a higher chance of failing than growth stocks. After all, the reason the PB or PE is low is that people are pessimistic about the prospects of the company. Others claim that historical data does not support the idea that value stocks are more risky and that the alpha you get from tilting towards value is a result of behavioral biases: people being too pessimistic on average when it comes to companies that aren’t doing well (and vice versa).

### Even More Factors

After Fama and French, people continued to introduce new factors to explain returns (e.g. the Carhart four-factor model). A popular one is momentum. The idea is that stocks that have had rising prices attract more investment, which lead to even higher prices. George Soros went further and thought that rising prices could positively affect fundamentals. Momentum stocks have lower costs of capital, because people trust them more, and they can attract better talent. This creates a positive reinforcement loop, which he referred to as “reflexivity”.

One can design a systematic approach to building portfolios around factors like these if you believe that factors do produce alpha or even if you just want to be exposed to a diversified set of risks. One example is Joel Greenblatt’s Magic Formula. In his approach he wanted to buy good stocks cheaply. So, he started by filtering out very small stocks, non-US stocks (differing report requirements), utilities (returns are regulated) and financials (EBIT doesn’t mean the same thing for financials). The remaining universe of stocks is ranked twice. First you rank by Enterprise Value / EBIT (value) in ascending order and assign the value rank to each stock. Then you rerank by Return on Invest Capital (financial strength) in descending order and assign that rank to each stock. Finally you sum the two ranks to get a combined rank and rerank by the combined rank. You then basically buy the top 20 or 30 combined rank stocks and then hold them for a year before rebalancing.

The reason you use the ranking approach instead of doing, say, a weighted average of the two ratios is that financial data contains some serious outliers. By ranking you normalize the values to be uniformly in the range from 1 to the number of stocks. Plus, it’s more important to identify the “most cheap” stock than actually knowing how cheap it is, at least in terms of building a portfolio. You aren’t limited to just using two ranks. There are so many other ways you can combine filters and ranks, which brings me to our new feature.

### RANK BY

The Magic Formula above is actually pretty hard to do in SQL, because you have to do various nested queries. Each level has to use the ROWNUM() function in conjunction with ORDER BY to get the ranks and then a final outer query to sum the ranks. This gets really ugly fast. So, I created a new operator, called RANK BY to automate this. Let’s see the Magic Formula example:

SELECT Symbol,
Name,
CONCAT("$", ROUND(MarketCap, 2), "M") AS MarketCap, ROUND(EnterpriseValue / EBIT, 2) AS EVToEBIT, CONCAT(ROUND(ROIC * 100, 2), "%") AS ROICPct FROM stocks WHERE MarketCap IS NOT NULL AND EnterpriseValue IS NOT NULL AND EBIT IS NOT NULL AND EBIT > 0 AND ROIC IS NOT NULL AND Country = "USA" AND Sector != "Finance" AND Sector != "Public Utilities" RANK BY EVToEBIT ASC, ROIC DESC LIMIT 20; Here we see that we want Symbol, Name, etc as the columns of the table. From the stocks table we filter out rows with invalid MarketCaps, negative EBIT values, etc. Finally we see the RANK BY statement, which tells the system that you want to do a combined rank using EVToEBIT in ascending order and ROIC in descending order. This will sum the ranks based on those factors and rerank by the combined rank (you’ll see these additional columns in the output too). The limit means that only the top 20 are shown. To illustrate the heavy lifting that RANK BY does, here’s what the query above expands to: SELECT Symbol, Name, MarketCap, EVToEBIT, ROICPct, EVToEBITRank, ROICRank, EVToEBITRank + ROICRank AS CombinedRank FROM ( SELECT Symbol, Name, MarketCap, EVToEBIT, ROICPct, EVToEBITRank, ROIC, ROWNUM() AS ROICRank FROM ( SELECT Symbol, Name, MarketCap, EVToEBIT, ROICPct, ROIC, ROWNUM() AS EVToEBITRank FROM ( SELECT Symbol, Name, CONCAT("$", ROUND(MarketCap, 2), "M") AS MarketCap,
ROUND(EnterpriseValue / EBIT, 2) AS EVToEBIT,
CONCAT(ROUND(ROIC * 100, 2), "%") AS ROICPct,
ROIC
FROM stocks
WHERE MarketCap IS NOT NULL AND
EnterpriseValue IS NOT NULL AND
EBIT IS NOT NULL AND
EBIT > 0 AND
ROIC IS NOT NULL AND
Country = "USA" AND
Sector != "Finance" AND
Sector != "Public Utilities"
)
WHERE EVToEBIT IS NOT NULL
ORDER BY EVToEBIT ASC
)
WHERE ROIC IS NOT NULL
ORDER BY ROIC DESC
)
ORDER BY CombinedRank ASC
LIMIT 20;

Imagine how much worse the query nesting could get if you keep adding more factors! Also, note that you can also add a weight to each rank (with a weight omitted it is implicitly 1.0). If we wanted to add a weight of 2.0 to the value rank we could change the RANK BY statement to:

RANK BY 2 * EVToEBIT ASC, ROIC DESC

This would tilt the ranking more towards the value factor than the financial strength factor.

### Other Rules For Using RANK BY

RANK BY is, admittedly, implemented as a bit of a hack. I parse the query and rewrite it into the nested query form I discussed above. This means there are some rules to follow to make sure it works:

• Don’t put anything after the RANK BY statement except an optional LIMIT statement
• Don’t use RANK BY on formatted columns (i.e. columns that actually strings). Notice that I ranked by ROIC, which is the underlying value, while the SELECTed column is formatted and labeled as ROICPct. If I were to rank by ROICPct it would rank based on the string representation of the ROIC, which would produce unexpected and incorrect results. Long story short: only rank on raw, numerical fields or equations that produce numerical results.
• Always label SELECTed equations (like ROUND(EnterpriseValue / EBIT, 2)) using AS.
• Always place the weight before the field name in the rank list (i.e. “3 * FieldName ASC” works, but “FieldName * 3 ASC” does not).

### Final Word

Ultimately, I see factor investing as the General Relativity of finance. It’s a type of model that can explain the returns of large portfolios where the idiosyncratic risks of individual securities have been diversified away. Fundamental analysis of individual companies is more like Quantum Mechanics. It’s better suited for the small scale of individual securities and is largely incompatible with factor models. I believe individual companies have situations that are often unique and cannot be fully explained by a combination of generalized factors. This is one of the places you can find excess returns through exploiting behavioral biases. There are all kinds of non-fundamental reasons people dump or buy stocks in a frenzy that are specific to that stock. But, I believe you can find behavioral biases at the macroscopic level too, as mentioned before. If you do try factor investing, make sure to make your portfolio large enough to avoid idiosyncratic risks. If you prefer individual security analysis then you might use factors to screen for ideas to look at more closely.

Hope you find this new feature useful. I haven’t seen something like it available in any other free screeners, except for more complicated tools like Quantopian Notebooks. Feel free to write me at admin@finutils.com if you have any questions about the feature, find bugs, or want me to add new columns to the screener.