08/05/2017

2017_05_08 : sell options and buy put option / WTI leverage

Sell [Won]

[5%]
WTI leverage
- each 11,400 ---->;; now 10,750

waiting for a chance to buy by scaling down


call option
buy each 309 ----> sell each 610  ( +97%)
put option
buy each 130 ----> sell each 30 (-76%)

= 21% profits in 2 days

buy
[+2%]
put option
-each 40

[+2%=7%]
WTI leverage
-each 10,685


04/05/2017

High volatility in oil which sildes to 5-week low as weekly U.S. stockpile data disappoints


High Volatility In Oil

The U.S West Texas Intermediate crude, WTI oil june contract shed 17 cents, or around 0.3%, to $47.51 a barrel by 14:35GMT after falling to $47.34, a level not seen since March 27. Prices were at around $47.98 prior to the release of the inventory data. 
Elsewhere, Brent oil for July delivery on the ICE Futures Exchange in London dipped 22 cents to $50.23 a barrel, after sliding %50.14 in the prior session, its deepest trough since March 27 as well.

Nevertheless movement in the oil is likely to move as follow:

1) After dropping to $47 it could be bottomed out and rebounded
2) It could rebound after it collapses to $40



The main reasons for this drop is because of vague news of an increase in WTI, Libyan production, shale oil and electric cars.

However, as the world economy is on the rise and demand is expected to increase accordingly. And this is likely to be the rise in crude oil prices as well.

By the end of 2017, I strongly believe that WTI prices are estimated at $50~58.


==========================================================
[Korean ver]

현재 국제 유가 WTI 투자 최대의 변동성을 보이고 있습니다.

3월말 국제 유가 50~47달러 분할 매집 이후 4월 초 53달러까지 상승 후 재판매

오늘날짜 기준 유가는 다시 47달러까지 떨어졌습니다.

그럼에도 불구하고

국제 유가에 대한 움직임은 다음과 같이 움직일 확률이 높다고 예측할수있습니다.

1) 47달러까지 하락 후 저점주고 반등합니다

2) 40달러까지 폭락 후 반등합니다

국제 유가가 하락하는 이유로는

세일 석유, 전기자동차, 미국의 원유 생산량의 증가 혹은 리비아의 생산량의 증가 라는 막연한 뉴스와

이 뉴스를 이용하여 국제유가를 하락시키는 외부세력의 수급파급력 때문입니다.


국제 유가가 상승해야만 하는 이유는

세계 경제가 상승장에 있으며 이에 따른 수요 증가가 예상됩니다. 그리고 이는 곧 원유 값의 상승으로 나타날 것으로 보입니다.

2017년말까지 원유가격 50~58 예상하고 있습니다.




03/05/2017

Individual Stock Movement Forecast - KNN/ SVM/ RandomForest with momentum strategy

#KNN Machine Learning Strategy

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import pandas_datareader.data as web
from sklearn import neighbors, svm
from sklearn.ensemble import RandomForestClassifier

def price(stock, start):
    price = web.DataReader(name=stock, data_source='yahoo', start=start)['Adj Close']
    return price.div(price.iat[0]).resample('M').last().to_frame('price')
 
def fractal(a, p):
    df = pd.DataFrame()
    for count in range(1,p+1):
        a['direction'] = np.where(a['price'].diff(count)>0,1,0)
        a['abs'] = a['price'].diff(count).abs()
        a['volatility'] = a.price.diff().abs().rolling(count).sum()
        a['fractal'] = a['abs']/a['volatility']*a['direction']
        df = pd.concat([df, a['fractal']], axis=1)
    return df

a = price('NVDA','2010-01-01')
a['cash'] = [(1.03**(1/12))**x for x in range(len(a.index))]
a['meanfractal']= pd.DataFrame(fractal(a, 12)).sum(1,skipna=False)/12
a['rollingstd'] = a.price.pct_change().shift(1).rolling(12).std()
a['result'] = np.where(a.price > a.price.shift(1), 1,0)  
a = a.dropna()



clf = neighbors.KNeighborsClassifier(n_neighbors=3)
clf1 = svm.SVC()
clf3 = RandomForestClassifier(n_estimators=5)

a['predicted']= pd.Series()
predictions = []
for i in range(12,len(a.index)):
    x  =  a.iloc[i-12:i,6:8]  
    y  =  a['result'][i-12:i]
    clf.fit(x, y)
    a['predicted'][i]= clf.predict(x)[-1]
#     print(clf.predict(x)[-1])
    predictions.append(clf.predict(x)[-1])

x1 = a.iloc[len(a.index)-12:len(a.index),6:8]
fit4 =clf.predict(x1)[-1]


a = a.dropna()
a.price = a.price.div(a.price.ix[0])

accuracy=clf.score(a.iloc[:,6:8],a['result'])

a['Aggresive'] = np.where(a.predicted.shift(1)==1,((a.price/a.price.shift(1))*0.7+(1.0026)*0.3),1.0026).cumprod()
a[['Aggresive','price']].plot()
plt.show()
print ("Predicted model accuracy: "+ str(accuracy)[2:4]+"%")

period = len(a.index)/12

md = a.price.rolling(min_periods=1, window = 500).max()
pmd = a.price/md - 1.0
mdd = pmd.rolling(min_periods=1, window=500).min()

pmd.plot(subplots=True, figsize = (8,2), linestyle='dotted')
mdd.plot(subplots=True, figsize = (8,2), color = 'red')
plt.show()

print("\nMDD : "+str(mdd.min()*100)[0:5]+"%")
print("CAGR : "+str(a.price[-1]**(1/period)*100-100)[0:4]+"%")

print('\nFor next Month:')
print('Do Invest') if fit4==1 else print('wait for next chance')




Predicted model accuracy: 59%


MDD : -22.2%
CAGR : 44.8%

For next Month:
wait for next chance

Calculation of Hit Ratio by Logistic Regreesion, RandomForest, SVM about samsung & hyundai

from __future__ import division

import os,sys,datetime
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import pprint
import statsmodels.tsa.stattools as ts
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.lda import LDA
from sklearn.metrics import confusion_matrix
from sklearn.qda import QDA
from sklearn.svm import LinearSVC, SVC


def load_stock_data(file_name):
df = pd.read_pickle(file_name)
return df

#create dataset
def make_dataset(df, time_lags=1):
    df_lag = pd.DataFrame(index=df.index)
    df_lag["Close"] = df["Close"]
    df_lag["Volume"] = df["Volume"]

    df_lag["Close_Lag%s" % str(time_lags)] = df["Close"].shift(time_lags)
    df_lag["Close_Lag%s_Change" % str(time_lags)] = df_lag["Close_Lag%s" % str(time_lags)].pct_change()*100.0

    df_lag["Volume_Lag%s" % str(time_lags)] = df["Volume"].shift(time_lags)
    df_lag["Volume_Lag%s_Change" % str(time_lags)] = df_lag["Volume_Lag%s" % str(time_lags)].pct_change()*100.0

    df_lag["Close_Direction"] = np.sign(df_lag["Close_Lag%s_Change" % str(time_lags)])
    df_lag["Volume_Direction"] = np.sign(df_lag["Volume_Lag%s_Change" % str(time_lags)])

    return df_lag.dropna(how='any')

#split dataset
def split_dataset(df,input_column_array,output_column,spllit_ratio):
    split_date = get_date_by_percent(df.index[0],df.index[df.shape[0]-1],spllit_ratio)

    input_data = df[input_column_array]
    output_data = df[output_column]

    # Create training and test sets
    X_train = input_data[input_data.index < split_date]
    X_test = input_data[input_data.index >= split_date]
    Y_train = output_data[output_data.index < split_date]
    Y_test = output_data[output_data.index >= split_date]

    return X_train,X_test,Y_train,Y_test


def get_date_by_percent(start_date,end_date,percent):
    days = (end_date - start_date).days
    target_days = np.trunc(days * percent)
    target_date = start_date + datetime.timedelta(days=target_days)
    #print days, target_days,target_date
    return target_date


#forecasting models
def do_logistic_regression(x_train,y_train):
    classifier = LogisticRegression()
    classifier.fit(x_train, y_train)
    return classifier


def do_random_forest(x_train,y_train):
    classifier = RandomForestClassifier()
    classifier.fit(x_train, y_train)
    return classifier


def do_svm(x_train,y_train):
    classifier = SVC()
    classifier.fit(x_train, y_train)
    return classifier


#operate and evaluate the models

def test_predictor(classifier,x_test,y_test):
    pred = classifier.predict(x_test)

    hit_count = 0
    total_count = len(y_test)
    for index in range(total_count):
        if (pred[index]) == (y_test[index]):
            hit_count = hit_count + 1
 
    hit_ratio = hit_count/total_count
    score = classifier.score(x_test, y_test)
    #print "hit_count=%s, total=%s, hit_ratio = %s" % (hit_count,total_count,hit_ratio)

    return hit_ratio, score
    # Output the hit-rate and the confusion matrix for each model
 
    #print("%s\n" % confusion_matrix(pred, y_test))



if __name__ == "__main__":
    # Calculate and output the CADF test on the residuals

    avg_hit_ratio = 0  
    for time_lags in range(1,6):
        print("- Time Lags=%s" % (time_lags))

        for company in ['samsung','hyundai']:
            df_company = load_stock_data('%s_2010to2017.csv'%(company))

            df_dataset = make_dataset(df_company,time_lags)
            X_train,X_test,Y_train,Y_test = split_dataset(df_dataset,["Close_Lag%s"%(time_lags),"Volume_Lag%s"%(time_lags)],"Close_Direction",0.75)
            #print X_test

            lr_classifier = do_logistic_regression(X_train,Y_train)
            lr_hit_ratio, lr_score = test_predictor(lr_classifier,X_test,Y_test)

            rf_classifier = do_random_forest(X_train,Y_train)
            rf_hit_ratio, rf_score = test_predictor(rf_classifier,X_test,Y_test)

            svm_classifier = do_svm(X_train,Y_train)
            svm_hit_ratio, svm_score = test_predictor(rf_classifier,X_test,Y_test)

            print("%s : Hit Ratio - Logistic Regreesion=%0.2f, RandomForest=%0.2f, SVM=%0.2f" % (company,lr_hit_ratio,rf_hit_ratio,svm_hit_ratio))



- Time Lags=1
samsung : Hit Ratio - Logistic Regreesion=0.54, RandomForest=0.53, SVM=0.53
hyundai : Hit Ratio - Logistic Regreesion=0.47, RandomForest=0.47, SVM=0.47

- Time Lags=2
samsung : Hit Ratio - Logistic Regreesion=0.54, RandomForest=0.49, SVM=0.49
hyundai : Hit Ratio - Logistic Regreesion=0.47, RandomForest=0.47, SVM=0.47

- Time Lags=3
samsung : Hit Ratio - Logistic Regreesion=0.54, RandomForest=0.49, SVM=0.49
hyundai : Hit Ratio - Logistic Regreesion=0.46, RandomForest=0.45, SVM=0.45

- Time Lags=4
samsung : Hit Ratio - Logistic Regreesion=0.54, RandomForest=0.52, SVM=0.52
hyundai : Hit Ratio - Logistic Regreesion=0.47, RandomForest=0.45, SVM=0.45

- Time Lags=5
samsung : Hit Ratio - Logistic Regreesion=0.54, RandomForest=0.48, SVM=0.48
hyundai : Hit Ratio - Logistic Regreesion=0.47, RandomForest=0.40, SVM=0.40

Mean Reversion Indicators : Hust, Half-life about Samsung & Hyundai

# 평균회귀 테스트 (주가 데이터가 랜덤워크를 보인다면 평균회귀 모델 적용 불가)
# mean reversion test

# 그러므로 비독립적이고 시계열인지 판별하기 위해 많이 사용하는 방법으로 ADF와 허스트 지수가 있다.
#Two test exist; ADF and Hust indicator

#ADF 테스트: 어떤 시계열 데이터가 랜덤워크를 따른다는 가설을 세우고, 이 가설을 검증해 랜덤워크인지 아닌지를 판단
#ADF test is to hypothesize that some time series data follow a random walk, and to test this hypothesis to determine whether it is a random walk or not



# Calculate and output the CADF test on the residuals
# 1st value : test statistic / 2nd : p-value /4th : number of data / 5th : critical value

def load_stock_data(file_name):
df = pd.read_pickle(file_name)
return df

df_samsung = load_stock_data('samsung_2010to2017.csv')
df_hyundai = load_stock_data('hyundai_2010to2017.csv')
adf_result = ts.adfuller(df_samsung["Close"])
pprint.pprint(adf_result)


# t statistic -0.01 is bigger than all of critical value (1,5,10%),
# therefore price of samsung cannot be used for regression analysis


(-0.013073453583186293,
 0.95743747408121949,
 4,
 1881,
 {'1%': -3.4338312580685653,
  '10%': -2.5675886567541726,
  '5%': -2.8630777789723392},
 42203.249409277974)

#허스트 지수 (hust indicator)
#정상과정은 랜덤워크 GBM보다 천천히 값이 퍼져나가게 된다.
#분산을 확산속도로 치환할수 있고 그 값을 GBM(기하적 브라운운동) 속도와 비교하면 랜덤워크인지 판별 가능

#H=0.5 -> GBM 동일 랜덤워크( same as GBM)
#H=0 -> 평균회귀 (mean reversion)
#H=1 -> 추세 성향 (momentum)

def get_hurst_exponent(df,lags_count=100):
    lags = range(2, lags_count)
    ts = np.log(df)

    tau = [np.sqrt(np.std(np.subtract(ts[lag:], ts[:-lag]))) for lag in lags]
    poly = np.polyfit(np.log(lags), np.log(tau), 1)

    result = poly[0]*2.0

    return result


df_samsung = load_stock_data('samsung_2010to2017.csv')
df_hyundai = load_stock_data('hyundai_2010to2017.csv')

hurst_samsung = get_hurst_exponent(df_samsung['Close'])
hurst_hyundai = get_hurst_exponent(df_hyundai['Close'])
print("Hurst Exponent : Samsung=%s, Hyundai=%s" % (hurst_samsung,hurst_hyundai))

Hurst Exponent : Samsung=0.477958163648, Hyundai=0.422901700868

#평균회귀의 half life
#삼성과 현대는 평균회귀 모델을 적용하기에 부적합
#하지만 두가지 테스트 통과에 실패했더라도 알파 창출하는 종목있다. 알파 창출 여부를 위한 테스트 Half-life

#half life는 값이 평균으로 회귀하는데 걸리는 시간을 의미하는것
#평균회귀 성향이 있는 랜덤과정을 Ornstein-Uhlenbeck Process 이라고 한다.

#half-life는 펑균회귀 속도인 lambda와 반비례 관계에 있다.
#half-life값이 크다는 것은 장기간 지속하는 경향이 있다. 반대로 작다는 것은 변동성이 크다

#half-life의 수치는 사용한 시간 단위이기 때문에 데이터 시간 단위에 주의를 기울여야한다.


def get_half_life(df):
    price = pd.Series(df)  
    lagged_price = price.shift(1).fillna(method="bfill")  
    delta = price - lagged_price  
    beta = np.polyfit(lagged_price, delta, 1)[0] 
    half_life = (-1*np.log(2)/beta) 

    return half_life

half_life_samsung = get_half_life(df_samsung['Close'])
half_life_hyundai = get_half_life(df_hyundai['Close'])
print("Half_life : Samsung=%s, Hyundai=%s" % (half_life_samsung,half_life_hyundai))

Half_life : Samsung=971.701411991, Hyundai=160.672993043

Correlation in python

#import functions and load dataset
#prerequisite : understanding of stationarity, (auto)covariance, (auto)correlation

import os,sys,datetime
import numpy as np
import pandas as pd
import pandas_datareader.data as web
import matplotlib.pyplot as plt
from pandas.compat import range, lrange, lmap, map, zip
from pandas.tools.plotting import scatter_matrix,autocorrelation_plot

def load_stock_data(file_name):
df = pd.read_pickle(file_name)
return df

def get_autocorrelation_dataframe(series):
    def r(h):
        return ((data[:n - h] - mean) * (data[h:] - mean)).sum() / float(n) / c0

    n = len(series)
    data = np.asarray(series)

    mean = np.mean(data)
    c0 = np.sum((data - mean) ** 2) / float(n)

    x = np.arange(n) + 1
    y = lmap(r, x)

    df = pd.DataFrame(y, index=x)

    return df


df_samsung = load_stock_data('samsung_2010to2017.csv')
df_hyundai = load_stock_data('hyundai_2010to2017.csv')

# samsung correlation

df_samsung_corr = get_autocorrelation_dataframe(df_samsung['Close'])

print(df_samsung_corr)

             0
1     0.994718
2     0.989251
3     0.984319
4     0.979771
5     0.975310
6     0.970901
7     0.966464
8     0.961763


#covariance between samsung and hyundai

print(df_samsung['Close'].cov(df_hyundai['Close']))

print(df_samsung['Close'].corr(df_hyundai['Close']))

512188378.7
0.0408314240626

fig, axs = plt.subplots(2,1)
axs[1].xaxis.set_visible(False) 

df_samsung['Close'].plot(ax=axs[0])
df_samsung_corr[0].plot(kind='bar',ax=axs[1])

plt.show()

descriptive statistic in python

#Download Samsung stock data and write descriptive statistic
#삼성전자 주가 정보 다운로드 및 descriptive statistic 작성

#import fucntions
import datetime
import pandas as pd
import pandas_datareader.data as web
import matplotlib.pyplot as plt
from pandas.tools.plotting import scatter_matrix


# download and load dataset
def download_stock_data(file_name,company_code,year1,month1,date1,year2,month2,date2):
start = datetime.datetime(year1, month1, date1)
end = datetime.datetime(year2, month2, date2)
df = web.DataReader("%s.KS" % (company_code), "yahoo", start, end)

df.to_pickle(file_name)

return df

def load_stock_data(file_name):
df = pd.read_pickle(file_name)
return df

download_stock_data('samsung_2010to2017.csv','005930',2010,1,1,2017,4,14)
download_stock_data('hyundai_2010to2017.csv','005380',2010,1,1,2017,4,14)

data = load_stock_data('samsung_2010to2017.csv')

#draw all figures
data.plot()
plt.show()




#check data
print(data)

                 Open       High        Low      Close  Volume   Adj Close
Date                                                                      
2010-01-04   803000.0   809000.0   800000.0   809000.0  239000   751191.79
2010-01-05   826000.0   829000.0   815000.0   822000.0  558500   763262.86
2010-01-06   829000.0   841000.0   826000.0   841000.0  458900   780905.19
2010-01-07   841000.0   841000.0   813000.0   813000.0  442100   754905.97
2010-01-08   820000.0   821000.0   806000.0   821000.0  295500   762334.32
2010-01-11   821000.0   823000.0   797000.0   797000.0  397900   740049.27



#print descriptive statistic
print(data.describe())

               Open          High           Low         Close        Volume  \
count  1.886000e+03  1.886000e+03  1.886000e+03  1.886000e+03  1.886000e+03   
mean   1.235572e+06  1.247218e+06  1.223781e+06  1.235772e+06  2.786047e+05   
std    2.936847e+05  2.962787e+05  2.925842e+05  2.949172e+05  1.427531e+05   
min    6.840000e+05  6.970000e+05  6.720000e+05  6.800000e+05  0.000000e+00   
25%    9.822500e+05  9.960000e+05  9.712500e+05  9.822500e+05  1.947250e+05   
50%    1.280000e+06  1.291000e+06  1.268000e+06  1.280000e+06  2.523500e+05   
75%    1.410000e+06  1.423000e+06  1.399000e+06  1.410000e+06  3.370000e+05   
max    2.110000e+06  2.134000e+06  2.094000e+06  2.128000e+06  1.276000e+06   

          Adj Close  
count  1.886000e+03  
mean   1.181318e+06  
std    3.017163e+05  
min    6.351751e+05  
25%    9.169498e+05  
50%    1.226299e+06  
75%    1.344629e+06  
max    2.128000e+06  
#check summary of qunaile score
print(data.quantile([.25,.5,.75,1]))
           Open       High        Low      Close     Volume    Adj Close
0.25   982250.0   996000.0   971250.0   982250.0   194725.0   916949.800
0.50  1280000.0  1291000.0  1268000.0  1280000.0   252350.0  1226299.135
0.75  1410000.0  1423000.0  1399000.0  1410000.0   337000.0  1344628.580
1.00  2110000.0  2134000.0  2094000.0  2128000.0  1276000.0  2128000.000

#check histogramme
(n, bins, patched) = plt.hist(data['Open'])
data['Open'].plot(kind='kde')
plt.axvline(data['Open'].mean(),color='red')
plt.show()

for index in range(len(n)):
 print("Bin : %0.f, Frequency = %0.f" % (bins[index],n[index]))


Bin : 684000, Frequency = 243
Bin : 826600, Frequency = 219
Bin : 969200, Frequency = 98
Bin : 1111800, Frequency = 265
Bin : 1254400, Frequency = 560
Bin : 1397000, Frequency = 306
Bin : 1539600, Frequency = 95
Bin : 1682200, Frequency = 29
Bin : 1824800, Frequency = 33
Bin : 1967400, Frequency = 38

#draw scatter_matrix without considering 'volume'
scatter_matrix(data[['Open','High','Low','Close']], alpha=0.2, figsize=(6, 6), diagonal='kde')

#draw  box plot
data[['Open','High','Low','Close','Adj Close']].plot(kind='box')
plt.show()

02/05/2017

How to calculate correlation coefficients for US stock sectors

Click and see the complete code


#required

library(dplyr)
library(quantmod)
library(dygraphs)


#get etf data from http://www.sectorspdr.com/sectorspdr/ 

tickers<- c="" p="">sectorNames <- c="" discretionary="" nbsp="" onsumer="" p="" staples="">                 "Energy", "Financials", "Health Care", "Industrials",
                 "Materials", "Information Technology", "Utilities", "Index")
etf_ticker_sectors <- data_frame="" p="" sectornames="" tickers="">
#check data
etf_ticker_sectors

# # A tibble: 10 × 2
# tickers            sectorNames
#                  
#   1      XLY Consumer Discretionary
# 2      XLP       Consumer Staples
# 3      XLE                 Energy
# 4      XLF             Financials
# 5      XLV            Health Care
# 6      XLI            Industrials
# 7      XLB              Materials
# 8      XLK Information Technology
# 9      XLU              Utilities
# 10     SPY                  Index


#calculate weekly return

sector_weekly_returns <- function="" p="" tickers="">
#download data
  symbols <- auto.assign="TRUE," getsymbols="" tickers="" warnings="FALSE)</p">
#get only close price
  prices <- cl="" do.call="" function="" get="" lapply="" merge="" p="" symbols="" x="">
#calculate weekly log-based return using a function, periodReturn()
  weekly_returns <- do.call="" lapply="" merge="" nbsp="" p="" prices="">                                          function(x) periodReturn(x, period = 'weekly', type = 'log')))


#Change the column names to the sector names from our dataframe above.

  colnames(weekly_returns) <- div="" etf_ticker_sectors="" sectornames="">
  return(weekly_returns)

}

weekly_returns = sector_weekly_returns(tickers)

weekly_returns
# 2008-10-10 -0.2205639196
# 2008-10-17  0.0518524493
# 2008-10-24 -0.0684872067
# 2008-10-31  0.1065890897
# 2008-11-07 -0.0311525633
# 2008-11-14 -0.0802735506
# 2008-11-21 -0.0855222458
# 2008-11-28  0.1248006016



#get rolling correlation between sector etf and s&p 500 etf

sector_index_correlation <- function="" p="" window="" x="">
#merge return of sector  and s&p500
  merged_xts <- merge="" ndex="" p="" weekly_returns="" x="">
#calculate rolling correlations using rollapply()
#pairwise.complete.obs automatically removes NA

  merged_xts$rolling_cor <- merged_xts="" nbsp="" p="" rollapply="" window="">                                      function(x) cor(x[,1], x[,2], use = "pairwise.complete.obs"),
                                      by.column = FALSE)

  names(merged_xts) <- c="" correlation="" ector="" p="" returns="">
  return(merged_xts)
}

#use a created function we made above

IT_SPY_correlation <- p="" sector_index_correlation="">  weekly_returns$'Information Technology', 26)

#draw graph using Dygragh

dygraph(IT_SPY_correlation$'Sector/SPY Correlation', main = "Correlation between SP500 and Tech ETF") %>%
  dyAxis("y", label = "Correlation") %>%
  dyRangeSelector(height = 20) %>%
  # Add shading for the recessionary period
  dyShading(from = "2007-12-01", to = "2009-06-01", color = "#FFE6E6") %>%
  # Add an event for the financial crisis.
  dyEvent(x = "2008-09-15", label = "Fin Crisis", labelLoc = "top", color = "red")






-reference-
http://blog.naver.com/htk1019/220966797230

How to calculate F-score in R

Click and see the complete code


#Get NVDA financial data for recent 3 years

library(quantmod)
NVDA    = getFinancials("NVDA",auto.assign = FALSE)
NVDA.BS = viewFinancials(NVDA, type='BS', period='A')
NVDA.IS = viewFinancials(NVDA, type='IS', period='A')
NVDA.CF = viewFinancials(NVDA, type='CF', period='A')


#Get the financial data to calculate F-Score

TA = NVDA.BS[rownames(NVDA.BS)=="Total Assets",]
CA = NVDA.BS[rownames(NVDA.BS)=="Total Current Assets",]
CL = NVDA.BS[rownames(NVDA.BS)=="Total Current Liabilities",]
NCL = NVDA.BS[rownames(NVDA.BS)=="Total Long Term Debt",]
NI = NVDA.IS[rownames(NVDA.IS)=="Net Income",]
CFO = NVDA.CF[rownames(NVDA.CF)=="Cash from Operating Activities",]
SALES = NVDA.IS[rownames(NVDA.IS)=="Revenue",]
NUMSHARES = NVDA.BS[rownames(NVDA.BS)=="Total Common Shares Outstanding",]
GP = NVDA.IS[rownames(NVDA.IS)=="Gross Profit",]


#calculate financial ratio

ROA = NI/TA
TURN = SALES/TA
CR = CA/CL
LDE = NCL / TA
GM = GP/SALES

#conditions for fscore

F1 = as.integer(ROA[1]>0)
F2 = as.integer(CFO[1]>0)
F3 = as.integer((CFO-NI)[1]>0)
F4 = as.integer(NUMSHARES[1]-NUMSHARES[2]<=0)
F5 = as.integer(TURN[1]-TURN[2]>0)
F6 = as.integer(CR[1]-CR[2]>0)
F7 = as.integer(LDE[1]-LDE[2]<=0)
F8 = as.integer(GM[1]-GM[2]>0)
F9 = as.integer(ROA[1]-ROA[2]>0)

F = F1+F2+F3+F4+F5+F6+F7+F8+F9


#define function

getFScore <-function code="" p="">{

  Company    = getFinancials(code,auto.assign = FALSE)
  Company.BS = viewFinancials(Company, type='BS', period='A')
  Company.IS = viewFinancials(Company, type='IS', period='A')
  Company.CF = viewFinancials(Company, type='CF', period='A')

  TA = Company.BS[rownames(Company.BS)=="Total Assets",]
  CA = Company.BS[rownames(Company.BS)=="Total Current Assets",]
  CL = Company.BS[rownames(Company.BS)=="Total Current Liabilities",]
  NCL = Company.BS[rownames(Company.BS)=="Total Long Term Debt",]
  NI = Company.IS[rownames(Company.IS)=="Net Income",]
  CFO = Company.CF[rownames(Company.CF)=="Cash from Operating Activities",]
  SALES = Company.IS[rownames(Company.IS)=="Revenue",]
  NUMSHARES = Company.BS[rownames(Company.BS)=="Total Common Shares Outstanding",]
  GP = Company.IS[rownames(Company.IS)=="Gross Profit",]

  ROA = NI/TA
  TURN = SALES/TA
  CR = CA/CL
  LDE = NCL / TA
  GM = GP/SALES

  F1 = as.integer(ROA[1]>0)
  F2 = as.integer(CFO[1]>0)
  F3 = as.integer((CFO-NI)[1]>0)
  F4 = as.integer(NUMSHARES[1]-NUMSHARES[2]<=0)
  F5 = as.integer(TURN[1]-TURN[2]>0)
  F6 = as.integer(CR[1]-CR[2]>0)
  F7 = as.integer(LDE[1]-LDE[2]<=0)
  F8 = as.integer(GM[1]-GM[2]>0)
  F9 = as.integer(ROA[1]-ROA[2]>0)

  F_SCORE = F1+F2+F3+F4+F5+F6+F7+F8+F9

  return (F_SCORE)
}

codes <- c="" p="">CompanyFscores = c()
for(code in codes)
{
  CompanyFscores=rbind(CompanyFscores,getFScore(code))
}
rownames(CompanyFscores) = codes
colnames(CompanyFscores) = "FScore"

CompanyFscores

#result
     FScore
GOOG      6
IBM       5
MSFT      5
ORCL      5
NVDA      7
AAPL      5


-reference-
http://blog.naver.com/htk1019/220955604506

Multi-Asset Momentum Strategy in R


Click and see the complete code

1) Determine 8 indexes which is representing each asset
2) Get Long Momentum (105 days) and Short Momentum (20 days) to rank
3) Determine the category with the lowest sum of the two ranks. (1st + 5th = 6)
4) To see if the current price is higher than the past 3-month average.
    If it is higher, then do invest


required(quantmod, PerformanceAnalytics, TTR)


#get adjust prices

symbols <- c("NAESX", #small cap "PREMX", #emerging bond "VEIEX", #emerging markets "VFICX", #intermediate investment grade "VFIIX", #GNMA mortgage "VFINX", #S&P 500 index "VGSIX", #MSCI REIT "VGTSX", #total intl stock idx "VUSTX") #long term treasury (cash)

getSymbols(symbols, from="1990-01-01")


#save to Prices

prices <- list() for(i in 1:length(symbols)) { prices[[i]] <- Ad(get(symbols[i])[,6]) }




#change to Dataframe and remove NA values

prices <- cbind="" colnames="" do.call="" gsub="" na.omit="" prices="" span="" z="">



#split cash apart from ranking calculation

cashPrices<-prices font="" prices="">



#calculate momentum



nShort <- 20="" span="">
nLong <- 105="" span="">
nSMA <- 3="" span="">

momShort <- -="" 1="" lag="" nshort="" prices="" span="">

momLong <- -="" 1="" lag="" nlong="" prices="" span="">

PricesQ<-prices endpoints="" on="quarters" prices="" span="">
PricesM<-prices endpoints="" on="months" prices="" span="">

momShortQ<-momshort endpoints="" on="quarters" prices="" span="">
momLongQ<-momlong endpoints="" on="quarters" prices="" span="">


#rank by momentum

srank <- 1="" apply="" br="" lrank="" momlongq="" momshortq="" rank="" t="">
#as there is a chance that both values are same, I put more value on long momentum totRank <- 1.01="" 1="" apply="" function="" lrank="" max="" maxrank="" rankpos="" rankrow="" return="" span="" srank="" t="" totrank="">



# check whether current price is higher than average of past 3 months prices.

PricesSMAsM<- apply="" index="" n="nSMA)," order.by="index(PricesM))" pricesq="" pricessmasm="" pricessmasq="" ricesm="" ricesq="" smafilter="" xts=""> PricesSMAsQ


# find intersections between the two

lastPos<- br="" lastpos="" na.omit="" rankpos="" smafilter="">
#invest to cash if there is nothing to invest cash <- cash="" font="" join="inner" lastpos="" merge="" order.by="index(lastPos))" rowsums="" xts="">


#calculate return

prices<-merge cashprices="" eturn.calculate="" font="" join="inner" lastpos="" na.omit="" prices="" return.portfolio="" returns="" stratrets="">


#evaluate the model

table.AnnualizedReturns(stratRets) maxDrawdown(stratRets) charts.PerformanceSummary(stratRets)

portfolio.returns Annualized Return 0.1680 Annualized Std Dev 0.1599 Annualized Sharpe (Rf=0%) 1.0507
maxDrawndown=[1] 0.2978453




-reference-
http://blog.naver.com/htk1019/220924952051