Tokyo.R#53で得たパッケージ情報とその実践

第53回のTokyo.Rで気になったパッケージの情報と実行例をいくつかあげました。スライドなどもろもろの発表はこちらの方のブログ「第53回R勉強会@東京で発表してきた」が非常に詳しく書かれています。

【目次】
・ggradarパッケージ
・proxyパッケージ
・因果推論（CBPSパッケージ）
・MXNetパッケージ
・missForestパッケージ
・RFinanceパッケージ

ggradarパッケージ

簡単にレーダーチャートを作れるパッケージです。こちらのブログを参考にしています。

install.packages("devtools")
devtools::install_github("ricardo-bion/ggradar")

1 2	install.packages("devtools") devtools::install_github("ricardo-bion/ggradar")

企業の職場環境に関してまとめられた某口コミサイトから4個ほどデータを拝借してきました。

> CompanyVoiceData
  company growth stability salary rewarding idea difficulty welfare education
1  google    5.0       5.0    4.9       5.0  4.3        5.0     5.0       4.6
2   yahoo    3.9       5.0    3.2       3.8  3.7        3.9     3.1       3.3
3 recruit    4.4       4.8    5.0       5.0  5.0        5.0     4.0       5.0
4  amazon    5.0       5.0    4.2       4.0  4.2        5.0     3.6       3.3

> CompanyVoiceData

company growth stability salary rewarding idea difficulty welfare education

1 google 5.0 5.0 4.9 5.0 4.3 5.0 5.0 4.6

2 yahoo 3.9 5.0 3.2 3.8 3.7 3.9 3.1 3.3

3 recruit 4.4 4.8 5.0 5.0 5.0 5.0 4.0 5.0

4 amazon 5.0 5.0 4.2 4.0 4.2 5.0 3.6 3.3

ggradarをそのまま使おうとすると、Circular Air Lightというフォントが必要だと怒られるので、参考のブログにある通り、OSXの場合はこちらをダブルクリックでインストールして再起動します。

先ほどのデータに対して、以下のコードを実行すれば非常に簡単にレーダーチャートが作れました。

library("ggradar")
CompanyVoiceData <- data.frame(read.csv(file ="company_voice.csv",header = TRUE))

ggradar(CompanyVoiceData, 
        grid.max = max(CompanyVoiceData[, 2:ncol(CompanyVoiceData)]),
        background.circle.colour = "#ffdd99", #背景色の指定
        background.circle.transparency = 1, #背景色の透明度を指定
        group.line.width = 2, #線の太さの指定
        group.point.size = 6, #シンボルの大きさの指定
        axis.label.size = 5, #軸ラベルサイズの指定
        gridline.min.colour = "#4b61ba", #最小円の線色の指定
        gridline.mid.colour = "#a87963", #中円の線色の指定
        gridline.max.colour = "#e1e6ea", #最大円の線色の指定
        grid.line.width = 1.5, #各円の線の太さの指定
        gridline.min.linetype = "longdash", #線種の指定
        gridline.mid.linetype = "longdash", #線種の指定
        gridline.max.linetype = "longdash") #線種の指定

library("ggradar")

CompanyVoiceData <- data.frame(read.csv(file ="company_voice.csv",header = TRUE))

ggradar(CompanyVoiceData,

grid.max = max(CompanyVoiceData[, 2:ncol(CompanyVoiceData)]),

background.circle.colour = "#ffdd99", #背景色の指定

background.circle.transparency = 1, #背景色の透明度を指定

group.line.width = 2, #線の太さの指定

group.point.size = 6, #シンボルの大きさの指定

axis.label.size = 5, #軸ラベルサイズの指定

gridline.min.colour = "#4b61ba", #最小円の線色の指定

gridline.mid.colour = "#a87963", #中円の線色の指定

gridline.max.colour = "#e1e6ea", #最大円の線色の指定

grid.line.width = 1.5, #各円の線の太さの指定

gridline.min.linetype = "longdash", #線種の指定

gridline.mid.linetype = "longdash", #線種の指定

gridline.max.linetype = "longdash") #線種の指定

proxyパッケージ

距離や類似度を計算するパッケージです。
先ほどのデータに対して類似度と距離を計算してみます。

library(proxy)
> simil(CompanyVoiceData[,-1])
          1         2         3
2 0.2286639                    
3 0.6373648 0.1339713          
4 0.6499133 0.5787506 0.4188571
> dist(CompanyVoiceData[,-1])
         1        2        3
2 3.522783                  
3 1.435270 3.401470         
4 2.269361 1.989975 2.393742

library(proxy)

> simil(CompanyVoiceData[,-1])

1 2 3

2 0.2286639

3 0.6373648 0.1339713

4 0.6499133 0.5787506 0.4188571

> dist(CompanyVoiceData[,-1])

1 2 3

2 3.522783

3 1.435270 3.401470

4 2.269361 1.989975 2.393742

こんな感じで、類似度や距離の計算ができます。

因果推論

こちらはパッケージとかそういうものではなく、既存の関数などで計算できるようです。
こちらのブログ、「調査観察データにおける因果推論(3) – Rによる傾向スコア，IPW推定量，二重にロバストな推定量の算出」に詳しく書かれています。
・glm関数での傾向スコアの算出
・傾向スコアを共変量としてlm関数で回帰分析
・コードを愚直に書いてIPW推定量の算出
・期待値の標準誤差を出すための関数を作成
・DR推定量の算出をするための関数を作成
などで、推定自体は実現できるようです。

ただし、CBPS(Covariate Balancing Propensity Score)というパッケージがあるらしく、このパッケージを用いれば因果推論の計算を行えるようです。

Package ‘CBPS’
以下のようなExampleコードが載っていたので、実行してみましたが、なかなか結果が返ってこなかったので不安になりました。計算が終わるまで10分以上はかかったと思います。

library(CBPS)
data(Blackwell)

form1<-"d.gone.neg ~ d.gone.neg.l1 + d.gone.neg.l2 + d.neg.frac.l3 + camp.length + camp.length +
deminc + base.poll + year.2002 + year.2004 + year.2006 + base.und + office"

##Fitting the models in Imai and Ratkovic (2014)
##Warning: may take a few mintues; setting time.vary to FALSE
##Results in a quicker fit but with poorer balance
fit1 <- CBMSM(formula = form1, time=Blackwell$time,id=Blackwell$demName,data=Blackwell, type="MSM",
            iterations = NULL, twostep = TRUE, msm.variance = "full", time.vary = TRUE)
fit2 <- CBMSM(formula = form1, time=Blackwell$time,id=Blackwell$demName,data=Blackwell, type="MSM",
            iterations = NULL, twostep = TRUE, msm.variance = "approx", time.vary = TRUE)

##Assessing balance
bal1 <- balance(fit1)
bal2 <- balance(fit2)

##Effect estimation: Replicating Effect Estimates in
##Table 3 of Imai and Ratkovic (2014)
lm1 <- lm(demprcnt[time==1]~fit1$treat.hist,data=Blackwell,weights=fit1$glm.weights)
lm2 <- lm(demprcnt[time==1]~fit1$treat.hist,data=Blackwell,weights=fit1$weights)
lm3 <- lm(demprcnt[time==1]~fit1$treat.hist,data=Blackwell,weights=fit2$weights)
lm4 <- lm(demprcnt[time==1]~fit1$treat.cum,data=Blackwell,weights=fit1$glm.weights)
lm5 <- lm(demprcnt[time==1]~fit1$treat.cum,data=Blackwell,weights=fit1$weights)
lm6 <- lm(demprcnt[time==1]~fit1$treat.cum,data=Blackwell,weights=fit2$weights)

library(CBPS)

data(Blackwell)

form1<-"d.gone.neg ~ d.gone.neg.l1 + d.gone.neg.l2 + d.neg.frac.l3 + camp.length + camp.length +

deminc + base.poll + year.2002 + year.2004 + year.2006 + base.und + office"

##Fitting the models in Imai and Ratkovic (2014)

##Warning: may take a few mintues; setting time.vary to FALSE

##Results in a quicker fit but with poorer balance

fit1 <- CBMSM(formula = form1, time=Blackwell$time,id=Blackwell$demName,data=Blackwell, type="MSM",

iterations = NULL, twostep = TRUE, msm.variance = "full", time.vary = TRUE)

fit2 <- CBMSM(formula = form1, time=Blackwell$time,id=Blackwell$demName,data=Blackwell, type="MSM",

iterations = NULL, twostep = TRUE, msm.variance = "approx", time.vary = TRUE)

##Assessing balance

bal1 <- balance(fit1)

bal2 <- balance(fit2)

##Effect estimation: Replicating Effect Estimates in

##Table 3 of Imai and Ratkovic (2014)

lm1 <- lm(demprcnt[time==1]~fit1$treat.hist,data=Blackwell,weights=fit1$glm.weights)

lm2 <- lm(demprcnt[time==1]~fit1$treat.hist,data=Blackwell,weights=fit1$weights)

lm3 <- lm(demprcnt[time==1]~fit1$treat.hist,data=Blackwell,weights=fit2$weights)

lm4 <- lm(demprcnt[time==1]~fit1$treat.cum,data=Blackwell,weights=fit1$glm.weights)

lm5 <- lm(demprcnt[time==1]~fit1$treat.cum,data=Blackwell,weights=fit1$weights)

lm6 <- lm(demprcnt[time==1]~fit1$treat.cum,data=Blackwell,weights=fit2$weights)

MXNet

XGBoostのパッケージを作ったチームが手がけているパッケージで、深層学習を実行できます。

インストール方法はここに書かれています。
Deep Learning for R

install.packages("drat", repos="https://cran.rstudio.com")
drat:::addRepo("dmlc")
install.packages("mxnet")

install.packages("drat", repos="https://cran.rstudio.com")

drat:::addRepo("dmlc")

install.packages("mxnet")

あれ、OSXではエラーが返ってきてライブラリが読み込めないですね。どうやら私のためにあるようなブログ「Installing mxnet for R on Yosemite」があったので、時間を見つけてチャレンジしてみようと思います。

ディープラーニングを用いた回帰分析については、Neural Network with MXNet in Five Minutesにコードがもろもろ載っていますので、チャレンジしてみると良いと思います。

リンク先に載っているのですが、一応コードを以下に記しておきます。

data(BostonHousing, package="mlbench")

train.ind = seq(1, 506, 3)
train.x = data.matrix(BostonHousing[train.ind, -14])
train.y = BostonHousing[train.ind, 14]
test.x = data.matrix(BostonHousing[-train.ind, -14])
test.y = BostonHousing[-train.ind, 14]

# Define the input data
data <- mx.symbol.Variable("data")
# A fully connected hidden layer
# data: input source
# num_hidden: number of neurons in this hidden layer
fc1 <- mx.symbol.FullyConnected(data, num_hidden=1)

# Use linear regression for the output layer
lro <- mx.symbol.LinearRegressionOutput(fc1)

preds = predict(model, test.x)

## Auto detect layout of input matrix, use rowmajor..
sqrt(mean((preds-test.y)^2))

demo.metric.mae <- mx.metric.custom("mae", function(label, pred) {
  res <- mean(abs(label-pred))
  return(res)
})

mx.set.seed(0)
model <- mx.model.FeedForward.create(lro, X=train.x, y=train.y,
                                     ctx=mx.cpu(), num.round=50, array.batch.size=20,
                                     learning.rate=2e-6, momentum=0.9, eval.metric=demo.metric.mae)

data(BostonHousing, package="mlbench")

train.ind = seq(1, 506, 3)

train.x = data.matrix(BostonHousing[train.ind, -14])

train.y = BostonHousing[train.ind, 14]

test.x = data.matrix(BostonHousing[-train.ind, -14])

test.y = BostonHousing[-train.ind, 14]

# Define the input data

data <- mx.symbol.Variable("data")

# A fully connected hidden layer

# data: input source

# num_hidden: number of neurons in this hidden layer

fc1 <- mx.symbol.FullyConnected(data, num_hidden=1)

# Use linear regression for the output layer

lro <- mx.symbol.LinearRegressionOutput(fc1)

preds = predict(model, test.x)

## Auto detect layout of input matrix, use rowmajor..

sqrt(mean((preds-test.y)^2))

demo.metric.mae <- mx.metric.custom("mae", function(label, pred) {

res <- mean(abs(label-pred))

return(res)

})

mx.set.seed(0)

model <- mx.model.FeedForward.create(lro, X=train.x, y=train.y,

ctx=mx.cpu(), num.round=50, array.batch.size=20,

learning.rate=2e-6, momentum=0.9, eval.metric=demo.metric.mae)

missForest

ランダムフォレストを用いて、欠損値補完を行うためのパッケージです。目的変数が欠損していても適用できるようです。
詳しくは、スライドを見ていただいた方がいいですが、以下のプログラムで実行できました。ちなみにスライドはこちら、「Imputation of Missing Values using Random Forest」

library(missForest)
library(dplyr)

#ggplot2のデータセットを読み込む
data(diamonds, package = "ggplot2")
dia.sample <- sample_n(diamonds, size=2000)
dia.sample <- as.data.frame(dia.sample)

#既存データセットに5%の欠損を与える
dia.mis <- prodNA(dia.sample, noNA=0.05)

#補完の実行
dia.imp <- missForest(dia.mis, verbose=TRUE)
dia.imp %>% str(max.level=1)

#補完精度の推定
dia.imp$OOBerror
dia.imp <- missForest(dia.mis, verbose=TRUE, variablewise=TRUE)

#補完精度の検証
mixError(ximp = dia.imp$ximp,
         xmis = dia.mis,
         xtrue = dia.sample)

library(missForest)

library(dplyr)

#ggplot2のデータセットを読み込む

data(diamonds, package = "ggplot2")

dia.sample <- sample_n(diamonds, size=2000)

dia.sample <- as.data.frame(dia.sample)

#既存データセットに5%の欠損を与える

dia.mis <- prodNA(dia.sample, noNA=0.05)

#補完の実行

dia.imp <- missForest(dia.mis, verbose=TRUE)

dia.imp %>% str(max.level=1)

#補完精度の推定

dia.imp$OOBerror

dia.imp <- missForest(dia.mis, verbose=TRUE, variablewise=TRUE)

#補完精度の検証

mixError(ximp = dia.imp$ximp,

xmis = dia.mis,

xtrue = dia.sample)

RFinanceYJ

Yohei Sato, Nobuaki Oshiro, Shinichi Takayanagiさんたちが作った、Yahoo!ファイナンスの株価データを取得できるパッケージです。だいぶ前からあったようですが、使って分析している人は初めて見ました。どうやらYahoo!ファイナンスの仕様によって書き換えていかないといけないようです。「2015-01-20 Rでチャートを書いてみる(9)」のブログに実行可能なプログラムがあります。以下、実行可能なコードを転載いたします。

library(RFinanceYJ)

#API
quoteStockTsData <- function(x, since=NULL,start.num=0,date.end=NULL,time.interval='daily')
{
  time.interval <- substr(time.interval,1,1)
  function.stock <- function(quote.table.item){
    if( xmlSize(quote.table.item) < 5) return(NULL) 
    d <- convertToDate(xmlValue(quote.table.item[[1]]),time.interval)
    o <- as.number(xmlValue(quote.table.item[[2]]))
    h <- as.number(xmlValue(quote.table.item[[3]]))
    l <- as.number(xmlValue(quote.table.item[[4]]))
    c <- as.number(xmlValue(quote.table.item[[5]]))
    v <- ifelse(xmlSize(quote.table.item) >= 6,as.number(xmlValue(quote.table.item[[6]])),0)
    a <- ifelse(xmlSize(quote.table.item) >= 7,as.number(xmlValue(quote.table.item[[7]])),0)
    return(data.frame(date=d,open=o,high=h,low=l,close=c,volume=v, adj_close=a))
  }
  return(quoteTsData(x,function.stock,since,start.num,date.end,time.interval,type="stock"))
}
quoteFundTsData <- function(x, since=NULL,start.num=0,date.end=NULL,time.interval='daily')
{
  time.interval <- substr(time.interval,1,1)
  function.fund <- function(quote.table.item){
    d <- convertToDate(xmlValue(quote.table.item[[1]]),time.interval)
    if(time.interval=='monthly'){
      d <- endOfMonth(d)
    }
    c <- as.number(xmlValue(quote.table.item[[2]]))
    v <- as.number(xmlValue(quote.table.item[[3]]))
    return(data.frame(date=d,constant.value=c,NAV=v))
  }
  return(quoteTsData(x,function.fund,since,start.num,date.end,time.interval,type="fund"))
}
quoteFXTsData <- function(x, since=NULL,start.num=0,date.end=NULL,time.interval='daily')
{
  time.interval <- substr(time.interval,1,1)
  function.fx <- function(quote.table.item){
    d <- convertToDate(xmlValue(quote.table.item[[1]]),time.interval)
    o <- as.number(xmlValue(quote.table.item[[2]]))
    h <- as.number(xmlValue(quote.table.item[[3]]))
    l <- as.number(xmlValue(quote.table.item[[4]]))
    c <- as.number(xmlValue(quote.table.item[[5]]))
    return(data.frame(date=d,open=o,high=h,low=l,close=c))
  }
  return(quoteTsData(x,function.fx,since,start.num,date.end,time.interval,type="fx"))
}
######  private functions  #####
#get time series data from Yahoo! Finance.
quoteTsData <- function(x,function.financialproduct,since,start.num,date.end,time.interval,type="stock"){
  r <- NULL
  result.num <- 51
  financial.data <- data.frame(NULL)
  #start <- (gsub("([0-9]{4,4})-([0-9]{2,2})-([0-9]{2,2})","&c=\\1&a=\\2&b=\\3",since))
  #end   <- (gsub("([0-9]{4,4})-([0-9]{2,2})-([0-9]{2,2})","&f=\\1&d=\\2&e=\\3",date.end))
  start <- (gsub("([0-9]{4,4})-([0-9]{2,2})-([0-9]{2,2})","&sy=\\1&sm=\\2&sd=\\3",since))
  end   <- (gsub("([0-9]{4,4})-([0-9]{2,2})-([0-9]{2,2})","&ey=\\1&em=\\2&ed=\\3",date.end))
  
  if(!any(time.interval==c('d','w','m'))) stop("Invalid time.interval value")
  
  extractQuoteTable <- function(r,type){
    if(type %in% c("fund","fx")){
      tbl <- r[[2]][[2]][[7]][[3]][[3]][[9]][[2]]
    }
    else{
      tbl <- r[[2]][[2]][[7]][[3]][[3]][[10]][[2]]
    }
    return(tbl)
  }
  
  #while( result.num >= 51 ){
  while(1){
    start.num <- start.num + 1
    quote.table <- NULL
    quote.url <- paste('http://info.finance.yahoo.co.jp/history/?code=',x,start,end,'&p=',start.num,'&tm=',substr(time.interval,1,1),sep="")
    #cat(quote.url)
    #try( r <- xmlRoot(htmlTreeParse(quote.url,error=xmlErrorCumulator(immediate=F))), TRUE)  # これだと取得時にエラーが出た。。
    try(r<-htmlParse(quote.url))
    if( is.null(r) ) stop(paste("Can not access :", quote.url))
    
    #try( quote.table <- r[[2]][[1]][[1]][[16]][[1]][[1]][[1]][[4]][[1]][[1]][[1]], TRUE )
    #try( quote.table <- extractQuoteTable(r,type), TRUE )
    try( quote.table <- xpathApply(r,"//table")[[2]], TRUE )
    
    quote.size<-xmlSize(quote.table)
    #cat(paste("size:",quote.size))
    if(xmlSize(quote.table)<=1){
      return (financial.data)
    }
    if( is.null(quote.table) ){
      if( is.null(financial.data) ){
        stop(paste("Can not quote :", x))
      }else{
        financial.data <- financial.data[order(financial.data$date),]
        return(financial.data)
      }
    }
    
    size <- xmlSize(quote.table)
    for(i in 2:size){
      financial.data <- rbind(financial.data,function.financialproduct(quote.table[[i]]))
    }
    
    #result.num <- xmlSize(quote.table)
    Sys.sleep(1)
  }
  financial.data <- financial.data[order(financial.data$date),]
  return(financial.data)  
}
#convert string formart date to POSIXct object
convertToDate <- function(date.string,time.interval)
{
  #data format is different between monthly and dialy or weekly
  if(any(time.interval==c('d','w'))){
    result <- gsub("^([0-9]{4})([^0-9]+)([0-9]{1,2})([^0-9]+)([0-9]{1,2})([^0-9]+)","\\1-\\3-\\5",date.string)
  }else if(time.interval=='m'){
    result <- gsub("^([0-9]{4})([^0-9]+)([0-9]{1,2})([^0-9]+)","\\1-\\3-01",date.string)
  }
  return(as.POSIXct(result))
}
#convert string to number.
as.number <- function(string)
{
  return(as.double(as.character(gsub("[^0-9.]", "",string))))
}
#return end of month date.
endOfMonth <- function(date.obj)
{
  startOfMonth     <- as.Date(format(date.obj,"%Y%m01"),"%Y%m%d")
  startOfNextMonth <- as.Date(format(startOfMonth+31,"%Y%m01"),"%Y%m%d")
  return(startOfNextMonth-1)
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

library(RFinanceYJ)

#API

quoteStockTsData <- function(x, since=NULL,start.num=0,date.end=NULL,time.interval='daily')

{

time.interval <- substr(time.interval,1,1)

function.stock <- function(quote.table.item){

if( xmlSize(quote.table.item) < 5) return(NULL)

d <- convertToDate(xmlValue(quote.table.item[[1]]),time.interval)

o <- as.number(xmlValue(quote.table.item[[2]]))

h <- as.number(xmlValue(quote.table.item[[3]]))

l <- as.number(xmlValue(quote.table.item[[4]]))

c <- as.number(xmlValue(quote.table.item[[5]]))

v <- ifelse(xmlSize(quote.table.item) >= 6,as.number(xmlValue(quote.table.item[[6]])),0)

a <- ifelse(xmlSize(quote.table.item) >= 7,as.number(xmlValue(quote.table.item[[7]])),0)

return(data.frame(date=d,open=o,high=h,low=l,close=c,volume=v, adj_close=a))

}

return(quoteTsData(x,function.stock,since,start.num,date.end,time.interval,type="stock"))

}

quoteFundTsData <- function(x, since=NULL,start.num=0,date.end=NULL,time.interval='daily')

{

time.interval <- substr(time.interval,1,1)

function.fund <- function(quote.table.item){

d <- convertToDate(xmlValue(quote.table.item[[1]]),time.interval)

if(time.interval=='monthly'){

d <- endOfMonth(d)

}

c <- as.number(xmlValue(quote.table.item[[2]]))

v <- as.number(xmlValue(quote.table.item[[3]]))

return(data.frame(date=d,constant.value=c,NAV=v))

}

return(quoteTsData(x,function.fund,since,start.num,date.end,time.interval,type="fund"))

}

quoteFXTsData <- function(x, since=NULL,start.num=0,date.end=NULL,time.interval='daily')

{

time.interval <- substr(time.interval,1,1)

function.fx <- function(quote.table.item){

d <- convertToDate(xmlValue(quote.table.item[[1]]),time.interval)

o <- as.number(xmlValue(quote.table.item[[2]]))

h <- as.number(xmlValue(quote.table.item[[3]]))

l <- as.number(xmlValue(quote.table.item[[4]]))

c <- as.number(xmlValue(quote.table.item[[5]]))

return(data.frame(date=d,open=o,high=h,low=l,close=c))

}

return(quoteTsData(x,function.fx,since,start.num,date.end,time.interval,type="fx"))

}

###### private functions #####

#get time series data from Yahoo! Finance.

quoteTsData <- function(x,function.financialproduct,since,start.num,date.end,time.interval,type="stock"){

r <- NULL

result.num <- 51

financial.data <- data.frame(NULL)

#start <- (gsub("([0-9]{4,4})-([0-9]{2,2})-([0-9]{2,2})","&c=\\1&a=\\2&b=\\3",since))

#end <- (gsub("([0-9]{4,4})-([0-9]{2,2})-([0-9]{2,2})","&f=\\1&d=\\2&e=\\3",date.end))

start <- (gsub("([0-9]{4,4})-([0-9]{2,2})-([0-9]{2,2})","&sy=\\1&sm=\\2&sd=\\3",since))

end <- (gsub("([0-9]{4,4})-([0-9]{2,2})-([0-9]{2,2})","&ey=\\1&em=\\2&ed=\\3",date.end))

if(!any(time.interval==c('d','w','m'))) stop("Invalid time.interval value")

extractQuoteTable <- function(r,type){

if(type %in% c("fund","fx")){

tbl <- r[[2]][[2]][[7]][[3]][[3]][[9]][[2]]

}

else{

tbl <- r[[2]][[2]][[7]][[3]][[3]][[10]][[2]]

}

return(tbl)

}

#while( result.num >= 51 ){

while(1){

start.num <- start.num + 1

quote.table <- NULL

quote.url <- paste('http://info.finance.yahoo.co.jp/history/?code=',x,start,end,'&p=',start.num,'&tm=',substr(time.interval,1,1),sep="")

#cat(quote.url)

#try( r <- xmlRoot(htmlTreeParse(quote.url,error=xmlErrorCumulator(immediate=F))), TRUE) # これだと取得時にエラーが出た。。

try(r<-htmlParse(quote.url))

if( is.null(r) ) stop(paste("Can not access :", quote.url))

#try( quote.table <- r[[2]][[1]][[1]][[16]][[1]][[1]][[1]][[4]][[1]][[1]][[1]], TRUE )

#try( quote.table <- extractQuoteTable(r,type), TRUE )

try( quote.table <- xpathApply(r,"//table")[[2]], TRUE )

quote.size<-xmlSize(quote.table)

#cat(paste("size:",quote.size))

if(xmlSize(quote.table)<=1){

return (financial.data)

}

if( is.null(quote.table) ){

if( is.null(financial.data) ){

stop(paste("Can not quote :", x))

}else{

financial.data <- financial.data[order(financial.data$date),]

return(financial.data)

}

size <- xmlSize(quote.table)

for(i in 2:size){

financial.data <- rbind(financial.data,function.financialproduct(quote.table[[i]]))

}

#result.num <- xmlSize(quote.table)

Sys.sleep(1)

}

financial.data <- financial.data[order(financial.data$date),]

return(financial.data)

}

#convert string formart date to POSIXct object

convertToDate <- function(date.string,time.interval)

{

#data format is different between monthly and dialy or weekly

if(any(time.interval==c('d','w'))){

result <- gsub("^([0-9]{4})([^0-9]+)([0-9]{1,2})([^0-9]+)([0-9]{1,2})([^0-9]+)","\\1-\\3-\\5",date.string)

}else if(time.interval=='m'){

result <- gsub("^([0-9]{4})([^0-9]+)([0-9]{1,2})([^0-9]+)","\\1-\\3-01",date.string)

}

return(as.POSIXct(result))

}

#convert string to number.

as.number <- function(string)

{

return(as.double(as.character(gsub("[^0-9.]", "",string))))

}

#return end of month date.

endOfMonth <- function(date.obj)

{

startOfMonth <- as.Date(format(date.obj,"%Y%m01"),"%Y%m%d")

startOfNextMonth <- as.Date(format(startOfMonth+31,"%Y%m01"),"%Y%m%d")

return(startOfNextMonth-1)

}

このコードでYahoo!ジャパンの株価を見てみましょう。ちなみに番号は4689です。どうやら上手く取れているようです。

> quoteStockTsData("4689.t",since="2016-01-01")
         date open high low close   volume adj_close
1  2016-05-02  476  483 475   478 18498100       478
2  2016-04-28  504  508 493   496 11966300       496
3  2016-04-27  505  511 495   497 12973800       497
4  2016-04-26  507  508 495   500  7712600       500
5  2016-04-25  513  515 506   509  7350600       509
6  2016-04-22  515  517 509   514  8908900       514
7  2016-04-21  512  517 506   514 13249900       514
8  2016-04-20  511  515 493   506 14455700       506
9  2016-04-19  516  523 511   516 13345800       516
10 2016-04-18  503  509 499   503 10275900       503
11 2016-04-15  504  519 504   513 16962900       513

> quoteStockTsData("4689.t",since="2016-01-01")

date open high low close volume adj_close

1 2016-05-02 476 483 475 478 18498100 478

2 2016-04-28 504 508 493 496 11966300 496

3 2016-04-27 505 511 495 497 12973800 497

4 2016-04-26 507 508 495 500 7712600 500

5 2016-04-25 513 515 506 509 7350600 509

6 2016-04-22 515 517 509 514 8908900 514

7 2016-04-21 512 517 506 514 13249900 514

8 2016-04-20 511 515 493 506 14455700 506

9 2016-04-19 516 523 511 516 13345800 516

10 2016-04-18 503 509 499 503 10275900 503

11 2016-04-15 504 519 504 513 16962900 513

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル