ラーメン二郎の某飲食店レビューサイトデータに対して共分散構造分析をしてみる

データ

ラーメン二郎に関して、某飲食店レビューサイトのデータをWebスクレイピングしたもので、「料理・味」・「サービス」・「雰囲気」・「CP」・「酒・ドリンク」の評価項目に関して、1~5の実数値が割り振られています。ラーメン二郎の店舗数40店のうち、欠損のなかった37店舗の評価データとなります。

記述統計量・ヒストグラム

まずは記述統計量とヒストグラムを見てみます。

> describe(jirou_dataset[,-1])
                 vars  n mean   sd median trimmed  mad  min  max range  skew kurtosis   se
taste               1 37 3.58 0.09   3.59    3.59 0.01 3.10 3.73  0.63 -3.77    17.97 0.02
service             2 37 3.51 0.11   3.55    3.53 0.04 3.10 3.59  0.49 -2.64     6.41 0.02
atmosphere          3 37 3.47 0.16   3.53    3.50 0.03 3.00 3.59  0.59 -2.11     3.06 0.03
cost_performance    4 37 3.61 0.12   3.59    3.61 0.01 3.07 3.95  0.88 -1.46     9.16 0.02
drink               5 37 3.08 0.09   3.07    3.07 0.10 2.99 3.33  0.34  1.00     0.39 0.01

> describe(jirou_dataset[,-1])

vars n mean sd median trimmed mad min max range skew kurtosis se

taste 1 37 3.58 0.09 3.59 3.59 0.01 3.10 3.73 0.63 -3.77 17.97 0.02

service 2 37 3.51 0.11 3.55 3.53 0.04 3.10 3.59 0.49 -2.64 6.41 0.02

atmosphere 3 37 3.47 0.16 3.53 3.50 0.03 3.00 3.59 0.59 -2.11 3.06 0.03

cost_performance 4 37 3.61 0.12 3.59 3.61 0.01 3.07 3.95 0.88 -1.46 9.16 0.02

drink 5 37 3.08 0.09 3.07 3.07 0.10 2.99 3.33 0.34 1.00 0.39 0.01

#各評価項目に関してヒストグラムを描く
hist_jirou_dataset <- melt(jirou_dataset)
g <- ggplot(data = hist_jirou_dataset,
            aes(x = value,
                y = ..density..)) +
  geom_histogram(alpha = 0.5,position = "identity") +
  geom_density(alpha = 0)
g + facet_wrap(~variable,nrow=5)

#各評価項目に関してヒストグラムを描く

hist_jirou_dataset <- melt(jirou_dataset)

g <- ggplot(data = hist_jirou_dataset,

aes(x = value,

y = ..density..)) +

geom_histogram(alpha = 0.5,position = "identity") +

geom_density(alpha = 0)

g + facet_wrap(~variable,nrow=5)

まぁ、お酒を飲むところではないので、drinkは低いですよね。しかしながら、総じて3点台なのは驚きです。昔行ったことがあって、雰囲気は決して良くはないはずなので。

共分散構造分析

今回、検証したい仮説は「ラーメン店としての質が二郎愛につながるかどうか」です。

ラーメン店としての質に繋がりそうな評価項目
「料理・味」
「CP」
「酒・ドリンク」

二郎愛に繋がりそうな評価項目
「料理・味」
「サービス」
「雰囲気」

加えて、料理・味とCPは関係していそうな項目なので、その点もパスにおいて考慮しておきます。

以下の図は仮説のイメージです。

パスを描写する際の細かい指定は参考文献を参照されると良いと思います。
さっそく、モデルを作って実行します。

#lavaanパッケージのためにモデルを描写する。
library(lavaan)

model_jirou <- 'f1 =~ cost_performance + drink + taste
                f2 =~ taste + atmosphere + service
                f2 ~ f1
                cost_performance ~~ taste'

fit <- sem(model = model_jirou,
           data = jirou_dataset[,-1],
           estimator="MLM")

summary(fit)

#lavaanパッケージのためにモデルを描写する。

library(lavaan)

model_jirou <- 'f1 =~ cost_performance + drink + taste

f2 =~ taste + atmosphere + service

f2 ~ f1

cost_performance ~~ taste'

fit <- sem(model = model_jirou,

data = jirou_dataset[,-1],

estimator="MLM")

summary(fit)

こちらが推定結果です。

> summary(fit)
lavaan (0.5-23.1097) converged normally after 100 iterations

  Number of observations                            37

  Estimator                                         ML      Robust
  Minimum Function Test Statistic                2.057       1.965
  Degrees of freedom                                 2           2
  P-value (Chi-square)                           0.358       0.374
  Scaling correction factor                                  1.047
    for the Satorra-Bentler correction

Parameter Estimates:

  Information                                 Expected
  Standard Errors                           Robust.sem

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  f1 =~                                               
    cost_performnc    1.000                           
    drink            -0.153    0.134   -1.139    0.255
    taste             0.381    0.194    1.958    0.050
  f2 =~                                               
    taste             1.000                           
    atmosphere        4.008    1.972    2.032    0.042
    service           3.147    1.423    2.211    0.027

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  f2 ~                                                
    f1                0.095    0.130    0.731    0.465

Covariances:
                      Estimate  Std.Err  z-value  P(>|z|)
 .cost_performance ~~                                    
   .taste               -0.001    0.007   -0.115    0.908

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .cost_performnc    3.606    0.020  179.760    0.000
   .drink             3.081    0.014  214.293    0.000
   .taste             3.582    0.015  237.571    0.000
   .atmosphere        3.471    0.027  129.316    0.000
   .service           3.509    0.019  187.760    0.000
    f1                0.000                           
   .f2                0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .cost_performnc   -0.007    0.017   -0.390    0.696
   .drink             0.007    0.002    3.170    0.002
   .taste             0.002    0.003    0.712    0.477
   .atmosphere        0.005    0.004    1.273    0.203
   .service          -0.000    0.001   -0.329    0.742
    f1                0.021    0.017    1.242    0.214
   .f2                0.001    0.001    1.282    0.200

> summary(fit)

lavaan (0.5-23.1097) converged normally after 100 iterations

Number of observations 37

Estimator ML Robust

Minimum Function Test Statistic 2.057 1.965

Degrees of freedom 2 2

P-value (Chi-square) 0.358 0.374

Scaling correction factor 1.047

for the Satorra-Bentler correction

Parameter Estimates:

Information Expected

Standard Errors Robust.sem

Latent Variables:

Estimate Std.Err z-value P(>|z|)

f1 =~

cost_performnc 1.000

drink -0.153 0.134 -1.139 0.255

taste 0.381 0.194 1.958 0.050

f2 =~

taste 1.000

atmosphere 4.008 1.972 2.032 0.042

service 3.147 1.423 2.211 0.027

Regressions:

Estimate Std.Err z-value P(>|z|)

f2 ~

f1 0.095 0.130 0.731 0.465

Covariances:

Estimate Std.Err z-value P(>|z|)

.cost_performance ~~

.taste -0.001 0.007 -0.115 0.908

Intercepts:

Estimate Std.Err z-value P(>|z|)

.cost_performnc 3.606 0.020 179.760 0.000

.drink 3.081 0.014 214.293 0.000

.taste 3.582 0.015 237.571 0.000

.atmosphere 3.471 0.027 129.316 0.000

.service 3.509 0.019 187.760 0.000

f1 0.000

.f2 0.000

Variances:

Estimate Std.Err z-value P(>|z|)

.cost_performnc -0.007 0.017 -0.390 0.696

.drink 0.007 0.002 3.170 0.002

.taste 0.002 0.003 0.712 0.477

.atmosphere 0.005 0.004 1.273 0.203

.service -0.000 0.001 -0.329 0.742

f1 0.021 0.017 1.242 0.214

.f2 0.001 0.001 1.282 0.200

lavaan (0.5-23.1097) converged normally after 100 iterationsとあるので、適切に推定されたようです。自由度が2とデータ数が少ないためかなりギリギリな推定となっています。ただ、推定すべき母数の数よりもデータ数が一応多い状態ではあります。二郎インスパイア系の店のデータも集めた方がいいかもしれないですね。

モデルの評価

モデルの評価として適合度と母数の推定に関して見ていきます。

適合度
・適合度指標であるCFI(Comparative Fit Index)が1なので、適合度に関しては良さそうです。
・同じく適合度指標であるTLI(Tucker-Lewis Index)が0.998なので、適合度に関しては良さそうです。
・0.05以下であれば当てはまりが良いとされるRMSEAは0.028なので、当てはまりは良さそうです。
・0に近いほど良いとされるSRMRは0.024となっています。

母数の推定

f2 ~ f1はラーメン店としての質と二郎愛の関係を想定したものですが、p値が0.47と全然だめでした。ラーメン店としての質が二郎愛に繋がるという仮説は正しいとは言えないです。

参考文献にもあるよう、係数の解釈を行いやすくするために標準化推定値を求めます。

> standardizedsolution(fit)
                lhs op              rhs est.std     se      z pvalue
1                f1 =~ cost_performance   1.208  0.524  2.303  0.021
2                f1 =~            drink  -0.257  0.178 -1.448  0.148
3                f1 =~            taste   0.612  0.325  1.883  0.060
4                f2 =~            taste   0.400  0.120  3.327  0.001
5                f2 =~       atmosphere   0.901  0.066 13.723  0.000
6                f2 =~          service   1.016  0.049 20.694  0.000
7                f2  ~               f1   0.382  0.338  1.132  0.258
8  cost_performance ~~            taste  -0.201  1.651 -0.122  0.903
9  cost_performance ~~ cost_performance  -0.459  1.267 -0.362  0.717
10            drink ~~            drink   0.934  0.091 10.226  0.000
11            taste ~~            taste   0.278  0.399  0.698  0.485
12       atmosphere ~~       atmosphere   0.187  0.118  1.582  0.114
13          service ~~          service  -0.033  0.100 -0.333  0.740
14               f1 ~~               f1   1.000  0.000     NA     NA
15               f2 ~~               f2   0.854  0.258  3.309  0.001
16 cost_performance ~1                   29.960  8.666  3.457  0.001
17            drink ~1                   35.715  4.676  7.638  0.000
18            taste ~1                   39.595 15.318  2.585  0.010
19       atmosphere ~1                   21.553  4.333  4.974  0.000
20          service ~1                   31.293  7.949  3.937  0.000
21               f1 ~1                    0.000  0.000     NA     NA
22               f2 ~1                    0.000  0.000     NA     NA

> standardizedsolution(fit)

lhs op rhs est.std se z pvalue

1 f1 =~ cost_performance 1.208 0.524 2.303 0.021

2 f1 =~ drink -0.257 0.178 -1.448 0.148

3 f1 =~ taste 0.612 0.325 1.883 0.060

4 f2 =~ taste 0.400 0.120 3.327 0.001

5 f2 =~ atmosphere 0.901 0.066 13.723 0.000

6 f2 =~ service 1.016 0.049 20.694 0.000

7 f2 ~ f1 0.382 0.338 1.132 0.258

8 cost_performance ~~ taste -0.201 1.651 -0.122 0.903

9 cost_performance ~~ cost_performance -0.459 1.267 -0.362 0.717

10 drink ~~ drink 0.934 0.091 10.226 0.000

11 taste ~~ taste 0.278 0.399 0.698 0.485

12 atmosphere ~~ atmosphere 0.187 0.118 1.582 0.114

13 service ~~ service -0.033 0.100 -0.333 0.740

14 f1 ~~ f1 1.000 0.000 NA NA

15 f2 ~~ f2 0.854 0.258 3.309 0.001

16 cost_performance ~1 29.960 8.666 3.457 0.001

17 drink ~1 35.715 4.676 7.638 0.000

18 taste ~1 39.595 15.318 2.585 0.010

19 atmosphere ~1 21.553 4.333 4.974 0.000

20 service ~1 31.293 7.949 3.937 0.000

21 f1 ~1 0.000 0.000 NA NA

22 f2 ~1 0.000 0.000 NA NA

これによると、f1(ラーメン店としての質)に対してはコストパフォーマンスが最も影響を与えるようです。味よりもコストパフォーマンスが勝っているという考察になりますが、それはそれで面白いですね。他方、f2(二郎愛)に対しては味よりも雰囲気・サービスが影響を与えるようです。

semPlotパッケージを用いてパス図を出力してみます。

#パス図の出力
library(semPlot)

semPaths(fit,whatLabels = "stand",optimizeLatRes = T)

#パス図の出力

library(semPlot)

semPaths(fit,whatLabels = "stand",optimizeLatRes = T)

変数名が3文字に省略されているのですが、修正方法がパッと見つからなかったので、そのまま載せています。

初歩の初歩ですが、一通りの進め方がわかったので、今後も共分散構造分析にチャレンジしてみたいと思います。

参考文献

共分散構造分析 R編―構造方程式モデリング

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル