AI 개발

[Keras] 튜토리얼10 - 앙상블(ensemble)

● 머신러닝에서 앙상블(ensemble)이란?

앙상블 기법은 동일한 학습 알고리즘을 사용해 여러 모델을 학습하는 기법입니다.

괜찮은 Single Learner(단일 학습기)보다 Weak Learner를 결합하면 더 좋은 성능을 얻을 수 있다는 아이디에서 출발한 방법이기도 합니다. 즉, 성능이 좋지 않은 모델을 모아 성능이 좋은 모델 하나를 만드는 것입니다.

Weak Learner :
Learner는 특정한 데이터를 이용해 인스턴스화 한 모델을 뜻하며, Weak Learner는 최종적인 결과물보다 상대적으로 정확하지 않은 결과를 보이는 learner를 말합니다.

예전에는 Kaggle(국제적인 머신러닝 문제풀이 사이트)에서 평타이상을 치는 모델로 랜덤포레스트를 사용하였는데 요즘에는 XGBoost를 많이 사용한다고 합니다. 둘 다 앙상블 기법을 쓰는 방법입니다.

앙상블의 대표적 기법에는 두가지가 있습니다.

Bagging : Bootstrap Aggregating
Boosting : Adapted Boost

○ Bagging : Bootstrap aggregating

병렬적, 빠르다

기본 데이터를 샘플링하여, n개의 데이터셋을 만들어 n개의 모델을 학습시키고 최종 결과를 aggregation(집계 : 평균을 내서 종합)하는 방법입니다. 샘플링 후에는 n개의 모델이 독립적으로 동시에 각각의 데이터셋을 학습하기 때문에 병렬적이고 빠르다고 이야기합니다.

먼저 대상 데이터로부터 복원 랜덤 샘플링을 하고, 추출한 데이터를 표본 집단으로 삼습니다. 거기에 동일한 모델을 학습시킨 후 학습된 모델의 예측 변수들을 집계하여 그 결과로 모델을 생성합니다.

특히 Bagging은 높은 bias로 인한 underfitting, 높은 Variance로 인한 overfitting 문제를 해결하는데 도움이 됩니다. 이에 관해서는 다음 장에 포스팅하도록 하겠습니다.

○ Boosting : Adapted Boost

직렬적, 상대적으로 느리다

첫번째 모델이 기본 데이터셋을 그대로 학습하고, 다음 모델은 전체 데이터를 학습하되, 첫번째 모델이 맞추지 못한 데이터에 더 큰 중점을 두고 학습합니다. 또 그 뒤의 모델은 앞의 두 모델이 맞추지 못한 데이터에 중점을 두고 학습을 진행하고 위 같은 방법이 반복됩니다.

때문에 Bagging에 비해 Boosting은 맞추기 어려운 문제를 맞추는데 특화되어 있습니다. 7번째 모델이 앞 1~6모델이 맞추지 못했던 문제를 맞춘다면 7번 모델을 최종 모델로 선정합니다. 앞 모델의 학습이 끝나야 뒷 모델이 그 결과를 기반으로 가중치를 결정하고 학습하기 때문에 직렬적이라고 이야기하며 느리다고 이야기합니다.

Boosting의 경우 정확도가 높게 나타나지만 그만큼 Outlier에 취약합니다.(outlier : 잘못 평가된 값)

이 외에 Stacking이라는 기법도 있는데 서로 다른 모델을 조합해 최고의 성능을 내는 모델을 생성하는 것으로 여기에는 SVM, RandomForest, KNN등 다양한 알고리즘을 사용할 수 있고 이런 조합을 통해 장점은 취하고 약점은 보완하게 됩니다.

Bagging & Boosting에 관한 좋은 참고 영상

○ 케라스로 구현하기

만약 x1, x2 인풋데이터 2개, y1, y2 아웃풋데이터 2개가 있을 때 모델 구성을 어떻게 해야할까요?

CASE 1
: (x1,x2) ---------------- (y1,y2)
데이터를 concatenate( or append) 하여 취합해서 하나의 모델에 넣고 학습 시키는 방법.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

# (x1,x2) ---------------- (y1,y2)
 
# 1. 데이터
import numpy as np
x1 = np.array([range(100), range(311,411), range(100)])
y1 = np.array([range(501,601), range(711,811), range(100)])
 
x2 = np.array([range(100,200), range(311,411), range(100,200)])
y2 = np.array([range(501,601), range(711,811), range(100)])
 
x1 = np.transpose(x1)
y1 = np.transpose(y1)
x2 = np.transpose(x2)
y2 = np.transpose(y2)
 
# x = np.concatenate((x1, x2), axis = 1) # axis=0 y축방향 병합 (200,3)
# y = np.concatenate((y1, y2), axis = 1)
x = np.append(x1,x2,axis=1)
y = np.append(y1,y2,axis=1)
 
print('x shape : ', x.shape)
print('y shape : ', y.shape)
 
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=66, test_size=0.4)
x_test, x_val, y_test, y_val = train_test_split(x_test, y_test, random_state=66, test_size=0.5)
 
# 2. 모델 구성
from keras.models import Sequential, Model
from keras.layers import Dense, Input
 
input1 = Input(shape=(6,)) 
dense1 = Dense(10, activation='relu')(input1) 
dense2 = Dense(5)(dense1) 
dense3 = Dense(4)(dense2)
output1 = Dense(6)(dense3)
 
model = Model(inputs=input1, outputs=output1)
model.summary()
Colored by Color Scripter

cs

하지만 위와 같은 경우는 x1과 x2의 데이터가 위처럼 상이하고 연관이 없는 데이터라면 당연히 좋지 않은 결과를 내겠죠?

CASE 2
: (x1)-- --(y1)
MERGE
(x2)-- --(y2)
앙상블 기법 이용하기

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94

# 1. 데이터
import numpy as np
x1 = np.array([range(100), range(311,411), range(100)])
y1 = np.array([range(501,601), range(711,811), range(100)])
 
x2 = np.array([range(100,200), range(311,411), range(100,200)])
y2 = np.array([range(501,601), range(711,811), range(100)])
 
x1 = np.transpose(x1)
y1 = np.transpose(y1)
x2 = np.transpose(x2)
y2 = np.transpose(y2)
 
print('x1s shape : ', x1.shape)
print('x2s shape : ', x2.shape)
print('y1s shape : ', y1.shape)
print('y2s shape : ', y2.shape)
 
from sklearn.model_selection import train_test_split
x1_train, x1_test, y1_train, y1_test = train_test_split(x1, y1, random_state=66, test_size=0.4)
x1_test, x1_val, y1_test, y1_val = train_test_split(x1_test, y1_test, random_state=66, test_size=0.5)
x2_train, x2_test, y2_train, y2_test = train_test_split(x2, y2, random_state=66, test_size=0.4)
x2_test, x2_val, y2_test, y2_val = train_test_split(x2_test, y2_test, random_state=66, test_size=0.5)
 
# 2. 2개의 모델 구성
from keras.models import Sequential, Model
from keras.layers import Dense, Input
 
## model 1
input1 = Input(shape=(3,)) 
dense1 = Dense(5, activation='relu')(input1) 
dense2 = Dense(3)(dense1) 
dense3 = Dense(4)(dense2)
middle1 = Dense(3)(dense3)
 
## model 2
input2 = Input(shape=(3,)) 
dense1 = Dense(5, activation='relu')(input2) 
dense2 = Dense(3)(dense1) 
dense3 = Dense(4)(dense2)
middle2 = Dense(3)(dense3)
 
# 모델 합치기 concatenate
from keras.layers.merge import concatenate
merge1 = concatenate([middle1, middle2]) 
 
output1 = Dense(30)(merge1)
output1 = Dense(13)(output1)
output1 = Dense(3)(output1)
 
output2 = Dense(15)(merge1)
output2 = Dense(32)(output2)
output2 = Dense(3)(output2)
 
model = Model(inputs=[input1, input2], outputs=[output1, output2])
 
# 3. 훈련
model.compile(loss='mse', optimizer='adam', metrics=['mse'])
model.fit([x1_train, x2_train],[y1_train, y2_train], epochs=100, batch_size=1, validation_data=([x1_val, x2_val],[y1_val, y2_val])) 
# input, output para 2
 
# 4. 평가 예측
mse = model.evaluate([x1_test, x2_test], [y1_test, y2_test], batch_size=1)
# in, out, merge 모델이 5개....라서 mse 5개?
print('loss(mse) : ', mse) # compile에서 lose = mse, loss와 mse둘다 만들 필요 없음
print('loss(mse) : ', mse[0])
print('loss(mse) : ', mse[1])
print('loss(mse) : ', mse[2])
print('loss(mse) : ', mse[3])
print('loss(mse) : ', mse[4])
 
y1_predict, y2_predict = model.predict([x1_test, x2_test])
print('PREDICT : ', y1_predict, y2_predict) #RMSE와 R2를 위해 쪼개기
 
# RMSE 구하기
from sklearn.metrics import mean_squared_error
def RMSE(y_test, y_predict):
    return np.sqrt(mean_squared_error(y_test, y_predict))
 
RMSE1=RMSE(y1_test, y1_predict)
RMSE2=RMSE(y2_test, y2_predict)
print('RMSE(y1_test) : ', RMSE1)
print('RMSE(y2_test) : ', RMSE2)
print('AVG(RMSE) : ', (RMSE1+RMSE2)/2)
 
# R2 구하기
from sklearn.metrics import r2_score
def R2(y_test, y_predict):
    return r2_score(y_test, y_predict)
 
R2_1 = R2(y1_test, y1_predict)
R2_2 = R2(y2_test, y2_predict)
print('R2(y1_test) : ', R2_1)
print('R2(y2_test) : ', R2_2)
print('AVG(R2) : ', (R2_1+R2_2)/2)
 
Colored by Color Scripter

cs

## model 1
input1 = Input(shape=(3,))
dense1 = Dense(5, activation='relu')(input1)
dense2 = Dense(3)(dense1)
dense3 = Dense(4)(dense2)
middle1 = Dense(3)(dense3)

위와 같은 모델을 2개 생성하여 output 레이어(여기서는 middle1, middle2)를

merge1 = concatenate([middle1, middle2])

concatenate() 함수를 이용하여 하나의 merge1 레이어로 만듭니다.

output1 = Dense(30)(merge1)
output1 = Dense(13)(output1)
output1 = Dense(3)(output1)

그리고 위와 같이 output레이어도 2개 생성하여 주고

model = Model(inputs=[input1, input2], outputs=[output1, output2])

함수형 모델을 생성할 때의 방법으로 최종 모델을 생성해 줍니다.

여기서도 주의할 점은 input과 output의 쉐입을 잘 맞춰야 한다는 것인데요,

x1 = np.array([range(100), range(311,411), range(100)])
y1 = np.array([range(501,601), range(711,811), range(100)])

x1 = [ [0,1,2....99], [311,312,....410], [0,1,2....99] ] x1은 위처럼 벡터 3개로 구성되어 있는 하나의 배열입니다. 즉 열이 3개인 데이터 입니다. 이는 배열로 구성했을 때, 100행, 3열이 나오게 되므로 transpose()함수를 이용하여 3행100열이 되도록 만들어 줍니다. 쉐입을 맞출 때 가장 중요한 것은 항상 '열' 입니다.

인풋 데이터의 열은 3개이므로, input1 = Input(shape=(3,)) 로 맞춰주었고 마찬가지로 아웃풋 노드도 output1 = Dense(3)(output1)로 맞춰주어야 모델이 돌아갑니다.

결과

# RESULT
# loss(mse) :  [5.6742421872257864e-08, 1.2058405829407093e-08, 4.46840218160105e-08, 1.2058405829407093e-08, 4.46840218160105e-08]
# loss(mse) :  5.6742421872257864e-08
# loss(mse) :  1.2058405829407093e-08
# loss(mse) :  4.46840218160105e-08
# loss(mse) :  1.2058405829407093e-08
# loss(mse) :  4.46840218160105e-08
# PREDICT :  
# [[5.0499997e+02 7.1499988e+02 4.0001001e+00]
#  [5.9499994e+02 8.0500006e+02 9.4000084e+01]
#  [5.3899994e+02 7.4900000e+02 3.8000111e+01]
#  [5.0899988e+02 7.1899994e+02 8.0000963e+00]
#  [5.7299994e+02 7.8299988e+02 7.2000107e+01]
#  [5.8500006e+02 7.9500018e+02 8.4000145e+01]
#  [5.0099991e+02 7.1099994e+02 8.8636763e-05]
#  [5.1899994e+02 7.2900000e+02 1.8000116e+01]
#  [5.3399994e+02 7.4399994e+02 3.3000069e+01]
#  [5.2499994e+02 7.3499994e+02 2.4000057e+01]
#  [5.4399994e+02 7.5399994e+02 4.3000053e+01]
#  [5.7400000e+02 7.8400018e+02 7.3000107e+01]
#  [5.8099994e+02 7.9099994e+02 8.0000122e+01]
#  [5.9299988e+02 8.0300000e+02 9.2000153e+01]
#  [5.2599994e+02 7.3599994e+02 2.5000109e+01]
#  [5.8599988e+02 7.9600006e+02 8.5000191e+01]
#  [5.1499982e+02 7.2499982e+02 1.4000044e+01]
#  [5.5099988e+02 7.6100000e+02 5.0000076e+01]
#  [5.2999994e+02 7.4000000e+02 2.9000120e+01]
#  [5.4999994e+02 7.6000006e+02 4.9000084e+01]] [[5.0499969e+02 7.1499969e+02 4.0001326e+00]
#  [5.9499982e+02 8.0499988e+02 9.3999962e+01]
#  [5.3899976e+02 7.4899963e+02 3.8000027e+01]
#  [5.0899979e+02 7.1899969e+02 7.9999814e+00]
#  [5.7299988e+02 7.8299982e+02 7.1999947e+01]
#  [5.8499982e+02 7.9500000e+02 8.3999931e+01]
#  [5.0099979e+02 7.1099957e+02 7.2337687e-05]
#  [5.1899982e+02 7.2899963e+02 1.8000074e+01]
#  [5.3399976e+02 7.4399963e+02 3.3000027e+01]
#  [5.2499982e+02 7.3499969e+02 2.4000084e+01]
#  [5.4399976e+02 7.5399982e+02 4.2999954e+01]
#  [5.7399994e+02 7.8399994e+02 7.2999992e+01]
#  [5.8099976e+02 7.9099969e+02 8.0000084e+01]
#  [5.9299982e+02 8.0299976e+02 9.2000038e+01]
#  [5.2599982e+02 7.3599969e+02 2.4999977e+01]
#  [5.8599976e+02 7.9599969e+02 8.5000000e+01]
#  [5.1499976e+02 7.2499963e+02 1.3999987e+01]
#  [5.5099982e+02 7.6099963e+02 5.0000027e+01]
#  [5.2999976e+02 7.3999969e+02 2.9000111e+01]
#  [5.4999982e+02 7.5999976e+02 4.9000050e+01]]
# RMSE(y1_test) :  9.531085295207488e-05
# RMSE(y2_test) :  0.00021100277583431233
# AVG(RMSE) :  0.0001531568143931936
# R2(y1_test) :  0.9999999999902099
# R2(y2_test) :  0.9999999999520176
# AVG(R2) :  0.9999999999711138

CASE 2
: (x1)-- --(y1)
MERGE --(y2)
(x2)-- --(y3)
앙상블 기법 이용하기

이번에는 인풋이2 아웃풋이3인 모델입니다. 보통 인풋2 아웃풋1모델을 많이 사용한다고 해요

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103

# 1. 데이터
import numpy as np
x1 = np.array([range(100), range(311,411), range(100)])
y1 = np.array([range(501,601), range(711,811), range(100)])
 
x2 = np.array([range(100,200), range(311,411), range(100,200)])
y2 = np.array([range(501,601), range(711,811), range(100)])
y3 = np.array([range(401,501), range(211,311), range(100)])
 
x1 = np.transpose(x1)
y1 = np.transpose(y1)
x2 = np.transpose(x2)
y2 = np.transpose(y2)
y3 = np.transpose(y3)
 
from sklearn.model_selection import train_test_split
x1_train, x1_test, y1_train, y1_test = train_test_split(x1, y1, random_state=66, test_size=0.4)
x1_test, x1_val, y1_test, y1_val = train_test_split(x1_test, y1_test, random_state=66, test_size=0.5)
 
x2_train, x2_test, y2_train, y2_test = train_test_split(x2, y2, random_state=66, test_size=0.4)
x2_test, x2_val, y2_test, y2_val = train_test_split(x2_test, y2_test, random_state=66, test_size=0.5)
 
y3_train, y3_test = train_test_split(y3, random_state=66, test_size=0.4)
y3_test, y3_val = train_test_split(y3_test, random_state=66, test_size=0.5)
 
 
# 2. 2개의 모델 구성
from keras.models import Sequential, Model
from keras.layers import Dense, Input
 
input1 = Input(shape=(3,)) 
dense1 = Dense(5, activation='relu')(input1) 
dense2 = Dense(3)(dense1) 
dense3 = Dense(4)(dense2)
middle1 = Dense(3)(dense3)
 
input2 = Input(shape=(3,)) 
dense1 = Dense(5, activation='relu')(input2) 
dense2 = Dense(3)(dense1) 
dense3 = Dense(4)(dense2)
middle2 = Dense(3)(dense3)
 
# 모델 합치기 concatenate
from keras.layers.merge import concatenate
merge1 = concatenate([middle1, middle2]) 
 
output1 = Dense(30)(merge1)
output1 = Dense(13)(output1)
output1 = Dense(3)(output1)
 
output2 = Dense(15)(merge1)
output2 = Dense(32)(output2)
output2 = Dense(3)(output2)
 
output3 = Dense(20)(merge1)
output3 = Dense(5)(output3)
output3 = Dense(3)(output3) # output 3
 
model = Model(inputs=[input1, input2], outputs=[output1, output2, output3])
 
# 3. 훈련
model.compile(loss='mse', optimizer='adam', metrics=['mse'])
model.fit([x1_train, x2_train],[y1_train, y2_train, y3_train], epochs=100,
            batch_size=1, validation_data=([x1_val, x2_val],[y1_val, y2_val, y3_val])) 
 
# 4. 평가 예측
mse = model.evaluate([x1_test, x2_test], [y1_test, y2_test, y3_test], batch_size=1)
# in, out, merge 모델이 6개인데 mse는 왜 7개,,,,???
print('loss(mse) : ', mse)
# print('loss(mse) : ', mse[0])
# print('loss(mse) : ', mse[1])
# print('loss(mse) : ', mse[2])
# print('loss(mse) : ', mse[3])
# print('loss(mse) : ', mse[4])
 
y1_predict, y2_predict, y3_predict = model.predict([x1_test, x2_test])
print('PREDICT : ', y1_predict, y2_predict, y3_predict) # RMSE와 R2를 위해 쪼개기
 
# RMSE 구하기
from sklearn.metrics import mean_squared_error
def RMSE(y_test, y_predict):
    return np.sqrt(mean_squared_error(y_test, y_predict))
 
RMSE1=RMSE(y1_test, y1_predict)
RMSE2=RMSE(y2_test, y2_predict)
RMSE3=RMSE(y3_test, y3_predict)
print('RMSE(y1_test) : ', RMSE1)
print('RMSE(y2_test) : ', RMSE2)
print('RMSE(y3_test) : ', RMSE3)
print('AVG(RMSE) : ', (RMSE1+RMSE2+RMSE3)/3)
 
# R2 구하기
from sklearn.metrics import r2_score
def R2(y_test, y_predict):
    return r2_score(y_test, y_predict)
 
R2_1 = R2(y1_test, y1_predict)
R2_2 = R2(y2_test, y2_predict)
R2_3 = R2(y3_test, y3_predict)
print('R2(y1_test) : ', R2_1)
print('R2(y2_test) : ', R2_2)
print('R2(y3_test) : ', R2_3)
print('AVG(R2) : ', (R2_1+R2_2+R2_3)/3)
Colored by Color Scripter

cs

결과

# RESULT
# RMSE(y1_test) :  0.5913385228569722
# RMSE(y2_test) :  0.5735123463992801
# RMSE(y3_test) :  0.874459850466958
# AVG(RMSE) :  0.6797702399077368
# R2(y1_test) :  0.9996231426238475
# R2(y2_test) :  0.9996455212388652
# R2(y3_test) :  0.9991758914414962
# AVG(R2) :  0.9994815184347363

https://wikidocs.net/book/9214

https://keras.io/api/

저작자표시 비영리 변경금지

'AI 개발' 카테고리의 다른 글

[Keras] 튜토리얼12 - Scikit-learn의 Scaler (0)	2020.01.05
[Keras] 튜토리얼 11 - LSTM(feat. RNN) 구현하기 (2)	2020.01.04
[Keras] 튜토리얼9 - MLP(MultiLayer Perceptron) 구현하기 (0)	2019.12.30
[Anaconda] 개발환경 설치 및 WIN32 응용프로그램 오류 (3)	2019.12.27
[Keras] 튜토리얼8 - 함수형으로 모델 구축(functional API) (1)	2019.12.21

Contents

새소식

인기 검색어