규제와검증

카테고리 없음

규제와검증

배채 2025. 4. 23. 08:19

✅ 7주차: 규제(Regularization)와 검증(Validation) 요약

1. 🎯 머신러닝의 도전 과제

문제설명

대표성 부족	훈련 데이터가 전체 데이터 분포를 잘 대표하지 못함
낮은 품질의 데이터	오류, 이상치, 노이즈가 많은 데이터
관련 없는 특성	유효하지 않은 변수들이 많으면 성능 저하
과대적합(Overfitting)	훈련 데이터에 너무 최적화되어 테스트에 약함
과소적합(Underfitting)	데이터 복잡성을 모델이 못 따라감

2. 🧠 규제 기법

기법설명

L1 정규화 (Lasso)	가중치의 절댓값을 줄여 일부 가중치를 0으로 → 특성 선택 효과
L2 정규화 (Ridge)	가중치 제곱합을 줄여 모든 가중치를 작게 유지
Dropout	학습 중 무작위 뉴런을 꺼서 과대적합 방지
Early Stopping	검증 성능이 더 이상 향상되지 않으면 학습 조기 종료

3. 🔍 검증 방법

검증법설명

Hold-out	데이터를 학습/검증/테스트로 나눠 단순 분리
k-fold CV	데이터를 k등분하여 번갈아가며 검증
Leave-p-out CV	p개를 검증에 사용, 데이터가 적을 때 유리

✅ 실습 과제

숫자 예측 + Dropout 실습
숫자 예측 + k-fold Cross Validation 실습

📌 Dropout을 포함한 MLP 예제 (Keras 기반)

from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.utils import to_categorical

# 데이터 로딩
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(-1, 784).astype('float32') / 255
x_test = x_test.reshape(-1, 784).astype('float32') / 255
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# 모델 정의
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.3))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(10, activation='softmax'))

# 컴파일 및 학습
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, batch_size=128, validation_split=0.2)

📌 k-fold Cross Validation 예제 (Scikit-learn 기반)

from sklearn.datasets import load_digits
from sklearn.model_selection import cross_val_score, KFold
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

# 데이터 로드
digits = load_digits()
X, y = digits.data, digits.target

# 파이프라인 구성
model = make_pipeline(StandardScaler(), LogisticRegression(max_iter=1000))

# k-fold CV
kfold = KFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X, y, cv=kfold, scoring='accuracy')

# 결과 출력
print("각 Fold 정확도:", scores)
print("평균 정확도:", scores.mean())

현재글규제와검증

코딩IT

같이공부해요

node.js, ｃ언어별찍기, 2020년9월정보처리기사, 중첩반복문, 정보처리기사2020년9월, 정처기기출분석, node-express, 정보처리기사필기, 노드익스프레스, 정처기공부법, C언어기초, ｃ언어, 정처기기출, 정보처리기사필기기출, 정보처리기사필기기출분석, C언어, ｃ언어중첩반복문, 정처기필기, 데이터형, 정보처리기사기출,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

코딩IT

규제와검증

✅ 7주차: 규제(Regularization)와 검증(Validation) 요약

1. 🎯 머신러닝의 도전 과제

2. 🧠 규제 기법

3. 🔍 검증 방법

✅ 실습 과제

'카테고리 없음'의 다른글

티스토리툴바