[Data Science Application]Probability Review_information theory(Entropy concepts and cross entropy)

728x90

what it means? that's the our main goal of this theme.

#1. measuring information

Shannon's definition of the information obtain o, outcome Si of a probabilistic experiments S:

unit of measurement is bit (Binary Infomation uniT)

엔트로피를 더 쉬운 개념으로 풀어보자.

1개의 동전을 던지는 기계가 있다고 하자, H =0.5 T = 0.5의 확률로 수렴할 것이다.

-> 결과(H,T) 결정하기위한 이진응답 number of bits는 1이다.

2개의 동전을 던지는 기계가 있다고하자(HH=0.25,HT=0.25,TH=0.25,TT=0.25) 의 확률로 수렴할 것이며

-> 결과(HH,TT,HT,TH) 결정하기위한 이진응답 number of bits는 2이다.

3개의 동전을 던지는 기계가 있다고하자 (HHH, TTT, HTH,HHT,TTH,THT,THH,HTT)결과를 결정하기 위한 이진응답 number of bits는 2^3 = 8 가지에대한 2의 제곱근 3이다.

여기서 비트는 동전던지기 결과를 전달하기위한 최소 비트수가된다. 비트의 정보량을 "엔트로피"라고 지칭한다.

#2. Expected information as Uncertainty or Entropy

1bit information (as I understand is) total summation of information.

as e.g. if we have fair coin toss expectation, it's reveal to us 1 bit of information.

still for the example, if we have a card dec that following probability N= 52- card deck. (1/52)

so, information

for an comprising M (mutually exclusive) outcomes will be M/52.

going into deeper, we can categorised whole 4 parts of deck. (heart, spade, diamonds, clover)

so we can call it N=52, M= 4 so info = log2(52/4) = log2(13) = 3.7bits

H(s) value of the information obtain by learning the outcome

all the p(s) are equal then,

below graph is one of the most famous example in binary entropy function

양쪽끝으로갈수록 uncertainty가 작아지고 0.5에서 불확실성이 가장높아진다.

#2. Cross Entropy

크로스엔트로피

크로스엔트로피는 원래 정보이론에서 유래했습니다. 매일날씨정보를 효율적으로 전달하려한다고 가정시, 8가지정보(맑음,비, 흐름 등등)이있다면 2^3=8이므로 이 선택사항을 3비트로 인코딩할 수 있습니다. 그러나 거의 대부분의 날이 맑음이라면 '맑음' 하나의 비트(0)로 인코딩하고 다른 일곱개의 선택사항을 (1로시작하는) 4비트로 표현하는것이 효율적입니다. 크로스 엔트로피는 선택 사항마다 전송한 평균 비트수를 측정합니다. 날씨에 대한 가정이 완벽하면 크로스엔트로피는 날씨 자체의 엔트로피와 동일할 것입니다. (즉, 예측 불가능한 고유 성질이빈다. 하지만 이런 가정이 틀렷다면(즉, 비가자주온다면) 크로스엔트로피는 쿨백-라이블러발산 이라 불리는 양만큼 커질것입니다.

크로스 엔트로피는 고유 엔트로피 와 쿨백-라이블러 발산의 합임을 알 수 있습니다.

출처: 핸즈온머신러닝(2판) 오렐리앙제롱 저/ 박해선 옮김 중

we have to consider between two distribution - the true distribution P when we use a coding shceme optimized for predicted distribution Q.

base on Q, the true distribution P is used as weights for calculating the expected Entropy.

incase Multiclassification -n then,

Pi is data

log Qi is model(predict)

convert the normalization,

then, cost function like

it goes below,

and also it can transpose below the calculation

# 3.Output Types depend on type of distribution

#4. Summary

* we use cross entropy between

1) the trainning data

2) the model's prediction (model distribution) as the cost function min negaive log-likelihood -> MLE

* learning (setimating) term

* Regularization term : weight decay term (penalty term)

* depending on output layer, cross entropy formula changes

from. Data Science Application Spring sememster Prof. simon woo SKKU sub-textbook, 2020

#5. Constrains and regularization

* goal: to reduce the test error, at the expense of increased training error

1) penalty terms and constraints: designed to encode specific kinds of priror knowledge

2) express a generic preference for simpler model class in order to promote generalization

3) Make undetermined problems to determined ones

4) ensemble method to combine multiple hypotheses that explain the training data

which is , so call large model that deep enough regularized and lower the generalizaition errors.

from. Data Science Application Spring sememster Prof. simon woo SKKU sub-textbook, 2020

참고링크 1: https://homl.info/xentropy

참고링크 2: christopher olah https://colah.github.io/posts/2015-09-Visual-Information/

Visual Information Theory -- colah's blog

Posted on October 14, 2015 I love the feeling of having a new way to think about the world. I especially love when there’s some vague idea that gets formalized into a concrete concept. Information theory is a prime example of this. Info

colah.github.io

참고링크 3: https://towardsdatascience.com/entropy-cross-entropy-kl-divergence-binary-cross-entropy-cb8f72e72e65

더 알아보기)

cross-entropy loss (or log loss) ??

this post will keep update as happened keep up with to it

if there need to any correction, feel free to announce me

Data + Marketing + INFP

[Data Science Application]Probability Review_information theory(Entropy concepts and cross entropy)

티스토리툴바