ÔØÈëÖС£¡£¡£ 'S bLog
 
ÔØÈëÖС£¡£¡£
 
ÔØÈëÖС£¡£¡£
ÔØÈëÖС£¡£¡£
ÔØÈëÖС£¡£¡£
ÔØÈëÖС£¡£¡£
ÔØÈëÖС£¡£¡£
 
ÌîдÄúµÄÓʼþµØÖ·£¬¶©ÔÄÎÒÃǵľ«²ÊÄÚÈÝ£º


 
¾ÛÀàËã·¨½Ì³Ì£¨4£©£º¸ß˹»ìºÏ¾ÛÀà
[ 2012/11/8 15:48:00 | By: ÃÎÏè¶ù ]
 

Model-Based ¾ÛÀà¼ò½é
¾ÛÀàÎÊÌâµÄÁíÒ»Àà·½·¨ÊÇ»ùÓÚÄ£Ð͵ķ½·¨£¬¼´Ê¹ÓÃÌØ¶¨µÄÄ£Ð;ÛÀ࣬²¢ÊÔͼÓÅ»¯Êµ¼ÊÊý¾ÝÓëÄ£Ð͵ÄÊÊÅä¶È¡£
ʵ¼ÊÉÏ£¬Ã¿¸ö·ÖÀà¶¼¿É±í´ïΪÊýѧÉϵIJÎÊý·Ö²¼£¬Èç¸ß˹·Ö²¼£¨Á¬Ðø£©»ò²´ËÉ·Ö²¼£¨ÀëÉ¢£©¡£Òò´ËÕû¸öÊý¾Ý¼¯Ôò¿É½¨Ä£³ÉÒ»¸öÕâЩ·Ö²¼µÄ»ìºÏmixture·Ö²¼£¬½¨Ä£Ìض¨·ÖÀàµÄµ¥¸ö·Ö²¼¾­³£±»³ÆÎª component·Ö²¼¡£

Ò»¸ö»ìºÏÄ£Ðͼ«ÓпÉÄÜÓÐÏÂÁÐÌØµã£º

  • component ·Ö²¼ÓС±¸ß·å¡±£¬ (µ¥¸ö·ÖÀàµÄÊý¾Ý½ôÃܽӽü£©;
  • mixtureÄ£Ðͺܺõء°¸²¸Ç¡±Êý¾Ý (ÓÉÓÚcomponent·Ö²¼ºÜºÃµØ±í´ïÁËÊý¾Ý·Ö²¼Ä£Ê½).

»ùÓÚÄ£Ð͵ľÛÀàÖ÷ÒªÓŵãÈçÏ£º

  • ¿ÉÒÔʹÓóÉÊìµÄͳ¼ÆÍƶϼ¼Êõ;
  • Áé»îÑ¡Ôñcomponent distribution;
  • ¿ÉÒԵõ½Ã¿¸ö·ÖÀàµÄ¹À¼ÆÃܶÈ;
  • ¿ÉÒÔÓÃʹÓÃÒ»¸ö ¡°Èí¡± ·ÖÀà

 

¸ß˹»ìºÏ

»ù±¾Ë¼Ï룺¼ÙÉèÕû¸öÊý¾Ý¼¯·þ´Ó¸ß˹»ìºÏ·Ö²¼£¬´ý¾ÛÀàµÄÊý¾Ýµã¿´³ÉÊÇ·Ö²¼µÄ²ÉÑùµã£¬Í¨¹ý²ÉÑùµãÀûÓÃÀàËÆ¼«´óËÆÈ»¹À¼ÆµÄ·½·¨¹À¼Æ¸ß˹·Ö²¼µÄ²ÎÊý¡£Çó³ö²ÎÊý¼´µÃ³öÁËÊý¾Ýµã¶Ô·ÖÀàµÄÁ¥Êôº¯Êý¡£

ÕâÀà·½·¨ÖУ¬Ê¹ÓÃ×î¹ã·ºµÄÊǸß˹»ìºÏÄ£ÐÍ£ºÎÒÃÇʵ¼ÊÉÏ¿ÉÒÔ°Ñ·ÖÀà¿´³ÉÒÔÖØÐÄΪÖÐÐĵĸß˹·Ö²¼£¬ÈçÏÂͼ£¬»ÒɫԲȦ±íʾ·Ö²¼µÄÖ÷Òª²¿·Ö£º

Ëã·¨¹¤×÷ÈçÏ£º

  • ÒÔ¸ÅÂÊËæ»úÑ¡ÔñÒ»¸öcomponent (the Gaussian) ·Ö²¼ ;
  • ´Ó·Ö²¼ÖвÉÑùÒ»¸öµã.

Let¡¯s suppose to have:

  • x1, x2,..., xN

We can obtain the likelihood of the sample:  .
What we really want to maximise is  (probability of a datum given the centres of the Gaussians).

is the base to write the likelihood function:

Now we should maximise the likelihood function by calculating  , but it would be too difficult. That¡¯s why we use a simplified algorithm called EM (Expectation-Maximization).

 

The EM Algorithm
The algorithm which is used in practice to find the mixture of Gaussians that can model the data set is called EM (Expectation-Maximization) (Dempster, Laird and Rubin, 1977). Let¡¯s see how it works with an example.

Suppose xk are the marks got by the students of a class, with these probabilities:

x1 = 30             

x2 = 18             

x3 = 0               

x4 = 23             

First case: we observe that the marks are so distributed among students:

x1 : a students
x2 : b students
x3 : c students
x4 : d students

We should maximise this function by calculating  . Let¡¯s instead calculate the logarithm of the function and maximise it:

Supposing a = 14, b = 6, c = 9 and d = 10 we can calculate that  .

Second case: we observe that marks are so distributed among students:

x1 + x2 : h students
x3 : c students
x4 : d students

We have so obtained a circularity which is divided into two steps:

  • expectation: 
  • maximization: 

This circularity can be solved in an iterative way.

Let¡¯s now see how the EM algorithm works for a mixture of Gaussians (parameters estimated at the pth iteration are marked by a superscript (p):

  1. Initialize parameters:

  2. E-step:

     
  3. M-step:

     

    where R is the number of records.

ÉÏÎĺóÃæÃ»Ì«¿´¶®£¬ÕâÀï´ÓcsdnµÄÒ»¸ö²©¿Íժ¼ÁËһЩ¸ß˹»ìºÏ¾ÛÀàËã·¨£¬½²µÄ»¹±È½ÏÇå³þ£¬

¼ÙÉèn¸ö¸ß˹·Ö²¼£¬´ÓÖвúÉúm¸öÑù±¾,ÿһ¸ö yj Ëù´ÓÊôµÄ·Ö²¼Óàzj ±íʾ£¬zj µÄȡֵ·¶Î§Îª1µ½n¡£¶ÔÓÚÈÎÒâµÄy£¬ÆäÀ´×ÔµÚi¸ö¸ß˹·Ö²¼µÄ¸ÅÂÊΪ

ÎÒÃǵÄÈÎÎñÊ**À¼ÆÎ´ÖªµÄ²ÎÊý£¬¼´Ã¿Ò»¸ö¸ß˹·Ö²¼µÄ¾ùÖµ¡¢·½²îºÍÏÈÑé¸ÅÂÊ¡£

Ê×ÏÈÎÒÃDZØÐëÉ趨³õʼֵ¡£

¶ÔÓÚÎ޼ලµÄ¾ÛÀ࣬ÎÒÃÇÊ×ÏȱØÐëΪÿһ¸öÑù±¾Ö¸¶¨Ò»¸öÀà±ð£¬ÕâÒ»²½¿ÉÒÔͨ¹ýÆäËûµÄ¾ÛÀà·½·¨ÊµÏÖ£¬ÈçK-means·½·¨£¬Çó³ö¸÷¸öÀà±ðµÄÖÐÐĺÍÿһ¸öÑù±¾µÄÀà±ð¡£È»ºóÇó³ö¸÷¸öÀà±ðÖÐÑù±¾µÄЭ·½²îÕ󣬿ÉÒÔÓÃÿ¸öÀà±ðÖÐÑù±¾µÄ¸öÊýÀ´±íʾ¸ÃÀà±ðµÄÈ¨ÖØ£¨ÏÈÑé¸ÅÂÊ£©¡£

E-step

ʹÓÃÉÏÒ»´ÎM²½Öеõ½µÄ²ÎÊý¦È£¬¼ÆËãδ֪µÄz£¨¼´Ã¿Ò»¸ö·ÖÀࣩ¶ÔÓÚ¹Û²âÊý¾ÝµÄÌõ¼þ·Ö²¼£¬¼´£¬ÔÚµ±Ç°²ÎÊý¦ÈnÏ£¬¶Ôÿһ¸öÑù±¾£¬¼ÆËã¸÷¸öÀà±ðµÄºóÑé¸ÅÂÊ¡£ ÏÂʽÖеÚÈý¸öµÈºÅºóÃæµÄ·Ö×ӵĵÚÒ»ÏΪÿһ¸öÑù±¾ÔÚÀà±ðiÖеķֲ¼¸ÅÂÊ£¬·Ö×ÓÖеĵڶþÏîΪÿһ¸öÀà±ðµÄÏÈÑé¸ÅÂÊ¡£
·ÖĸΪ¹éÒ»»¯£¬¼´¶Ôÿһ¸öÑù±¾£¬ÆäÔÚ¸÷¸öÀà±ðÖеķֲ¼¸ÅÂʵĺÍΪ1.


for ti = 1:Ncluster
    PyatCi = mvnpdf(featurespace,Mu(ti,:),Sigma(:,:,ti));
    E(:,ti) = PyatCi*Weight(ti);
end
Esum = sum(E,2);
for ti = 1:Ncluster
    E(:,ti) = E(:,ti)./Esum;
end

M-step

¾ÛÀàËã·¨½Ì³Ì£¨4£©£º¸ß˹»ìºÏ¾ÛÀà¾ÛÀàËã·¨½Ì³Ì£¨4£©£º¸ß˹»ìºÏ¾ÛÀà¾ÛÀàËã·¨½Ì³Ì£¨4£©£º¸ß˹»ìºÏ¾ÛÀà

ΪÁË×î´ó»¯ÁªºÏ·Ö²¼µÄ¶ÔÊýËÆÈ»º¯ÊýµÄÆÚÍû

ÉÏʽÖÐ×îºóÒ»ÏîµÄÁªºÏ·Ö²¼¿ÉÒÔд³ÉÌõ¼þ·Ö²¼ºÍ±ßÔµ·Ö²¼µÄ³Ë»ýµÄÐÎʽ£¬µÃµ½

ÉÏʽÐèÒªÂú×ãÏÈÑé¸ÅÂʹéÒ»»¯µÄÒªÇó£º

ΪÁËÇóµÃQ(¦È)µÄ¼«´óÖµµÄλÖã¬ÒýÈëÀ­¸ñÀÊÈÕËã×Ó £¬È»ºó´øÈë¸ÅÂÊÃܶȺ¯Êý£¬

ΪÁËÇóµÃÐ嵀 ¦Èt + 1,ÎÒÃÇÐèÒªÕÒ³öÉÏʽȡµÃ¼«ÖµÊ±¶ÔÓ¦µÄ¦È£¬¼´ÆäÒ»½×µ¼Êý¡£

ÏÂÃæÍÆµ¼Âú×ãÉÏʽµÄ¾ùÖµ£¬


for ti = 1:Ncluster
    muti = E(:,ti)'*featurespace;
    muti = muti/sum(E(:,ti));
    Mu(ti,:) = muti;
end

²úÉúеÄЭ·½²î¾ØÕó

еĸ÷¸ö·ÖÀàµÄÈ¨ÖØ

½øÐйéÒ»»¯´¦Àí

Inserting ¦Ë into our estimate:

½«ÐµIJÎÊý£¬¾ùÖµ¡¢·½²î¡¢·ÖÀàµÄÏÈÑé¸ÅÂÊ£¬´øÈëE²½£¬ÖØÐ¼ÆË㣬ֱµ½Ñù±¾¼¯ºÏ¶Ô¸÷¸ö·ÖÀàµÄËÆÈ»º¯Êý²»ÔÙÓÐÃ÷ÏԵı仯Ϊֹ¡£

Bibliography

from:
 
 
  • ±êÇ©£º¾ÛÀà 
  • ·¢±íÆÀÂÛ£º
    ÔØÈëÖС£¡£¡£

     
     
     

    ÃÎÏè¶ùÍøÕ¾ ÃηÉÏèµÄµØ·½ http://www.dreamflier.net
    ÖлªÈËÃñ¹²ºÍ¹úÐÅÏ¢²úÒµ²¿TCP/IPϵͳ ±¸°¸ÐòºÅ£ºÁÉICP±¸09000550ºÅ

    Powered by Oblog.