Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Two or more categorical variables (Read 3619 times)
Austin Meyer
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 8
Two or more categorical variables
Sep 20th, 2011 at 11:49am
 
Hello,

I have been using a single categorical variable to fit evolutionary rate versus various parameters.

I was wondering if there is a way to use, say, two categorical variables each with the same number of categories and link them to get a joint probability using some kind of bivariate distribution?

I'm essentially fitting a line and I would like to have something like three different lines with unique slopes and intercepts, rather than having three intercepts all with the same slope.

Also, is this scalable to multivariate distributions?

Thanks,

Austin
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Two or more categorical variables
Reply #1 - Sep 21st, 2011 at 5:10pm
 
Hi Austin,

You can define up to 32 categorical variables for a likelihood function. In your example it sounds like the variables are independent, so you can define them along the lines of the example in Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login for the section "Site-to-site rate heterogeneity" (especially see the twocat example starting on page 53).

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Austin Meyer
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 8
Re: Two or more categorical variables
Reply #2 - Oct 18th, 2011 at 1:59pm
 
What if the variables are dependent on each other?

Thanks,
Austin
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Two or more categorical variables
Reply #3 - Oct 18th, 2011 at 3:11pm
 
Hi Austin,

You can do this as well, basically by defining the joint distribution P(X,Y) as P(X|Y) * P(Y) (or P(Y|X) * P(X)), i.e. one variable is explicitly defined using a set of conditional distributions. Something like this should work (for discrete valued distributions):

Code:
category c = (2, {{0.7,0.3}}, MEAN,  , {{1,2}}, 1,2);
category d = (2, {{0.5,0.5}{0.2,0.8}}, MEAN, c , {{3,4}}, 3,4);
 



The joint probability of P(c,d) is

P(1,3) = 0.7*0.5 = 0.35;
P(1,4) = 0.7*0.5 = 0.35;
P(2,3) = 0.3*0.2 = 0.06;
P(2,4) = 0.3*0.8 = 0.24;

When you define d in the example above, the 2nd argument defines the matrix of conditional probabilities (1st row for the 1st value of c, 2nd row for the 2nd value of c ...) and using 'c' in place of the standard density argument tells HyPhy that d depends on c.

Sergei

HTH,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Austin Meyer
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 8
Re: Two or more categorical variables
Reply #4 - Oct 19th, 2011 at 10:36am
 
It seems HYPHY will not allow a categorical variable to depend on another categorical variable?

Am I doing something wrong?

Here's the output:

Code:
global wa_p1 = 1/2;
wa_p1:<1;


global wa_1 = 1;
global wa_2 = 1;

categFreqMatrix={{1,1}};
categRateMatrix={{wa_1,wa_2}};

category wa  = (2, categFreqMatrix , MEAN, ,categRateMatrix, 0, 1e25);


global wb_p1 = 1/2;
wb_p1:<1;


global wb_1 = 1;
global wb_2 = 1;

categFreqMatrix={{wb_p1,(1-wb_p1)}};
categRateMatrix={{wb_1,wb_2}};

category wb  = (2, categFreqMatrix , MEAN, wa,categRateMatrix, 0, 1e25);


Error:
1e-25*(_x_-0)Can't have a category variable depend on a category variable.

Function call stack
1 : Category variable: {wb,2,categFreqMatrix,MEAN,wa,categRateMatrix,0,1e25}
-------
2 : ExecuteCommands in string gdDefString using basepath /home/agm854/RSA_Stuffs/Using_Smaller_Set/nonzero_k_2cats_dep/catmat_pro_2/.
-------
3 : BuildCategory(Cat_Num,Cat_Name_2,Cat_Name_1) 

Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Two or more categorical variables
Reply #5 - Oct 19th, 2011 at 2:50pm
 
Hi Austin,

For "wb", HyPhy expects you to supply a 2x2 matrix for the frequencies, i.e. two conditional distributions (one for wa = wa_1 and another for wa = wa_2).

Sergei

Code:
global wa_p1 = 1/2;
wa_p1:<1;


global wa_1 = 1;
global wa_2 = 1;

categFreqMatrix={{0.5,0,5}};
categRateMatrix={{wa_1,wa_2}};

category wa  = (2, categFreqMatrix , MEAN, ,categRateMatrix, 0, 1e25);


global wb_p1 = 1/2;
wb_p1:<1;
global wb_p2 = 1/2;
wb_p2:<1;


global wb_1 = 1;
global wb_2 = 1;

categFreqMatrix={{wb_p1,(1-wb_p1)},
			     {wb_p2,(1-wb_p2)}};

categRateMatrix={{wb_1,wb_2}};

category wb  = (2, categFreqMatrix , MEAN, wa,categRateMatrix, 0, 1e25);

 

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged