Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Output NielsenYang.bf (Read 5707 times)
Miguel
Junior Member
**
Offline


Hi Hyphy!

Posts: 53
CBMSO, CSIC (Spain)
Gender: male
Output NielsenYang.bf
Dec 11th, 2008 at 4:48am
 
Dear Sergei,

I am trying to interpretate the output of NielsenYang.bf (GY94 1x4, 8 categories, cutoff = 0.95), and I have some doubts about that, and I think for sure, they are very easy for you:
- Model Neutral: Rate [1] and Rate [2] mean the omega values, and their weights mean the proportion of sites for the corresponding omega. Right?
- Model Beta: Rate[1] to Rate [8] mean the omega for each category. Right?
- Model Beta & w: Rate[1] to Rate[9] mean the omega for each category. Then, in "Sites with dN/dS>=1", What means the number in parenthesis beside each site?, for example:
1 (0.518826)
2 (0.328302)
3 (0.31326)
4 (0.29262)
.    .
is it the omega for that site or the posterior probability?

Here I have an alignment that, in the neutral model the omega value is 2.1 but in the Model 8 (Bata & w), there are not sites in "Sites with dN/dS > 1", how is it possible?.

Thanks a lot,

Miguel

Back to top
 
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Output NielsenYang.bf
Reply #1 - Dec 11th, 2008 at 8:08am
 
Dear Miguel,

1). The weights are indeed the corresponding proportions in the omega distribution.
2). Rate[i] is the value for the i-th category of omega
3). For sites with dN/dS >=1 the numbers in parentheses are posterior probabilities for omega>1

I don't quite understand your last question: the neutral model should not permit omega>1.

Also, I generally wouldn't recommend using NielsenYang.bf for identifying sites under selection for a variety of reasons. FEL is probably the safest 'default' choice: see Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Miguel
Junior Member
**
Offline


Hi Hyphy!

Posts: 53
CBMSO, CSIC (Spain)
Gender: male
Re: Output NielsenYang.bf
Reply #2 - Dec 11th, 2008 at 8:36am
 
Dear Sergei,

Ok, so, in M8, in "Sites with dN/dS<=1", the numbers in parenthesys mean also the posterior probabilities for omega <= 1, right?  

About the last question I am sorry, I did a mistake, I told you neutral model when I wanted to say single rate model. So, the question is, is it possible to have an alignment with an omega > 2 in the single rate model and in the M8 model do not obtain sites with dN/dS > 1?, such us the case:
*** RUNNING SINGLE RATE MODEL ***
-4775.24771803793
dN/dS = 2.08162

*** RUNNING MODEL 8 (Beta & w) ***
-4792.3778985718
dN/dS = 1.13844 (sample variance = 0.161697)

Rate[1]=  0.99085194 (weight=0.1075459)
Rate[2]=  0.99849676 (weight=0.1075459)
Rate[3]=  0.99966269 (weight=0.1075459)
Rate[4]=  0.99993782 (weight=0.1075459)
Rate[5]=  0.99999213 (weight=0.1075459)
Rate[6]=  0.99999954 (weight=0.1075459)
Rate[7]=  1.00157413 (weight=0.1075459)
Rate[8]=  2.29678233 (weight=0.1075459)
Rate[9]=  1.00000000 (weight=0.1396329)

------------------------------------------------

Sites with dN/dS>1 (Posterior cutoff = 0.95)


------------------------------------------------

Sites with dN/dS<=1 (Posterior cutoff = 0.95)

1 (0.518826)
2 (0.328302)
3 (0.31326)
4 (0.29262)
5 (0.348639)
6 (0.69026)
7 (0.289077)
8 (0.310804)
9 (0.30969)
10 (0.852541)
.
.(the number in parenthesys never is >= 1)
.
Or are there positive selection sites in this alignment? If there are, what are they for a given (any value, e.g. 0.8) significance value?
Many thanks!!

Cheers,

Miguel

Back to top
 
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Output NielsenYang.bf
Reply #3 - Dec 11th, 2008 at 3:30pm
 
Hi Miguel,

Miguel wrote on Dec 11th, 2008 at 8:36am:
Rate[1]=  0.99085194 (weight=0.1075459)
Rate[2]=  0.99849676 (weight=0.1075459)
Rate[3]=  0.99966269 (weight=0.1075459)
Rate[4]=  0.99993782 (weight=0.1075459)
Rate[5]=  0.99999213 (weight=0.1075459)
Rate[6]=  0.99999954 (weight=0.1075459)
Rate[7]=  1.00157413 (weight=0.1075459)
Rate[8]=  2.29678233 (weight=0.1075459)
Rate[9]=  1.00000000 (weight=0.1396329)



This is a numerical issue - HyPhy is not discretizing the beta distribution correctly (rate[0-8] must be in 0-1).

Quote:
.(the number in parenthesys never is >= 1)


It will always be in [0-1] as it is a posterior probability!

Quote:
Or are there positive selection sites in this alignment? If there are, what are they for a given (any value, e.g. 0.8) significance value?



Just run your alignment through Datamonkey and it will all be revealed:)

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Miguel
Junior Member
**
Offline


Hi Hyphy!

Posts: 53
CBMSO, CSIC (Spain)
Gender: male
Re: Output NielsenYang.bf
Reply #4 - Dec 12th, 2008 at 2:20am
 
Dear Sergei,

Thanks a lot for your fast answer, I understand eveything.   Smiley
Let me ask you a last question about this. Is there any way for trying to converge these alignments in the M8 model?.

On the other side, with the same alignment that I used for NielsenYang.bf, I made a proof in datamonkey for obtaining the positive selection sites (PSS), at 0.1 significance level.
- With FEL, the PSS were very good (with very sense).
- With SLAC, the PSS were good, but the algorithm only found a few of them (should be much more PSS).
- With REL, as SLAC, at 0.5 significance level, the PSS were good but the algorithm only found very few of them, even changing the significance level.
Well, looking to these results I prefer to use FEL, do you think I can believe in it (looking to the problems with NielsenYang method)? Do you think (like me) that FEL is the best option that I can use to detect PSS in this case?.
Thanks a lot for help me with these problems for detecting PSS.


Cheers,

Miguel  
Back to top
« Last Edit: Dec 12th, 2008 at 7:30am by Miguel »  
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Output NielsenYang.bf
Reply #5 - Dec 12th, 2008 at 2:49pm
 
Dear Miguel,

SLAC has less power than FEL on smaller (e.g. <50 sequences) alignments, so your findings are in line with the expectations. REL is finicky - it can perform very badly for pathological datasets, but generally it detects the most PSS; can you post datamonkey.org links for the results page, so I can take a look and tell you what is happening.

FEL is generally the best 'default' method, unless your alignment is small (<15-20 sequences), where it will have low power, or very large (e.g. >200 sequences), where you will get the same power with SLAC, but will spend a lot less computer time on it:)

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Miguel
Junior Member
**
Offline


Hi Hyphy!

Posts: 53
CBMSO, CSIC (Spain)
Gender: male
Re: Output NielsenYang.bf
Reply #6 - Dec 13th, 2008 at 5:58am
 
Dear Sergei,

This alignment contains 30 sequences of 333 codons, and many codons are PSS.
The links with the results are:
- FEL (0.1 significance level):Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
- FEL (0.2 significance level): Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

- REL (50 significance level): Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
- REL (40 significance level): Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

- SLAC (0.1 significance level): Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
- SLAC (0.2 significance level): Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

First, SLAC have found a very few PSS, even at 0.2 significance level, so I think, I am going to elimate this option.
FEL detects many PSS and REL detects also several PSS (not so many as FEL). When I change the significance level (in FEL to 0.2 and in REL to 40) I can see that FEL find more (and quite correctly) not positive selection sites and PSS than REL. The sites predicted by REL were also quite correctly detected.

So, I have to choose between FEL and REL. Looking to the problems of convergence in M7 and M8 models of NielsenYang.bf, what method do you think could be better for this case?

Thans for your help!

Cheers,

Miguel
Back to top
 
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Output NielsenYang.bf
Reply #7 - Dec 13th, 2008 at 6:55am
 
Dear Miguel,

FEL is probably the way to go; if you look at the ROC figures in our MBE paper Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login you will see the results of power/false positive rate simulations we did comparing SLAC/FEL/REL.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Miguel
Junior Member
**
Offline


Hi Hyphy!

Posts: 53
CBMSO, CSIC (Spain)
Gender: male
Re: Output NielsenYang.bf
Reply #8 - Dec 13th, 2008 at 12:19pm
 
Dear Sergei,

Thank you so much for your great help!


Cheers,

Miguel
Back to top
 
WWW WWW  
IP Logged