Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
branch specific dN/dS><=1 detection,multiple genes (Read 7700 times)
abi
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 5
branch specific dN/dS><=1 detection,multiple genes
May 2nd, 2011 at 8:35am
 
Hi Segei,

I'm a new user of hyphy so I'll first try to explain the biological problem I'm facing:
I prepared >5000 alignments of orthologous ORFs (~50-5000 codons) with 10 mRNA sequences taken from 10 different mammals in each. Corresponding phylogenetic trees with variable branch lengths were also prepared. I would like estimate the percentage of sites evolved under neutral, positive (directional), and negative selection, in specific branches (or groups of branches), for each alignment.

Accordingly, my questions are:

1) Is Kosokovsky and Frost 2005 GA approach may fit the forgoing problem ( hyphy.org/gabranch/GABranchFiles.tgz ) ? I understand it aims in finding groups of branches with the same selection regime, but can it also estimate the percentage of positively selected sites ? from the paper it seems so. Alternatively, which other *.bf scripts are designed for this type of problem (and with the minimal need for user modifications) ?
2) Currently, I'm using a linux server with 8 processors and MPI (openMPI). The MPI processing runs with MrBayes on this server, for example. How the forgoing HyPhy scripts should be set to work with MPI ? after compilation I got HYPHYMP executable but not  HYPHYMPI (if this the MPI version then how this is compliled ? ).
3) what are the minimal requirement to perform this type of test on multiple genes (say 100, 1000, 10000) in a reasonable time  ?

Thanks in advanced,
Avi
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: branch specific dN/dS><=1 detection,multiple
Reply #1 - May 2nd, 2011 at 3:21pm
 
Hi Avi,

We just had a paper accepted in MBE which presents just the method you need (attached). It is implemented in the BranchSiteREL.bf standard analysis. The best way to run this analysis is to talk farm -- send a separate file to a processor using a shell script. Alternatively, you can write a simple HyPhy wrapper to use in an MPI environment (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login). To compile HYPHYMPI try

Code:
cd /path/to/hyphy/HYPHY
sh build.sh MPI
 



The run time for the analysis is strongly dependent on the number of sequences. It takes about 10-20 mins / gene on a 10-15 sequence alignment.

Sergei
Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (264 KB | )

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
abi
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 5
Re: branch specific dN/dS><=1 detection,multiple genes
Reply #2 - May 3rd, 2011 at 3:58am
 
Hello Sergei,

Many thanks for your kind help. I'll try the approach implemented in BranchSiteREL.bf, according to the paper. To deal with multiple genes, I'll call hyphy with a simple perl script.

About hyphy mpi compilation with "sh build.sh MPI": it gives the complain:
Code:
batchlan.h:374:18: error: mpi.h: No such file or directory 




which stops compilation.
The actual mpi.h file is located in: /usr/lib/openmpi/include

The total output is:

Code:
Linux
[: 98: ==: unexpected operator
[: 98: ==: unexpected operator
Checking for curl
Curl seems to be present
[: 213: ==: unexpected operator
[: 213: ==: unexpected operator
+-----------------------------------------------------------+
|Building a single-threaded HYPHYKernelMPI for MPI          |
+-----------------------------------------------------------+
COMPILER=g++, gcc
COMPILER_FLAGS= -w -c -fsigned-char -O3 -fpermissive -I/home/home/programs/hyphy/hyphy/HYPHY/Source -I/home/home/programs/hyphy/hyphy/HYPHY/Source/SQLite -D INTPTR_TYPE=long -D __UNIX__  -D _SLKP_LFENGINE_REWRITE_  -D __HYPHYMPI__ -D _SLKP_LFENGINE_REWRITE_
Building HYNetInterface.cpp
In file included from HYNetInterface.h:13,
                 from HYNetInterface.cpp:9:
batchlan.h:374:18: error: mpi.h: No such file or directory
Error during compilation
 



Would be grateful for your help,
Avi
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: branch specific dN/dS><=1 detection,multiple genes
Reply #3 - May 4th, 2011 at 11:24am
 
Dear Avi,

Try the fixes suggested in Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
Let me know if this works.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
abi
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 5
Re: branch specific dN/dS><=1 detection,multiple genes
Reply #4 - May 5th, 2011 at 8:21am
 
Thanks Sergei,

It works on the server (less success on my laptop, but lets leave him for now) !

As you explained, I changed lines ~55,~56,~198:
COMPILER="mpic++"; COMPILERC = "mpicc"

LINKER_FLAGS=$CURL_LINKER_LIBS" -lpthread -lm -ldl "


Best,
Avi
Back to top
 
 
IP Logged
 
abi
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 5
Re: branch specific dN/dS><=1 detection,multiple
Reply #5 - May 5th, 2011 at 8:32am
 
In fact, it complains it doesn't see Curl (this requires me to contact the administrator), but I assume I can live with it.

Thanks again,
Avi

Code:
Linux
Checking for curl
curl_check.cpp:1:23: error: curl/curl.h: No such file or directory
Curl seems to be absent (setting up compiler options skip CURL code)
+-----------------------------------------------------------+
|Building a single-threaded HYPHYKernelMPI for MPI	    |
+-----------------------------------------------------------+
COMPILER=mpic++, mpicc
COMPILER_FLAGS= -w -c -fsigned-char -O3 -fpermissive -I/home/xxx/hyphy/HYPHY/Source -I/home/xxx/hyphy/HYPHY/Source/SQLite -D INTPTR_TYPE=long -D __UNIX__  -D _SLKP_LFENGINE_REWRITE_  -D__HYPHY_NO_CURL__ -D __HYPHYMPI__ -D _SLKP_LFENGINE_REWRITE_
File baseobj.cpp is up to date
File batchlan2.cpp is up to date
File batchlan.cpp is up to date
File bayesgraph2.cpp is up to date
File bayesgraph.cpp is up to date
File bgm2.cpp is up to date
File bgm.cpp is up to date
File calcnode2.cpp is up to date
File calcnode.cpp is up to date
File category.cpp is up to date
File error.cpp is up to date
File fisher_exact.cpp is up to date
File HYNetInterface.cpp is up to date
File hyphyunixutils.cpp is up to date
File likefunc2.cpp is up to date
File likefunc.cpp is up to date
File list.cpp is up to date
File matrix.cpp is up to date
File Net.cpp is up to date
File nexus.cpp is up to date
File parser2.cpp is up to date
File parser.cpp is up to date
File polynoml.cpp is up to date
File regex.cpp is up to date
File scfg.cpp is up to date
File sequence.cpp is up to date
File site.cpp is up to date
File strings.cpp is up to date
File main-unix.cxx is up to date
SQLite File sqlite3.c is up to date
Linking HYPHYMPI
mpic++ -w -fsigned-char -D__HYPHY_NO_CURL__ -o HYPHYMPI obj_MPI/baseobj.cpp.o obj_MPI/batchlan2.cpp.o obj_MPI/batchlan.cpp.o obj_MPI/bayesgraph2.cpp.o obj_MPI/bayesgraph.cpp.o obj_MPI/bgm2.cpp.o obj_MPI/bgm.cpp.o obj_MPI/calcnode2.cpp.o obj_MPI/calcnode.cpp.o obj_MPI/category.cpp.o obj_MPI/error.cpp.o obj_MPI/fisher_exact.cpp.o obj_MPI/HYNetInterface.cpp.o obj_MPI/hyphyunixutils.cpp.o obj_MPI/likefunc2.cpp.o obj_MPI/likefunc.cpp.o obj_MPI/list.cpp.o obj_MPI/main-unix.cxx.o obj_MPI/matrix.cpp.o obj_MPI/Net.cpp.o obj_MPI/nexus.cpp.o obj_MPI/parser2.cpp.o obj_MPI/parser.cpp.o obj_MPI/polynoml.cpp.o obj_MPI/regex.cpp.o obj_MPI/scfg.cpp.o obj_MPI/sequence.cpp.o obj_MPI/site.cpp.o obj_MPI/sqlite3.c.o obj_MPI/strings.cpp.o -lpthread -lm -ldl
Finished
 

Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: branch specific dN/dS><=1 detection,multiple
Reply #6 - May 5th, 2011 at 5:35pm
 
Hi Avi,

Don't worry about curl -- it's an optional library, which you won't need for this analysis.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
abi
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 5
Re: branch specific dN/dS><=1 detection,multiple genes
Reply #7 - May 16th, 2011 at 7:38am
 
Dear Segei,

I executed multiple instances of a python script which communicates with BranchSiteREL.bf. I do not use the MPI version of hyphy because it seems  BranchSiteREL.bf is not designed to work with MPI (please correct me if I'm wrong...). Lets see if I can detect something special using this approach.

The paper referring to BranchSiteREL.bf describes episodic selection in branches that separate species linages, which is what I think I need in order to detect directional selection in a specific target branch.
In addition, is there a hyphy method allowing to detect selection against fixation of alleles in a population using the structure of assembly of reads representing mRNA segments from different individuals (in my case 454 transcriptome of a mammal) ?

Best,
Avi
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: branch specific dN/dS><=1 detection,multiple genes
Reply #8 - May 16th, 2011 at 10:07am
 
Hi Avi,

You are correct -- BranchSiteREL.bf is NOT designed to work with MPI. If you are interested for SITES under episodic selection, you can do that using a different version of the same model (paper in preparation).

BranchSiteREL.bf should scan ALL lineages in the tree and tell you which ones show evidence of episodic selection. There is currently no method in HyPhy (that I know of) to use population level data to study selection. Sorry.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
asaf
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 2
Re: branch specific dN/dS><=1 detection,multiple genes
Reply #9 - Jun 14th, 2011 at 2:52am
 
Hello Sergei

I tried BranchSiteRel.bf and in most tested alignments (>2000) the level of negative selection was >95% in all branches, with very few cases of negative selection > 5% in specific branches. Moreover, on the average, >15% of the entire coding region was excluded from the alignments (mainly alternatively spliced regions), so critical factors that influence gene evolution are in any case neglected. Therefore, it seems to me that tools for detection of sites under episodic selection (rather than showing just the proportion of sites) would be more informative. If this is feasible, it would be great! Any recommendations on Hyphy scripts that can do this job ?

Asaf
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: branch specific dN/dS><=1 detection,multiple genes
Reply #10 - Jun 15th, 2011 at 1:47pm
 
Hi Asaf,

Not sure what you mean by ">15% of the entire coding region was excluded from the alignment" -- I presume this had been done before the analysis was run?

How many sequences are in your alignment? If >10, then it may be worthwhile to look for episodic selection at site (but you won't know with precision along which branches).

To do this, follow the same steps as you would to run FEL, except instead of choosing "Two rate FEL", try "MEME". This test will examine each site for evidence of episodic selection. 

Sergei


asaf wrote on Jun 14th, 2011 at 2:52am:
Hello Sergei

I tried BranchSiteRel.bf and in most tested alignments (>2000) the level of negative selection was >95% in all branches, with very few cases of negative selection > 5% in specific branches. Moreover, on the average, >15% of the entire coding region was excluded from the alignments (mainly alternatively spliced regions), so critical factors that influence gene evolution are in any case neglected. Therefore, it seems to me that tools for detection of sites under episodic selection (rather than showing just the proportion of sites) would be more informative. If this is feasible, it would be great! Any recommendations on Hyphy scripts that can do this job ?

Asaf

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
asaf
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 2
Re: branch specific dN/dS><=1 detection,multiple
Reply #11 - Jun 19th, 2011 at 5:33am
 
Many thanks Sergei,

You are right: the exclusion of alignment regions was done before running hyphy. I use about 10 sequences in alignments, but can increase this number with some additional species if necessary.
Could you kindly specify how to reach the Mem option from the command line menu (is it with the QuickSelectionDetection.bf script ?)  Is there any documentation of this option ?

Generally: I would be interested in detecting differences in selection regimes in sites (or groups of sites), between a specific species that is expected to evolve differently in some genes compared to all other species in the tree. I am not "fixated" to any particular method or approach, so any relevant suggestions for hyphy scripts/methods can help.

Best wishes,
Asaf
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: branch specific dN/dS><=1 detection,multiple
Reply #12 - Jun 20th, 2011 at 10:57am
 
Hi Asaf,

Please take a look at section 2.6 of Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
MEME will appear in the 'Ancestor Counting Options' dialog. The rest of the settings are analogous to FEL (also described in the link above).

For 10 sequences it may be difficult to drill down to differences at individual sites; probably groups of sites are best. Can you describe your problem with an example? We have done something that I think is similar in Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

Sergei

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged