HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> branch specific dN/dS><=1 detection,multiple genes
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1304350512

Message started by abi on May 2nd, 2011 at 8:35am

Title: branch specific dN/dS><=1 detection,multiple genes
Post by abi on May 2nd, 2011 at 8:35am
Hi Segei,

I'm a new user of hyphy so I'll first try to explain the biological problem I'm facing:
I prepared >5000 alignments of orthologous ORFs (~50-5000 codons) with 10 mRNA sequences taken from 10 different mammals in each. Corresponding phylogenetic trees with variable branch lengths were also prepared. I would like estimate the percentage of sites evolved under neutral, positive (directional), and negative selection, in specific branches (or groups of branches), for each alignment.

Accordingly, my questions are:

1) Is Kosokovsky and Frost 2005 GA approach may fit the forgoing problem ( hyphy.org/gabranch/GABranchFiles.tgz ) ? I understand it aims in finding groups of branches with the same selection regime, but can it also estimate the percentage of positively selected sites ? from the paper it seems so. Alternatively, which other *.bf scripts are designed for this type of problem (and with the minimal need for user modifications) ?
2) Currently, I'm using a linux server with 8 processors and MPI (openMPI). The MPI processing runs with MrBayes on this server, for example. How the forgoing HyPhy scripts should be set to work with MPI ? after compilation I got HYPHYMP executable but not  HYPHYMPI (if this the MPI version then how this is compliled ? ).
3) what are the minimal requirement to perform this type of test on multiple genes (say 100, 1000, 10000) in a reasonable time  ?

Thanks in advanced,
Avi

Title: Re: branch specific dN/dS><=1 detection,multiple
Post by Sergei on May 2nd, 2011 at 3:21pm
Hi Avi,

We just had a paper accepted in MBE which presents just the method you need (attached). It is implemented in the BranchSiteREL.bf standard analysis. The best way to run this analysis is to talk farm -- send a separate file to a processor using a shell script. Alternatively, you can write a simple HyPhy wrapper to use in an MPI environment (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login). To compile HYPHYMPI try

[code]
cd /path/to/hyphy/HYPHY
sh build.sh MPI
[/code]

The run time for the analysis is strongly dependent on the number of sequences. It takes about 10-20 mins / gene on a 10-15 sequence alignment.

Sergei
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=ManuscriptRev2.pdf (264 KB | )

Title: Re: branch specific dN/dS><=1 detection,multiple genes
Post by abi on May 3rd, 2011 at 3:58am
Hello Sergei,

Many thanks for your kind help. I'll try the approach implemented in BranchSiteREL.bf, according to the paper. To deal with multiple genes, I'll call hyphy with a simple perl script.

About hyphy mpi compilation with "sh build.sh MPI": it gives the complain:

Code (]
batchlan.h:374:18: error: mpi.h: No such file or directory[/code):

which stops compilation.
The actual mpi.h file is located in: /usr/lib/openmpi/include

The total output is:

[code]Linux
[: 98: ==: unexpected operator
[: 98: ==: unexpected operator
Checking for curl
Curl seems to be present
[: 213: ==: unexpected operator
[: 213: ==: unexpected operator
+-----------------------------------------------------------+
|Building a single-threaded HYPHYKernelMPI for MPI          |
+-----------------------------------------------------------+
COMPILER=g++, gcc
COMPILER_FLAGS= -w -c -fsigned-char -O3 -fpermissive -I/home/home/programs/hyphy/hyphy/HYPHY/Source -I/home/home/programs/hyphy/hyphy/HYPHY/Source/SQLite -D INTPTR_TYPE=long -D __UNIX__  -D _SLKP_LFENGINE_REWRITE_  -D __HYPHYMPI__ -D _SLKP_LFENGINE_REWRITE_
Building HYNetInterface.cpp
In file included from HYNetInterface.h:13,
                from HYNetInterface.cpp:9:
batchlan.h:374:18: error: mpi.h: No such file or directory
Error during compilation


Would be grateful for your help,
Avi

Title: Re: branch specific dN/dS><=1 detection,multiple genes
Post by Sergei on May 4th, 2011 at 11:24am
Dear Avi,

Try the fixes suggested in Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
Let me know if this works.

Sergei

Title: Re: branch specific dN/dS><=1 detection,multiple genes
Post by abi on May 5th, 2011 at 8:21am
Thanks Sergei,

It works on the server (less success on my laptop, but lets leave him for now) !

As you explained, I changed lines ~55,~56,~198:
COMPILER="mpic++"; COMPILERC = "mpicc"
LINKER_FLAGS=$CURL_LINKER_LIBS" -lpthread -lm -ldl "

Best,
Avi

Title: Re: branch specific dN/dS><=1 detection,multiple
Post by abi on May 5th, 2011 at 8:32am
In fact, it complains it doesn't see Curl (this requires me to contact the administrator), but I assume I can live with it.

Thanks again,
Avi

[code]
Linux
Checking for curl
curl_check.cpp:1:23: error: curl/curl.h: No such file or directory
Curl seems to be absent (setting up compiler options skip CURL code)
+-----------------------------------------------------------+
|Building a single-threaded HYPHYKernelMPI for MPI          |
+-----------------------------------------------------------+
COMPILER=mpic++, mpicc
COMPILER_FLAGS= -w -c -fsigned-char -O3 -fpermissive -I/home/xxx/hyphy/HYPHY/Source -I/home/xxx/hyphy/HYPHY/Source/SQLite -D INTPTR_TYPE=long -D __UNIX__  -D _SLKP_LFENGINE_REWRITE_  -D__HYPHY_NO_CURL__ -D __HYPHYMPI__ -D _SLKP_LFENGINE_REWRITE_
File baseobj.cpp is up to date
File batchlan2.cpp is up to date
File batchlan.cpp is up to date
File bayesgraph2.cpp is up to date
File bayesgraph.cpp is up to date
File bgm2.cpp is up to date
File bgm.cpp is up to date
File calcnode2.cpp is up to date
File calcnode.cpp is up to date
File category.cpp is up to date
File error.cpp is up to date
File fisher_exact.cpp is up to date
File HYNetInterface.cpp is up to date
File hyphyunixutils.cpp is up to date
File likefunc2.cpp is up to date
File likefunc.cpp is up to date
File list.cpp is up to date
File matrix.cpp is up to date
File Net.cpp is up to date
File nexus.cpp is up to date
File parser2.cpp is up to date
File parser.cpp is up to date
File polynoml.cpp is up to date
File regex.cpp is up to date
File scfg.cpp is up to date
File sequence.cpp is up to date
File site.cpp is up to date
File strings.cpp is up to date
File main-unix.cxx is up to date
SQLite File sqlite3.c is up to date
Linking HYPHYMPI
mpic++ -w -fsigned-char -D__HYPHY_NO_CURL__ -o HYPHYMPI obj_MPI/baseobj.cpp.o obj_MPI/batchlan2.cpp.o obj_MPI/batchlan.cpp.o obj_MPI/bayesgraph2.cpp.o obj_MPI/bayesgraph.cpp.o obj_MPI/bgm2.cpp.o obj_MPI/bgm.cpp.o obj_MPI/calcnode2.cpp.o obj_MPI/calcnode.cpp.o obj_MPI/category.cpp.o obj_MPI/error.cpp.o obj_MPI/fisher_exact.cpp.o obj_MPI/HYNetInterface.cpp.o obj_MPI/hyphyunixutils.cpp.o obj_MPI/likefunc2.cpp.o obj_MPI/likefunc.cpp.o obj_MPI/list.cpp.o obj_MPI/main-unix.cxx.o obj_MPI/matrix.cpp.o obj_MPI/Net.cpp.o obj_MPI/nexus.cpp.o obj_MPI/parser2.cpp.o obj_MPI/parser.cpp.o obj_MPI/polynoml.cpp.o obj_MPI/regex.cpp.o obj_MPI/scfg.cpp.o obj_MPI/sequence.cpp.o obj_MPI/site.cpp.o obj_MPI/sqlite3.c.o obj_MPI/strings.cpp.o -lpthread -lm -ldl
Finished
[/code]

Title: Re: branch specific dN/dS><=1 detection,multiple
Post by Sergei on May 5th, 2011 at 5:35pm
Hi Avi,

Don't worry about curl -- it's an optional library, which you won't need for this analysis.

Sergei

Title: Re: branch specific dN/dS><=1 detection,multiple genes
Post by abi on May 16th, 2011 at 7:38am
Dear Segei,

I executed multiple instances of a python script which communicates with BranchSiteREL.bf. I do not use the MPI version of hyphy because it seems  BranchSiteREL.bf is not designed to work with MPI (please correct me if I'm wrong...). Lets see if I can detect something special using this approach.

The paper referring to BranchSiteREL.bf describes episodic selection in branches that separate species linages, which is what I think I need in order to detect directional selection in a specific target branch.
In addition, is there a hyphy method allowing to detect selection against fixation of alleles in a population using the structure of assembly of reads representing mRNA segments from different individuals (in my case 454 transcriptome of a mammal) ?

Best,
Avi

Title: Re: branch specific dN/dS><=1 detection,multiple genes
Post by Sergei on May 16th, 2011 at 10:07am
Hi Avi,

You are correct -- BranchSiteREL.bf is NOT designed to work with MPI. If you are interested for SITES under episodic selection, you can do that using a different version of the same model (paper in preparation).

BranchSiteREL.bf should scan ALL lineages in the tree and tell you which ones show evidence of episodic selection. There is currently no method in HyPhy (that I know of) to use population level data to study selection. Sorry.

Sergei

Title: Re: branch specific dN/dS><=1 detection,multiple genes
Post by asaf on Jun 14th, 2011 at 2:52am
Hello Sergei

I tried BranchSiteRel.bf and in most tested alignments (>2000) the level of negative selection was >95% in all branches, with very few cases of negative selection > 5% in specific branches. Moreover, on the average, >15% of the entire coding region was excluded from the alignments (mainly alternatively spliced regions), so critical factors that influence gene evolution are in any case neglected. Therefore, it seems to me that tools for detection of sites under episodic selection (rather than showing just the proportion of sites) would be more informative. If this is feasible, it would be great! Any recommendations on Hyphy scripts that can do this job ?

Asaf

Title: Re: branch specific dN/dS><=1 detection,multiple genes
Post by Sergei on Jun 15th, 2011 at 1:47pm
Hi Asaf,

Not sure what you mean by ">15% of the entire coding region was excluded from the alignment" -- I presume this had been done before the analysis was run?

How many sequences are in your alignment? If >10, then it may be worthwhile to look for episodic selection at site (but you won't know with precision along which branches).

To do this, follow the same steps as you would to run FEL, except instead of choosing "Two rate FEL", try "MEME". This test will examine each site for evidence of episodic selection.  

Sergei



asaf wrote on Jun 14th, 2011 at 2:52am:
Hello Sergei

I tried BranchSiteRel.bf and in most tested alignments (>2000) the level of negative selection was >95% in all branches, with very few cases of negative selection > 5% in specific branches. Moreover, on the average, >15% of the entire coding region was excluded from the alignments (mainly alternatively spliced regions), so critical factors that influence gene evolution are in any case neglected. Therefore, it seems to me that tools for detection of sites under episodic selection (rather than showing just the proportion of sites) would be more informative. If this is feasible, it would be great! Any recommendations on Hyphy scripts that can do this job ?

Asaf


Title: Re: branch specific dN/dS><=1 detection,multiple
Post by asaf on Jun 19th, 2011 at 5:33am
Many thanks Sergei,

You are right: the exclusion of alignment regions was done before running hyphy. I use about 10 sequences in alignments, but can increase this number with some additional species if necessary.
Could you kindly specify how to reach the Mem option from the command line menu (is it with the QuickSelectionDetection.bf script ?)  Is there any documentation of this option ?

Generally: I would be interested in detecting differences in selection regimes in sites (or groups of sites), between a specific species that is expected to evolve differently in some genes compared to all other species in the tree. I am not "fixated" to any particular method or approach, so any relevant suggestions for hyphy scripts/methods can help.

Best wishes,
Asaf

Title: Re: branch specific dN/dS><=1 detection,multiple
Post by Sergei on Jun 20th, 2011 at 10:57am
Hi Asaf,

Please take a look at section 2.6 of Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
MEME will appear in the 'Ancestor Counting Options' dialog. The rest of the settings are analogous to FEL (also described in the link above).

For 10 sequences it may be difficult to drill down to differences at individual sites; probably groups of sites are best. Can you describe your problem with an example? We have done something that I think is similar in Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

Sergei


HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.