Features and project structure


Input options


Reading sequences

input.sequence.file={path}                              The aligned sequence file to use. 
                                                        Will consider {n} random sites, with optional replacement.
                                                        Please note that unknown character is not supported in this 
                                                        version of ARPIP.

The following formats are currently supported:

Fasta             
                                                        The fasta format. The argument extended, default to 'no' 
                                                        allows to enable the HUPO-PSI extension of the format. The 
                                                        argument strict_names, default to 'no', specifies that only
                                                        the first word in the fasta header is used as a sequence names,
                                                        the rest of the header being considered as comments.

Reading trees

input.tree.file={path}                                  The phylogenetic tree file to use.
input.tree.format={Newick|Nexus|NHX}                    The format of the input tree file. --- #  Alphabet options ---


alphabet={DNA|RNA|Protein},type={Standard|EchinodermMitochondrial|InvertebrateMitochondrial|VertebrateMitochondrial})}
                                                        The alphabet to use when reading sequences. This version of 
                                                        ARPIP does not consider an unknown character such as 'N' or '?'.     --- #  Options ---

opt.seed={real}                                         Sets the seed value of the random number generator.
 
opt.likelihood={0|1}                                    1: The user wants to know what is the value of joint 
                                                        likelihood of tree and MAS under PIP 0: Deactive this option.
                                                        By default it is 0.
                                    
opt.pip_param_estimate={0|1}                            1: The user does not know what are the evolutionary parameter 
                                                        (i.e. lambda and mu) and wants program to compute them. 
                                                        0: otherwise.

opt.tree.scale={real}                                   Set the scale value to scale the branch lengths.

opt.tree.with_ans_node_names={0|1}                      1: The printed tree (orginal or reconstructed) will have the
                                                        internal nodes name placed in the newick file. For the case 
                                                        of 0: the user have the check the internal node's name in the
                                                        relation file or use indelviewer software. By default it is 1. 

opt.tree.re_root={rand|node_name|long}                  In the case the the tree is not rooted user can choose which
                                                        lineage would be the outgroup lineage. 'rand': With this option
                                                        one of the node would be picked randomly. 'node_name': Replace
                                                        the node name with one of the taxa name. For example 'leaf_1'
                                                        in the tree provided as test cases. 'long': By default it is
                                                        activated meaning the longest branch would be considered as
                                                        the lineage to outgroup.

opt.unknown_as_gap={0|1}                                1: This software does not support ambiguity in characters.
                                                        We are kindly ask users to remove the unknown chars, o.w. the 
                                                        software change them to gap and in the next step will remove 
                                                        all the only-gap columns from MSA file. By default this flag is 0.

opt.combine_msa_asr={0|1}                               1: The user can see the result along with their corresponding MSA.
                                                        It is recommended that user to activate this flag when using 
                                                        'unknown_as_gap'. In the case of having column full of gap the length
                                                        of input MSA and ASR are not the same. By default this flag is 0.

opt.asr.prob_profile={none|raw|normalized|naive_posterior}   
                                                        none: The user explicitly asks to not comput the pobability porofile.
                                                        raw: The user can see the raw probability of each character in each
                                                        position. normalized: The user can see the normalized probability
                                                        of each character in each position. naive_posterior: The user can
                                                        see the normalized probability of each character in each position
                                                        with respect to the background probability. By default this flag is
                                                        'raw'.

Initial tree options


init.tree={user|auto}                                   Set the method for the initial tree reconstruction to use.
                                                        The user option allows you to use an existing file passed via
                                                        input.tree.file. This file may have been built using another 
                                                        method like neighbor joining or parsimony for instance. 
                                                        The random option picks a random tree, which is handy to test
                                                        convergence.  This may however slows down significantly the 
                                                        optimization process. Please notice that this option has a 
                                                        limited proficiency as this method is not developed to 
                                                        reconstruct the tree. It is recommended to use the user option
                                                        and provide the tree as input using other software such as Phyml.

init.tree.method={wpgma|upgma|nj|bionj}                 When tree reconstruction method is required, the user can specify which algorithm 
                                                        to use. 

If the init.tree=user, then refer to the option you find in “Reading trees”.


Evolutionary model options


For more information about the substitution models available on BPP library please check their documentation at Bio++

Substitution models

model={string}                                          A description of the substitution model to use, using the keyval syntax.

The following nucleotide models are currently available as a core model:

See the ‘test_dna_sub_model’ folder in the source code for example of the correct syntax.

JC69
K80([kappa={real>0}])
F84([kappa={real>0}, {theta={real[0,1]}, theta1={real[0,1]},theta2={real[0,1]}} | "equilibrium frequencies"])
HKY85([kappa={real>0}, {theta={real[0,1]}, theta1={real[0,1]}, theta2={real[0,1]}} | "equilibrium frequencies"])
T92([kappa={real>0}, theta={real[0,1]} | "equilibrium frequencies"])
TN93([kappa1={real>0}, kappa2={real>0}, theta={real[0,1]}, theta1={real[0,1]}, theta2={real[0,1]} | "equilibrium frequencies"])
GTR([a={real>0}, b={real>0}, c={real>0}, d={real>0}, e={real>0}, {theta={real[0,1]}, theta1={real[0,1]}, theta2={real[0,1]} | "equilibrium frequencies" ])
L95([{beta={real>0}, gamma={real>0}, delta={real>0}, theta={real[0,1]}, theta1={real[0,1]}}, theta2={real[0,1]} | "equilibrium frequencies"])
SSR([beta={real>0}, gamma={real>0}, delta={real>0}, theta={real[0,1]}])
RN95([thetaR={real[0,1]}, kappaP={real[0,1]}, gammaP={real[0,1]}, alpha={real>1}, sigma={real>1}, beta={real>1}, epsilon={real>1}])

"equilibrium frequencies" are {piA={real[0,1]},piC={real[0,1]},piG={real[0,1]},piT={real[0,1]}} and the summation should be one.
For example: {piA=0.26,piC=0.25,piG=0.24,piT=0.25} 

The following protein models are currently available as a core model:

JC69
DSO78
JTT92
WAG01
LG08

The following meta models are currently available:

PIP13(model={model description}, {lambda={real>0}, mu={real>0})

If you leave the ‘lambda’ and ‘mu’ empty then the program would estimate them using Brent’s method. Please note that this algorithm is designed to work the ‘PIP13’ model.

Rate across site distribution

rate_distribution={rate distribution description}       Specify the rate across sites distribution

Only Constant rate is currently available:

Constant                                                Uses a constant rate across sites

Output options


Output tree file

output.tree.file={path}                                       The phylogenetic tree file to write to.
output.tree.format={Newick|Nexus|NHX}                         The format of the output tree file.
output.trees.file={path}                                      The file that will contain multiple trees.
output.trees.format={Newick|Nexus|NHX}                        The format of the output tree file.

Output alignment file

output.msa.file={path}                                      Alignment used in the study.

Output inferred file

output.ancestral.file={path}                              Write ancestral seuqences inferred by algorithm.
output.node_rel.file={path}                               Write the relation of nodes. It is important to idendifying the internal nodes.
output.mlindelpoints.file={path}                          Write the inferred indel points.
output.pipparams.file={path}                              Write the estimated PIP parameters if the user set opt.likelihood=1
output.prob_profile.file={path}                           Write the probability profile of each character of ancestral seqeunce.