Configuration file

While TreeKo is able to run on default values, it is advisable to use a configuration file in order to adapt its use to the data. The configuration file can be used to change the rooting method, printing information or to limit the number of pruned trees allowed during the use of TreeKo. Lines starting with # will be regarded as comments. There are two kind of values in the configuration file. The True/False values consist in one tag which will enable/disable the option. For instance, if we want to print the pruned trees generated for the trees, having one line saying “print_pruned_trees” will enable the option. When further information is needed, this should be put in a tab delimited line. For instance, the root method should be defined as: “root_method<tab>m”. Any information omitted in the file will be supplied by the default values. An example configuration file can be found here.


Options

Tree information:

Command: “orto_mode <tab> s/r/m”

Options:
s: Orthology and paralogy nodes will be predicted using the species overlap algorithm
r: Reconciliations is used in order to predict duplications nodes. A species tree needs to be provided (see below)
m: Manual annotation of duplication nodes is provided by the user in the New Hampshire eXtended format (NHX). Duplications should be marked at internal nodes adding [&&NHX:evoltype=D] next to the duplicated node. Multifurcated leaves should be marked as MS when no duplication is expected or MD when there are duplicated nodes. An example tree can be found below.

Program: Tree comparison, Phylome support

Description: Duplication nodes can be predicted in several different ways. The default method uses the species overlap algorithm, which checks whether there are duplicated species at both sides of a node. The reconciliation method compares the tree to a species tree and all inconsistencies are treated as duplications or loss events (note that in this mode the speciation distance will always be 0). Additionally the user is able to define duplication nodes using the NHX format. The [&&NHX:evoltype=D] tag needs to be placed at the nodes that should be considered duplications. An example is provided below:

(((((C:100,(A:100,B:100)1:100[&&NHX:evoltype=D])1:100,D:100)1:100,E:100)1:100,F:100)1:100;

In this case A and B will be considered the result of a duplication event. Some considerations need to be taken into account when using these annotations. If after the pruned tree construction treeKO detects there are pruned trees that contain duplicated species the program will end and the user will be asked to provide further annotations or use an alternative method.


Command: “species_tree <tab> path_species_tree_file”

Program: Tree comparison, Phylome support

Description: This command is only useful when tree reconciliation is used as the orthology prediction method. The species tree should contain all the species that are present in the trees that are to be compared. Species names in both trees and the species tree should be identical. The species tree should be rooted, else the selected rooting method will be used to root it.


Command: “species_limit <tab> value”

Program: Tree comparison, Phylome support, Reconciliation distance

Description:

If the two trees contain species information this should be found at the beginning of the name tag (i.e. human_XXXX or hsaXXXX). By default the first three letters of all leaf names are taken as the species identification (so for the first “hum” would be the species name). This may lead to problems if your species name is longer, shorter or separated by a given delimiter. The species definition tag allows you to set the length of the species name or to choose a delimiter. Acceptable lengths for the species name go from 1 to 7, and accepted delimiters are: “_”, “#”, “-”, ”:”, ”.”. If your trees have a different format, you can use ETE to change it.

Warning:

All leaves should have the same format.


Options regarding tree rooting:

Command: “root_method <tab> n/m/s/f/fi/d”

Options:
n: none (assumes the provided trees are already rooted)
m: midpoint rooting (default)
s: root at a user defined species or protein
f: root at the farthest oldest species
fi: root at the farthest oldest node
d: root at a position where duplication nodes are minimized

Program: Tree comparison, Phylome support, Reconciliation distance

Description: TreeKo needs the trees to be rooted in order to predict duplication and speciation nodes. If the trees provided by the user are unrooted then treeKo can root them using one of the options above. The rooting of a tree will have a direct effect on the prediction of duplications and in consequence on the pruned tree construction, therefore it should be handled carefully. Midpoint rooting places the root in the middle of the two most distant species and therefore does not need any additional information. If information about the species tree is known, a list of species can be provided that will reflect the way in which they diverged (look below at the root_data information). The user can also provide a species by which he wishes to root the tree. This species should appear in both trees, else the rooting will fail.

Warnings:A different root method can be used to for each tree. If the second file contains a list of trees, the same rooting method specified as the method of tree 2 will be applied to them all. Additional information like root_species or root_data will not be taken into account unless the proper root method has been selected.


Command: “root_species <tab> species”

Program: Tree comparison, Phylome support, Reconciliation distance

Description: In order to root the trees in a user defined leaf you will need to choose the “s” option in the root_method command and then provide the name of the species you wish to use as root in the root_species command. The species provided should be present in both trees. If a species name is provided and there are multiple proteins belonging to this species one will be taken randomly.


Command: “root_data <tab> species_name : integer_value”

Program: Tree comparison, Phylome support, Reconciliation distance

Description: When using the farthest species root method, a list of species needs to be provided with a number identifying the order in which the species diverged over time. Species names and values should be separated with “:”. The highest values should represent the outgroups going down till reaching the species of interest.

Warnings: All the species in the tree should be included in the list. Several species can be represented by the same number signifying a cluster that diverged from the main evolutionary path at the same time.


Options regarding treeKO thresholds:

Command: “limit <tab> integer_value”

Program: Tree comparison, Phylome support

Description: One of the drawbacks of treeKo is that it can generate a large number of pruned trees when a tree contains numerous duplications. Specially in large-scale situations trees with such a large number of pruned trees might want to be omitted and dealt with separately. This threshold allows treeKo to abort the comparison when the number of pruned trees overpasses a given value. By default no limit is used.


Command: “overlap”

Program: Tree comparison

Description: TreeKo is able to compare two trees that have no species/proteins in common, it will just assign them the maxim distance of 1. To avoid the comparison altogether, the overlap command can be used to detect from the first moment whether an overlap in species/proteins exists between the two trees. By default the overlap option is not used.


Command: “support <tab> float_value”

Program: Tree comparison

Description: TreeKo is able to collapse nodes with support below a certain threshold. Multifurcated nodes containing potential duplications, will be resolved by assuming the minimum number of duplications.

Warning: This method is not yet compatible with a reconciliation orthology prediction. The option can be defined for each tree independently.


Command: “spec_filter <tab> species_name”

Program: Tree comparison

Description: Only tree partitions that contain the given species will be used in the final distance computation. The option can be defined for each tree independently.


Command: “prot_filter <tab> protein_name”

Program: Tree comparison

Description: Only tree partitions that contain the given protein will be used in the final distance computation. The option can be defined for each tree independently.


Printing options:

Command: “print_all”

Program: Tree comparison, Reconciliation distance

Description: This command will print all the information provided by treeKo: configuration information, both distance measures and the list of pruned trees for each input tree. This is the default option


Command: “print_pruned_trees”

Program: Tree comparison

Description: A list of predicted pruned trees for each one of the provided trees will be printed.


Command: “print_speciation_distance”

Program: Tree comparison

Description: The speciation distance will be printed.


Command: “print_strict_distance”

Program: Tree comparison

Description: The strict distance will be printed.


Command: “print_duplications_distance”

Program: Reconciliation_distance

Description: The duplication distance will be printed.


Command: “print_all_events_distance”

Program: Reconciliation_distance

Description: The distance computing the number of duplications and losses will be printed


Command: “print_configuration”

Program: Tree comparison, Reconciliation distance

Description: The configuration used to run treeKo will be printed.


Command: “verbose”

Program: Tree comparison, Phylome support

Description: It adds short descriptions to the output. By default this option is enabled.


CRG logoBSC logoIRB logo