jenti


mario falchi1and christian fuchsberger2

1Twin Research & Genetic Epidemiology Unit, Kings College London, London
2Institute of Genetic Medicine, European Academy, Bolzano, Italy


Description Documentation Tutorial Download Forum m.falchi@imperial.ac.uk

Jenti Tutorial

Before taking this tutorial Jenti should be installed in your system. Reading the documentation section is also essential. This tutorial will use the genealogy sample.ped, which isn’t representative of the genealogy sizes that will be encountered in a real problem but can be helpful to briefly illustrate the program functionality. Additional examples will require the large genealogy large.ped.

Start the program

To start Jenti double click run.bat.

Load the genealogical data sample.ped from the file menu or pressing the button in the main window.

From the main window is it possible to change the individual attributes: the SEL flag indicates that the individual will be used in the clustering process, and GEN (for mapping purposes) indicates if the individuals have been genotyped. This latter attribute is taken into account only in the pedigree reconstruction step if the user requests to include in the pedigree genotyped parents and/or offspring to increase the inheritance information - for example for mapping purposes. Under this context, any selected (SEL) individual will be also considered as genotyped but his/her phenotypic data should be used only in a single sub-pedigree to avoid duplicated observations.

 

Identify a sample of least unrelated individuals.

Set the required parameters in the “Cliques” panel. For instance we can ask Jenti to identify the largest sample whose kinship is less than the average kinship as observed in the sample (0.022). We want to extract a sample including at least 4 individuals.

Run the search pressing . In a few seconds a window shows the best result:

 

The best result has been found with seed 1234, which was the starting seed for the permutation procedure. Therefore the best cluster has been found in the first trial.

The kinship and inbreeding statistics are reported for each group.

The identified cluster can be saved in a text file

Partitioning the genealogy

Set the required parameters in the “Cliques” panel. For instance we can ask Jenti to identify the best partition in sub-samples where the kinship among individuals is at least 0.0625 (as first cousins in an outbred population). We will ask the list of the largest sub-groups having not less than 4 individuals each.

Run the search pressing . In a few seconds a window shows the best result:

The 6 cliques can be explored in the main window.

The aim of the program is maximizing the overall number of pair-wise relationships given the parameters (which are 99 in the current example) keeping at the same time the highest degree of relationship between the individuals belonging to the same cluster.

If the aim of the analysis is extracting the largest number of individuals grouped in clusters of similar size, the user set an upper limit for the number of individuals that can be included in the same cluster.

For instance to obtain clusters containing 4 to 8 individuals each:

Using this setting the program extracts more individuals (38 instead of 35). Indeed in the previous analysis the largest sub-group comprised 10 individuals. The new settings discard sub-groups having size greater than 8, therefore 2 individuals are now available to generate new clusters with the remaining sample. This gives rise to a new group having size > 4. It should be noted that even if the number of individuals is increased, the overall number of pair-wise relationships is decreased (92 instead of 99). The decision on how the genealogy should be partitioned depends on the aim of the analyses. The tool allows exploring different configurations, and through the Residual Management functions is also possible to mix sub-groups of individuals selected wit different criteria.

The identified cluster can be saved in a text file

Building the pedigrees

We will build the pedigrees connecting the individuals belonging to each clique. Go to the “Pedigrees” sub-panel and press the  button.

The left side of the panel shows the cliques list which can be individually visualized in the right side of the panel. In red are reported the individuals extracted in the clique, in white and grey the individuals used for their connection. The grey lines highlight individuals selected for the clustering procedure that can belong to a different sub-group. However, given the interconnections among the genealogy member they can appear in different sub-groups to allow the pedigree reconstruction. Any statistic evaluated with the generated pedigrees should consider complete information only for the selected (red) individuals.

If the sub-pedigrees will be used for mapping purposes the phenotypic values should be attributed only to the selected (red) individuals in order to avoid bias in the statistics determined by the presence of replicated samples. Under this context, the whole genotypic information can often be used (including the replicated samples genotypes) to clarify the inheritance pattern within the family. For this purpose when the pedigrees are saved in a text file  the last column can have the following value:

0 – no information have been provided for this individual

1 – only the genotypic data can be used

2 – both the genotypic and phenotypic data can be used

3,4 – The genotypic data can be used. The individual has been initially choosed for the clustering procedure but wasn't included in any sub-group using the given parameters (-> residuals) - and was included in the family by the pedigree reconstruction step. The user can decide whether use also the phenotypic information in one of the sub-groups  where he/she appears.

To clarify the inheritance pattern within the family we can also include in each pedigree any offspring or parent with genotypic information. Include parents and offspring and rebuild the sub-pedigrees:


Each pedigree can be visualized in 2D and 2.5D.


Using the Residuals

Any individual that wasn't included in the sub-groups for the chosen parameters (or that have been deleted from a sub-groups) appears in the Residual list.

We can cluster these individuals using another set of parameters and include the new group in the main list. Set the required parameters in the “Residuals” sub-panel. For instance identify new clusters where the kinship among individuals is at least 0.0125 (we used 0.0625 for the main selection).

A new group is identified, and can be selected and added to the main list:

or removed and put again in the residual list:

When a new group is added to the main list it appears in the “Cliques” panel. The “Pedigrees” panel identifies the new clique and proposes to build its sub-pedigrees:


Explore the different options using this simple genealogy before analysing a huge dataset!

 

Additional examples – a larger genealogy

The genealogy large.ped connects a study sample of 642 individuals in a complex 2,292-person pedigree with multiple inbreeding loops.

Load the genealogical data large.ped from the file menu or pressing the button in the main window.

 

When genealogical data have been collected for a study sample some of the individuals (especially from the oldest generations) might be uninformative for genetic purposes if they don’t appear in any line of descent connecting the subjects under study. Jenti allows identifying the sub-group of individuals providing useful information for genetic purposes.

 

To remove the unnecessary individuals it is possible to either load a pedigree using the menu option “load ConPedigree” (automatically remove the uninformative subjects) or save a copy of the “cleaned” genealogical data after a pedigree has been loaded (and some changes have been eventually carried out) using the “ExportConPedigree” button.

 

 

Select “ExportConPedigree” and save the new dataset (it will be saved in tab-delimited format). Open the new pedigree: the pedigree size is now 2,240 – 52 individuals from the original pedigree were actually uninformative for genetic purposes.

 

Try to select a subset of less-related individuals – for instance a group of individuals (>10) whose mutual kinship is up-to the median kinship observed for this sample, which is 0.032752.

Set the parameters in the “Cliques” panel.

To avoid a long execution time the “Random seed” field can be set to 1,237 with 0 permutations (since this is the right setting for this dataset). Indeed, particularly when using large pedigrees, once a suitable solution has been identified for the given dataset and settings, the “Random seed” value can be stored to obtain the same solution in a next time and the permutations can be set to 0.

The selected 15 individuals can be connected in a single pedigree. Try connecting these individuals using the “Include all common ancestors” option. This option will preserve all the genealogical connections among individuals. The resulting sub-pedigree comprises 1451 individuals in 17 generations.

Let’s identify the best sub-sample where the kinship among individuals is at least 0.03125 (as first cousins once removed in an outbred population). We will ask the extracted sub-group having not less than 10 individuals each.

Again, to avoid a long execution time the “Random seed” field can be set to 1,243 with 0 permutations. In this case the execution time would be long, since using this genealogy the number of possible clusters of individuals separated by kinship >= 0.03125 is extremely large.

Try connecting these individuals using the “Automatic pedigree reconstruction” option. This option will try to preserve most of the genealogical connections among individuals while keeping the obtained sub-pedigree as small as possible.

The pedigree size is 332 spreading 7 generations. Comparing the inbreeding and kinship observed using the whole dataset (Clique Inbr and Clique Kin) with the inbreeding and kinship observed in the extracted sub-pedigree (Ped Inbr and Ped Kin), we can note that some of the connections among individuals have not been included. Indeed, for instance, the mean kinship observed in the sub-pedigree is one third of what observed using the whole genealogy.

The pedigree structure can be interactively explored through PedVizApi (which for instance allows to highlight and follow the path between pairs).

Depending on the use of the extracted sub-pedigree (and assumptions) we can also extract the sub-pedigree where all the connections have been preserved (“Include all common ancestors” option), or increase the number of “Max meiotic steps” (from the main panel or using the pedigree-specific setting with the  button) that can pass through a common ancestor to allow his/her inclusion in the extracted sub-pedigree.

By including all the common ancestors the pedigree size increase to 1,942 in 17 generations, while using, for instance, 12 meiotic steps the pedigree size will be about one thousand and the mean kinship rises up to more than one half of what observed in the whole genealogy.