Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging gvcf with GLnexus introduces non-zero heterozygous PL in hemizygous PAR #811

Open
ASLeonard opened this issue Apr 23, 2024 · 1 comment
Assignees

Comments

@ASLeonard
Copy link

This is a follow up from an email thread with @pichuan, where DeepVariant correctly produces only hemizygous genotypes in regions indicated as PAR, but through GLnexus merging and later imputation can gain "impossible" heterozygous genotypes.

The probably mainly seems to originate in GLnexus merging, where uncalled alleles (e.g., for "GT:DP:AD:GQ:PL:RNC" "./.:5:5,0:0:14,0,104:II" where I = gVCF input site is non-called) has a non-zero likelihood of being heterozygous according the PL tag, which is later imputed to a "0/1" genotype. But this also occurs for samples called as reference (0/0) but with non-zero PL tags for 0/1.

For example, the third sample gains a heterozygous genotype at site Y:6899016 after imputation, and has the gvcf block

Y       6898989 .       C       <*>     0       .       END=6899117     GT:GQ:MIN_DP:MED_DP:PL  0/0:1:0:0:0,3,29

leads to the GLnexus merged GT

Y       6899016 Y_6899016_T_C   T       C       47      .      0/0:0:0,0:1:0,3,29:..

and then the imputed

Y       6899016 Y_6899016_T_C   T       C       47      .      0/1:1:0,1,0

variant calling with DeepVariant

Running with singularity image v1.6.1.
This was done with the "WGS" model, using --channels "insert_size" --include_med_dp --gvcf <gvcf> for make_examples and --haploid_contigs "X,Y" --par_regions_bed <PAR.bed> for postprocess.

merging with GLnexus

Running with singularity image v1.4.1-0-g68e25e5.
This uses a modified DeepVariant config, where there is no genotype revision (revise_genotypes: false).

/usr/local/bin/glnexus_cli \
        --dir $TMPDIR/GLnexus.DB \
        --config <deepvariant_preset_with_revise_genotypes_false> \
        --threads 4 \
        --mem-gbytes 20 \
        *.g.vcf |\
        bcftools view - |\
        bgzip -@ 8 -c > Yhap.Unrevised.vcf.gz

imputing with beagle4

This is where the "./." genotypes with non-zero het GL turn into "0/1".

java -jar -Xss25m -Xmx50G beagle.27Jan18.7e1.jar ne=100 gl=Yhap.Unrevised.vcf.gz out=Yhap.beagle4

The unrevised (glnexus output, with no hets) and imputed (beagle output, with hets) variants are here, along with the gvcfs. I can provide the alignments, references, etc. if you need as these samples are already public.
Y_haploid_vcf.tar.gz

Best,
Alex

@pichuan pichuan self-assigned this Apr 23, 2024
@pichuan
Copy link
Collaborator

pichuan commented May 6, 2024

Hi @ASLeonard ,
A quick update: We think this is because we need to adjust our PL in our gVCF as well.
I'm working on a code change, but it might take a bit longer than I thought.
I just want to let you know that this is still in my queue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants