I already solved the mystery in this comment I posted to your Substack: usmortality.substack.co…. Or actually the mystery was solved by ChrisDeZPhD in this Twitter thread and I just repeated what he said: twitter.com/ChrisDeZPhD…. In the second version of Wuhan-Hu-1 at GenBank, the last 20 bases (which are also included in the first and third versions but not at the very end) are TGTGATTTTAATAGCTTCTT, which is identical to a segment of human chromosome 1, and the extra 598 nucleotides after the 20-base segment in the first version also match human chromosome 1. If you do a BLAST search for the last 618 nucleotides of the first version, it's 99.68% identical to a result titled "Human DNA sequence from clone RP11-173E24 on chromosome 1q31.1-31.3, complete sequence". You can simply copy positions 29856 to 30473 from here: ncbi.nlm.nih.gov/nuccor…. Then paste it here and press the BLAST button: blast.ncbi.nlm.nih.gov/…. You can align the three different versions of Wuhan-Hu-1 like this: `brew install mafft;curl 'eutils.ncbi.nlm.nih.gov… --clustalout -`. Another paper where they did de-novo assembly for an early SARS 2 sample was this paper Ren et al.: ncbi.nlm.nih.gov/pmc/ar…. They wrote: "Quality control processes included removal of low-complexity reads by bbduk (entropy = 0.7, entropy-window = 50, entropy k = 5; version: January 25, 2018),[11] adapter trimming, low-quality reads removal, short reads removal by Trimmomatic (adapter: TruSeq3-SE.fa:2:30:6, LEADING: 3, TRAILING: 3, SLIDING WINDOW: 4:10, MINLEN: 70, version: 0.36),[12] host removal by bmtagger (using human genome GRCh38 and yh-specific sequences as reference),[13] and ribosomal reads removal by SortMeRNA (version: 2.1b).[14]" The reason why the first version of Wuhan-Hu-1 included the 618-base segment of human DNA at the end may have been that they didn't remove human reads before they ran MEGAHIT, or at least they didn't mention removing human reads in the methods section of the Wu et al. paper.