This is a summary, written by members of the CITF Secretariat, of:

Yoo S, Garg E, Elliott LT, Hung RJ, Halevy AR, Brooks JD, Bull SB, Gagnon F, Greenwood C, Lawless JF, Paterson AD, Sun L, Zawati MH, Lerner-Ellis J, Abraham R, Birol I, Bourque G, Garant JM, Gosselin C, Li J, Whitney J, Thiruvahindrapuram B, Herbrick JA, Lorenti M, Reuter MS, Adeoye OO, Liu S, Allen U, Bernier FP, Biggs CM, Cheung AM, Cowan J, Herridge M, Maslove DM, Modi BP, Mooser V, Morris SK, Ostrowski M, Parekh RS, Pfeffer G, Suchowersky O, Taher J, Upton J, Warren RL, Yeung R, Aziz N, Turvey SE, Knoppers BM, Lathrop M, Jones S, Scherer SW, Strug LJ. HostSeq: a Canadian whole genome sequencing and clinical data resource. BMC Genom Data. 2023 May 2;24(1):26. doi: https://doi.org/10.1186/s12863-023-01128-3.

The results and/or conclusions contained in the research do not necessarily reflect the views of all CITF members.

The HostSeq platform, established in April 2020, is a national collaboration of population-based studies investigating genetic risk factors for SARS-CoV-2 disease and the health outcomes associated COVID-19. HostSeq has collected genomic and clinical information from 95% of the 10,000 Canadians of all ages with a positive SARS-CoV-2 diagnosis that it intends to include. Data collection is complete for 70%. The study was funded by CITF and includes many CITF-funded studies. Its results were published in BMC Genomic Data.

Data amassed by HostSeq studies are made available through two open portals or through a controlled data access request. Of the 13 studies included in HostSeq, four are funded in part by the CITF. These include genMARK, headed by Dr. Upton Allen, and the SickKids COVID-19 Biobank, led by Dr. Rae Young, both at The Hospital for Sick Children in Toronto; the Biobanque québécoise de la COVID-19 (BQC19), formerly led by Dr. Vincent Mooser at McGill University; and CANCOV, co-led by Drs. Angela Cheung and Margaret Herridge at the University Health Network in Toronto.

Key points:

  • The data currently included in the HostSeq databank reveal a median age of participants of 47.9 years, with 54.6% female, 41.5% male, and 3.9% reported sex at birth missing. Approximately half of the HostSeq participants required hospitalization due to their SARS-CoV-2 infection. Of the hospitalized patients, 54% were discharged home, 15% were transferred to other hospitals or healthcare settings (e.g., rehabilitation centers or long-term care facilities), 11.9% died, and data are being collected for 18.7%.
  • Each HostSeq partner study obtained the consent of participants, collected blood samples for whole genome sequencing, and recorded clinical information using standardizing case report forms.
  • HostSeq provides open access to some of its datasets through the two following data portals: the Phenotype Portal (contains summaries of the major clinical variables collected by partner studies) and the Variant Search Portal (enables users to search specific regions of HostSeq genetic data).
  • Access to individual-level HostSeq data – which are held in a cloud-based, access-controlled data repository – may be granted following review and approval by a HostSeq-independent Data Access Compliance Office. Currently HostSeq provides 174.5 million short variants consisting of single nucleotide variants and indelsInsertions and deletions (indels) are additions or deletions of one or more nucleotides in the DNA sequences of the participants..
  • HostSeq links to provincial health administrative databases, thereby providing additional, long-term health outcomes data on participants. These may indicate post-infection conditions such as long COVID and vulnerability to new diagnoses such as diabetes and cancer.

Learn more about the HostSeq initiative here.