This is a summary, written by members of the CITF Secretariat, of:

Yoo S, Garg E, Elliott LT, Hung RJ, Halevy AR, Brooks JD, Bull SB, Gagnon F, Greenwood CM, Lawless JF, Paterson AD, Sun L, Zawati MH, Lerner-Ellis J, Abraham RJS, Birol I, Bourque G, Garant J-M, Gosselin C, Li J, Whitney J, Thiruvahindrapuram B, Herbrick J-A, Lorenti M, Reuter MS, Liu S, Allen U, Bernier FP, Biggs CM, Cheung AM, Cowan J, Herridge M, Maslove DM, Modi BP, Mooser V, Morris SK, Ostrowski M, Parekh RS, Pfeffer G, Suchowersky O, Taher J, Turvey SE, Upton J, Warren RL, Yeung RSM, Aziz N, Knoppers BM, Lathrop M, Jones SJM, Scherer SW, Strug LJ. HostSeq: A Canadian Whole Genome Sequencing and Clinical Data Resource. medRxiv. 2022 May 10. doi: 10.1101/2022.05.06.22274627.

The results and/or conclusions contained in the research do not necessarily reflect the views of all CITF members.

The HostSeq platform, established in April 2020, is a national collaboration of population-based studies investigating the genetic risk factors behind SARS-CoV-2 infection and health outcomes associated with it. HostSeq is 60% of the way to its goal of collecting genomic and clinical information from 10,000 Canadians of all ages with a positive SARS-CoV-2 diagnosis.

Data amassed by HostSeq studies are made available through two open portals or through a controlled data access request. Of the 13 studies included in HostSeq, three are funded in part by the CITF, including genMARK, headed by Dr. Upton Allen at The Hospital for Sick Children, the Quebec COVID-19 Biobank (BQC19) led by Dr. Vincent Mooser, and CANCOV, co-led by Drs. Angela Cheung and Margaret Herridge at the University Health Network in Toronto. This article introducing HostSeq has been released in pre-print and is therefore not yet peer reviewed.

Key points:

  • Of the data currently included in the HostSeq databank, the median age of participants is 49.6 years, 44.1% are male, 41.8% required hospitalization due to their SARS-CoV-2 infection, and 14.1% required admission to an intensive care unit.
  • Each HostSeq partner study obtained the consent of participants, collected blood samples for whole genome sequencing, and recorded clinical information using standardizing care report forms.
  • HostSeq provides open access to some of its datasets through the two following data portals: the Phenotype Portal (contains summaries of the major clinical variables collected by partner studies) and the Variant Search Portal (enables users to search specific regions of HostSeq genetic data).
  • Access to individual-level HostSeq data – which are held in a cloud-based, access-controlled data repository – may be granted following review and approval by a HostSeq-independent Data Access Compliance Office.
  • HostSeq links to provincial health administrative databases, thereby providing additional, long-term health outcomes data on participants, which may inform post-infection conditions such as long COVID and vulnerability to new diagnoses such as diabetes and cancer.

Learn more about the HostSeq initiative here.