Precision medicine requires big data. In order to improve the treatment of people with cancer or to understand rare diseases, scientists and clinicians, as well as AI technologies, need access to larger sets of health research data covering various populations and a wide range of conditions. For AI, more data means better understanding of diseases, which will lead to more accurate diagnosis and treatment. At the same time, each hospital will only see a relatively small number of people with an illness, and even across the province we only have access to a small portion of the total data available globally. To create the large-scale datasets needed to advance precision medicine, data sharing across the country and around the world is essential.
The Canadian Distributed Infrastructure for Genomics (CanDIG), featured recently in a special issue of Cell genomics dedicated to data sharing, is Canada’s solution to enable data sharing across the country (and connect our data to datasets around the world). Led from the University Health Network in Toronto with sites at McGill University in Montreal and the BC Genome Science Center, CanDIG is a collaboration of computer scientists, AI specialists, clinicians and geneticists working together to enable the studies needed to address the health challenges facing Canadians.
CanDIG is a pilot project of the Global Alliance for Genomics and Health (GA4GH), an international effort establishing standards for genomics and health data aimed at improving interoperability in the global genomics landscape. The organization was the subject of this month’s special issue of Cell genomics for their work on global genomics and health data sharing efforts. Canada has been a leader in GA4GH, hosting its headquarters, leading several streams of project work, and implementing many of the GA4GH standards. CanDIG, as one of the pilot projects, not only implemented the GA4GH standards, but also helped inform and build many of them. CanDIG is already helping scientists nationwide access large-scale genomics data that was previously siled in individual provinces or hospitals and is beginning to link Canada’s genomics datasets to those around the world through collaborations. such as the EU/Africa/Canada CINECA Project.
The CanDIG platform was developed to meet provincial healthcare and privacy legislation in Canada, by creating a federation of datasets, simplifying the challenges of sharing across provincial borders. CanDIG is also a key part of the Future Digital Health and Discovery Platform (DHDP), a $200 million effort funded in part by the Canadian government, which will support the sharing of genomic data from the Terry Fox Marathon of Hope Cancer Center Network. Making this data accessible to researchers is key to unlocking their discovery potential and enabling better cancer treatments, because the smartest researchers and the most powerful machine learning techniques can’t do anything with data they don’t. cannot find, access or use.
“At institutions like UHN, we are building increasingly sophisticated data assets containing health data from many different sources. The next step is to help researchers transform this data into new knowledge by making it findable, available and usable in a uniform, organized and secure format. CanDIG is an important step in enabling researchers across Canada to access the wealth of data collected and generated across the country.
Dr. Michael Brudno, CanDIG Principal Investigator, UHN Chief Data Scientist and Professor of Computer Science at the University of Toronto.
“Participating in the GA4GH community and international projects like the EU/Africa/Canada CINECA project, CanDIG is beginning to link Canadian genomics efforts with those around the world. As the types of health data expand and the volumes increase, we need to ensure that our datasets are findable and useful; Canada is a world leader in this area. »
Dr. Guillaume Bourque, CanDIG Co-Head, Professor of Molecular Genetics at McGill University and Director of the Canadian Center for Computational Genomics (C3G).
“Access to whole genome data has been key to understanding the spectrum of mutations that accumulate in cancer. CanDIG and the Terry Fox Digital Health and Discovery Platforms will help the data collected by the Marathon of Hope Cancer Centers Network be studied by as many approved people. researchers as possible. »
Dr. Steve Jones, Co-Lead CanDIG, Head of Bioinformatics and Co-Director of the BC Michael Smith Genome Sciences Center
University Health Network
Jonathan Dursi, L., et al. (2021) CanDIG: Federated network across Canada for multi-omics and health data discovery and analysis. Cell genomics. doi.org/10.1016/j.xgen.2021.100033.