The Ontario Cancer Data Linkage Project (‘cd-link’) is an initiative of the Ontario Institute for Cancer Research/Cancer Care Ontario Health Services Research Program. Through a collaborative agreement between the Institute for Clinical Evaluative Sciences (ICES) and Cancer Care Ontario (CCO), cd-link is a data release program whereby administrative datasets relevant to cancer health services research—such as the Ontario Cancer Registry and Ontario Health Insurance Plan claims—are linked, de-identified and with the protections of a comprehensive Data Use Agreement, provided directly to researchers.
How are the data de-identified?
An innovation of the cd-link initiative is the mechanism by which the data are de-identified. All data are made anonymous by removing personal identifiers, according to Ontario (PHIPA) and U.S. (HIPAA) privacy legislation.
To achieve this, identifiers are removed or scrambled, and all dates more specific than year are converted to the number of days relative to the index date (e.g., diagnosis date). Data sets are then evaluated with the Privacy Analytics Risk Assessment Tool, a program that measures and manages the re-identification risk of each data release. All data releases must have a re-identification risk of ≤ 0.33 (at least three similar observations in the data sets released) and preferably at least ≤ 0.20 (at least five similar observations in the datasets released). Consequently, variables may be modified to ensure that each data release meets these criteria. For example, age at index may be grouped into five-year bands, or geographic regions may be combined. In this way, a new type of data are created: risk-reduced de-identified data (‘R2D2’).
The de-identification process is conducted by a cd-link analyst in collaboration with the researcher to ensure that the final de-identified data still allow the researcher to achieve the objectives of his/her research.
Am I eligible and how much does it cost?
The program is currently available only to researchers at Ontario academic institutions.
Currently, there is no charge for data requests. In the future, a modest cost recovery fee will be introduced.
At this time, applications funded by for-profit interests will not be considered.
How do I apply?
The application process for cd-link mirrors the current process for approving projects through the ICES Cancer Research Program. Researchers must prepare a study proposal (approximately five pages) consisting of the required elements listed below. Failure to include all required elements will delay the review and approval process:
- Rationale and objectives
- Data sources and variables requested, including justification for each variable
- Planned analyses and planned use of the data
- Expected products
- Funding sources
- Data custodian resources for ensuring data security
- Study timeline
- List of research staff members, their role and contact information (include postal address, e-mail address and telephone number).
Principal Investigators must submit their proposals electronically to the Health Services Research Program Leader, Dr. Craig Earle, for preliminary review.
What datasets are available?
- CIHI Discharge Abstract Database (CIHI-DAD)
- CIHI National Ambulatory Care Reporting System (NACRS)
- Continuing Care Reporting System (CCRS)
- CytoBase (cervical screening)
- Home Care Database (HCD) / Ontario Home Care Administrative System (OHCAS)
- New Drug Funding Program (NDFP)
- National Rehabilitation Reporting System (NRS)
- Ontario Breast Screening Program (OBSP)
- Ontario Cancer Registry (OCR)
- Ontario Drug Benefit Claims (ODB)
- Ontario Health Insurance Plan Claims Database (OHIP)
- Registered Persons Database (RPDB)
Click here for available date ranges for datasets.
Typically, an incident or decedent cohort is created based on the Ontario Cancer Registry,and data from additional sources pertaining to the cohort is also included. Optionally, a 5% sample of the general population and its attendant data are also included. Please note: While the general population sample is subject to the same age/sex exclusion criteria as the case cohort, it is not a matched set (i.e., the age-sex distribution may differ markedly between the cases and the controls).
What is the review and approval process?
If the proposal receives preliminary acceptance by the HSR Program Leader, it will be reviewed by faculty members of the ICES Cancer Program at the next monthly meeting to ensure: adherence to ICES and CCO privacy policies; feasibility of the study with the data requested; and appropriate use of the data requested. The researcher will be invited to present a proposal summary. Consultations with the Chief Privacy Officer and the Senior Director, Data, Technology and Security at ICES will take place as necessary.
Upon approval by the ICES Cancer Program, the researcher must submit the following forms/documents to the cd-link Research Assistant. Signed original copies of the Data Use Agreement, cd-link Privacy Impact Assessement and cd-link Confidentiality Agreements are required. Scanned copies should also be emailed or faxed to the Research Assistant as soon as possible to expedite the process.
- Data Use Agreement (DUA): This agreement between the Principal Investigator and ICES outlines the terms and conditions for the appropriate and secure use of the data and proposed output. Please note that we reserve the right to terminate PI and/or institutional access to cd-link data indefinitely should there be a breach of the terms and conditions of the DUA.
- Data Sharing Agreement (DSA): This agreement between an external institution and ICES is required if the PI submits a proposal requesting the importation of non-ICES data to ICES for the purpose of inclusion with the cd-link request. The data thus created would be released in an unidentifiable and unlinkable format.
- cd-link Confidentiality Agreement for Researchers (CA): This agreement must be signed (and renewed each fiscal year) by all members of the research team who will use the data for its designated purpose.
- cd-link Privacy Impact Assessment (PIA): This form is used to collect additional information regarding the proposal as required by PHIPA.
- Project Activation Worksheet (PAW): As the costs are currently being covered by the cd-link program, this form must be completed for accounting purposes.
- cd-link Dataset Creation Plan (DCP): This form is used to specify the data holdings and variables required, along with other necessary details, such as time frame, exclusion criteria and whether or not a general population sample is required, etc.
- cd-link Request Checklist for PIs: Refer to this checklist as you complete the forms to avoid common oversights and hence unnecessary delays in processing your application.
Final approval occurs once documents have been received and the PIA has been approved and signed by the HSR Program Leader and by the ICES Cancer Program Leader, Chief Executive Officer, and Chief Privacy Officer. An assigned Analyst will begin creating the dataset according to the specification in the DCP, and may contact the researcher if any clarification is needed.
The review panel reserves the right to request additional information or requirements should the need arise.
When and how will I receive the data?
Once signed documents are received, we aim to process requests within our target period of six weeks; however, we cannot provide any guarantees as more complex requests typically take longer to process.
Each data source will be provided as a separate dataset, linkable by subject IDs that are unique to each data release.
The data will be provided to the Principal Investigator in SAS format on an encrypted CD via a tracked delivery provider (courier). Detailed information, including minimal descriptive statistics of each variable and missing value information, will accompany each dataset. In addition, a summary report will be included in each release detailing:
- The criteria used to create the data
- The name, number of observations and time frame of each dataset
- Any de-identification manipulations performed
- Excluded cases due to invalid or unavailable health card number
What happens towards the end of my project?
Toward the project end date, the Principal Investigator will be asked to report on the status of any manuscripts/reports that are in production and submit the document to the Research Assistant at least 45 calendar days in advance of publication. The PI is to inform the cd-link Research Assistant at this time if an extension to the project is necessary. Monthly PubMed searches will take place for instances of publication without notification and approval.
Three months prior to the expiration of the Data Use Agreement, the Principal Investigator will be sent a certificate of destruction to be completed and returned to ICES as proof of destruction of all original, copied, backup or derived data.
Other commonly asked questions
What is the difference between the cd-link data release program and other ICES projects?
Cd-link is a data release program whereby researchers can request specific ICES and CCO data holdings (that are held at ICES) for their research projects. We make some modifications to the datasets so that they can be released outside of ICES. Specifically, personal identifiers are removed, other identifiers (e.g., institution) are scrambled, and all dates more specific than year are converted to number of days relative to the index date. Some variables (e.g., age, geographic regions) may also be categorized.
Do I require ethics approval prior to submitting a cd-link request?
Researchers should consult with their institution to discuss the need for ethics approval. In general, Research Ethics Board approval is strongly encouraged. You do not need to wait for the approval of a pending ethics application for your cd-link request to be approved.
Are all members of the research team required to sign the confidentiality agreement or only those who will access the data?
The confidentiality agreement must be completed by all members of the research team who are listed on the cd-link PIA form.
Will my research team need to sign a new confidentiality agreement if we have already done so for previous ICES projects?
Every team member must sign the ‘cd-link confidentiality agreement for researchers’ as it is different from the ICES confidentiality agreement.
Do I need to submit original documents before my request can be processed?
We understand that it may take some time to obtain signatures on all the documents, and the confidentiality agreements in particular. Researchers should fax or email us documents as they are signed and completed in order to expedite the process. This would allow us to verify that the documents have been completed correctly and to obtain the required internal ICES signatures before submission to the ICES Privacy Office. Please note that final approval can only take place after the signed hard copies of the DUA, PIA and confidentiality agreements have been received.
Would I be able to add members to the research team after a cd-link request has been approved?
Yes, new members may be added to the research team after the cd-link request has been approved. New members are required to sign a cd-link confidentiality agreement for researchers. It is the responsibility of the Principal Investigator to limit the use of the data by each research team member in accordance with the provisions of the DUA and confidentiality agreements. It is also his/her responsibility to orient all research team members to the terms of the DUA prior to signing the confidentiality agreement to ensure adherence to data use and security requirements.
Can students request cd-link data?
Trainees require a supervisor affiliated with an Ontario academic institution in order to submit a cd-link request. If the trainee is assuming the role of Principal Investigator, both his/her and the supervisor’s signature must be included on the Data Use Agreement.
Would it be possible to link non-ICES data from another institution with ICES data holdings to create a cd-link dataset?
It is possible. In addition to the datasets listed under “What datasets are available”, a cd-link data release may include a third-party dataset. These requests will be evaluated on a case-by-case basis.
For requests that are approved, we require written acknowledgement from the institution, in the form of a DSA with ICES, stating that we will link data from their institution with ICES data holdings for inclusion in a project-specific cd-link data release. Also, the research proposal must be revised to include the amended purpose and additional linkage. PIs should understand that if the dataset is very small, the variables may undergo a lot of generalization to meet the cd-link de-identification standards.
Please note that once a cd-link dataset is released, it is not possible to link it with any other data, even from a previous cd-link release.
Is the Johns Hopkins ACG system available to calculate co-morbidities?
Both the Charlson Co-Morbidity Index and Johns Hopkins ACG system are available for cd-link requests. For requests involving the use of the Johns Hopkins ACG system, Principal Investigators must complete a form stating the study objectives, funding source, intended use of the Johns Hopkins ACG system and how and where results will be presented. In compliance with the ICES licence agreement, the completed form will be forwarded to Johns Hopkins.
Please note that all non-confidential reports and or articles produced by the project team involving ACGs must be provided to Johns Hopkins. In addition, non-confidential copies of output produced by the ACG software (including a list of non-matched ICD-9-CM and ICD-10 codes the software failed to recognize) must be given to Johns Hopkins if requested.
For more information
Craig Earle, Program Leader
Health Services Research Program
Ontario Institute for Cancer Research and Cancer Care Ontario
Katrina Chan, Research Assistant
Health Services Research Program
Ontario Institute for Cancer Research and Cancer Care Ontario