Data sharing
The Good, the Bad, and the Open

Open Science at the MNI

November 28th, 2017
Samir Das - Associate Director of Software Development
McGill Centre for Integrative Neuroscience
Montreal Neurological Institute
samir.das@mcgill.ca

Today's talk

  1. Data Sharing and Open Science
  2. Plans for Open Science at the MNI (TOSI)
  3. Why the MNI made this move
  4. Changes needed to make Open Science reality
  5. Examples of tools, datasets, and environments
  6. MNI Ecosystem - LORIS and CBRAIN
  7. Challenges and Hurdles of going Open

What is Data Sharing?

Exchange of information

Datasets

Tools

Standardization

Databases

Collaborations

Conferences, Hackathons

Facebook, Google, Twitter, etc.


Image source: http://blog.veritythink.com/post/87880448269/creative-data-sharing-and-open-humanitarianism

Best practices in Data Sharing

Committee on Best Practices in Data Analysis and Sharing (COBIDAS)

Richard Stallman
Visit to the MNI
May 2017

Open (Free) Science

Cyberinfrastructure

Centralized or Distributed?

Why did the MNI move towards Open Science?

Increased exposure

Greater collaborations

More citations

Less money spent of patents

New funding opportunities

Improved reproducibility

Enable scientific discovery

It's the future!

Aled Edwards video

What changes are necessary to make this a reality?

Improved Infrastructure

Databases, NoSQL, APIs, Version Control, Provenance Capture

NoSQL

LORIS API

Instrument Format

NeuroImaging Data Model

Standardization and Interoperability

BIDS Format

Boutiques

Makes use of Containers

API Building and Standardization

Atlas template building

Goal:

To create standardized JSON metadata to describe atlases

For either volumetric or surface atlases

Pipelines can access more atlases for anatomical standardization

Consolidated datasets

ADNI, ICBM, NIHPD, Allen Mouse Brain, IBIS, Generation-R, ABIDE, ABIDE Preprocessed, ADHD 200, ADHD Preprocessed, Human Connectome Project, OMEGA, UK Biobank, Edinburgh Biobank, BigBrain, Talairach, 1000 Functional Connectomes, Colin 27, MNI 305, 1000 Brains, AAL, ANIMAL, MAVAN, PreventAD, PING, MNI 152, MNI 305, FSL...

Not enough datasets!

BigBrain

ADNI

  • Cited 343 times

  • Data-use agreements

  • Used for countless analyses

  • Restrictive

  • Cited 295 times
  • Control data for NDAR
  • Longitudinal study (8 years)
  • 532 subjects
  • 8000 distinct variables
  • 37000 individual assessments
  • T1, T2, PD, DTI, Spectroscopy
  • ∼3TB of imaging data
  • 2000 MRI acquisitions

  • Autism infant brain development
  • 3000+ scans
  • 700+ subjects
  • 7000 distinct variables
  • 150,000+ individual assessments
  • T1, T2, DTI, BOLD
  • ∼5 TBs imaging data
  • Genetic/biospecimen data

CCNA

  • $35 million

  • CIHR funded

  • Pan Canadian Consortium

  • Acquisition in June, 2016

  • 35 collections sites

  • LORIS powered

ABIDE

Quebec Parkinson Network

OMEGA

  • Open MEG Archive

  • First open MEG repository

  • 180 user accounts created

  • Associated structural data

  • 500+ reads on academia.edu

Tools and Environments

Neurovault, NeuroSynth, CIVET, VIP, Boutiques, Git-Annex, SOLID, BIDS, NiDM, DiCAT, DCMTK, NiPype, ITK, Freesurfer, SPM, FSL, Mobile MRI, 1000 Brains, AAL, BrainCode, GitHub, Amazon Cloud, FSL, IDA, BrainVisa, DICOM Confidential, DockerHub, Gate, CMIND...

Too many Tools!

Neurovault Example

BrainBrowser

..a set of web-based 3D visualization tools primarily used for viewing neurological data i.e. MRI scans.

It allows for real-time manipulation and analysis of 3D neuroimaging data through any modern web browser.

Data Publishing

Not Data Sharing

Data Publishing Geeksquad for Persistent DOIs

Hackathons

Important value of hackathons to data sharing initiatives

What’s involved?

Longitudinal Acquisition, Storage and Curation, Interoperability, Reproducibility, Transfer, Anonymization, Security, Privacy, Ethics, APIs, Validation, Quality Control, Protocol Checking, Preprocessing, Analysis, HPC, Provenance, Ontological Standarization, Data Harmonization, Upgrades, Maintenance, Bug Fixes, User Interface, Javascript, Bootstrap, Tracking, Extensibility, Data Management, Summary Statistics, Workflows, Development, Tool Integration, Data Sharing, Download, Multi-Modal Linking, Querying, Image Processing, Visualization, Networking, System Administration, Partnerships, Funding, HR bureaucracies ...No big deal!

Data flow

WHAT IS LORIS?

“LORIS is a modular and extensible web-based data management system that integrates all aspects of a multi-center study: from heterogeneous data acquisition (imaging, clinical, behavior, genetics) to storage, processing and ultimately dissemination.”

LORIS Dashboard

Imaging Data

Cross-modal querying

Real-time Query Results

Genomics Browser

CBRAIN

Several default tools:

  • CIVET
  • CivetCombiner
  • CivetQC
  • Freesurfer
  • SPM-batch
  • NIAK
  • FSL (bedpostx, bet, fast, feat, first flirt, melodic, probtrackx)
  • Numerous converters
  • And many more...

cbrain-support.mni@mcgill.ca

CBRAIN projects

TRY ME!

NOW

611 users; 199 international
191 sites
299 countries

NIAK

Challenges and Hurdles

Future unknowns Fear of getting scooped
Waste/duplication Technical challenges
Decreased exposure Privacy concerns
Changing the Publishing Culture Data Harmonization
Licensing and legal aspects Interoperability
More attrition Reproducibility
Sustainibility Obtaining ethics
Funding Scalability

Privacy Concerns

Adrian Thorogood BIC lecture - March 9, 2016

Buy-in from other researchers!

Open Science Beers

Beers at Else's - Wednesday's at 4:44pm

Thank you!Acknowledgements: Alex Zijdenbos, Dario Vins, Jonathan Harlap, Matt Charlet, Andrew Corderey, Sebastian Muehlboeck, Reza Adalat, Louis Collins, Vladimir Fonov, Marc Rousseau, Mia Petkova, Rathi Gnanasekaran, David Brownlee, Tarek Sherif, Pierre Rioux, Nic Kassis, Leigh MacIntyre, Claude Lepage, Ilana Leppert, Natasha Beck, Tristan Glatard, Bert Vincent, Lindsay Lewis, Najma Mahani, Elodie Portales-Casamar, Alden Woodward, Sylvain Milot, Jean Francois Malouin, Sylvain Baillet, Daniel Kroetz, Martin Weiss, Mathieu Desrosier, Jason Karamchandani, Amit Bar-Or, Ted Fon, John Brietner, Derek Lo, Patrick Bermudez, Chris Steele, Pamela Patterson and one of my favourites: Pierre Bellec!

LORIS team on left
Special thanks to Alan Evans for making all of this possible.