In recent decades, rapid technological development has brought new opportunities and possibilities for collecting and making information available. Universities and libraries now have to increasingly take responsibility to both preserve research data for the long term and to make it available where this is possible.
The Open Data Repository supports academic research by facilitating researchers’ access to authentic, permanent, persistent and secure documents and data.
Scientific publishing today is still kept at an artificial 20th century level. Researchers write articles on systems that are long out of date, with poor collaboration support, submit them to journals via web-interfaces that breathe the spirit of the early 1990s in user-interface design, go through a similar review process and finally see their work published as PDFs that are optimized for paper printout. Furthermore, the resulting articles are hard to find, make data re-use difficult to almost impossible, hard to read on the screen and are likely to be hidden behind a paywall beyond the reach of many researchers anyway. 1 2
At the same time billions of euros flow into this system, of which only a small percentage actually goes into the research and the strict publication process –the vast majority pay the exorbitant revenue of a few publishers and their paywalls. 3 4 5 6
This infrastructure is further challenged by link and reference rot that seriously undermine the knowledge base that millions of researchers are building today. Under examination more than 70% of the URLs within the three leading legal journals, and 50% of the URLs within U.S. Supreme Court opinions suffer reference rot — they do not (and cannot) produce the information originally cited. 7 8
Now imagine a global network of data repositories in public libraries. Each one hosts all open access articles relevant to the libraries’ users, maybe even with the published research data from the articles. The repositories update and distribute newly published articles among themselves without manual interaction, serving both as an archive and a front-end library. 1
Articles can be stored in any format together with underlying research data. The repository’s host can decide which data to host (for example only host data for local researchers), and if a user wants to access off-site data or articles, the network will deliver them immediately from the sites that have them available. All events on the repository such as uploads or metadata changes are published on a permissioned blockchain that serves as the feed that connects all repositories and allows newly set up repositories to quickly access all requested articles and local data.
Permissioned blockchains are immutable records of transaction history which is performed by a predefined list of subjects with known identities. Due to their distributed nature, blockchains provide a built-in means of recovery from database corruption and mechanism for data verification.
The actual articles and data are stored on the Interplanetary File System (IPFS) 12 that also connects all repositories and allows mutual access from all repositories. Each publication is addressed by a unique immutable hash (permanent URL).
While all repositories form a network, each one is also independent of the rest, thereby making the network resilient against local breakdowns or attacks. If a library’s connection to the outside fails, all hosted articles and hosted data are still available locally. One publisher’s home repository goes down? No problem, other repositories that host its articles and data serve it as well; therefore, the requesting user will not even notice the outage. A library in a war zone asks for help to save all their hosted publications? Repositories worldwide can be set to quickly mirror all publications from the endangered library, looking up all references published by that library on the blockchain.
Search services can be done locally, are decentralised, thereby keeping bandwidth and server usage low. Researchers that want to mine large numbers of articles are encouraged to set up their own simple client repository for this, thus not putting any load on the local library repository server or the network apart from the initial data download.
Existing open access publications can be imported into the network and made available immediately.
All protocols and software will be under open licenses. No vendor-lock-in. Everyone with a sufficiently fast internet connection can participate. Advanced publishing and archiving solutions can be adjusted or developed by third parties. Censorship in part of the network will not affect repositories outside these areas.
A real open and global network of open access repositories to make publicly funded research outcomes FAIR: Findable, Accessible, Interoperable and Reusable 13. It has to be seen as the digital public infrastructure of the 21st century: linked, secure and permanent.
How much will it cost? It can be done for a small percentage of the more traditional way.
A Simple Demonstration Of The System
The ODR addresses increasing demands for long-term preservation of scholarly research documents and the availability of the research data. The ODR manage research data, metadata enrichment, thus ensuring the format for long-term storage and making the data available when it is possible.
The ODR is an ideal tool for building distributed public and private digital infrastructures of the 21st century. The ODR is easy to scale, easy to maintain and is driven by tested and trusted open-source technologies 15.
- Put the Document on an open distributed storage system: IPFS 12 http://kubrik.io/demos/odr/upload
- Address the data by its hash, so changes will be obvious (for example https://ipfs.io/ipfs/QmNtVSkheRfivYWaMHYMuV1or54XCT92LyfBHhoFrx5BJ5)
All data versions will be put on IPFS and the references to them on an immutable public ledger (blockchain), so they can be traced back
Data signed by the authors or data publishers
- Everything is put into a search engine for easy access http://kubrik.io/demos/odr/search
Publisher / Librarian:
- Publisher selects “Upload new data” or “add metadata”
- Publisher updates metadata (Author, keywords, etc)
- Publisher signs the changes with digital signature
- Do and metadata are uploaded on IPFS 12
- IPFS reference is secured on immutable public ledger (blockchain)
- Document and metadata is indexed on document repository for easy search and retrieval
- User navigates to web document repository search engine, which is as easy as Google
User selects keywords and fills in search field, or browses based on categories
- User selects document from list of results
- Document is presented with:
- Timeline with links to previous versions including public ledger references
- Digital signatures
The Open Data Repository is the public library of the 21st century.
“A Journal is a Club: A New Economic Model for Scholarly Publishing.” Potts, Jason, John Hartley, Lucy Montgomery, Cameron Neylon, and Ellie Rennie. 12 April 2016. [cited 20 April 2016] http://ssrn.com/abstract=2763975 ↩ ↩2
“Why Sci-Hub Will Win” James Heathers. 2 May 2016. [cited 5 May 2016] https://medium.com/@jamesheathers/why-sci-hub-will-win-595b53aae9fa#.ng9kcxoto ↩
“Why Haven’t We Already Canceled All Subscriptions?” Björn Brembs. 20 May 2016. [cited 27 May 2016] http://bjoern.brembs.net/2016/05/why-havent-we-already-canceled-all-subscriptions/ ↩
“Are We Paying US$3000 Per Article Just For Paywalls?” Björn Brembs. 30 July 2014. [cited 20 April 2016] http://bjoern.brembs.net/2014/07/are-we-paying-us3000-per-article-just-for-paywalls/ ↩
“What goes into making a scientific manuscript public?” Björn Brembs. 11 June 2015. [cited 20 April 2016] http://bjoern.brembs.net/2015/06/what-goes-into-making-a-scientific-manuscript-public/ ↩
“Opening the Black Box of Scholarly Communication Funding: A Public Data Infrastructure for Financial Flows in Academic Publishing” Stuart Lawson , Jonathan Gray, Michele Mauri. 11 April 2015. [cited 20 April 2016] https://olh.openlibhums.org/articles/10.16995/olh.72/ ↩
Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations” - Lawrence Lessig et al. 1 October 2013. [cited 20 April 2016] http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2329161 ↩
“Library Discovery and the Open Access challenge” Aaron Tay. 18 May 2016. [cited 27 May 2016] https://medium.com/@aarontay/library-discovery-and-the-open-access-challenge-1b9acff6786b#.62nvhteo5 ↩
“How academic libraries may change when Open Access becomes the norm” Aaron Tay. 20 August 2014. [cited 27 May 2016] http://musingsaboutlibrarianship.blogspot.ie/2014/08/how-academic-libraries-may-change-when.html ↩
“Does the type of Open Access matter for future of academic libraries?” Aaron Tay. 21 May 2016. [cited 27 May 2016] https://musingsaboutlibrarianship.blogspot.ie/2016/05/does-type-of-open-access-matter-for.html#.V8mWSbVri8U ↩
European leaders call for ‘immediate’ open access to all scientific papers by 2020. 27 May 2016. [cited 27 May 2016] http://www.sciencemag.org/news/2016/05/dramatic-statement-european-leaders-call-immediate-open-access-all-scientific-papers ↩
“Introduction: What Is A Blockchain?” Ingo Keck. 12 January 2017. [cited 23 January ↩
ODR demo on Github [cited 9 March 2017] https://github.com/kubrik-engineering/opendocumentrepository*** ↩