Already a member?

Sign In
Syndicate content

Community of Data Professionals

New about IASSIST members.

The Role of Data Repositories in Reproducible Research

Cross posted from ISPS Lux et Data Blog

These questions were on my mind as I was preparing to present a poster at the Open Repositories 2013 conference in Charlottetown, PEI earlier this month. The annual conference brings the digital repositories community together with stakeholders, such as researchers, librarians, publishers and others to address issues pertaining to “the entire lifecycle of information.” The conference theme this year, “Use, Reuse, Reproduce,” could not have been more relevant to the ISPS Data Archive. Two plenary sessions bookended the conference, both discussing the credibility crisis in science. In the opening session, Victoria Stodden set the stage with her talk about the central role of algorithms and code in the reproducibility and credibility of science. In the closing session, Jean-Claude Guédon made a compelling case that open repositories are vital to restoring quality in science.

My poster, titled, “The Repository as Data (Re) User: Hand Curating for Replication,” illustrated the various data quality checks we undertake at the ISPS Data Archive. The ISPS Data Archive is a small archive, for a small and specialized community of researchers, containing mostly small data. We made a key decision early on to make it a "replication archive," by which we mean a repository that holds data and code for the purpose of being used to replicate and verify published results.

The poster presents ISPS Data Archive’s answer to the questions of who is responsible for the quality of data and what that means: We think that repositories do have a responsibility to examine the data and code we receive for deposit before making the files public, and that this data review involves verifying and replicating the original research outputs. In practice, this means running the code against the data to validate published results. These steps in effect expand the role of the repository and more closely integrate it into the research process, with implications for resources, expertise, and relationships, which I will explain here.
First, a word about what data repositories usually do, the special obligations reproducibility imposes, and who is fulfilling them now. This ties in with a discussion of data quality, data review, and the role of repositories.

Data Curation and Data Quality

A well-curated data repository is more than a place to put data. The Digital Curation Center (DCC) explains that data curation means ensuring data are accessible to designated users for first time use and reuse. This involves a set of curatorial practices – maintaining, preserving and adding value to digital research data throughout its lifecycle – which reduces threat to the long-term research value of the data, minimizes the risk of its obsolescence, and enables sharing and further research. An example of a standard-setting curation process is the Inter-university Consortium for Political and Social Research (ICPSR). This process involves organizing, describing, cleaning, enhancing, and preserving data for public use and includes format conversions, reviewing the data for confidentiality issues, creating documentation and metadata records, and assigning digital object identifiers. Similar data curation activities take place at many data repositories and archives.

These activities are understood as essential for ensuring and enhancing data quality. Dryad, for example, states that its curatorial team “works to enforce quality control on existing content.” But there are many ways to assess the quality of data. One criterion is verity: Whether the data reflect actual facts, responses, observations or events. This is often assessed by the existence and completeness of metadata. The UK’s Economic and Social Research Council (ESRC), for example, requests documentation of “the calibration of instruments, the collection of duplicate samples, data entry methods, data entry validation techniques, methods of transcription.” Another way to assess data quality is by its degree of openness. Shannon Bohle recently listed no less than eight different standards for assessing the quality of open data on this dimension. Others argue that data quality consists of a mix of technical and content criteria that all need to be taken into account. Wang & Strong’s 1996 article claims that, “high-quality data should be intrinsically good, contextually appropriate for the task, clearly represented, and accessible to the data consumer.” More recently, Kevin Ashley observed that quality standards may be at odds with each other. For example, some users may prize the completeness of the data while others their timeliness. These standards can go a long way toward ensuring that data are accurate, complete, and timely and that they are delivered in a way that maximizes their use and reuse.

Yet these procedures are “rather formal and do not guarantee the validity of the content of the dataset” (Doorn et al). Leaving aside the question of whether they are always adhered to, these quality standards are insufficient when viewed through the lens of “really reproducible research.” Reproducible science requires that data and code be made available alongside the results, to allow regeneration of the published results. For a replication archive, such as the ISPS Data Archive, the reproducibility standard is imperative.

Data Review

The imperative to provide data and code, however, only achieves the potential for verification of published results. It remains unclear as to how actual replication occurs. That’s where a comprehensive definition of the concept of “data review” can be useful: At ISPS, we understand data review to mean taking that extra step – examining the data and code received for deposit and verifying and replicating the original research outputs.

In a recent talk, Christine Borgman pointed out that most repositories and archives follow the letter, not the spirit, of the law. They take steps to share data, but they do not review the data. “Who certifies the data? Gives it some sort of imprimatur?” she asks. This theme resonated at Open Repositories. Stodden asked: “Who, if anyone, checks replication pre-publication?” Chuck Humphrey lamented the lack of an adequate data curation toolkit and best practices regarding the extent of data processing prior to ingest. And Guédon argued that repositories have a key role to play in bringing quality to the foreground in the management of science.

Stodden’s call for the provision of data and code underlying publication echoes Gary King’s 1995 definition of the “replication standard” as the provision of, “sufficient information… with which to understand, evaluate, and build upon a prior work if a third party could replicate the results without any additional information from the author.” Both call on the scientific community to take up replication for the good of science as a matter of course in their scientific work. However, both are vague as to how this can be accomplished. Stodden suggested at Open Repositories that this activity is community-dependent, often done by students or by other researchers continuing a project, and that community norms can be adjusted by rewarding high integrity, verifiable research. King, on the other hand, argues that “the replication standard does not actually require anyone to replicate the results of an article or book. It only requires sufficient information to be provided – in the article or book or in some other publicly accessible form – so that the results could in principle be replicated” (emphasis added in italics). Yet, if we care about data quality, reproducibility, and credibility, it seems to me that this is exactly the kind of review in which we should be engaging.

A quick survey of various stakeholders in the research data lifecycle reveals that data review of this sort is not widely practiced:

  • Researchers, on the whole, do not do replication tests as part of their own work, or even as part of the peer review process. In the future, they may be incentives for researchers to do so, and post-publication crowd-sourced peer review in the mold of Wikipedia, as promoted by Edward Curry, may prove to be a successful model.
  • Academic institutions, and their libraries, are increasingly involved in the data management process, but are not involved in replication as a matter of course (note some calls for libraries to take a more active role in this regard).
  • Large or general data repositories like Dryad, FigShare, Dataverse, and ICPSR provide useful guidelines and support varying degrees of file inspection, as well as make it significantly easier to include materials alongside the data, but they do not replicate analyses for the purpose of validating published results. Efforts to encourage compliance with (some of) these standards (e.g., Data Seal of Approval) typically regard researchers responsible for data quality, and generally leave repositories to self-regulate.
  • Innovative services, such as RunMyCode, offer a dissemination platform for the necessary pieces required to submit the research to scrutiny by fellow scientists, allowing researchers, editors, and referees to “replicate scientific results and to demonstrate their robustness.” RunMyCode is an excellent facilitator for people who wish to have their data and code validated; but it relies on crowd sourcing, and does not provide the service per se.
  • Some argue that scholarly journals should take an active role in data review, but this view is controversial. A document produced by the British Library recently recommended that, “publishers should provide simple and, where appropriate, discipline-specific data review (technical and scientific) checklists as basic guidance for reviewers.” In some disciplines, reviewers do check the data. The F1000 group identifies the “complexity of the relationship between the data/article peer review conducted by our journal and the varying levels of data curation conducted by different data repositories.” The group provides detailed guidelines for authors on what is expected of them to submit and ensures that everything is submitted and all checklists are completed. It is not clear, however, if they themselves review the data to make sure it replicates results. Alan Dafoe, a political scientist at Yale, calls for better replication practices in political science. He places responsibility on authors to provide quality replication files, but then also suggests that journals encourage high standards for replication files and that they conduct a “replication audit” which will “evaluate the replicability and robustness of a random subset of publications from the journal.”

The ISPS Data Archive and Reproducible Research

This brings us to the ISPS Data Archive. As a small, on-the-ground, specialized data repository, we are dedicated to serious data review. All data and code – as well as all accompanying files – that are made public via the Archive are closely reviewed and adhere to standards of quality that include verity, openness, and replication. In practice it means that we have developed curatorial practices that include assessing whether the files underlying a published (or soon to be published) article, and provided by the researchers, actually reproduce the published results.

This requires significant investment in staffing, relationships, and resources. The ISPS Data Archive staff has data management and archival skills, as well as domain and statistical expertise. We invest in relationships with researchers and learn about their research interests and methods to facilitate communication and trust. All this requires the right combination of domain, technical and interpersonal skills as well as more time, which translates into higher costs.

How do we justify this investment? Broadly speaking, we believe that stewardship of data in the context of “really reproducible research” dictates this type of data review. More specifically, we think this approach provides better quality, better science, and better service.

  • Better quality. By reviewing all data and code files and validating the published results, the ISPS Data Archive essentially certifies that all its research outputs are held to a high standard. Users are assured that code and data underlying publications are valid, accessible, and usable.
  • Better science. Organizing data around publications advances science because it helps root out error. “Without access to the data and computer code that underlie scientific discoveries, published findings are all but impossible to verify” (Stodden et al.) Joining the publication to the data and code combats the disaggregation of information in science associated with open access to data and to publications on the Web. In effect, the data review process is a first order data reuse case: The use of research data for research activity or purpose other than that for which it was intended. This places the Archive as an active partner in the scientific process as it performs a sort of “internal validity” check on the data and analysis (i.e., do these data and this code actually produce these results?).

    It’s important to note that the ISPS Data Archive is not reviewing or assessing the quality of the research itself. It is not engaged in questions such as, was this the right analysis for this research question? Are there better data? Did the researchers correctly interpret the results? We consider this aspect of data review to be an “external validity” check and one which the Archive staff is not in a position to assess. This we leave to the scientific community and to peer review. Our focus is on verifying the results by replicating the analysis and on making the data and code usable and useful.

  • Better service. The ISPS Data Archive provides high level, boutique service to our researchers. We can think of a continuum of data curation that progresses from a basic level where data are accepted “as is” for the purpose of storage and discovery, to a higher level of curation which includes processing for preservation, improved usability, and compliance, to an even higher level of curation which also undertakes the verification of published results.

This model may not be applicable to other contexts. A larger lab, greater volume of research, or simply more data will require greater resources and may prove this level of curation untenable. Further, the reproducibility imperative does not neatly apply to more generalized data, or to data that is not tied to publications. Such data would be handled somewhat differently, possibly with less labor-intensive processes. ISPS will need to consider accommodating such scenarios and the trade-offs a more flexible approach no doubt involves.

For those of us who care about research data sharing and preservation, the recent interest in the idea of a “data review” is a very good sign. We are a long way from having all the policies, technologies, and long-term models figured out. But a conversation about reviewing the data we put in repositories is a sign of maturity in the scholarly community – a recognition that simply sharing data is necessary, but not sufficient, when held up to the standards of reproducible research.

OR2013: Open Repositories Confront Research Data

Open Repositories 2013 was hosted by the University of Prince Edward Island from July 8-12. A strong research data stream ran throughout this conference, which was attended by over 300 participants from around the globe.  To my delight, many IASSISTers were in attendance, including the current IASSIST President and four Past-Presidents!  Rarely do such sightings happen outside an IASSIST conference.

This was my first Open Repositories conference and after the cool reception that research data received at the SPARC IR meetings in Baltimore a few years ago, I was unsure how data would be treated at this conference.  I was pleasantly surprised by the enthusiastic interest of this community toward research data.  It helped that there were many IASSISTers present but the interest in research data was beyond that of just our community.  This conference truly found an appropriate intersection between the communities of social science data and open repositories. 

Thanks go to Robin Rice (IASSIST), Angus Whyte (DCC), and Kathleen Shearer (COAR) for organizing a workshop entitled, “Institutional Repositories Dealing with Data: What a difference a ‘D’ makes!”  Michael Witt, Courtney Matthews, and I joined these three organizers to address a range of issues that research data pose for those operating repositories.  The registration for this workshop was capped at 40 because of our desire to host six discussion tables of approximately seven participants each.  The workshop was fully subscribed and Kathleen counted over 50 participants prior to the coffee break.  The number clearly expresses the wider interest in research data at OR2013.

Our workshop helped set the stage for other sessions during the week.  For example, we talked about environmental drivers popularizing interest in research data, including topics around academic integrity.  Regarding this specific issue, we noted that the focus is typically directed toward specific publication-related datasets and the access needed to support the reproducibility of published research findings.  Both the opening and closing plenary speakers addressed aspects of academic integrity and the role of repositories in supporting the reproducibility of research findings.  Victoria Stodden, the opening plenary speaker, presented a compelling and articulate case for access to both the data and computer code upon which published findings are based.  She calls herself a computational scientist and defends the need to preserve computer code as well as data to facilitate the reproducibility of scientific findings.  Jean-Claude Guédon, the closing plenary speaker, bracketed this discussion on academic integrity.  He spoke about scholarly publishing and how the commercial drive toward indicators of excellence has resulted in cheating.  He likened some academics to Lance Armstrong, cheating to become number one.  He feels that quality rather than excellence is a better indicator of scientific success.

Between these two stimulating plenary speakers, there was a number of sessions during which research data were discussed.  I was particularly interested in a panel of six entitled, “Research Data and Repositories,” especially because the speakers were from the repository community instead of the data community.  They each took turns responding to questions about what their repositories do now regarding research data and what they see happening in the future.  In a nutshell, their answers tended to describe the desire to make better connections between the publications in their repositories with the data underpinning the findings in these articles.  They also spoke about the need to support more stages of the research lifecycle, which often involves aspects of the data lifecycle within research.  There were also statements that reinforced the need for our (IASSIST’s) continued interaction with the repository community.  The use of readme files in the absence of standards-based metadata and other practices, where our data community has moved the best-practice yardstick well beyond, demonstrate the need for our communities to continue in dialogue. 

Chuck Humphrey

IASSIST Fellows 2013


The IASSIST Fellows Committee is glad to announce through this post the six recipients of the 2013 IASSIST Fellowship award. We are extremely excited to have such a diverse and interesting group with different backgrounds and experience and encourage IASSISTers to welcome them at our conference in Cologne, Germany.

Please find below their names, countries and brief bios:

Chifundo Kanjala (Tanzania) 

Chifundo currently works as a Data Manager and data documentalist for an HIV research group called ALPHA network based at London School of Hygiene and Tropical Medicine's department of Population Health, Chifundo spends most of his time in Mwanza, Tanzania but do travel from time around Southern and Eastern Africa to work with colleagues in the ALPHA network.Before joining the London School of Hygiene and Tropical Medicine, he was working as a Data analyst consultant at Unicef, Zimbabwe.Currently working part time on a PhD with London school of Hygiene and Tropical Medicine. He has an MPhil in Demography from university of Cape Town, South Africa and a BSc Statistics Honours degree from University of Zimbabwe.

Judit Gárdos (Hungary) 

Judit Gárdos studied Sociology and German Language and Literature in Budapest, Vienna and Berlin. She is PhD-candidate in sociology, with a topic on the philosophy, sociology and anthropology of quantitative sociology. She is young researcher at the Institute of Sociology of the Hungarian Academy of Sciences. Judit has been working at the digital archive and research group called "" that is collecting qualitative, interview-based sociological research collections of the last 50 years. She is coordinating the work at the newly-funded Research Documentation Center of the Center for Social Sciences at the Hungarian Academy of Sciences.

Cristina Ribeiro (Portugal) 

Cristina Ribeiro is an Assistant Professor in Informatics Engineering at Universidade do Porto and a researcher at INESC TEC. She has graduated in Electrical Engineering, holds a Master in Electrical and Computer Engineering and a Ph.D. in Informatics. Her teaching includes undergraduate and graduate courses in information retrieval, digital libraries, knowledge representation and markup languages. She has been involved in research projects in the areas of cultural heritage, multimedia databases and information retrieval. Currently her main research interests are information retrieval, digital preservation and the management of research data.

Aleksandra Bradić-Martinović (Serbia) 

Aleksandra Bradić-Martinović, PhD is the Research Fellow at the Institute of Economic Sciences, Belgrade, Serbia. Her field of expertize is research of information and communication technology implementation in economy, especially in banking, payment system operations and stock exchange operations. Aleksandra is also engaged in education process in Belgrade Banking Academy at the following subjects: E-banking and Payment Systems, Stock Market Dealings and Management Information Systems. She was engaged at several projects in the field of education. At the FP7 SERSCIDA project she is a Serbia team coordinator.

Anis Miladi (Tunisia) 

Anis Miladi earned his Bachelor degree in computer sciences and multimedia in 2007 and a Master degree in Management of Information Systems and organizations in 2008 and he is currently finalizing his master degree in project management(projected date summer 2013). Before joining the Social and Economic Survey Research Institute at Qatar University as Survey Research technology specialist in 2009, he worked as a programmer analyst in a private IT services company In Tunisia. His Area of expertise includes managing computer assisted surveys CAPI,CATI(Blaise surveying system)  in addition to Enterprise Document Management Systems, Enterprise Portals (SharePoint).

Lejla Somun-Krupalija (Sarajevo) 

Lejla currently serves as the Senior Program and Research Officer at the Human Rights Centre of the University of Sarajevo. She has over 15 years of experience in research, policy development in social inclusion issues. She is the Project Coordinator of the SERSCIDA FP7 project that aims to open data services/archives in the Western Balkan region in cooperation with CESSDA members. She had been engaged in the NGO sector previously, particularly on issues of capacity building and policy development in the areas of gender equality, the rights of persons with disabilities and issues of social inclusion and forced migration. She teaches academic writing, qualitative research, and gender and nationalism at the University of Sarajevo. 

IASSISTers and librarians are doin' it for themselves

See video


Hey IASSISTers (gents, pardon for the video pun - couldnt' resist),

Are librarians at your institutions struggling to get up to speed with research data management (RDM)? If they're not, they probably should be. Library organisations are publishing reports and issuing recommendations left and right, such as the LIBER (Association of European Research Libraries) 2012 report, "Ten Recommendations for Libraries to Get Started with Research Data Management" (PDF). Just last week Nature published an article highlighting what the Great and the Good are doing in this area: Publishing Frontiers: The Library Reboot.

So the next question is, as a data professional, what are you doing to help the librarians at your institution get up to speed with RDM? Imagine (it isn't that hard for some of us) having gotten your Library masters degree sometime in the last century and now being told your job includes helping researchers manage their data? Librarians are sturdy souls, but that notion could be a bitter pill for someone who went into librarianship because of their love of books, right?

So you are a local expert who can help them. No doubt there will be plenty of opportunities for them to return the favour.

If you don't consider yourself a trainer, that's okay. Tell them about the Do-It-Yourself Research Data Management Training Kit for Librarians, from EDINA and Data Library, University of Edinburgh. They can train themselves in small groups, making use of reading assignments in MANTRA, reflective writing questions, group exercises from the UK Data Archive, and plenty of discussion time, to draw on their existing rich professional experience.

And then you can step in as a local expert to give one or more of the short talks to lead off the two hour training sessions in your choice of five RDM topics.Or if you're really keen, you can offer to be a facilitator for the training as a whole.Either way it's a great chance to build relationships across the institution, review your own knowledge, and raise your local visibility. If you're with me so far, read on for the promotional message about the training kit.

DIY Research Data Management Training Kit for Librarians

EDINA and Data Library, University of Edinburgh is pleased to announce the public release of the Do-It-Yourself Research Data Management Training Kit for Librarians, under a CC-BY licence:

 The training kit is designed to contain everything needed for librarians in small groups to get themselves up to speed on five key topics in research data management - with or without expert speakers.

 The kit is a package of materials used by the Data Library in facilitating RDM training with a small group of librarians at the University of Edinburgh over the winter of 2012-13. The aim was to reuse the MANTRA course developed by the Data Library for early career researchers in a blended learning approach for academic liaison librarians.

 The training comprises five 2-hour face-to-face sessions. These open with short talks followed by group exercises from the UK Data Archive and long discussions, in a private collegiate setting. Emphasis is placed on facilitation and individual learning rather than long lectures and passive listening. MANTRA modules are used as reading assignments and reflective writing questions are designed to help librarians 'put themselves in the shoes of the researcher'. Learning is reinforced and put into practice through an independent study assignment of completing and publishing an interview with a researcher using the Data Curation Profile framework developed by D2C2 at Purdue University Libraries.

 The kit includes:

 * Promotional slides for the RDM Training Kit

* Training schedule

* Research Data MANTRA online course by EDINA and Data Library, University of Edinburgh:

* Reflective writing questions

* Selected group exercises (with answers) from UK Data Archive, University of Essex - /Managing and sharing data: Training resources./ September, 2011 (PDF). Complete RDM Resources Training Pack available:

* Podcasts (narrated presentations) for short talks by the original Edinburgh speakers (including from the DCC) if running course without ‘live’ speakers.

* Presentation files - if learners decide to take turns presenting each topic.

* Evaluation forms

* Independent study assignment: Data Curation Profile, from D2C2, Purdue University Libraries. Resources available:

 As data librarians, we are aware of a great deal of curiosity and in some cases angst on the part of academic librarians regarding research data management. The training kit makes no assumptions about the role of librarians in supporting research data management, but aims to empower librarians to support each other in gaining confidence in this area of research support, whether or not they face the prospect of a new remit in their day to day job. It is aimed at practicing librarians who have much personal and professional experience to contribute to the learning experience of the group.

Become rich and famous: publish in the IQ!

These days many IASSIST members have received acceptance for their papers to the upcoming conference IASSIST 2013 in Cologne. There will be many interesting presentations at the conference. The conference presentation is your chance to present a project you are involved in, to air your argumentation for special areas, and in general to add to the IASSIST knowledge bank.

Projects are typically focused on support of social science research but the IASSIST related support now takes many forms with the developments of technology and applications. With your presentation at the conference you will have discussions and improvements of your work. After the conference you can in addition to the presentation at the conference reach a greater audience by publishing a revised paper in a coming issue of the IQ. Articles for the IASSIST Quarterly are always very welcome. They can be papers from IASSIST conferences or other conferences and workshops, from local presentations or papers especially written for the IQ.

If you are chairing a conference session you have the opportunity to become guest editor and to aggregate and integrate papers on a common subject for a special issue of the IQ.

Authors are very welcome to take a look at the instructions and article template on the IASSIST website. Authors and guest editors can also contact the editor via e-mail:

Karsten Boye Rasmussen     -    March 2013

IASSIST 2013 Fellows update

This year the IASSIST Fellows Committee received a grand total of 44 Fellows applications from a strong range of candidates from 28 countries around the globe: 
  • 18 Asia    

  • 13 Africa

  • 7 Europe

  • 3 North America

  • 2 Latin America

  • 1 Australia

Applications have been evaluated by the IASSIST Fellows Committee and offers have been made to a number of prospective Fellows to attend the annual conference in Cologne, Germany. We shall announce the names of those who have accepted the Fellows awards shortly.

We look forward to welcoming the new members at what will no doubt be the best IASSIST ever!

Best Wishes

Co-Chairs of the Fellows Committee

Some reflections on research data confidentiality, privacy, and curation by Limor Peer

Some reflections on research data confidentiality, privacy, and curation

Limor Peer

Maintaining research subjects’ confidentiality is an essential feature of the scientific research enterprise. It also presents special challenges to the data curation process. Does the effort to open access to research data complicate these challenges?

A few reasons why I think it does: More data are discoverable and could be used to re-identify previously de-identified datasets; systems are increasingly interoperable, potentially bridging what may have been insular academic data with other data and information sources; growing pressure to open data may weaken some of the safeguards previously put in place; and some data are inherently identifiable

But these challenges should not diminish the scientific community’s firm commitment to both principles. It is possible, and desirable, for openness and privacy co-exist. It will not be simple to do, and here’s what we need to keep in mind:

First, let’s be clear about semantics. Open data and public data are not the same thing. As Melanie Chernoff observed, “All open data is publicly available. But not all publicly available data is open.” This distinction is important because what our community means by open (standards, format) may not be what policy-makers and the public at large mean (public access). Chernoff rightly points out that “whether data should be made publicly available is where privacy concerns come into play. Once it has been determined that government data should be made public, then it should be done so in an open format.” So, yes, we want as much data as possible to be public, but we most definitely want data to be open.

Another term that could be clarified is usefulness. In the academic context, we often think of data re-use by other scholars, in the service of advancing science. But what if the individuals from whom the data were collected are the ones who want to make use of it? It’s entirely conceivable that the people formerly known as “research subjects” begin demanding access to, and control over, their own personal data as they become more accustomed to that in other contexts. This will require some fresh ideas about regulation and some rethinking of the concept of informed consent (see, for example, the work of John Wilbanks, NIH, and the National Cancer Institute on this front). The academic community is going to have to confront this issue.

Precisely because terms are confusing and often vaguely defined, we should use them carefully. It’s tempting to pit one term against the other, e.g., usefulness vs. privacy, but it may not be productive. The tension between privacy and openness or transparency does not mean that we have to choose one over the other. As Felix Wu says, “there is nothing inherently contradictory about hiding one piece of information while revealing another, so long as the information we want to hide is different from the information we want to disclose.” The complex reality is that we have to weigh them carefully and make context-based decisions.

I think the IASSIST community is in a position to lead on this front, as it is intimately familiar with issues of disclosure risk. Just last spring, the 2012 IASSIST conference included a panel on confidentiality, privacy and security. IASSIST has a special interest group on Human Subjects Review Committees and Privacy and Confidentiality in Research. Various IASSIST members have been involved with heroic efforts to create solutions (e.g., via the DDI Alliance, UKDA and ICPSR protocols) and educate about the issue (e.g., ICPSR webinar , ICPSR summer course, and MANTRA module). A recent panel at the International Data Curation Conference in Amsterdam showcased IASSIST members’ strategies for dealing with this issue (see my reflections about the panel).

It might be the case that STEM is leading the push for open data, but these disciplines are increasingly confronted with problems of re-identification, while the private sector is increasingly being scrutinized for its practices (see this on “data hops”). The social (and, of course, medical) sciences have a well-developed regulatory framework around the issue of research ethics that many of us have been steeped in. Government agencies have their own approaches and standards (see recent report by the U.S. Government Accountability office). IASSIST can provide a bridge; we have the opportunity to help define the conversation and offer some solutions.

Now Accepting Proposals for IASSIST 2013

IASSIST 2013 will be hosted by GESIS – Leibniz Institute for the Social Sciences at Maternushaus in Cologne, Germany from May 28-31.

The Conference Website can be accessed here:

As announced previously, the theme of this year’s conference is Data Innovation: Increasing Accessibility, Visibility and Sustainability

This theme reflects recent efforts across the globe by the largest government agencies down to the smaller independent research units to make data (be it survey, administrative, geospatial, or scientific) more open, accessible and understandable for all.

With an ever-increasing availability of new technologies offering unparalleled opportunities to sustainably deliver, share, model and visualize data, we anticipate that there is much to share with and much to learn from one another.  Interdisciplinarity is a large part of where innovation comes from, and we hope to receive submissions from those in the social sciences, humanities, sciences, and computer science fields.

We welcome submissions on the theme outlined above, and encourage conference participants to propose papers and sessions that would be of interest to a diverse audience. In order to make session formation and scheduling more streamlined, we have created three distinct tracks.  If you are not sure where your submission fits, or feel that it fits into more than one track, that’s perfectly fine. Please do still make your submission, and if accepted, we will find an appropriate fit.

Online submission forms and guidelines for BOTH conference content and workshops are be found here:

NOTE: The top of the page is for sessions/papers/posters/round tables/pecha kuchas the bottom is for workshops – please note that the submission forms are completely separate.

All submissions are due by December 5, 2012.  Notification of acceptance will be made by February 5, 2012

Questions about session/paper submissions may be sent to
Questions about workshop submission may be sent to the Workshop Coordinator, Lynda Kellam at

Data-related blog posts coming out of Open Repositories 2012 conference

I 'd been meaning to write an IASSIST blog post about OR 2012, hosted by the University of Edinburgh's Host Organising Committee led by Co-Chair and IASSISTer Stuart Macdonald in July, because it had such good DATA content.

Fortunately Simon Hodson, the UK's JISC Managing Research Data Programme Manager, has provided this introduction and has allowed me to post it here, with further links to his analytic blog posts, and even those contain further links to OTHER blog posts talking about OR2012 and data!

There are also more relevant pointers from the OR 2012 home page here:

I think there's enough here to easily keep people going until next year's conference in Prince Edward Island next July. Oh, and Peter Burnhill, Past President IASSIST, made a good plug for IASSIST in his closing keynote, pointing it out to repository professionals as a source of expertise and community for would-be data professionals.

Enjoy! - Robin Rice, University of Edinburgh


It has been widely remarked that OR 2012 saw the arrival of research data in the repository world.  Using a wordle of #or2012 tweets in his closing summary, Peter Burnhill noted that ‘Data is the big arrival. There is a sense in which data is now mainstream.’  (See Peter’s summary on the OR2012 You Tube Channel:

I have written a series of blog posts reflecting on the contributions made by *some* those working on research data repositories, and particularly the development of research data services

These posts may be of interest to subscribers to this list and are listed below.

Institutional Data Repositories and the Curation Hierarchy: reflections on the DCC-ICPSR workshop at OR2012 and the Royal Society’s Science as an Open Enterprise report

‘Data is now Mainstream’: Research Data Projects at OR2012 (Part 1…)

Pulling it all Together: Research Data Projects at OR2012 (Part 2…)

Making the most of institutional data assets: Research Data Projects at OR2012 (Part 3…)

Manage locally, discover (inter-)nationally: research data management lessons from Australia at OR2012

Simon Hodson [reposted with permission]

  • IASSIST Quarterly

    Publications Special issue: A pioneer data librarian
    Welcome to the special volume of the IASSIST Quarterly (IQ (37):1-4, 2013). This special issue started as exchange of ideas between Libbie Stephenson and Margaret Adams to collect


  • Resources


    A space for IASSIST members to share professional resources useful to them in their daily work. Also the IASSIST Jobs Repository for an archive of data-related position descriptions. more...

  • community

    • LinkedIn
    • Facebook
    • Twitter

    Find out what IASSISTers are doing in the field and explore other avenues of presentation, communication and discussion via social networking and related online social spaces. more...