Already a member?

Sign In
Syndicate content

Social Science Datasets

IASSIST 2016 Program At-A-Glance, Part 2: Data infrastructure, data processing and research data management

 

Here's another list of highlights from IASSIST2016 which is focusing on the data revolution. For previous highlights, see here.

Infrastructure

  • For those of you with an interest in technical infrastructure, the University of Applied Sciences HTW Chur will showcase an early protype MMRepo (1 June, 3F), whose function is to store qualitative and quantitative data into one big data repository.
  • The UK Data Service will present the following panel "The CESSDA Technical Framework - what is it and why is it needed?", which elaborates how the CESSDA Research Infrastructure should have modern data curation techniques rooted in sophisticated IT capabilities at its core, in order to better serve its community.

  • If you have been wondering about the various operational components and the associated technology counterparts involved with running a data science repository, then the presentation by ICPSR is for you. Participants in that panel will leave with an understanding of how the Archonnex Architecture at ICPSR is strengthening the data services offered to new researchers and much more.

Data processing

Be sure to check out the aforementioned infrastructure offerings if you’re interested in data processing, but also check out a half-day workshop on 31 May, “Text Processing with Regular Expressions,” presented by Harrison Dekker, UC Berkeley, that will help you learn regular expression syntax and how to use it in R, Python, and on the command line. The workshop will be example-driven.

Data visualisation

If you are comfortable working with quantitative data and are familiar with the R tool for statistical computing and want to learn how to create a variety of visualisations, then the workshop by the University of Minnesota on 31 May is for you. It will introduce the logic behind ggplot2 and give participants hands-on experience creating data visualizations with this package. This session will also introduce participants to related tools for creating interactive graphics from this syntax.

Programming

  • If you’re interesting in programming there’s a full-day Intro to Python for Data Wrangling workshop on 31 May, led by Tim Dennis, UC San Diego,  that will provide tools to use scientific notebooks in the cloud, write basic Python programs, integrate disparate csv files and more.

  • Also, the aforementioned Regular Expressions workshop also on 31 May will offer  in-workshop opportunities  to working with real data and perform representative data cleaning and validation operations in multiple languages.

Research data management

  • Get a behind-the-scenes look at data management and see how an organization such as the Odum Institute manages its archiving workflows, head to “Automating Archive Policy Enforcement using Dataverse and iRODS” on 31 May with presenters from the UNC Odom Institute, UNC Chapel Hill. ’Participants will see machine actionable rules in practice and be introduced to an environment where written policies can be expressed in ways an archive can automate their enforcement.

  • Another good half-day workshop, targeted to for people tasked with teaching good research data management practices to researchers is  “Teaching Research Data Management Skills Using Resources and Scenarios Based on Real Data,” 31 May, with presenters from ICPSR, the UK Data Archive and FORS. The organisers of this workshop will showcase recent examples of how they have developed teaching resources for hands-on-training, and will talk about successes and failures in this regard.

Tools

If you’re just looking to add more resources to your data revolution toolbox, whether it’s metadata, teaching, data management, open and restricted access, or documentation, here’s a quick list of highlights:

  • At Creating GeoBlacklight Metadata: Leveraging Open Source Tools to Facilitate Metadata Genesis (31 May), presenters from New York University will provide hands-on experience in creating GeoBlacklight geospatial metadata, including demos on how to capture, export, and store GeoBlacklight metadata.

  • DDI Tools Demo (1 June). The Data Documentation Initiative (DDI) is an international standard for describing statistical and social science data.

  • DDI tools: No Tools, No Standard (3 June), where participants will be introduced to the work of the DDI Developers Community and get an overview of tools available from the community.

Open-access

As mandates for better accessibility of data affects more researchers, dive into the Conversation with these IASSIST offerings:

Metadata

Don’s miss IASSIST 2016’s offerings on metadata, which is the data about the data that makes finding and working with data easier to do. There are many offerings, with a quick list of highlights below:

  • Creating GeoBlacklight Metadata: Leveraging Open Source Tools to Facilitate Metadata Genesis (Half-day workshop, 31 May), with presenters from New York University

  • At Posters and Snacks on 2 June, Building A Metadata Portfolio For Cessda, with presenters from the Finnish Social Science Data Archive; GESIS – Leibniz-Institute for the Social Sciences; and UK Data Service

Spread the word on Twitter using #IASSIST16. 


A story by Dory Knight-Ingram (
ICPSR)

Interested in the “data revolution” and what it means for research? Here’s why you should attend IASSIST2016

 

Part 1: Data sharing, new data sources and data protection

IASSIST is an international organisation of information technology and data services professionals which aims to provide support to research and teaching in the social sciences. It has over 300 members ranging from data archive staff and librarians to statistical agencies, government departments and non-profit organisations.

The theme of this year’s conference is Embracing the ‘data revolution’: opportunities and challenges for research” and it is the 42nd of its kind, taking place every year. IASSIST2016 will take place in Bergen, Norway, from 31 May to 3 June, hosted by NSD - Norwegian Centre for Research Data.

Here is a first snapshot of what is there and why it is important.

Data sharing

If you have ever wondered whether data sharing is to the advantage of researchers, there will be a session led by Utrecht University Library exploring the matter. The first results of a survey which explores personal beliefs, intention and behaviour regarding the sharing of data will also be presented by GESIS. The relationship between data sharing and data citation, relatively overlooked until now, will then be addressed by the Australian Data Archive.

If you are interested in how a data journal could incentivise replications in economics, you should think about attending a session by ZBW Leibniz Information Centre for Economics which will present some studies describing the outcome of replication attempts and discuss the meaning of failed replications in economics.

GESIS will then look into improving research data sharing by addressing different scholarly target groups such as individual researchers, academic institutions, or scientific journals, all of which place diverse demands on a data sharing tool. They will focus on the tools offered by GESIS as well as a joint tool, “SowiDataNet”, offered together with the Social Science Centre Berlin, the German Institute for Economic Research, and the German National Library of Economic.

The UKDA and UKDS will present a paper which seeks to explore the role that case studies of research can play in regard to effective data sharing, reuse and impact.

The Data Archive in Finland (FSD) will also be presented as a case study of an archive that is broadening its services to the health sciences and humanities, disciplines in which data sharing practices have not yet been established.

If you’d like to know more about data accessibility, which is being required by journals and mandated by government funders, join a diverse group of open data experts as IASSIST dives into open data dialogue that includes presentations on Open Data and Citizen Empowerment and 101 Cool Things to do with Open Data as part of the “Opening up on open data workshop.” Presenters will be from archives from across the globe.

New data sources

A talk entitled “Data science: The future of social science?” by UKDA will introduce its conceptual and technical work in developing a big data platform for social science and outline preliminary findings from work using energy data.

If you have been wondering about the role of social media data in the academic environment, the session by the University of California will include an overview of the social media data landscape and the Crimson Hexagon product.

The three Vs of big data, volume, variety and velocity, are being explored in the “Hybrid Data Lake” being built by UKDA using the Universal Decimal Classification platform and expanding “topics” search while using big data management. Find out more about it as well as possible future applications.

Data protection

If you follow data protection issues, the panel on “Data protection: legal and ethical reviews” is for you, starting off with a presentation of the Administrative Data Research Network's (ADRN) Citizen's Panel, which look at public concerns about research using administrative data, the content of which is both personal and confidential. The ADRN was set up as part of the UK Government’s Big Data initiative as a UK-wide partnership between universities, government bodies, national statistics authorities and the wider research community.

The next ADRN presentation within this session will outline their application process and the role of the Approvals Panel in relation to ethical review. The aim is “to expand the discussion towards a broader reflection on the ethical dilemmas that administrative data pose”, as well as present some steps taken to address these difficulties.

NSD will then present the new EU General Data Protection Regulation (GDPR), recently adopted at EU level, and explain how it will affect data collection, data use, data preservation and data sharing. If you have been wondering how the regulation will influence the possibilities for processing personal data for research purposes, or how personal data are defined, what conditions apply to an informed consent, or in which cases it is legal and ethical to conduct research without the consent of the data subjects, this presentation is for you.

The big picture

Wednesday 1 June will kick-off with a plenary entitled “Data for decision-makers: Old practice - new challenges” by Gudmund Hernes, the current president of the International Social Science Council and Norway’s former Minister of Education and Research 1990-95, and Minister of Health 1995-97.

The third day of the conference (2 June) will begin with a plenary - “Embracing the ‘Data Revolution’: Opportunities and Challenges for Research’ or ‘What you need to know about the data landscape to keep up to date”, by Matthew Woollard, Director of the UK Data Archive at the University of Essex and Director of the UK Data Service.

If you want to know more about the three European projects under the framework of the Horizon 2020 programme of the European Commission that CESSDA is involved in, one on big data (Big Data Europe - Empowering Communities with Data Technologies), another on - strengthening and widening the European infrastructure for social science data archives (CESSDA SaW) and a third on synergies for Europe's Research Infrastructures in the Social Sciences (SERISS), this panel is for you.  

"Don't Hate the Player, Hate the Game": Strategies for Discussing and Communicating Data Services” considers how libraries might strategically reconsider communications about data services.

Keep an eye on this blog for more news in the run-up to IASSIST2016.

Find out more on the IASSIST2016 website.

Spread the word on Twitter using #IASSIST16.

We are looking forward to seeing you in Bergen! 


A story by Eleanor Smith (CESSDA)

Finding Historical Economic Data through FRASER and ALFRED

The North Carolina Library Association's Government Resources Section had an excellent webinar yesterday on finding historical (or vintage) economic data using FRASER and ALFRED.  The recording and slides are available to everyone. Enjoy!

IASSIST Fellows 2013

 

The IASSIST Fellows Committee is glad to announce through this post the six recipients of the 2013 IASSIST Fellowship award. We are extremely excited to have such a diverse and interesting group with different backgrounds and experience and encourage IASSISTers to welcome them at our conference in Cologne, Germany.

Please find below their names, countries and brief bios:

Chifundo Kanjala (Tanzania) 

Chifundo currently works as a Data Manager and data documentalist for an HIV research group called ALPHA network based at London School of Hygiene and Tropical Medicine's department of Population Health, Chifundo spends most of his time in Mwanza, Tanzania but do travel from time around Southern and Eastern Africa to work with colleagues in the ALPHA network.Before joining the London School of Hygiene and Tropical Medicine, he was working as a Data analyst consultant at Unicef, Zimbabwe.Currently working part time on a PhD with London school of Hygiene and Tropical Medicine. He has an MPhil in Demography from university of Cape Town, South Africa and a BSc Statistics Honours degree from University of Zimbabwe.


Judit Gárdos (Hungary) 

Judit Gárdos studied Sociology and German Language and Literature in Budapest, Vienna and Berlin. She is PhD-candidate in sociology, with a topic on the philosophy, sociology and anthropology of quantitative sociology. She is young researcher at the Institute of Sociology of the Hungarian Academy of Sciences. Judit has been working at the digital archive and research group called "voicesofthe20century.hu" that is collecting qualitative, interview-based sociological research collections of the last 50 years. She is coordinating the work at the newly-funded Research Documentation Center of the Center for Social Sciences at the Hungarian Academy of Sciences.


Cristina Ribeiro (Portugal) 

Cristina Ribeiro is an Assistant Professor in Informatics Engineering at Universidade do Porto and a researcher at INESC TEC. She has graduated in Electrical Engineering, holds a Master in Electrical and Computer Engineering and a Ph.D. in Informatics. Her teaching includes undergraduate and graduate courses in information retrieval, digital libraries, knowledge representation and markup languages. She has been involved in research projects in the areas of cultural heritage, multimedia databases and information retrieval. Currently her main research interests are information retrieval, digital preservation and the management of research data.


Aleksandra Bradić-Martinović (Serbia) 

Aleksandra Bradić-Martinović, PhD is the Research Fellow at the Institute of Economic Sciences, Belgrade, Serbia. Her field of expertize is research of information and communication technology implementation in economy, especially in banking, payment system operations and stock exchange operations. Aleksandra is also engaged in education process in Belgrade Banking Academy at the following subjects: E-banking and Payment Systems, Stock Market Dealings and Management Information Systems. She was engaged at several projects in the field of education. At the FP7 SERSCIDA project she is a Serbia team coordinator.


Anis Miladi (Tunisia) 

Anis Miladi earned his Bachelor degree in computer sciences and multimedia in 2007 and a Master degree in Management of Information Systems and organizations in 2008 and he is currently finalizing his master degree in project management(projected date summer 2013). Before joining the Social and Economic Survey Research Institute at Qatar University as Survey Research technology specialist in 2009, he worked as a programmer analyst in a private IT services company In Tunisia. His Area of expertise includes managing computer assisted surveys CAPI,CATI(Blaise surveying system)  in addition to Enterprise Document Management Systems, Enterprise Portals (SharePoint).


Lejla Somun-Krupalija (Sarajevo) 

Lejla currently serves as the Senior Program and Research Officer at the Human Rights Centre of the University of Sarajevo. She has over 15 years of experience in research, policy development in social inclusion issues. She is the Project Coordinator of the SERSCIDA FP7 project that aims to open data services/archives in the Western Balkan region in cooperation with CESSDA members. She had been engaged in the NGO sector previously, particularly on issues of capacity building and policy development in the areas of gender equality, the rights of persons with disabilities and issues of social inclusion and forced migration. She teaches academic writing, qualitative research, and gender and nationalism at the University of Sarajevo. 

Update from COSSA: Changes to the Common Rule: The Implications for the Social and Behavioral Sciences

This is from the COSSA Newsletter (Consortium of Social Science Associations). March 25, 2013 Volume 32, Issue 6 Regarding a workshop on proposed changes to the Common Rule.  Readers of these blog entries will recall that these proposed changes would require that data identitified in social science research would be required to meet HIPPA standards; potentially rendering many public datasets unuseful for research purposes.

A link to the webcast is here: http://sites.nationalacademies.org/DBASSE/BBCSS/CurrentProjects/DBASSE_080452#Workshop  George Alter spoke on the panel on Data Security and Sharing.

Here is a summary of the COSSA report:

On March 21 and 22, the National Academies' Board on Behavioral, Cognitive, and Sensory Sciences (BBCSS) held a workshop on the "Proposed Revisions to the Common Rule in Relation to the Behavioral and Social Sciences." In 2011, the Department of Health and Human Services proposed changes to the Common Rule, the regulations governing the protection of human subjects in research, in an Advanced Notice of Proposed Rulemaking (ANPRM). (For more information, see Update, January 28, 2013 and click here for a response to the ANPRM from the social and behavioral science community.) Several COSSA member organizations helped sponsor the workshop. More information about the workshop, including presenters' slides and an archived webcast, is available here. BBCSS will publish a summary report of the workshop. According to Robert Hauser, Executive Director of the Division of Behavioral and Social Sciences and Education (DBASSE), the Academies expect to convene a panel a panel that will produce a consensus report with conclusions and recommendations.

 

The workshop's opening session reviewed existing knowledge and evidence about the functioning of the Common Rule and Institutional Review Boards (IRBs). Connie Citro, Director of the Committee on National Statistics at the National Academies, gave an overview of the many National Academies' reports on human subjects protection published since 1979 and summarized the lessons learned. She pointed to four major takeaways from the existing literature. First, one-size-fits-all approaches often have unanticipated negative consequences. Second, there is no need to reinvent the wheel regarding human subjects' protection. Third, a balance needs to be struck between leaving subjects vulnerable and handicapping researchers. Finally, the social and behavioral sciences (SBS) are often not given the same consideration as the biomedical sciences in writing regulations and thus need to be constantly vigilant to make sure that new rules are appropriate for a SBS context.

 

Noting that there is a relatively small evidence base on the efficacy of the Common Rule and IRBs, Jeffrey Rodamar, Department of Education, reviewed some of the existing data. He found that despite popular perception, IRBs function pretty well. They are generally no more of an administrative burden than other grant-related activities; on average, review takes less than three percent of a study's time; a majority of studies are approved; expedited review takes less than a month on average and full review takes less than two months; and extreme delays are statistically uncommon. Rodamar described data showing that both SBS and biomedical researchers generally approve of the IRB system. He conceded that there are some problems with the Common Rule regulations and IRBs, but, paraphrasing Winston Churchill, suggested that perhaps "IRBs are worst form of governing research except for all those other forms that have been tried from time to time."

 

The "Minimal Risk" Standard

 

The second session, moderated by Celia Fischer, Fordham University, focused on the types of "risks and harms" encountered in SBS research. Richard T. Campbell delved into the concept of "minimal risk," an important area for researchers dealing with human subjects. The determination of whether participation in a study represents a "minimal risk" dictates the level of IRB review that takes place. Under the Common Rule, a study represents minimal risk if "the probability and magnitude of harm or discomfort anticipated in the research are not greater in and of themselves than those ordinarily encountered in daily life or during the performance of routine physical or psychological examinations or tests." Noting that it is a "cognitively complex" concept, Campbell suggested that risk can be thought of as the relationship between the probability of harm occurring and the severity of potential harm. Thus, the Common Rule provides some flexibility in that it does not dictate that both probability severity must be "minimal," just that, as probability increases, the severity of possible harm must decrease (and vice versa). Given that other parts of the definition are also thorny (such as what is meant by "daily life"), Campbell suggested that the Office of Human Research Protection (OHRP) could provide guidance to facilitate more consistent application of the minimal risk benchmark.

 

Brian Mustanski, Northwestern University Feinberg School of Medicine, spoke about his research on risky and sensitive behavior (such as drug use, sexual behavior, and HIV) in youth, which are topics that often make IRBs skittish. He conducted a study that was reviewed by two IRBs. One board approved it immediately, while the other delayed the study for six months because it was felt to be a "slight increase" over minimal risk. However, when Mustanski surveyed his subjects, a large majority felt that their participation was less uncomfortable than a routine medical exam (the minimal risk standard). Mustanski argued that such institutional reluctance to approve research into controversial or sensitive subjects as minimal risk can have a chilling effect, leading to a poor evidence base for interventions with already underserved populations, which is indeed the case regarding HIV prevention in LGBT youth.

 

Steve Breckler, American Psychological Association and COSSA Board Member, discussed the concept of risk in the SBS context. He reminded the audience that the broad goal of assessing risk is to calibrate the level of review to the level of risk a study poses to participants, in other words, to protect subjects and reduce unnecessary regulatory burden. He argued that the social science community should put greater focus on producing evidence to determine how well regulations are working and that having better guidance and tools for assessing risk would facilitate the work of IRBs.

 

Charles Plott, California Institute of Technology and a former COSSA Board member, posed the question of whether the entirety of the research endeavors for some fields, like economics, political science, game theory and decision science, could be said to be wholly without risk. In a survey of economics, political science, and judgment and decision making associations, Plott found very low numbers of adverse incidents and reports of harm, all of which were low-magnitude events (such as feelings of stress or frustration). He argued that some research topics-- markets, committees and voting, games, processes, and decisions-- and some research methods-- questionnaires, computer games, etc.-- can be said to pose no potential harm to subjects and should thus be exempted from consideration under the Common Rule.

 

Informed Consent and Special Populations

 

A session on the consent process and special populations was moderated by Margaret Foster Riley, University of Virginia. Sally Powers, University of Massachusetts, Amherst, discussed how consent operates in her research on depression, which collects "rich" behavioral and biospecimen data (which can be recoded and analyzed as part of future analysis). The proposed changes to the Common Rule would require that prior consent is obtained for re-analysis of biospecimens, but that consent should be given for open-ended use of specimens. However, the changes do not address rich behavioral data; Powers argued that the same standards should be applied.

 

Roxane Cohen Silver explained how she conducts research on victims of disasters and traumatic experiences (like natural disasters, infant death, and mass shootings) shortly after such events occur. Silver argued that such research can be conducted ethically and sensitively, if participants are given multiple opportunities to opt out, are allowed to refuse to answer questions and researchers and staff are well-trained. Noting that this type of research is most valuable if it is commenced immediately after a traumatic event, Silver described her arrangement with her IRB, which pre-approved a generic post-disaster proposal. In the aftermath of a traumatic event, Silver provides the IRB with specific information and can get full approval within 48 hours.

 

Celia Fischer, Fordham University, spoke about some of the issues involving obtaining informed consent from children. She argued that simplifying consent forms, as proposed by the ANPRM, would be useful. However, relying on standardized forms can be problematic for certain types of research and subjects of different ages, language skills, and educational backgrounds. Fischer observed that verbal consent can be a better form in certain contexts. She also noted that emancipated minors are often not treated as full adults by IRBs, despite being adults under the law. Fischer pointed out the issue of re-obtaining consent from adults, for whom parental consent had been granted when they were minors.

 

Data Security and Sharing

 

David Weir, University of Michigan Survey Research Center, moderated a panel on "Data Use and Sharing and Technological Advancement." The proposals in the ANPRM would mandate that all studies that collect identifiable or potentially identifiable data to have data security plans. George Alter, University of Michigan Interuniversity Consortium for Political and Social Research (ICPSR), which archives and protects social science data, spoke about some of the ways data can be kept secure. Informational risk can be reduced by improving study design (implementing certain sampling procedures, using multiple sites), having protection plans in place, using data repositories and archives, and training. ICPSR restricts data based on the degree of risk of disclosure and severity of harm from that disclosure, from publically releasing data online to requiring researchers work with data in physical data enclaves.

 

Taylor Martin, University of Utah, spoke about the data security implications of her research into math learning, which collects rich data from children playing online educational games. This type of research shows promise in terms of providing new information about how different kinds of children learn and how we can teach them better. However, concerns about data security can have a chilling effect on data sharing and reuse among researchers. Martin observed that for-profit companies are collecting data and doing the same kind of research without having to go through the same hurdles as researchers.

 

Susan Bouregy, Yale University Human Research Protection Program, raised concerns about the ANPRM's proposal to apply HIPAA standards for deidentification of data (requiring removal of 18 specific identifiers). Bouregy noted such standards may make some data sets unusable while ignoring other ways individuals could be identified. She also argued that some of the mandated HIPAA security elements are not appropriate for certain types of social science research. Furthermore, it ignored that not all identified data is risky. Finally, Boregy suggested that the ANPRM's requirement that all suspected data breaches be reported should be made more flexible and allow IRBs to tailor reporting to the context of each situation.

 

Multi-Disciplinary and Multi-Site Studies

 

Robert Levine, Yale, University, moderated a session focused on multi-disciplinary and multi-site studies. Pearl O'Rourke, Partners Health Care System, discussed the requirement that multi-site studies use a single IRB of record. She noted that having a central IRB does not absolve the individual institutions of fulfilling a number of responsibilities in overseeing and approving research. O'Rourke was concerned that mandating a central IRB would not address the complexity of each situation. Furthermore, the requirement underestimates the costs and time involved in running a central IRB.

 

Laura Stark, Vanderbilt University Center for Medicine Health and Society, gave an ethnographic perspective on IRB decision-making. As an explanation for why IRBs reach different conclusions regarding the risk level of similar research, Stark suggested the concept of "local precedents," or allowing past decisions to govern the evaluation of subsequent research. Such precedents may lead to faster decisions and internal consistency, but they can be problematic for researchers working with multiple IRBs. Stark offered three strategies to work around local precedents: 1) study networks (having a central IRB for multiple sites), 2) collegial review (allowing departmental experts to review research), and 3) decision repositories (online archives of approved protocols from many IRBs).

 

Thomas Coates, University of California, Los Angeles Program in Global Health, shared his experience with multinational studies (which are not addressed by the ANPRM). Some concerns he encountered included whether requiring other countries to adhere to U.S. requirements could be considered paternalistic, how to evaluate minimum risk in different cultural and economic contexts, and how to harmonize U.S., international, and local regulations. Coates also stressed the importance of receiving approval from local bodies in addition to U.S.-based IRBs.

 

The Scope of Institutional Review Boards

 

A final session, moderated by Yonette Thomas, Howard University and a COSSA board member, focused on the "Purview and Roles of IRBs." Lois Brako, University of Michigan, discussed the ANPRM's proposed changes from the perspective of an IRB that has made strides to become more innovative and flexible. Brako praised the ANPRM's proposals to reduce the oversight burden for minimal risk studies, eliminate annual review, and harmonize federal regulations (so long as the harmonization does not take the form of a unilateral one-size-fits-all approach). However, she argued that some of the proposals are unnecessarily burdensome, including requiring all institutions that receive Common Rule funding to be subject to federal oversight, some of the information security provisions, requiring reports of all adverse events to be submitted and stored in a central database, and expanding "human subjects" to include deidentified biospeicimens. Brako also suggested that in some cases, clearer guidance from OHRP would be more helpful than changed regulations.

 

Rena Lederman, Princeton University, observed that the Common Rule regulations were written from a biomedical perspective and are particularly unsuited for certain types of SBS research, such as anthropological fieldwork. Anthropologists establish thick relationships with their subjects, immerse themselves in other cultures, and do not test hypotheses or run controlled experiments. The ANPRM's requirements for informational security could cripple anthropological research (anthropologists' detailed fields notes would treated as data with informational risks under the new rules, raising the question of how such notes could be deidentified). Rather than trying to adapt the Common Rule to fit SBS research, Lederman proposed that it be only applied to biomedical research. She proposed the creation of a National Commission to develop an alternative guidance and framework to address SBS research risks.

 

Cheryl Crawford Watson, National Institute of Justice (NIJ), discussed the Department of Justice's (DOJ) approach to confidentiality and how it differs from other regulations regarding human subjects protection. Researchers funded by DOJ must submit a Privacy Certificate, which protects researchers and data from subpoena. It also prevents the researcher from violating subjects' privacy for any reason other than future criminal conduct. The DOJ privacy certificate differs from the certificate of confidentiality mandated by other agencies (like Health and Human Services) in that it prohibits researchers from reporting child abuse, reportable communicable diseases, and threatened harm to self or others. In order to be allowed to report such abuse, researchers must get the subjects to sign a separate consent-to-report form. The certificate is so strict due to concerns that few of the subjects under DOJ's purview would consent to participate in research otherwise.



Some reflections on research data confidentiality, privacy, and curation by Limor Peer

Some reflections on research data confidentiality, privacy, and curation

Limor Peer

Maintaining research subjects’ confidentiality is an essential feature of the scientific research enterprise. It also presents special challenges to the data curation process. Does the effort to open access to research data complicate these challenges?

A few reasons why I think it does: More data are discoverable and could be used to re-identify previously de-identified datasets; systems are increasingly interoperable, potentially bridging what may have been insular academic data with other data and information sources; growing pressure to open data may weaken some of the safeguards previously put in place; and some data are inherently identifiable

But these challenges should not diminish the scientific community’s firm commitment to both principles. It is possible, and desirable, for openness and privacy co-exist. It will not be simple to do, and here’s what we need to keep in mind:

First, let’s be clear about semantics. Open data and public data are not the same thing. As Melanie Chernoff observed, “All open data is publicly available. But not all publicly available data is open.” This distinction is important because what our community means by open (standards, format) may not be what policy-makers and the public at large mean (public access). Chernoff rightly points out that “whether data should be made publicly available is where privacy concerns come into play. Once it has been determined that government data should be made public, then it should be done so in an open format.” So, yes, we want as much data as possible to be public, but we most definitely want data to be open.

Another term that could be clarified is usefulness. In the academic context, we often think of data re-use by other scholars, in the service of advancing science. But what if the individuals from whom the data were collected are the ones who want to make use of it? It’s entirely conceivable that the people formerly known as “research subjects” begin demanding access to, and control over, their own personal data as they become more accustomed to that in other contexts. This will require some fresh ideas about regulation and some rethinking of the concept of informed consent (see, for example, the work of John Wilbanks, NIH, and the National Cancer Institute on this front). The academic community is going to have to confront this issue.

Precisely because terms are confusing and often vaguely defined, we should use them carefully. It’s tempting to pit one term against the other, e.g., usefulness vs. privacy, but it may not be productive. The tension between privacy and openness or transparency does not mean that we have to choose one over the other. As Felix Wu says, “there is nothing inherently contradictory about hiding one piece of information while revealing another, so long as the information we want to hide is different from the information we want to disclose.” The complex reality is that we have to weigh them carefully and make context-based decisions.

I think the IASSIST community is in a position to lead on this front, as it is intimately familiar with issues of disclosure risk. Just last spring, the 2012 IASSIST conference included a panel on confidentiality, privacy and security. IASSIST has a special interest group on Human Subjects Review Committees and Privacy and Confidentiality in Research. Various IASSIST members have been involved with heroic efforts to create solutions (e.g., via the DDI Alliance, UKDA and ICPSR protocols) and educate about the issue (e.g., ICPSR webinar , ICPSR summer course, and MANTRA module). A recent panel at the International Data Curation Conference in Amsterdam showcased IASSIST members’ strategies for dealing with this issue (see my reflections about the panel).

It might be the case that STEM is leading the push for open data, but these disciplines are increasingly confronted with problems of re-identification, while the private sector is increasingly being scrutinized for its practices (see this on “data hops”). The social (and, of course, medical) sciences have a well-developed regulatory framework around the issue of research ethics that many of us have been steeped in. Government agencies have their own approaches and standards (see recent report by the U.S. Government Accountability office). IASSIST can provide a bridge; we have the opportunity to help define the conversation and offer some solutions.

ANES Announcement: : Deadlines for the ANES 2010-2012 EGSS Online Commons Proposals

The American National Election Studies are continuing to accept proposals for the ANES 2010-2012 Evaluations of Government and Society Study. The deadline to submit proposals for EGSS 4 is 3:00p.m. EDT, August 30, 2011. The deadline for members of the Online Commons community to comment on proposals is September 8, 2011. The deadline for revisions to proposals is at 3:00p.m. EDT on September 14, 2011. For additional information about how to submit a proposal, please visit: http://www.electionstudies.org/

Proposals may be submitted through the ANES Online Commons. The following describes the goals of this study and proposal process.

About The 2010-2012 Evaluations of Government and Society Study

The overarching theme of the surveys is citizen attitudes about government and society. These Internet surveys represent the most cost-effective way for the ANES user community to gauge political perceptions during one of the most momentous periods in American history. Aside from the historic nature of the current administration and the almost unprecedented economic crisis facing the country, we believe it is imperative that researchers assess attitudes about politics and society in the period leading up to the 2012 national elections. Potential topics include: attitudes about the performance of the Obama administration on the major issues of the day, evaluations of Congress and the Supreme Court, identification with and attitudes about the major political parties, and levels of interest in and engagement with national politics. This is primarily because these perceptions are unmistakably correlated with both presidential vote choice and levels of political participation. We intend to measure each of these topics at multiple points throughout the two-year period preceding the

2012 elections. In addition to these subjects, we envision that each of these surveys would explore a particular aspect of these political perceptions.

This Study includes five rolling cross-section surveys that will allow us the opportunity to pilot new items for possible inclusion on the 2012 time series. Proposals for the first three surveys of the study were accepted earlier this year. The first survey of the study was conducted in October 2010; the second survey was conducted in the Spring of 2011. The third survey will be in the field later this year. We are currently accepting proposals for the final two surveys of the study. The fourth survey will be conducted in early 2012 and the final survey will be in the field in the middle of 2012. For the timelines and deadlines for the remaining surveys, please see http://electionstudies.org/studypages/2010_2012EGSS/2010_2012EGSScalendar.htm

By offering multiple opportunities for the user community to place their items on one or more surveys, we are providing the capacity to survey on a diverse set of topics that are relevant to a wide set of research communities. Lastly, the flexibility of these surveys as to both content and timing will allow the ANES to respond promptly to emerging political issues in this volatile period in our country's history.

About the Online Commons

The design of the questionnaires for The 2010-2012 Evaluations of Government and Society Study will evolve from proposals and comments submitted to the Online Commons (OC). The OC is an online system designed to promote communication among scholars and to yield innovative proposals about the most effective ways to measure electorally-relevant concepts and relationships. The goal of the OC is to improve the quality and scientific value of ANES data collections, to encourage the submission of new ideas, and to make such experiences more beneficial to and enjoyable for investigators. In the last study cycle, more than 700 scholars sent over 200 proposals through the Online Commons.

Proposals for the inclusion of questions must include clear theoretical and empirical rationales. All proposals must also clearly state how the questions will increase the value of the respective studies. In particular, proposed questions must have the potential to help scholars understand the causes and/or consequences of turnout or candidate choice.

For more information about the criteria that will be used to evaluate proposals, please see http://www.electionstudies.org/studypages/2010_2012EGSS/2010_2012EGSScriteria.htm

For additional information on how to submit a proposal, please see http://www.electionstudies.org/onlinecommons/proposalsubmit.htm

ANES Announcement: The ANES 2012 Time Series Study

On June 30, 2011, the American National Election Studies (ANES) began accepting proposals for questions to include on the ANES 2012 Time Series Study.  Proposals may be submitted through the ANES Online Commons. The following describes the goals of this study and the opportunity to include questions on it.

About The ANES 2012 Time Series Study

The ANES’s core mission is to promote cutting-edge and broadly-collaborative research on American national elections. The heart of the ANES is its presidential year time series surveys. The time series legacy is well known, serving as a model for election studies around the world and having generated thousands of publications. Every four years, a large representative sample of American adults has been interviewed on two occasions, first between Labor Day and Election Day, and again between Election Day and the onset of the winter holidays. The two face-to-face interviews will last approximately one hour each in 2012. Pre-election interviews focus on candidate preferences and anticipated vote choice; an array of possible predictors of candidate preferences, turnout, citizen engagement; and an array of indicators of cognitive and behavioral engagement in the information flow of the campaign. Post-election interviews measures a variety of behavioral experiences people might have had throughout the campaign (e.g., turnout, mobilization efforts), plus additional posited predictors of candidate preferences, turnout, and citizen engagement.

Some of the questions asked during these interviews are categorized as standard (also known as core) items, meaning that they have been asked regularly over the years.  These questions are scheduled to appear on subsequent editions of the ANES Time Series in order to permit comparisons across elections.  The purpose of categorizing items as standard is to assure scholars who conduct longitudinal analyses that they can continue to depend on ANES to include variables that have been shown to perform well in the past.

Although recognizing the importance of continuity, ANES has also sought to develop the time series in innovative ways. The non-standard component of each questionnaire has routinely focused on matters of interest to the current election cycle. These items are often selected from an "ANES Question Inventory," which includes the standard questions and questions that have been asked in past ANES surveys but are not part of the standard battery of questions.  Researchers can access the question inventory at:

ftp://ftp.electionstudies.org/ftp/anes/OC/CoreUtility/ALT2010core.htm

The non-standard content of questionnaires has varied over the years. For example, candidate positions on issues of government policy are recognized as predictors of candidate preferences, but two one-hour interviews do not permit measuring positions on all of the many issues enjoying government attention at any one time in history. So from year to year, different choices have been made about which issues to include in the questionnaire.

As in the past, ANES will continue to emphasize best practices in sample design, respondent recruitment, and interviewing.  As always, we aim to provide top-quality service in many respects, including: (1) the careful and extensive planning that must be done before the field work begins, (2) the hard work that will be done by interviewers, supervisors, and study managers during data collection to monitor productivity and make adjustments in strategy to maximize the quality of the final product, and (3) the extensive data processing efforts (including integration of an extensive contextual data file) that will be required to assemble and document the final data set.

 

About the Online Commons

Content for the ANES 2012 Time Series Study will primarily evolve from two sources:  previous ANES Time Series questionnaires and new proposals received via the ANES Online Commons (OC).  The OC is an Internet-based system designed to promote communication among scholars and to yield innovative proposals about the most effective ways to measure electorally-relevant concepts and relationships. The goal of the OC is to improve the quality and scientific value of ANES data collections, to encourage the submission of new ideas, and to make such experiences more beneficial to and enjoyable for investigators. In the last study cycle, more than 700 scholars sent over 200 proposals through the OC.

Proposals for the inclusion of questions must include clear theoretical and empirical rationales. All proposals must also clearly state how the questions will increase the value of the respective studies. In particular, proposed questions must have the potential to help scholars understand the causes and/or consequences of turnout or candidate choice.

The ANES Online Commons will accept proposals until 3:00pm Eastern Time on August 30, 2011. The deadline for members of the Online Commons community to comment on proposals is September 8, 2011. The deadline for revisions to proposals is at 3:00pm Eastern Time on September 14, 2011.

For additional information about how to submit a proposal, please visit:

http://www.electionstudies.org/

 

Proposal Evaluation Criteria

The following criteria will guide the PIs and the ANES Board in evaluating proposals made through the Online Commons. We strongly encourage anyone who is considering making a proposal to read the following carefully.

1. Problem-Relevant. Are the theoretical motivations, proposed concepts and survey items relevant to ongoing controversies among researchers? How will the data that the proposers expect to observe advance the debate?

What specific analyses of the data will be performed? What might these analyses reveal? How would these findings be relevant to specific questions or controversies?

2. Suitability to ANES. The primary mission of the ANES is to advance our understanding of voter choice and electoral participation. Ceteris paribus, concepts and instrumentation that are relevant to our understanding of these phenomena will be considered more favorably than items tapping other facets of politics, public opinion, American culture or society.

3. Building on Solid Theoretical Footing. Does the proposed instrumentation follow from a plausible theory of political behavior?

4. Demonstrated Validity and Reliability of Proposed Items. Proposed items should be accompanied by evidence demonstrating their validity and reliability. Validity has various facets: e.g., construct validity, concurrent validity, discriminant validity and predictive validity. Any assessment of predictive validity should keep in mind criterion 2, above.

Reliability can be demonstrated in various ways; one example is test-retest reliability. We understand that proposals for novel concepts and/or instrumentation will almost always lack empirical evidence demonstrating validity and/or reliability. Proposals for truly "novel" instrumentation might be best suited for the series of smaller, cross-sectional studies ANES will field in the period 2010 through the summer of 2012; as a general matter, we are highly unlikely to field untested instrumentation on the Fall 2012 pre-election and post-election surveys.

5. Breadth of Relevance and Generalizability. Will the research that results from the proposed instrumentation be useful to many scholars?

Given the broad usage of ANES data, we may be unable to accommodate requests to include items that are relevant for one -or only a few- hypothesis tests. Ceteris paribus, items that are potentially relevant for a wide range of analyses will be considered more favorably than items that would seem to have less applicability.

When the 2012 questionnaires are designed, the status of the standard questions will be central considerations. Standard questions do not have an infinite shelf life -- Science advances and new insights can reveal more effective ways of asking important questions or can show that some questions do not in fact meet the requirements of remaining a standard question.  However, proposed changes made to standard questions will be scrutinized with recognition of the value of continuity over time.  While we will welcome proposals to change standard questions, the burden of proof required for making such changes will be high. We will take most seriously arguments that are backed by concrete evidence and strong theory.

All proposals that include a change to a particular question (standard or non-standard) should name the specific question that would be altered and provide a full explanation as to why the ANES user community will benefit by such a change.

Tools To Assist Your Proposal Development

As previously mentioned, researchers can access the ANES Question Inventory at:

ftp://ftp.electionstudies.org/ftp/anes/OC/CoreUtility/ALT2010core.htm

This Inventory provides the list of standard and non-standard questions that have been part of the Time Series, and includes frequencies for the most recent studies.

We have also created a second resource to review questions that have been asked previously.  The ANES Time Series Codebook Search utility searches existing codebooks from studies in the ANES Time Series.   You can access the utility at http://ftp.nes.isr.umich.edu/backup/searchhelp.htm  

(Please note that there are some limitations to the utility that are documented on the search help page, the link to that page is at the top of the utility page.)

We hope that you will find these tools useful as you prepare your proposals.

The opportunity to submit proposals is open to anyone who wants to make a constructive contribution to the development of the ANES 2012 Time Series Study. Feel free to pass this invitation along to anyone (e.g., your colleagues and students) who you think might be interested. We hope to hear from you.

For additional resources and information on how to submit a proposal, please visit http://www.electionstudies.org/onlinecommons/

 

Darrell Donakowski

Director of Studies

American National Election Studies (ANES)

IASSIST Quarterly (IQ) volume 34-2 now on the web

The new issue of the IASSIST Quarterly is now available on the web. This is the volume 34 (number 2, 2010).

 http://iassistdata.org/iq/issue/34/2

The layout has changed. We hope you’ll enjoy the new style presented. It seems to be a more modern format and more suited for the PDF presentation on the web. Walter Piovesan – our publication officer – had a biking accident. To show that nothing is so bad that it is not good for something Walter used his recovery time to redesign the IQ. Furthermore, Walter is the person in charge of the upcoming 2011 IASSIST conference, so he is a busy guy. And I’m happy to say that Walter should be fit for the conference.

This issue of the IQ features the following papers:

Rein Murakas and Andu Rämmer from the Estonian Social Science Data Archive (ESSDA) at the University of Tartu describe in their paper "Social Science Data Archiving and Needs of the Public Sector: the Case of Estonia" how the archive had a historical background in the empirical research of the Soviet Union.

From the historical background we move to web 2.0 in a paper  by Angela Hariche, Estelle Loiseau and Philippa Lysaght on "Wikiprogress and Wikigender: a way forward for online collaboration". The authors are working at the OECD and the paper's statement is that "collaborative platforms such as wikis along with advances in data visualisation are a way forward for the collection, analysis and dissemination of data across countries and societies”.

The third paper addresses an issue of central importance for most data archives. The question concerns balancing data confidentiality and the legitimate requirements of data users. This is a key problem of the Secure Data Service (SDS) at the UK Data Archive, University of Essex. The paper "Secure Data Service: an improved access to disclosive data" by Reza Afkhami, Melanie Wright, and Mus Ahmet shows how the SDS will allow researchers remote access to secure servers at the UK Data Archive.

The last article has the title "A user-driven and flexible procedure for data linking". The authors are Cees van der Eijk and Eliyahu V. Sapir from the Methods and Data Institute at the University of Nottingham. The data linking relates to research combining several different datasets. The implementation is developed for the PIREDEU project in comparative electoral research. The authors are combining traditional survey data with data from party manifestos and state-level data.

Articles for the IQ are always very welcome. They can be papers from IASSIST or other conferences, from local presentations or papers directly  written for the IQ.

Notice that chairing a conference session with the purpose of aggregating and integrating papers for a special issue IQ is much appreciated as the information reaches many more people than the session participants and will be readily available on the IASSIST website.

Authors are very welcome to take a look at the description for layout and sending papers to the IQ:

http://iassistdata.org/iq/instructions-authors

Authors can also contact me via e-mail: kbr @ sam.sdu.dk. Should you be interested in compiling a special issue for the IQ as guest editor or editors I will also be delighted to hear from you.

Karsten Boye Rasmussen, editor

  • IASSIST Quarterly

    Publications Special issue: A pioneer data librarian
    Welcome to the special volume of the IASSIST Quarterly (IQ (37):1-4, 2013). This special issue started as exchange of ideas between Libbie Stephenson and Margaret Adams to collect

    more...

  • Resources

    Resources

    A space for IASSIST members to share professional resources useful to them in their daily work. Also the IASSIST Jobs Repository for an archive of data-related position descriptions. more...

  • community

    • LinkedIn
    • Facebook
    • Twitter

    Find out what IASSISTers are doing in the field and explore other avenues of presentation, communication and discussion via social networking and related online social spaces. more...