Conference Presentations 2010

  • IASSIST 2010-Social Data and Social Networking: Connecting Social Science Communities across the Globe, Ithaca, NY
    Host Institution: Cornell Institute for Social and Economic Research (CISER) and Cornell University Library (CUL)

D3: Virtual Research Environments: Tools for Presenting and Storing Data (Thu, 2010-06-03)
Chair:Ron Nakao, Stanford University

  • Data Warehouses and Business Intelligence for Social Sciences. Aims and Possibilities of a Data Warehouse within the National Educational Panel Study (NEPS) in Germany
    David Schiller (University of Bamberg, National Educational Panel Study (NEPS) )


    The National Educational Panel Study NEPS (which is supported by the German Federal Ministry of Education and Research) was established in 2008. Based on a multicohort sequence design 60,000 target persons within six starting cohorts will be followed through their life course. The main aims of the NEPS are to increase the knowledge about the development of competencies, the impact of learning environments, social inequality and educational decisions, educational acquisition with migration background, and the returns of education. The NEPS data will be stored in a data warehouse. A data warehouse and the tools of Business Intelligence provide a higher level of flexibility than ordinary file servers or conventional transactional databases. Some of these additional opportunities are: Easier access to the data, data can be offered in different dimensions for various statistical packages (such as Stata), data from different sources can be matched, the user can build his/her own private use files (e. g., by selecting variables depending on the personal needs via a shopping basket), results and new computed variables can be easily stored within the data warehouse for further use by different users, the statistical power within the data warehouse can be used without an additional statistical software and even new solutions of disclosure control and anonymization techniques can be integrated.

  • A User-Driven and Flexible Procedure for Data Linkin
    Cees van der Eijk (University of Nottingham)
    Eliyahu V. Sapir (University of Nottingham)


    The PIREDEU program develops a large-scale pilot infrastructure for research into electoral democracy in the European Union. This program comprises, for each of 27 EU member-states five data components: voter studies, candidate studies, content analyses of mass media, content analyses of party manifestos, and contextual data. The infrastructure builds user-friendly tools for integrating these very different kinds of data into data-sets tailored to individual researchers’ needs. This paper discusses, after a brief description of the PIREDEU program

    • preconditions for successful data integration (including ex-ante harmonization and pre-linking)
    • a brief overview of the multi-dimensional conceptual space defining different forms of linking the various data components
    • the strategy designed for linking data without restraining users to a limited number of pre-defined possibilities, and to provide analysts to specify the kind of linking that optimally suits their research needs, and the on-line implementation of this strategy
    • the effects of the on-line linking tools on the cohesion of networks of researchers, effective use of data, quality of data and need for new data collection
    • conditions under which the approach to data linking developed for PIREDEU can be extended to other data holdings (without ex-ante harmonization and pre-linking).
  • Virtual Center for Collaborative Research (ViCtoR)
    Pascal Heus (Metadata Technology)


    The purpose of the Virtual Center for Collaborative Research (ViCtoR) project is to support the development of a DDI metadata driven web based platform that provides researchers with a flexible and dynamic environment for the purposes of discovering and analyzing microdata, fostering collaboration, and facilitating the preservation of research knowledge. The platform will consist in a web-based environment that will offer users tools to manage virtual research projects (derive customized data files for a particular purpose; share files and knowledge with team members; package research outputs as enhanced publications for preservation and dissemination) and to facilitate social networking. The facility will also include tools to manage research outputs and knowledge (data, document or script libraries; researcher directories; Wiki and other collaborative tools such as chat rooms, discussion forums, news, event calendar, etc.) The project is in its pilot phase and we will present the initial version of the environment focusing on metadata exploration, dataset customization and retrieval, research project management, and knowledge capture. This work is being supported by the NORC Data Enclave ( and implemented by Metadata Technology North America.

  • InFuse: Data Feeds for the UK 2001 and 2011 Censuses and Beyond
    Justin Hayes (The University of Manchester)


    The InFuse Project is building on the successes of the recent CAIRD Project in demonstrating the feasibility, and some of the potential benefits of applying the emerging open standards SDMX metadata/ transfer schema in combination with a web service to the large and complex aggregate outputs from the UK 2001 Census to create a data feed providing comprehensive and flexible access to data and metadata. The primary objective of the project is to develop a dissemination application based on a complete UK 2001 Census data feed that will provide an operational service to the Census Dissemination Unit’s user base across UK academia by September 2010. The service will then be developed to incorporate information from the UK 1971, 1981, 1991 and 2011 censuses, as well as other, non-census datasets. Further research will create new metadata on geographic and definitional comparability to enable integrated use of these datasets. This paper will describe the methods used in the InFuse Project, the outputs to date, some potential benefits and impacts, and the parallel partnership work between the Census Dissemination Unit and the UK Office for National Statistics in developing data feed dissemination from source for the aggregate outputs from the UK 2011 Census.

D4: Restricted Data Access: Principles and Standards (Thu, 2010-06-03)
Chair:Oliver Watteler, GESIS

  • Survey on Access to African Government Microdata for Social Science Research
    Lynn Woolfrey (University of Cape Town)


    Governments mandate their National Statistics Offices to collect empirical data to determine appropriate policies. Re-use of this data for research can provide input regarding the effectiveness of government action. In Western Europe and North America policies and institutions support the efficient collection and sharing of official data for research purposes. In Africa the sharing of government microdata is constrained by several obstacles. African National Statistics Offices have limited resources to curate microdata and ensure its long-term availability. Consequently many African data producers do not follow international best practice with regard to survey data management or share the microdata from the surveys they conduct. This was confirmed by a survey conducted in order to investigate the availability of survey microdata from African National Statistics Offices for research. A further obstacle to access to government microdata in Africa is inadequate producer-user communication channels. Concerns around the confidentiality of respondent information also present a barrier to data usage for research, as does the bureaucratic nature of government institutions involved in data production in African countries. Access to official microdata for research requires sound data usage policies driven by African decision-makers who appreciate the role of information utilisation in national development.

  • Developing a Statistical Disclosure Standard for Europe
    Tanvi Desai (London School of Economics )


    The European Union has long faced the problem of how overcome the challenge of sharing data across borders for effective cross-national research. One of the key issues is a harmonisation of standards and protocols. The ESSNet project funded by Eurostat has developed a protocol for statistical disclosure that can be implemented by all european member states. This presentation will outline some of the challenges faced, the standard developed, and will look how the standard might be used to change the european data infrastructure in the future.

  • Settings, Practices and Data Access: Results of a Survey of UK Social Scientists
    Jo Wathan (University of Manchester )


    Where access to data are controlled by a data service or depositor, their use may be restricted to individuals who are able to adhere to certain conditions regarding their use. These conditions may require the applicant to store, use or limit sharing in particular ways. In the UK such conditions have become more commonplace with the growth of special licences and securing settings for government microdata. In order to assess the potential impact of these conditions on the usability of data, a survey was conducted to obtain data from a representative sample UK social scientists in ten disciplines to better understand their working environments and practices during the autumn of 2009. A 61% response rate was achieved resulting in over six hundred completed questionnaires. The survey covered a range of questions which included, access to computing facilities, including data and printout storage, home working, data transportation, awareness of institutional policies on personal data, and attitudes to a range of access conditions.

  • International Access to Restricted Data - A Principles-Based Standards Approach
    Felix Ritchie (UK Office for National Statistics)


    Access to restricted microdata for research is increasingly part of the data dissemination strategy within countries, made possible by improvements in technology and changes in the risk-benefit perceptions of NSIs. For international data sharing, relatively little progress has been made. Recent developments in Germany, the Netherlands and the US are notable as exceptions. This paper argues that the situation is made more complex by the lack of a general coherent risk-assessment framework. Discussions about whether something should be done become sidetracked into discussions about how procedural issues would constrain implementation. International data sharing negotiations quickly become bilateral, often dataset-specific, and of limited general value. One way forward is to decouple implementation from principles. A principles-based risk-assessment framework could be designed to address the multiple-component data security models which are increasingly seen as best practice. Such a framework allows decisions a out access to focus on legal-procedural issues; similarly, secure facilities could be developed to standards independent of dataset-specific negotiations. In an international context, proposals for classification systems are easier to agree than specific multilateral implementations. The paper concludes with examples from the UK and cross-European projects to show how such principles-based standards could work in practice.


E1: Engaging New Users (Thu, 2010-06-03)
Chair:Lynda Kellam, University of North Carolina at Greensboro

  • Telling Stories About and With Data
    Jackie Carter (University of Manchester )


    This session focuses on a project with a remit to produce some evidence of how teachers use data resources, and the impact on student learning. There is concern about levels of data and statistical literacy skills of UK social science students, even though the cutting edge of social science is reliant on use of real world datasets, and there is a great desire to improve research-led teaching in the area. The project collates the experience of attempts to upskill students in data and its discipline-related usage, and provides an illustration of educational practice at both discipline and national level. The case studies showcase attempts to make learning and teaching about and with data a less passive experience. The UK national data centres provide access to a wealth of social science data - provided by national census agencies, and inter governmental organisations including the OECD, IMF, UN and World Bank - for undergraduate and postgraduate study. Students often avoid handling and discussing data in their study unless forced to confront it. The challenge for educators lies in promoting students' use of data, but the benefits in doing so improve both academic performance and job prospects for students.

  • Outreach to New Communities: The Census 2010 Projec
    Lisa Neidert (University of Michigan )


    Data Support at the Population Studies Center receives money from the university provost to provide support for census data on the UM campus. With the advent of the 2010 Census, we are in the process of a many-pronged outreach effort related to the 2010 Census. The purpose is to reach communities that could/should be using census data to inform their stories, lives, and research. We also want to underscore the importance in participating in the 2010 Census. The first effort was a Census 2010 Boot Camp for Journalists, which provided 25 journalists with training to understand the importance of census data and how to analyze the data for local and national stories: In the second effort we are sponsoring an ad contest (YouTube-like), where students create short videos encouraging participation with the 2010 Census. The ‘hard-to-count’ areas in Ann Arbor are the student-dominated university neighborhoods. Finally, we will be teaching a one-credit course in Winter 2010 on “The United States Census.” Portions of the course will be presented in other classes throughout the academic term. The presentation will discuss the challenges and rewards of outreach to new communities.

