Conference Presentations 2011

  • IASSIST 2011-Data Science Professionals: A Global Community of Sharing, Vancouver, BC
    Host Institution: Simon Fraser University and the University of British Columbia

A3: Building Capacity to Link, Visualize, Identify, and Discover (Wed, 2011-06-01)
Chair:Steven McEachern, Australian Social Science Data Archive

  • Creating a Linchpin for Financial Data. The need for a Legal Entity Identifier (LEI)
    Linda Powell (Federal Reserve Board)


    The need for reference metadata has been recognized and identified as crucial within the financial industry. Unique and standardized identification of legal entities has become a top priority for the financial industry. It is now widely recognized that such an identifier is critical to efficient financial transaction processing and is critical for clear and unambiguous identification of parties and counterparties involved in all financial activities. Although the need for unique entity identification across government agencies, data vendors, and financial market participants has been discussed for years there is currently no entity identification schema used across the financial industry. The events of the last several years and the recent financial reform highlight the need for legal entity identifiers that cross organizations. The paper is the collaborate result of several financial regulators and:
    • Explores the current state of entity identification throughout the financial industry,
    • Presents the business cases for why unique and standardized identification is necessary,
    • Summarizes the industry best practices and requirements for entity identification, and
    • Identifies alternative approaches to implementing industry-wide identifiers.

  • Longitudinal and Time-series Documentation Protocols at the ADA
    Melanie Spallek (The Australian Data Archive)
    Steve McEachern (The Australian Data Archive)


    This paper will provide an overview of the longitudinal data processing protocols being implemented at the Australian Data Archive as part of the development of the ADA Longitudinal Archive. As part of the migration to a new storage foundation, ADA has been reviewing its longitudinal data holdings to streamline longitudinal data processing, particularly focussing on improvements to the metadata associated with these studies to take advantage of emerging discovery and visualisation tools. This paper will examine ADA's experience with this review process, experiences with the development of guidelines for data dictionaries and variable mapping systems, and the implementation of the procedures with 3 sample longitudinal studies within the ADA holdings. The paper will then conclude with a review of ADA's plans for documentation and support of future longitudinal and cross-sectional time-series data holdings, including new longitudinal panel studies, public opinion poll data and major cross-sectional social surveys.

  • Geospatial Analysis and Visualisation at ADA
    Rhys Hawkins (Australian National University)
    Steve McEachern (The Australian Data Archive)


    The expanding computational resources, web interfaces and spatially enabled data have provided an increasing capability for spatial visualisation in social science data analysis. This has prompted the Australian Data Archive to develop visualisation tools for spatial information that complement and extend the traditional analysis methods used in the social sciences. These improved methods however have major implications for the processing workflow for archiving of survey data, from the design of surveys to incorporate the accurate recording of geospatial identifiers, to maintaining confidentiality of geo-located respondents information to prevent identification by unauthorised users, and allowing researchers access to the data in new and powerful ways. This paper will present the recent work of the ADA and the ANU Supercomputing Facility in this area, providing an overview of progress in developing the ADA GIS data framework as well as a demonstration of new online visualisation tools for exploring spatial social science data.


A4: Extending Data Support Services (Wed, 2011-06-01)
Chair:Samantha Guss, New York University

  • Collaborating with Subject Librarians to Provide Undergraduates with Appropriate International Statistical Resources
    Joe Hurley (Georgia State University)


    Subject librarians often seek the assistance of data services librarians when faced with reference questions concerning statistical information. As more university courses emphasize international awareness, subject librarians frequently receive questions from undergraduate students seeking statistical information on developing and non-western nations. Often unfamiliar with and sometimes intimidated by international statistical information, some subject librarians are unsure where to begin. In addition, many undergraduate students are unskilled in how to properly interpret a statistical chart. More often, undergraduate students need statistics that include contextual information. The many United Nations agencies and divisions produce an abundance of publications that contain highly sought after international statistics and also provide the reader with background information that explain what the numbers mean, how they were collected and the shortcomings of the statistical information. This presentation will focus on the importance of increasing the awareness of both subject librarians and students of these United Nations publications and will also provide advice on how to access and search for these publications. 

  • A Multidisciplinary Analysis of Data Reuse Activities
    Nicholas Weber (University of Illinois Champaign-Urbana)
    Tiffany Chao (University of Illinois Champaign-Urbana)


    The reuse and secondary analysis of digital data in the environmental and social sciences is aided greatly by well-established data repositories and a research culture that fosters trusted data sharing, respectively. However, relatively few studies in either of these disciplines have considered the component activities of reusing data beyond an initial phase of discovery; more often, studies have identified barriers to access or focused on the need to properly attribute datasets. We present a comparative analysis of those activities and practices surrounding secondary use of publicly available data in the environmental and social sciences. Identified activities in these two disciplines are grounded in the current literature, which include: selection criteria for reuse, methodological approaches as they vary by discipline, transfer protocols, citation practices, and explicit barriers to access and use of secondary data. This comparison of data practices provides a formalization of the implicit activities surrounding reuse that will prove highly valuable to data librarians and data scientists in their interactions with an increasingly interdisciplinary and collaborative research community. An analysis of data reuse also offers much needed insight to the development and maintenance of deployed cyberinfrastructures, particularly as these large-scale systems are geared toward data-centric research.

  • Research Data and Open Access
    Nanna Clausen (Danish Data Archive)


    For many years the data archives have concentrated on the preservation, documentation and dissemination for secondary analysis of research data from the social sciences. The research libraries are holding the researchers’ publications based on their data. In Europe the research libraries have begun to be interested in either holding the research data or making open access to the research data and open access to the publications. This paper will firstly present the ongoing projects and efforts of the research libraries that have as objective to investigate the whole area of data discovery and data preservation and linking of publications and data. The paper will secondly discuss the role of the data archives and the research libraries: what is the role of the data archives in the process? How can the data archives’ experience with long time preservation contribute? Will or shall we co-operate more closely with the research libraries? Can DDI be used by both communities?


B1: Describing Qualitative Data Formally - Where Are We with DDI and Its Relatives? (Wed, 2011-06-01)
Chair:Louise Corti, UK Data Archive

  • Report back from the DDI Qualitative Working Group
    Louise Corti (UK Data Archive)
    Arofan Gregory (Metadata Technology USA)
  • Lost in Translation? Experiences in documenting qualitative data at the ADA
    Steven McEachern (Australian Data Archive)
    Lynda Cheshire (Australian Data Archive)
    Melanie Spallek (Australian Data Archive)


    The expansion of data holdings to incorporate qualitative content has been a major emphasis of the Australian Data Archive since 2007, focussed on the establishment of ADA Qualitative (formerly AQuA). While there have been significant challenges in efforts during this time to encourage qualitative researchers to deposit content with the archive, the deposit of these new data forms have also created new challenges for the archive in ingesting, processing and dissemination. These challenges have been threefold: - methodological: what changes do researchers need to make in their methods to support archival practice - technical: how does ADA adapt its existing metadata schema and data management software (DDI2 and Nesstar) to support qualitative content - practical: how are processing procedures for archivists changed when documenting qualitative content This paper explores each of these challenges in turn, focussing particularly on the adoption of the QuDEx schema developed by the UK Data Archive to support qualitative data archiving. The paper will discuss ADA's experience with the use of the QuDEx schema to address these three challenges, and provide suggestions for future developments of the schema and qualitative archiving more generally.

  • The next generation Timescapes Archive - supporting the complex structures and relationships of qualitative longitudinal data
    Ben Ryan (University of Leeds)


    Timescapes, an ESRC funded study, has developed an archive to hold the data and documentation outputs of seven empirical longitudinal qualitative projects. One limitation that has become clear is that the platform used to archive the data (DigiTool) does not support the rich diversity, structure and inter-relationships that the data sets require to be of maximum use to the research community. The archive platform treats all files as "digital objects" and does not allow the modelling of complex structures of information and its inter-relationships. It is not possible to clearly display the connections between artefacts produced from a number of interviews and cohort activities over a number of phases. A consequence of this lack of relation is the inability to present the artefacts utilising time as a dimension other than simplistic date based searching, severely constraining the usability of longitudinal data. Adopting the Fedora Commons platform will allow the archive to represent concepts and relationships between concepts, such as collections, waves, and longitudinal case studies to be directly represented in the archive. The new architecture will support the representation of standards e.g. DDI and emerging initiatives e.g. QuDeX (an XML standard for qualitative data exchange) directly within the archive.


B2: The IASSIST SIGDC Presents: Perspectives on Data Citation (Wed, 2011-06-01)
Chair:Mary Vardigan, Inter-university Consortium for Political and Social Research

  • Building Data Citations for Discovery
    Hailey Mooney (Michigan State University )
    Mark Newton (Purdue University )


    Authors who choose to cite the research data behind their published reports have a variety of options to entertain: domain style guides, publisher requirements, and data provider citation recommendations. Instructions from these sources may differ in terms of the range of required citation elements and guidance to authors on when and how research data merit citation. This presentation will compare the elements of recommended data citations with actual citations in published articles drawn from targeted disciplinary bodies of literature. By creating a window into the practice of data citation, this presentation seeks to understand what guidance is offered to authors who want to cite data and how authors actually compose these citations.

