Conference Presentations 2008

  • IASSIST 2008-Technology of Data: Collection, Communication, Access and Preservation, Stanford, CA
    Host Institution: Stanford University Libraries and Academic Information Resources

A4: Web Tech: Presentation and Design (Wed, 2008-05-28)
Chair:San Cannon, Federal Reserve Board

  • Using the Web to Communicate Survey Metadata: Design, Development and Maintenance of the ESRC Question Bank
    Julie Gibbs (University of Surrey)


    This session is intended to be follow on from a session at the IASSIST conference in 2006 on web development and websites that were good examples for students wishing to find data. This paper will provide members with an update on recent developments! The ESRC Question Bank has been redesigned in 2006 with a new interface developed around current web standards and with user feedback in mind. This has been a great success, with user statistics showing a 70% increase in visits to the site! In this presentation I will discuss how the web interface was developed and why we have kept the rather flat structure of the Qb as opposed to developing a database for the survey questionnaires that we hold. I will also discuss the maintenance required for this type of resource, and how it can be used to get students to think about using or recycling survey questions for their own research work. The session will aim therefore to update the audience on current web designs for metadata dissemination with particular reference to the Qb whilst pondering whether we are making web dissemination too complex for students to use.

  • Welding a Website that (Might) Work: An Analysis of a Data Website Redesign at a Small Liberal Arts College
    Rachael Barlow (Raether Library, Trinity College, Hartford)


    Data websites at small schools aim to serve a variety of constituencies: undergraduates and graduate students from a range of majors with a range of data skills, faculty of varying technical ability, and librarians without training in data who nonetheless must answer data questions at the reference desk. I report on what happened when a small school (Trinity College) redesigned its website to cater to these varied constituencies by being more inclusive of the varying formats in which library resources now come: not just journal databases, but audio/video resources, images, and, of course, data. With data collected from the new website on what resources get the most “hits” and how users exploit multiple entry points to access those resources, I highlight which resources are the most surprising in terms of the number of hits they now receive. Finally, I pose some theoretical questions concerning how we should think about data websites in the more general context of library websites.


B1: Cultures of Data Sharing (Wed, 2008-05-28)
Chair:Mari Kleemola, Finnish Social Science Data Archive

  • Life Cycle of and Open Access to Research Data in Finland
    Tuomas J. Alatera (Finnish Social Science Data Archive)


    The OECD recently published its guidelines for access to research data from public funding. Motivated by the OECD guidelines, Finnish Social Science Data Archive conducted an Internet survey on the preservation and reuse of research data, targeting humanities, social sciences and behavioural sciences. The aim of this survey was to chart how the universities in Finland have organised the depositing of digital research data and to what extent the data are reused by the scientific community after the original research has been completed. Views were also investigated on whether confidentiality or research ethics issues, or problems related to copyright or information technology formed barriers to data reuse.

  • Barriers to Data Archiving and Sharing in Health Research – Lessons from a User Study
    Lone Bredahl (Danish Data Archive)


    In some research domains data archiving constitutes an integrated part of the research process. Here, data are deposited not only for personal reasons, and data are willingly shared with fellow researchers and other interested parties. In Danish health research, central archiving of research data broke new grounds in 2005 with the establishment of a separate entity for health data, DDA Health, at Danish Data Archives (DDA). To obtain a systematic overview of expectations and experiences on central archiving as well as of behaviours and attitudes with regard to data sharing, an empirical study was carried out in summer 2007 among depositors and potential users of data services in DDA Health. Data were collected by a combination of qualitative and quantitative methods. Initially, a focus group and five personal interviews were carried out. Following these, a web-based survey was conducted based on an extract of email addresses from the administrative database at DDA. Results point to low perceived necessity and lack of consideration of the issue of data preservation over all as major barriers to data archiving. Data sharing, on the other hand, is clearly (negatively) linked to perceptions of data ownership. At the same time there is no tradition for data sharing trough more formal channels, such as a central data archive.

  • Asian Social Science Data Accessibility
    Daniel C. Tsang (UC Irvine)


    This paper takes an overview of Asian social science data availability and accessibility in Asia and, to a certain extent, abroad. It covers not what is available in existing social science data archives, but what is also available from government agencies, survey research organizations and individual researchers. It looks at different cultures of data sharing across the region and what contributes to that. It addresses whether emerging standards (such as DDI) are applied in Asia and what needs to be done to close the gap between social science data archiving in the west and in Asia.

B2: Tools for Data Visualization and Manipulation (Wed, 2008-05-28)
Chair:Jane Weintrop, Columbia University

  • Web 2.0 Data Visualisation Tools
    Stuart McDonald (University of Edinburgh)


    As Web 2.0 continues to evolve and transform into what is being referred to as Web 3.0 we are seeing the boundaries between websites and web services blurring as more and more web content becomes remixable. Many of the resultant visualizations and applications can be achieved with no more than a basic understanding of the underlying technologies. This presentation will discuss the range of collaborative web utilities which use Web2.0 technologies which venture into the numeric and spatial data visualisation arenas. There are a whole range of map (or spatial) mash-ups which utilize Web2.0 technologies and interactive mapping products such as Google Earth and MS Virtual Earth. Such mapping utilities have paved the way for research organisations to explore and expose their findings in new and innovative ways. There has been however less publicity regarding the visualisation of data, once thought to be the remit of domain experts. This presentation will also look at and compare a number of utilities such as Swivel and Many Eyes, that to varying degrees visualise data and allow data users the opportunity to interact with and share data in an open environment.

  • What about Swivel?
    Amy West (University of Minnesota)


    An unintended consequence of using social sciences data, especially from government sources, is that users end up devoting a significant amount of time to dealing with the mechanics of data cleanup that could (in theory) be otherwise spent thinking about the content and meaning of the data instead. Often that time is further devoted to learning a great deal about a single piece of proprietary software that may or may not continue to be available over time. Swivel appears to be a tool that resolves this problem by simplifying data visualization. It uses non-proprietary data formats as its input and automatically turns out a range of graphical displays for any table uploaded. But does this really help with the issue of misplaced effort by users? Even if it does, how might it be integrated into a data services program in an academic institution? I will be discussing my efforts to develop a more user-friendly version of the Consumer Expenditure Survey data via Swivel and where/how I think Swivel and similar tools might fit into a data services program.

  • Python in the Data Lab
    Harrison Dekker (UC Berkeley)


    The open source programming language Python is often recommended as a first language for those new to programming. Some have even argued that for those who program infrequently, it may be the only language they need to learn. For data services the Python language has a number of compelling features. It is easy to learn, has clean syntax, and features an extensive collection of modules to help address the sorts of “data munging” and administrative tasks that users often find themselves engaged in. Examples include automation of repetitive data manipulation processes, gluing together multiple applications in order to accomplish a complex task, extraction of data from websites and web services, and scripting for websites and servers. This presentation will give an overview of the features of the language of particular interest to data users. Comparisons will be made to other popular scripting languages. Modules relevant to data manipulation will be discussed. Finally, attention will be given to the recent integration of Python into SPSS and ArcGIS and how this might be relevant to data users.

B3: What Is Old Is New Again (Wed, 2008-05-28)
Chair:Gretchen Gano, New York University

  • Canada Year Book Historical Collection
    Bernie Gloyn (Statistics Canada)


    First published in 1867 as the Year Book of British North America, the Canada Year Book became the premier reference resource on the social and economic life of Canada and its citizens, a function it still performs today. With funding from Heritage Canada, Statistics Canada has digitized the first hundred years of this important historical resource and is making it freely available on the Internet. An important step by the agency, it has allowed us to develop our expertise in making key historical resources more readily accessible, searchable and useable. The digitized year books are supplemented with tables, graphs and maps and are linked to a series of lesson plans to spark their use by teachers and students. Spanning more than 3 years to completion and exceeding a terabyte of data, the project has encountered and overcome a number of issues of interest to the IASSIST community. This presentation will give a brief history of the project, demonstrate the website, and highlight the issues overcome and future directions possible.

  • Moving an Archive from Tape to Disk: A Case-Study at ICPSR
    Bryan Beecher (ICPSR)


    In early 2007 ICPSR's digital archives consisted of 700 magnetic tapes stored in two separate locations in Ann Arbor, Michigan. The best-case scenario for retrieving content consisted of a data librarian finding the correct tape, mounting that tape, restoring the desired content, and then copying that content to a well-known location for the data manager. A worst-case scenario could include trips to off-site storage locations, composing content from a combination of Master and Backup tapes, and multiple attempts at finding just the right content. By late 2007 the archives have been copied to spinning disk, and replicated across storage grids. In true belt-and-suspenders fashion, an additional copy also resides on tape, and this copy of the entire archive fits on a mere six high-density tapes. The online content may be searched and browsed, and with sufficient access rights, an ICPSR data manager may fetch any file through a convenient web interface. This presentation describes the starting point of the migration, challenges faced and lessons learned during the process, and the state of the archives post-migration. We reference technologies that we found useful during this process, but do not probe too deeply into their intricacies.

