Conference Presentations 2013

  • IASSIST 2013-Data Innovation: Increasing Accessibility, Visibility, and Sustainability, Cologne, Germany
    Host Institution: GESIS – Leibniz Institute for the Social Sciences

Posters (Thu, 2013-05-30)

  • UK Data Archive Keyword Indexing with a SKOS Version of HASSET Thesaurus
    Mahmoud El-Haj (UK Data Archive)


    We show the evaluation results, tools and techniques used to automatically index data collections. We examine the efficiency and the accuracy of keyword automation. We tested the capacity and quality of automatic indexing using a controlled vocabulary called HASSET. We began by applying SKOS to HASSET. The automatic indexing, using the SKOS version of HASSET, provided a ranked list of candidate keywords to the human expert for final decision-making. The accuracy or effectiveness of the automatic indexing was measured by the degree of overlap between the automated indexing decisions and those originally made by the human indexer. We investigated text mining techniques to automatically index the data collection. These included applying the tf.idf model and Keyphrase Extraction Algorithm (KEA) in a Java development environment. We used Machine Learning and Natural Language Processing tools. The tools were used to build a classifier model using training documents with known keywords and then used the model to find keywords in new documents. Extensive manual and automatic evaluation was performed to calculate recall and precision scores. This poster explains how and why we applied the chosen technical solutions, and how we intend to take forward any lessons learned from this work in the future.

  • DataForge
    Pascal Heus (Metadata Technology)


    Statistical data exist in many different shapes and forms such as proprietary software files (SAS, Stata, SPSS), ASCII text (fixed, CSV, delimited), databases (Microsoft, Oracle, MySql), or spreadsheets (Excel). Such wide variety of formats present producers, archivists, analysts, and other users with significant challenges in terms of data usability, preservation, or dissemination. These files also commonly contain essential information, like the data dictionary, that can be extracted and leveraged for documentation purposes, task automation, or further processing. Metadata Technology will be launching mid-2013 a new software utility suite, "DataForge", for facilitating reading/writing data across packages, producing various flavors of DDI metadata, and performing other useful operations around statistical datasets, to support data management, dissemination, or analysis activities. DataForge will initially be made available as desktop based products under both freeware and commercial licenses, with web based version to follow later on. IASSIST 2013 will mark the initial launch of the product. This presentation will provide an overview of DataForge capabilities and describe how to get access to the software.

  • Interdisciplinarity: Ways to Improve Data and Statistical Literacy
    Flavio Bonifacio (METIS Ricerche Srl)


    Working on Numbers (IQ, Fall 2009) and modeling multishaped reality through data, I discovered an unexpected and widespread desire to recognize the charming appeal of numbers. This Paper describes how the curiosity for a better knowledge of numbers comes out and what we can do to transform this curiosity into a desire to learn. Curiosity often is the first step toward knowledge and it is evenly distributed among disciplines: the poster will illustrate this natural and parallel interest in numbers to show genuine interdisciplinary ways to improve both data use and statistical literacy. I will present a collection of samples extracted from literature (Paulos and others) and the media world (TV, newspapers, magazines, social networks) to show why the knowledge of numbers and statistical data is needed to understand the real world. Furthermore, I will use two examples from my teaching experience to show how it is possible to teach the "feeling" for statistical numbers: the first is the Numbers Meaning course held [sp1] by METIS Ricerche; and the second is the Master in Data Analysis and Business Intelligence designed and conducted by METIS Ricerche in cooperation with the University of Turin.

F1: Integrated Efforts: Discovery, Distribution and Preservation (Fri, 2013-05-31)
Chair:Jennifer Doty

  • Innovation in thesaurus management
    Lucy Bell (UK Data Archive)


    This paper gives an overview of recent, high profile, future-focused initiatives undertaken at the UK Data Archive to further the usefulness and usability of its digitally-delivered thesauri. The Archive has recently received funding from two separate sources (Jisc and ESRC) to enhance its thesaurus products. The paper starts by describing the work of the Jisc-funded SKOS-HASSET project (June 2012 - March 2013). This R&D project had three aims: to apply SKOS to HASSET; to improve its online presence; and to test SKOS-HASSET's automated indexing capabilities. The paper outlines in more detail the project's aims, objectives, activities and the uses to which its deliverables have been put, post-project. Building on this, a second, five-year project is also underway at the Archive, with wider and more ambitious deliverables. In 2012, the ESRC awarded the Archive funds to improve the content and delivery of ELSST, under the CESSDA ELSST project. The deliverables expected from this work are a new and improved thesaurus management interface, an established annual release process, a review of the thesaurus structures and hierarchies and the creation of a system for remote access. The project will also build on the SKOS-HASSET work in extending our community of thesaurus users.

  • A Nordic collaboration on data archiving and preservation of data on medicine and health
    Elisabeth Strandhagen (Swedish National Data Service (SND))
    Bodil Stenvig (Danish Data Archive (DDA))


    The archives in the Nordic countries, Danish Data Archive (DDA), the Swedish National Data Service (SND), the Finnish Social Science Data Archive (FSD), and the Norwegian Social Science Data Services (NSD), have started collaboration on data archiving and preservation of data on medicine and health. The first meeting was held in Odense in November 2012 with 2-3 representatives from each country. One area that could benefit from cooperation on preserving data on health is to prepare and create common key words and Track for health sciences, and to prepare and describe the teaching content for data management program. The group will also focus on support for secondary use of data on health science as an important resource for medical scientists. A common goal is to develop a platform for collaboration within the framework of dissemination of research data presented by the Council of European Social Science Data Archives (CESSDA)/CESSDA-ERIC. The Nordic data archives want to be represented at the NordicEpi 2013 and will also report to the NordForsk project "NORIA-net on Registries". The data services in the Nordic countries will qualify the Nordic researches infrastructures for health science in cooperation with the Nordic epidemiological research.

F2: (SERSCIDA) Making New Connections: Developing Data Services in Bosnia and Herzegovina, Croatia, and Serbia (Fri, 2013-05-31)
Chair:Lejla Somun-Krupalija

  • Role of CESSDA in developing new data services in Western Balkans
    Irena Vipavc Brvar (University of Ljubljana)
    Mattias Persson (University of Gothenburg)
    Han Jorgen Marker (University of Gothenburg)
  • Country assessment reports: Researchers' interest in data services
    Aleksandra Bradic-Martinovic (Institute of Economic Sciences, Belgrade)
  • Existing infrastructures for data services in Western Balkans
    Marijana Glavica (University of Zagreb)

F4: Expanding Scholarship: Research Journals and Data Linkages (Fri, 2013-05-31)
Chair:Jenny Muilenburg

  • Research data management in economics journals: Data policies and data description as prerequisites of reproducible research
    Sven Vlaeminck (Leibniz - Information Centre for Economics - ZBW)
    Ralf Toepfer (Leibniz - Information Centre for Economics - ZBW)


    Replication of research results is eminent for empirical science. But in disciplines like economics, replication is a vision rather than a reality. One reason for this is that research data are not available due to the lack of mandatory data policies and archives. Even if data is available, descriptions with sufficient metadata are often missing. Also the e-infrastructure for providing datasets and other materials is still underdeveloped and offers no features. Our talk focuses on academic journals in economics. We present some results of a study of more than 140 journals regarding their research data management and suggest good practices for data availability policies. Subsequently we propose concepts for improvements regarding the journals' e-infrastructures. In particular we are addressing the problem of metadata creation. Often, the creation of metadata is not accepted by researchers because it is too time consuming. On the other side it must be comprehensive enough for reproducibility purposes. Referring to this contradiction, we define different levels of metadata schemata dependent on the different purposes they should serve - from ensuring the citation of research data to the requirements for replications of data and results.

  • Perspectives on the role of trusworthy repository standards in data journal publication
    Angus Whyte (Digital Curation Centre)
    Sarah Callaghan (Digital Curation Centre)
    Jonathan Tedds (Digital Curation Centre)
    Matthew Mayernik (Digital Curation Centre)


    Data journals are a focus for innovation in data sharing and publication, across a growing range of disciplines. They offer a number of significant opportunities to researchers, data centers/ repositories, institutions and publishers. We report on progress in the PREPARDE project, which is addressing key issues including the common ground between 'trustworthy' data repository standards and effective peer review of datasets. The project has an initial focus on earth science disciplines, and the Geoscience Data Journal, a partnership between the UK Royal Meteorological Society and Wiley-Blackwell, and involves major geoscience data centers in the UK and US. We discuss findings of an international interdisciplinary workshop, and its contribution to our aim of producing guidelines on a) dataset review criteria and the associated cross-repository workflows; and b) the roles of trusted repository standards e.g. the Data Seal of Approval and ISO16363 in supporting the peer review of data. These focus on how the responsibilities for both technical and scientific review of data can be met effectively through collaboration between the various stakeholders. These include research institutions, many of which are developing infrastructure for research data management to fulfill their policy obligations towards sharing publicly funded research data as a public good.

  • IASSIST Quarterly

    Publications Special issue: A pioneer data librarian
    Welcome to the special volume of the IASSIST Quarterly (IQ (37):1-4, 2013). This special issue started as exchange of ideas between Libbie Stephenson and Margaret Adams to collect


