IASSIST/CSS 1998 Conference Abstracts


Plenary
Wednesday, 9:15 AM

Towards Multiple-Media Survey and Census Data

Stephen E. Fienberg, Carnegie Mellon University
Maurice Falk University Professor of Statistics and Social Science

Sampling and survey methodology have made great strides in the past 25 years and many of the recent advances have been linked to model-based approaches to survey analysis. But the physical and political environments in which data are gathered are in enormous ferment and, at the same time, new forms of data are emerging as alternatives to the traditional numerical responses that survey methodologists have dutifully encoded for use in statistical analyses. Survey data sets of the future might well consist, either through direct collection or forms of record linkage, of combinations of traditional numbers, text, images, sound, and even symbolic summaries. New statistical methods will be needed to deal with such mixed media, and the new data and methods will raise new issues regarding the design and collection of survey data as well as their dissemination, including concerns of confidentiality and disclosure limitation. The time to begin thinking about such issues is now.



Panel 1: Confidentiality, Security, and Access to Data.
Wednesday, 10:30 AM

Statistical Disclosure Limitation Methodology: In Introduction to Current Thinking

Stephen E. Fienberg, Carnegie Mellon University
Maurice Falk University Professor of Statistics and Social Science

For many years, pledges of confidentiality to respondents in censuses and surveys were interpreted in an absolute fashion and agency statisticians created conservative rules for disclosure avoidance which they believed would prevent disclosure of confidential information. During the past twenty-five years, the field of disclosure protection has undergone a "statistical transformation" and begun to utilize the advances that have occurred within the field of statistics itself. This talk provides an overview of the statistical issues that are related to the evolving area of statistical disclosure limitation methodology.

Confidentiality and Data Access - The Rationale for and Implementation of Policies for Restricted Access

Vigdis Kvalheim, Assistant Director
Norwegian Social Science Data Services

A basic requirement for high quality empirical research is access to high quality data. The potential scope of data available from academic and official administrative sources is enormous. Nevertheless, effective use of existing data is constrained by lack of access to these resources, particularly disaggregated data.

There are various reasons for this, most important in this context, the legal requirements restricting data dissemination.

Statistical agencies as well as academic data archives have two main options for protecting the confidentiality of released data; providing restricted data or providing restricted access. The first option entails restricting the content of data sets or files to be released. The second entails imposing conditions on who may have access, for what purpose and so forth.

This paper argues for treating research as a special case, and thus for developing models for restricted access as instruments to remove some of the barriers to data access. The emphasis is on national legislation as a key barrier to data access, and on infrastructures for data sharing within the academic as well as the public sector. Keywords: Self-regulation, partnership and co-operation as long-term strategy for the research community in the future 'battle' between privacy and access.

Protecting Confidentiality in Archival Data Resources

Dr. Christopher Dunn, Assistant Archival Director
Institute for Social Research

Responsibility for maintaining the confidentiality of research subjects is a shared venture, involving both the social scientists who collect data and the archival repositories that preserve and disseminate research data collections. This paper will discuss ways in which confidentiality can be violated, as well as practices at the ICPSR archive to prevent such occurrences (including changes made to data files before their release to the public). Means of providing special access to confidential information will also be described. Finally, we will explore other alternatives for protecting confidentiality, and discuss research trends that pose problems for maintaining respondent confidentiality.



Panel 2: Issues and Problems Confronting Decentralized University Computing Units and their Relationship with Central Computing Organizations
Wednesday, 10:30 AM

A Cooperative Mode of Organization for Social Sciences Computing: A Case Study

Nancy McDermott and Tom Flory
University of Wisconsin, Madison

As a target, the most workable scale of a computing organization for the social sciences is hard to hit. Institutional policy on research computing support may be prescriptive or laissez-faire; it may favor a centralized utility or every department for itself. At the same time, technical innovations have from time to time suddenly changed the terms of the efficiency comparison between distributed and centralized services.

If social scientists beyond the boundaries of departments perceive a community of need for a common computing environment -- hardware, software, networking services, user support -- what kind of organizational structures will permit a proper degree of attention to their special needs, achieve a critical funding mass, and even be flexible in adapting to changing scale requirements? This paper looks at a solution adopted by the Social Science Computing Cooperative at the University of Wisconsin-Madison.

Decentralization and the Onslaught of Technology

Tom Phelan
University of California, Los Angeles

Over the last decade, the role of central computing organizations on campus has come under scrutiny. Due to the rapid increase of desktop computing capabilities, and in response to a perception that local computing, data access, and help desk needs were not being adequately met by central computing units, smaller local computing groups have proliferated at many American universities. At the beginning of this evolution, many central groups ignored the development of these decentralized units, and were content to see the workload generated by PCs, mini-computers, and data management offloaded to other groups. As in-demand technology itself becomes increasingly decentralized, however, the role of centralized units, and the division of computing resources within the campus, is being closely examined at many universities. The relationship between central and non-central units is still evolving and has become increasingly complex.

A Computer Center Decenter(ral)ized: A Comparative Perspective on Social Science Computing at Washington University and the University of North Texas

Jonathan Rapkin, Washington University
Karl Ho, University of North Texas

This paper offers insight into the way social scientists' computing needs are met through comparing two different models. In our experience, a great deal of social scientists specialized needs revolve around methods for providing statistical support to students and faculty, as well as the dissemination of data. The two models contrasted represent centralized and decentralized models of providing such resources. In both instances, these models were established as byproducts of the evolution of computing in general at our various universities and

At the University of North Texas, such services are provided through a centralized office responsible for providing such services campus-wide. At Washington University, on the other hand, such services are provided by several small facilities. Each college is responsible for its' own computing needs, hence there is an emphasis on decentralized facilities. The key advantages of the centralized model include improved compatability due to standardization, improved access in terms of hours and who can use the facility, improved managability in terms of budgeting, policy, technical administration, etc. and the ability to reach critical mass in terms of providing a high level of service to all departments, and leveraging volume discounts on software. The key advantages of the decentralized model include a sense of ownership among the department(s), ability to meet discipline-specific needs, and establishing a convenient gathering place for furthering interaction among graduate students and faculty.

At this point in time we do not advocate either model. Each has clear advantages and disadvantages which must be taken into account when planning the best means of meeting the discipline-specific needs of social scientists within the administrative framework of each university. We outline the pros and cons of the two models and explore how such models might be adapted in the future to better serve the evolving needs of social scientists.

Adapting to Scale: A Framework for Decision About Quality University Computing Services

George Yates
Northwestern University

In my experience, preferential use of small computers instead of mainframes at large academic and research universities has not, so far, ushered in an era of distributed university computing; instead, it has ushered in an era of small university computing. In this setting, an explosion of personal productivity in many areas is accompanied by an explosion of human, budgetary, and technical management issues that resist effective resolution. This talk observes that managing capital equipment, which is a characteristically difficult area for many universities, is a minor problem compared with managing expert computing staff and quality software. Managing the latter is more a matter of managing content rather than infrastructure, hence management issues suffer the vagaries and complexities of intellectual diversity, similar to the problems of managing scholarship itself.

How far apart scholarly and computing technology initiatives are is a key question for almost all university citizens. This talk offers a framework for resolving many issues without getting lost in the details of intellectual content or method. In providing quality services, the specializations of each computing service provider are inherently defined and limited by the scale of its service mission. Understanding the properties of scale and the scaling characteristics of many network services and then applying that understanding to define clear service expectations can capture some economies of scale, inform and control discontent, and serve scholarly diversity at traditional levels, while providing a challenging environment to key technical staff at all levels.



Panel 3: Social Science Software and Services: Research & Development Issues for the Entrepreneur
Wednesday, 10:30 AM

Panel Chairperson: Eric Lang (Sociometrics Corp)
Presenters: Eric Lang, Al Anderson, Jim Gilden, Edward Brent

Social science researchers, educators, and business persons are increasingly collaborating on the development of commercial software and services for social scientists, teachers, students, health professionals, business analysts, and statistical consumers of all levels of expertise. This panel will discuss a number of issues related to the development of social science software and services, such as: [1] Small Business Innovative Research (SBIR) grants, [2] Small Business Technology Transfer (STTR) grants, [3] the relationship between social science entrepreneurs and consumers, and [4] trials and tribulations of social science based businesses.

Dr. Albert F. Anderson retired from the University of Michigan in 1996 after 25+ years as co-head of the computing support group at the Population Studies Center. He now works for Public Data Queries, Inc., a family owned company, and continues to focus his efforts on making data accessible and meaningful to a broad range of users.

Eric L. Lang, Ph.D. is a Principal Research Scientist and Director of the Research Support Group at Sociometrics Corporation in Los Altos, California. He directed the development of Socionet, Sociometrics' commercial social science WWW server, and the Research Archive on Disability in the U.S. (RADIUS). His Ph.D. is in Social Psychology (Univ. of Michigan) and his interests include data archive development/services, the Internet, and social science methodology.

Mr. James Gilden is the Electronic Products Administrator for Scolari (Sage Publications Software), which markets and distributes several qualitative and quantitative social science computer programs, such as the Methodologist's Toolchest, and QSR NUD*IST.

Dr. Edward Brent is Professor of Sociology, Adjunct Professor of Computer Science, and a member of the board of the Laboratory for Applied Expert Systems Research (LAESR) at the University of Missouri - Columbia. He received his Ph.D. in sociology from the University of Minnesota in 1976 and has taught at the University of Missouri - Columbia since January of 1976. He is also president of The Idea Works, Inc., an information technology company specializing in the development and publication of expert systems for business, industry, research, and human services.



Concurrent Session 1A: Designing For Intelligent Data Access
Wednesday, 1:30 PM

A Proposal for a Virtual Digital Data Library

Cavan Capps
US Bureau of the Census

This talk is a proposal for a Virtual Data Library based on current low cost Internet technology. Such a data library would provide data from sources physically distributed across Federal, State and Local governments. Common access tools for searching, documenting, tabulating, graphing, and mapping would be provided to any data source participating in the system.

Cavan Capps has worked with data bases and networks for over 17 years. Throughout his career he has labored on issues related to data as a economist, or as a software engineer. Currently he is the project manager of the Data FERRET component of the Census DADS effort. Data FERRET provides intelligent internet access to many survey record data bases. A few of the surveys include the Current Population Survey, the Survey of Income and Program Participation, the Health and Nutrition Examination Survey.

The role of the Web in the provision of national data and information services: The MIDAS experience

Julia Chruszcz
University of Manchester Computing

MIDAS is a JISC designated national data centre for the UK higher education community providing on-line access and support for a range of large and complex datasets, such as censuses, surveys and time series databanks. In this context, MIDAS is part of the developing JISC funded National Distributed Electronic Resource which is seeking to promote and extend access to electronic information and services to the entire UK higher education community. The expectations of users have changed considerably and we have had to rethink how we deliver data and information to the researcher's desktop. It is not enough to promote awareness of the data resources and their potential applications in teaching and research. We also have to convince the users that their time is being used efficiently, that they can easily identify the data that they want, extract it and put it into a suitable format for secondary analysis. For us this means creating appropriate interfaces for the data - simple enough for a once-off selection and versatile enough for more sophisticated use. This paper addresses the influence of the Web and the expectations of its users on the services provided by MIDAS. We shall describe some the new interfaces to data and information which will be of particular interest to social scientists, in both research and teaching. Julia Chruszcz is Head of National Services at Manchester Computing and is responsible for the strategic management of the MIDAS national datasets service and other national computing services provided by Manchester Computing for the UK higher education community.

Gesine: Integrated retrieval on heterogeneous social science databases via the World Wide Web

Peter Mutschke, Marcus Schommler, Siegfried Schomisch, Udo Riege, Juergen Krause
Informationszentrum Sozialwissenschaften

In the age of the World Wide Web the integration of distributed heterogeneous databases is still an unsolved problem. Moreover, especially in the field of social science, the global information market leads to an increasing need of high value and complex information on social science research and the structure of certain research fields.

The aim of the project GESINE at the social science information center (Bonn) is to develop a retrieval system for the World Wide Web allowing an integrated access to several social science databases of the Gesellschaft Sozialwissenschaftlicher Infrastruktureinrichtungen (GESIS) which offers German language social science information services to the scientific community.

The specific goal of this project is finding suitable retrieval methods and presentation styles for very heterogeneous material, e.g. bibliographical records on literature and research projects, survey data and texts. Particularly concerning social science information, there is still no concept of integration of text and data during information retrieval. Therefore, one of the major tasks of the project is the evaluation of modern indexing and ranking methods and, finally, the implementation of an adequate domain-related retrieval model maintaining large social science fact databases and text corpora.

The technical basis of the prototype implemented so far is a relational database (Oracle) which allows via its text retrieval facilities (Context option) a combined search in structured and unstructured records. By means of the Oracle WebServer we are able to allow a direct access to the Oracle database through the World Wide Web. The WebServer technology enables dynamic generation of HTML-documents regarding a certain database request. The prototype presented offers a differentiated search for social science literature and project documents via the Internet.

The Virtual Data Center Development Project

Micah Altman, Assistant Director
Harvard-MIT Data Center

We propose to develop an instrument to manage and share numerical data easily throughout the university and beyond. This project will refine and extend the prototype data server developed by the Harvard-MIT Data Center and turn it into a free, portable software product that will seamlessly integrate with other data centers and library databases (intra- and inter-University) by supporting a variety of communication and interoperation protocols. By providing a portable software product that makes the process of data sharing automatic, the proposed ``Virtual Data Center'' (VDC) will solve problems for individual researchers -- when they try to locate, subset, convert, analyze, and otherwise use quantitative data; for university data centers and large data archives -- when trying to store, distribute, and permanently archive data and by automatically creating a widely accessible "union catalog" of information (metadata) about the holdings of all Virtual Data Centers worldwide; and for the wider scholarly community -- by making it easier to make data publically available and potentially capturing on the public record a much larger fraction data produced from all scholarly activities. The VDC will make data easier to find, share and preserve throughout the field, and will assist the development of original research and the replication and extension of previous research results. It will reduce the costs and benefits of data sharing, for users and producers of data.

The VDC will be a free, open, and portable data-server capable of running on any Unix or Windows system. It will seamlessly communicate with other databases by supporting a number of communication and query protocols, including SQL over HTTP, Z39.50, and the Stanford Digital Libraries Project's "Digital Library Interoperability" protocol; and by being able to import and export metadata in a number of standard formats including MARC records. It will have modules for emerging standards for generalized mark up languages for codebooks. Our project is designed to be extended through snap-in modules for many activities - such as web-based G.I.S applications, extended password authentication and encryption, and electronic commerce for proprietary systems.

A prototype of the VDC is now in use by the Harvard-MIT data center (http://data.fas.harvard.edu/), and has greatly accelerated data-based research and teaching within Harvard University and MIT.



Concurrent Session 1B: Impact of Technology on Privacy and Community
Wednesday, 1:30 PM

On Cultural Lags and Communications Technology

Dana Fisher
University of Wisconsin-Madison

The Internet continues to grow at an exponential rate. With this growth, organizations are discussing the positive and negative implications of Internet usage both on a personal level and in the workplace. After the defeat of the Communications Decency Act, the issue of regulation of the newest form of communication technology continues to threaten the diffusion of the technology. This paper contextualizes some of the main issues of discussion regarding the Internet. By applying an adjusted version of William Ogburn's theory of cultural lags (1964), I analyze the diffusion, regulation, and eventual socialization of communication technologies. This lag explains some of the most seriously debated potential implications of Internet diffusion: privacy, community and democratization. The paper will look at the diffusion of the telephone, television and finally the Internet in order to frame it as the newest in a series of communication technologies.

Dana R. Fisher is a sociologist who specializes in both the implications of communications technology and sustainable development. Presently, she is editing International Organizations and the Internet: the United Nations in the Next Century for United Nations University Press. She has designed and implemented international networks focusing on the environment and security in Asia while serving as Researcher/Program Coordinator at the Nautilus Institute for Security and Sustainable Development. During her tenure at the Institute, she wrote and presented her research on the Information Age and the growing Global Information Infrastructure (GII) around the world. Presently, her research looks at both the concept of sustainability and the implications of new communication technologies. Prior to her work at Nautilus, she served as an energy lobbyist and computer system administrator for environmental NGOs in Washington, DC. She has done extensive research on the Japanese environmental movement in Japan and the United States. She is presently in the PhD program in Sociology/Rural Sociology at the University of Wisconsin-Madison.

Data Protection and Privacy in the United States and Europe

Juri Stratford and Jean Stratford
Government Documents Librarian, Shields Library and Director of Research Services, Institute of Governmental Affairs

The rapid expansion in electronic communications and commerce over the past several years has raised concerns in the United States over personal privacy in an online environment. These concerns have captured the attention of the public, the media, and policy-makers, and there is new interest in the United States in explicit policies protecting the privacy of electronic transactions and personal information. In the fall of 1997, the Clinton administration announced plans to pursue legislation protecting the privacy of personal medical records. This proposal continues a pattern of multiple policies, directed at subject-specific information, or at various levels of government (e.g. both federal and state level legislation). Our paper would provide an overiew of U.S. legislation and current initiatives and contrast this with the European approach. In Europe, there is legislation at the national and regional (supra-national) level which recognizes privacy as a basic human right and provides a framework for protecting and providing access to personal data of all types. Special attention will be given to the policies as they relate to access to personal data for research purposes.

Data Protection in the United States

Thomas Brown
U. S. National Archives and Records Administration

The paper outlines the current state of data protection laws in the United States, i.e. those laws which impose requirements and restrictions on the collection, maintenance, and use of personally identifiable data by non-governmental organizations and businesses. The United States has traditionally taken a laissez faire approach to data protection, imposing restrictions in only a limited number of circumstances. Advancing technology, however, may demand that the United States expand the scope of data protection. In this regard, several initiatives, especially in the area of protecting personal medical information, are under consideration. Such initiatives are two-edged swords in that they may pose potential dangers for secondary data use while not addressing the dangers to personal privacy.

Web Access to the Data Warehouse: Is It Worth It?

Pat Hildebrand
University of Pennsylvania

Data warehouses are currently very popular. They also pose many challenges. Two of the major challenges to using them are who has access to what, i.e., security, and how to make that access available.

Use of the Web as a means of accessing a warehouse seems to be one of the most popular means of access. Since there is general accessibility to the Web this is both a plus and a minus for warehouse access. While making it possible for users to access the warehouse without special database software which can make the warehouse essentially accessible to everyone, the warehouse may contain information that a very limited number of people should have access to at the same time that it contains data with a much wider user base so that there have to be multiple levels of access.

Dynamic web pages which only allow for warehouse access based on the result of a security procedure are one way in which the security issue might be handled. This is how the School of Arts and Sciences at the University of Pennsylvania is handling the question of who has what access to the student records warehouse.

However, this is far from the complete answer. Different users have different setups so what one will see on the screen at one time another will have to scroll to see. Different users do different things instinctively or look at instructions differently.

The security requirements impose some restrictions on the browser but the access is still much easier for the average faculty member trying to check on class enrollment after preregistration when issuing permits or preparing for the coming semester than a special application would be. The use of Java has increased what options are available via a web page.

The database engine being used for this warehouse is Oracle. Although the security issues required a different web server than the one available under version 2 of the Oracle WebServer, so development has been under version 1, what can be done with the Oracle WebServer is an indication that use of the Web for accessing databases in general and warehouses in particular is an idea that is pushing the Web well beyond its original implementation.

Pat Hildebrand is a member of the team responsible for Web access, originally for the School of Arts and Sciences but expanded to include Engineering, to the University of Pennsylvania's student records data warehouse. Her more traditional, from an IASSIST view, data responsibilities include serving as Penn's official representative to ICPSR.



Concurrent Session 1C: Establishing and Managing Data Resource Centers.
Wednesday, 1:30 PM

A Digital Archive for New Jersey Environmental Data

Ronald C. Jantz and Linda Langschied
Rutgers, The State University

Traditionally, librarians have organized and provided access to print information sources and provided the necessary user training to effectively use information tools. As we enter the age of digital libraries, this mission and service orientation offers new opportunities and challenges to provide access to information that has been relatively inaccessible.

This paper will describe a project undertaken by the authors at Alexander Library, Rutgers University, with grant funds provided by the New Jersey Department of Environmental Protection, and in collaboration with Rutgers University's Ecopolicy Center. Our challenge was to provide a single source of access to the vast amounts of environmental information, both digital and non-digital, that are created by government agencies, consultants, and non-profit organizations. Examples of this information include "fugitive" or "gray" literature, elusive master's and doctoral theses, and digital maps with associated data layers that have been created with GIS tools. The dispersed environmental documents of a state can offer a wealth of information to its citizens, policy-makers and institutions, but only if the information can be made more readily accessible.

This project and the prototype database demonstrate not only the new roles that librarians are undertaking, but also the new tools that can be used in digital libraries. Our challenges in this project were to:

Use of standards, advanced computer tools, off-the-shelf software, and marketing techniques will be discussed as new areas of opportunity for librarians. Also, the authors will discuss the importance of establishing partnerships to implement digital library initiatives, to attract funding, and to assure successful project outcomes through the collaborative efforts of interested parties across the state.

Ron Jantz is the Data Librarian at Alexander Library, Rutgers University. Linda Langschied is Information Services Librarian at Alexander Library.

Project EconData: Dutch Data Service for Economic Data

Albert Bots
Netherlands Institute for Scientific Information Services

In July 1996 NIWI's Steinmetz Archive started the project EconData to establish a Dutch data service for economic data. This service will be integrated with the current activities of the archive. EconData builds on previous feasibility studies conducted by the Economic and Social Institute (ESI) in Amsterdam and the Economic Institute Tilburg (EIT). Both of these studies have been funded by the Netherlands Organization for Scientific Research (NWO). For EconData the Steinmetz Archive receives additional funding from NWO. This grant follows on a recommendation by the Social Science Council (SWR) of the Royal Netherlands Academy of Arts and Sciences (KNAW). EconData aims at broadening the scope of the Steinmetz Archive. New services will be established to support economic research, including macro-economics, business economics and economic modeling. In addition the more traditional functions of a data archive, EconData puts strong emphasis on data brokerage. The data service will act as an intermediary between suppliers of economic data and data users. This will include suppliers of international data and users of Dutch data abroad. For this purpose the project plan includes the establishment of an online register of available data sets, irrespective of whether these data sets are available from the Steinmetz Archive or from other sources. EconData will be evaluated in the summer of 1998. In the paper first attention will be paid to the background of the project. Among others the main results of the preliminary studies will be shown. Next the project plan with the corresponding activities will be described and the results so far will be presented. Further remarks will be made about some specific topics like the attitude of data owners towards the registration of their data files and the use of existing data sets for education and training.

Albert Bots is the project manager of EconData. Besides he is active as lecturer at the department of Economics of the Free University, Amsterdam. He gives lessons in business modeling and information systems. Albert Bots is econometrician and is one the authors of the aforementioned feasibility study by EIT.

It's Science, Jim ...But Not As We Know It!

Paul Rouse
Economic and Social Research Council, UK

The role of national funding bodies such as the Economic and Social Research Council in the United Kingdom is clearly to support highest quality social science research. To do this however we must make good provision for our researchers to have access to the highest quality electronic and other resources. Under the conference theme Expanding Roles of Data Libraries, I would like to explore the role and contribution of funding agencies and the relationships between both the funding agencies and resource centres host institutions, normally Universities. Universities of course have a vested interest in promoting and supporting highest quality research to secure and enhance their budgets. What however is their attitude toward supporting, often at least partially with their own funds, data facilities? How should ESRC and other funding bodies try to help universities, data users and beneficiaries develop and support excellent high quality data resources?



Concurrent Session 2A: Data Access Systems and Metadata Structures.
Wednesday, 3:30 PM

Data Dissemination in an Electronic World: The Essential Role of Metadata

Ernie S. Boyko
Library and Information Centre, Statistics Canada

Statistical offices have long recognized the importance of metadata as a way of identifying which data and information are available and as a means of informing users about the methods, sources and concepts underpinning them. In a time when most information was disseminated on paper, metadata systems consisted of catalogues, indexes, technical appendices, and user guides.

The increasing use and popularity of electronic media and systems has led to an increasing demand for more detailed data in a form that can be manipulated. Low cost storage and delivery tools have led to large volumes of data being presented to user. As well, technological advances make it possible to disseminate more complex data such as public use microdata files to a broader audience. Finally, innovative programs such as Canada's Data Liberation Initiative have brought data, traditionally only available to "expert" users, into mainstream research and teaching.

All of these changes, taken together, make enhanced metadata an invaluable component of the data dissemination activity. This evolving environment has prompted Statistics Canada to initiate a major metadata project. This paper will outline the aim of the project, its scope, and approaches being evaluated. In particular, it will concentrate on innovative approaches to collect and integrate metadata within Statistics Canada in order to facilitate data use not only in the traditional expert user communities, but also as mainstream tools for teaching, research and public access.

The impacts of the electronic explosion have been felt well beyond the work of statistical agencies. This has prompted metadata projects and approaches in other domains. This paper will identify some relationships between the metadata work of statistical agencies and initiatives in other areas, especially libraries and social science data archives. The ultimate question in this regard is whether or not there are emerging global metadata standards for finding, evaluating, using, and managing information. And, if there are, how does the work of statistical agencies relate to these emerging standards?

Implementing a Statistical Metadata Repository at the U.S. Census Bureau

Daniel Gillman, Samuel N. Highsmith, Jr., Martin V. Appel
US Bureau of the Census

This paper describes the results of continuing research at the U.S. Bureau of the Census (BOC) into the content, design, population, query, maintenance, and implementation of a statistical metadata repository and the tools to use it. The goals of the research are many, but the ultimate goal is to create a production statistical metadata repository and the associated tools for the agency.

In support of this goal a multi-dimensional effort has been launched. The major parts of this effort include the development of detailed models for describing the content and organization of a statistical metadata repository; building an agency standard for statistical metadata; development of tools for the collection, registration, and query of metadata; and the integration of a repository into other statistical information systems. This paper will briefly describe the models and the BOC statistical metadata standard. Collecting the metadata to populate a repository is not easy. Survey designers and analysts often create metadata only as an afterthought. When asked about the importance of metadata, the designers and analysts always say that it is important. Then, they say they don't have the time or resources to enter it into a repository. Effective tools will allow them to enter metadata without appreciable extra effort. Success is achieved when the users of the repository perceive it as an indispensable part of their work. Metadata repository tools are divided into several types: population (or collection), registration, crosswalk, maintenance, and query. This paper will focus on the population, registration, and crosswalk kinds. Population or collection tools allow the user to enter metadata into the repository. They can be batch loading tools for entering many records at once, or they can be interactive. Each type of tool has the capability of gathering information common to all objects in the repository; the process called registration. Registration allows users to view the repository as a card catalog. Special rules need to be in place for registration to work properly. Crosswalk tools allow users to view or capture metadata in several different formats, especially the formats specified in metadata standards such as GILS, FGDC, DDI, etc.

The tools mentioned above are described in detail in the paper. Also, the software used to tie the tools, the repository, and other systems together is described. The complete package is still under development, but plans exist to move the entire package to a production staff soon.

Next Generation Tools for Data Dissemination - The Example of NESSTAR

Jostein Ryssevik
The Norwegian Social Science Data Services

NESSTAR (Networked Social Science Tools and Resources) is a joint project between the Norwegian Social Science Data Services (NSD), UK Data Archive and the Danish Data Archive (DDA). The aim of the project is to develop a common gateway on the Internet to the data holdings of several social science data archives in Europe. By means of NESSTAR, users all over the world will be able to:

The system will include advanced user authentication procedures to prevent unauthorised use of data.

The NESSTAR system is building upon the emerging documentation standard from the Data Documentation Initiative and will support the XML-version of this standard. Tools to convert metadata from existing standards to the new one will be an integrated part of NESSTAR. The system is designed as a three-level client server application mainly developed in Java.

The paper describes the technical and organisational sides of NESSTAR, and discuss some of the "political" consequences of such a system for the archive world.



Concurrent Session 2B: Technology in the Classroom
Wednesday, 3:30 PM

A Dialogue Between Technical Support Staff and Social Science Faculty: Implementing Effective Use of Technologies at Carleton College

Paula Lackie, Carleton College
Academic Computing Coordinator for the Social Sciences

This paper is on the various layers involved in the integration of computer technologies into existing and newly developed social sciences classes. Much of the presentation will rely on case studies at Carleton College during the years of 1994-1998. For example, in the social sciences I have worked with faculty in Economics, Political Science, Sociology & Anthropology disciplines as well as cross disciplinary programs such as Educational Studies and American Studies. Their pedagogical needs range widely from presentation of data base management to web page construction to paperless classes and simple software use on a variety of platforms (Mac, Win 3.x & 95, VMS).

The paper will contain discussion on both the institution's response and my role as the Academic Computing Coordinator to meet the curricular needs of the faculty. As evidence I will use records of our technical development from the inception of our department (Academic Computing & Networking Services, FY 1993) as well as interviews with faculty in my division. Further, I will intersperse my own developmental process with how to best present new technologies as well as utilize old ones. How I came to better understand each individual faculty's goals with their classes as well as understand the logical limits vs. the pedagogical limits of a proposed teaching strategy.

Paula Lackie has been the Academic Computing Coordinator for the Social Sciences at Carleton College since 1993 and has been a social scientist engaged in technical support since 1987.

Teaching Strategy and Assignment Design: Assessing Quality and Validity of Information Via the Web

Jean Shackelford and Dot S. Thompson
Bucknell University

In his review of Internet guru Paul Gilster's Digital Literacy journalist John Moran observes that "unlike previous media, the Internet imposes new demands on readers to become their own editors and critics. Most information now available on the web^ comes devoid of clues as to whether it is true and unbiased. Those who master this new form of literacy will reap huge benefits form the news and background available on the Internet. Those who do not will remain awash in half-truths, outright deception, and fraud," John Moran 1997). While Moran may be overstating the case, it is clear that, as educators, we need to help students more critically assess the sties they visit and the information and news they find on the web. By carefully designing the problem of how to better help students assess and think critically about the material they have accessed at a particular web site may be resolved.

Critics such as philosopher David Rothenberg have pointed out that the web had reduced the "quality of the writing and the originality of the research papers. Rothenberg reported that his class had "fallen victim to the latest easy way of writing a paper: doing their research on the World-Wide Web" (David Rothenberg 1997) Although the web brings particular problems to students in the ease with which they find appropriate information, the reality is that students need prompting to evaluate all information sources. To help students learn to evaluate and to think more critically about the material they find a structured three-part assignment was developed for a first semester seminar. A librarian's perspective contributed to the planning of the assignment and in the formation of research skills. The first component required identifying on-line, web, or gopher sources. At least ten sources with descriptions of the strengths or weaknesses of each site were required as well as the kind and amount of web information on the topic. The second component was similar except that all sources were to be traditional library sources. A concluding summary comparing the two approaches helped students to recognize the importance of quality and differences of information available.

The results from this particular structured assignment and potentially others indicate that students feel comfortable and learn how to assess reliable from unreliable information. Their interest in the topic stays high and the level of research was good for first semester students. We will discuss integrating th instruction of web-based traditional library resources and offer a model of assignment design.

Jean Shackelford is a professor of economics and associate editor of Feminist Economics. The fifth edition of her co-authored book, Economics: A Tool for Critically Understanding Society, was recently published by Addison Wesley Longman.

Dot S. Thompson is a reference librarian specializing in economics and management. She is also Bucknell University's designated representative for ICPSR.

Teaching with Technology: Lessons Learned from the First 2,000 Students

Edward Brent
University of Missouri/Idea Works, Inc.

This paper describes some of the problems and opportunities encountered in using technology in the classroom over a period of six years. During this period of time the author has continued to develop and refine a comprehensive program for teaching introductory sociology. This program uses a computer for classroom presentations and provides students with a CD-version they use on their own computers or in campus computing laboratories as a sociological laboratory and study guide. The program has evolved from a disk-based DOS program on IBMs to a multi-platform CD, and web course. Versions of the program to accompany nine different sociology textbooks have been published by three different publishers. The presentation will focus on common problems and strengths of using technology that are likely to apply to other disciplines as well.



Concurrent Session 2C: Digital Libraries and User Support
Wednesday, 3:30 PM

The History Data Service - Using Technology to Enhance Access

Sheila Anderson, Cressida Chappell and Oscar Struijve
The Data Archive, University of Essex

The Data Archive employs a strategy for using and employing new technologies in an innovative and user-sensitive way to enhance and improve access to its resources. Within the Archive, the History Data Service is developing a user-centered, needs-driven programme to enhance and increase access to its collection of historical data materials by an innovative use of the potential of the World Wide Web. The core aim of this work is to develop a multi-leveled system which provides web access to as wide a range of material as is possible within limited resources. We aim to:

We are confident that this strategy will encourage and enhance use of and experimentation with the collection. This paper will describe the work the HDS is undertaking in this area, including a discussion of the selection of the materials for inclusion in this system.

On-Line Technology for Enhanced Secondary Analysis of Public Opinion Survey Data

Rich Clark
Roper Center for Public Opinion Research

Online services are becoming increasingly important not only for access to but for secondary analysis of survey research data. I will review current online public opinion sources and discuss the impact that online technologies are having on the way secondary analysis of survey research is currently performed. Next, I will present a model for where online services can go in the future given the technology that is available today. I believe that the Internet is currently under-exploited for its capacity to aid secondary analysis. On that note, I will examine the potential of making survey data more easily available online to all potential users. This entails varying the format and depth of data so that users find sources suitable to their needs. It also entails the use of desktop technology to store and analyze survey research data and making that technology, or the applications that are developed through that technology, available to other users via computer networks, primarily via the Internet.

Global Access to Data Resources: Where's the Metadata?

Mark A. Carrozza & Steven R. Howe
University of Cincinnati

The University of Cincinnati has recently purchased an HP 330FX Optical Storage Jukebox to store its social science data collection. The Jukebox, with 330GB of direct online storage, is connected to a Windows NT file server that provides both Novell NetWare and Microsoft Networking access to university researchers through the campus' wide area network. The same data sets are available through FTP clients and WWW browsers on the Internet.

While the UC system compares very favorably to almost any other resources for access to secondary data, it offers a dramatic example of how software resources to facilitate access to secondary data continues to lag behind hardware innovations. As storage devices and direct access methods such as these become more common place, data archivists must contend both with increased variation in the access methods, and the bibliographic reference material (metadata) available. This paper addresses these concerns.

The combination of increased storage capacity, the low cost of both hardware and software, and dramatically improved global access via the World Wide Web has made the desktop computer the environment of choice for all but a few social science researchers. Data archives have had to respond by making data more accessible for the desktop computer users.

Ten years ago data users sat at terminals or desktop computers connected to mainframes by slow modems and submitted batch jobs that involved mounting tapes. Five years ago, the same users had purchased PC's with CD-ROM drives and began to access the data on CD-ROMs. Too often, however, these CD resources were merely copies of the original mainframe tapes.

Application software resources for data management and analysis have also improved. While SPSS and SAS are still widely used, there is a wide variety of additional application software resources, ranging from spreadsheets with a far greater range of capabilities than just a few years ago, to vastly simpler programming languages (e.g., Visual Basic), to well designed, easy-to-use relational DBMS systems, to utilities such as DBMSCopy.

Through the decade long movement from mainframe tape to local CD-ROM to massive online-storage available via global networks there has been consistently lagging development in the area of bibliographic reference material (or metadata) for the studies being archived. Foremost in the archivist minds is both the use of metadata for creating comprehensive catalogs of data holdings and for creating user-friendly interfaces to data.

Five years ago the 'state of the art' was seen in examples such as the U.S. Census Bureau's 'GO' and EXTRACT software, NHIS's SETS, and custom extract software for such studies as the NLS. Since then there have been improvement in both the catalogue procedures and the machine-readable documentation available to the social science researcher, but little has changed in the availability of metadata generating and packaging programs that will meet the needs of the social science research community.

Mark A. Carrozza, M.A. is the Data Manager and Network Administrator for the Institute for Policy Research at the University of Cincinnati and Director of the UC Southwest Ohio Regional Data Center. Data management responsibilities at UC include data acquisition and archiving, and training in the use of secondary data for research and instruction.



Concurrent Session 3A: Locating and Linking Diverse Information Resources
Thursday, 9:00 AM

Metadata for datasets as Digital Information Objects of Desire: Identifiers as the Linchpin in the Chain.

Peter Burnhill, Edinburgh University
Director, EDINA & Head, Data Library

The main purpose of this paper is to argue the case for ISSN-based identifiers for social science datasets. As may be implied, this case has been built on experiences working in a project to achieve 'co-operative action on serials and articles' (CASA: an European Union Telematics for Libraries project). The argument will be presented within a schema for metadata standards for inter-operability: descriptors, identifiers, classifiers, locators, formats and transport protocols. It will be argued that there are four important 'demand verbs' in the information economy: discover, locate, request, access. In order to have cost-effective transition along this chain there must be agreement on the identity of the object being sought, invoking the verb 'verify'. The system of identifiers used for journals and other periodicals will be examined and a proposal made based on this, and two other schemes: the S.I.C.I. (Z39.56) and the D.O.I. Both of these latter two schemes are promoted by commercial publishers, and part of the motivation for this presentation is to ask what is required to ensure that the digital library being built for our knowledge industry can protect itself from unwanted consequences from the global information economy.

Metadata both matters and depends upon metaphor. Throughout our history as IASSIST, our members have sought, on behalf of users of research data, to realise dreams of finding aids on 'what exists' and union catalogue of 'who holds what'. During the first ten years that history, in the attempt to reconcile the testing demands of ill-published product from the research process and almost continuous changes in computer-dependency, we have drawn on the metaphor and language of 'bibliographic control', inherited from the library profession, and variously mixed with insights from the archival profession and from that of social science. During the most recent ten years, often through participation of IASSIST members in activities outside that of social science datasets, we have been grappling with the meaning of metadata and the demands for inter-operability within the wider context of the growth of the Internet and the global information economy. If we regard the serial as a complex information, in which the real information object of desire is the serial article, or even the information object contained therein, then this may provide the metaphor we require to identify datasets. This could provide part of the information infrastructure we require for our 'virtual' union catalogues and for our finding aids.

Interoperability - Just More Jargon or a Whole New World for the Arts and Humanities?

Sheila Anderson
University of Essex, Data Archive

The establishment of the Arts and Humanities Data Service in the UK in 1996 heralded the introduction of a range of services long taken for granted among the social science community. Modeled as distributive service, the AHDS Executive, along with its five service providers provides a full range of data library/data archival services in the areas of history, performing arts, visual arts, literary and linguistic texts, and archaeology. Among the services to be offered is an integrated catalogue which will describe the astonishingly wide range of electronic resources available to the arts and humanities communities. The problem facing the AHDS was how to establish a catalogue that could take into account the diversity of materials and approaches inherent in the disciplines served by the AHDS and still produce a system that enabled end-users to easily and simply locate materials of interest. This paper will describe the steps the AHDS has taken to ensure that its catalogue has the necessary interoperability whilst not losing sight of the richness and diversity of the resources it seeks to describe, by an innovative and practical application of the Dublin Core for the content of the catalogue records and Z39.50 protocols to drive the technical issues.

Can Library and Data Archive Meet in Active Support of Research in the Social Sciences? The Case of ILSES

Dr. R. E. de Vries, Researcher, Electronic Services,
Netherlands Institute for Scientific Information Services (NIWI)

Please see
http://www.niwi.knaw.nl

Data material collected for empirical research has traditionally been computer stored and electronically distributed by data archives and data libraries. Whereas publications from the same research were kept, referenced and given access to by libraries. As content providers data archives could not extend their services with relevant book and journal collections, cross referencing and lending of printed material. Libraries could not give access to data related to published research or had the means to expand bibliographic references to also point at data as machine readable outcome of the research process. A situation where data and books are separately referenced without consistent cross linking, have to be searched for in separate catalogues and are given access to by different authorities and with different facilities, has consequences for any one embarking upon new research or in general needing social scientific information. It is not possible to start with general literature searches in libraries and easily trace back publications to the empirical research and collected data that is at the heart of it. Neither can data archive catalogues (even when expanded with bibliographies) help with book and article searches starting from particular data collecting efforts. Properly linking data and publications would need metadata standards that take such relationships into account and coordinated efforts between authors (proper citation of data sources or writing such metadata directly themselves), the library world (referencing with cross linking in new metadata formats) and the data archives (likewise referencing with cross linking). Part of those efforts would also have to be a common catalogue search facility or some form of easy access from one catalogue to information in the other. World Wide Web techniques for linking electronic resources on the Internet but also new metadata initiatives that explicitly hold linking information to related (electronic) resources, have the potential to finally bring data and book together again for searching and retrieval.

A few Internet related projects will be mentioned that already demonstrate first attempts in this direction. ILSES as Integrated Library and Survey-Data Extraction Service, a system of tools and (Internet) facilities, will be expanded upon as a current project funded within the Library Programme of the European Commission and addressing the same goal of integrating publication and data. ILSES accommodates both content providers (libraries and data archives) and end-users. Finally the paper addresses the relative strength and weaknesses of both ILSES and aforementioned other models of achieving some form of integration. An attempt will be made to look ahead at (near) future scenario's, especially in the light of recent metadata developments.



Concurrent Session 3B: Impact of Technology on Libraries.
Thursday, 9:00 AM

Meeting the Needs of Academic Librarians in the Distribution of Electronic Social Science Data

Molly Petrick and Kathryn Murrell
Sociometrics Corp.

Providing access to electronic social science data can be a challenging task. For the past seven years Sociometrics has been offering electronic collections of data archives on CD-ROM and now on the web. The focus of our efforts has been to make electronic social science data as user-friendly possible. This focus has produced promising results for distribution of individual data sets and archives. However, when we assessed the distribution of our entire data collection, we realized we were missing an important audience -- academic libraries. Distribution of Electronic Social Science Data to academic libraries would, we believed, overcome the high cost involved in distributing to individual users and provide greater access to quality social science data. While we had been marketing our entire data collection to academic libraries, we had received little response. We concluded that perhaps our end-user focus was not a selling point for academic librarians and academic collections development specialists. To test this hypothesis, we developed a "three-pronged approach" to survey academic librarians and collections developers in the San Francisco Bay Area. Using a combination of questionnaires, web surveys, and in-person interviews, we were able to determine many of the concerns of this population. The primary concerns of academic librarians included:

The results of our research have led Sociometrics to develop a Librarian Toolkit and specialized packaging for the distribution of our Electronic Data Library. The Librarian Toolkit is a multimedia overview of the features of Sociometrics Electronic Data Library, including details about the studies contained in each archive (including selection process and data preparation), descriptions of the uses of the data library by specific groups, an instructional overview, and full citations of the studies for cataloging purpose. We also changed our packaging of the data library by replacing all paper documentation with electronic (pdf) versions, including quick reference sheets for distribution to library patrons, providing a cross-archive searchable index of all study abstracts for quick reference and selections, and developing a special campus-wide multi-user licensing agreement.

The California Digital Library: Implications for Data Files Collections

Daniel Tsang, University of California, Irvine

The recent creation of the California Digital Library at the University of California promises to radically change the delivery of electronic resources to library users in California. It aims at a "creation of a single statewide digital collection" to serve the University's information needs. While the initial focus has been on science and technology, the library expects to focus on social science resources in the future. This paper explores the implications of a system-wide electronic library as it affects data file collections across the various UC campuses. One of the goals, however, of the CDL is to reach out to the community, in collaborative efforts, to make electronic resources available to the public. How will this work in practice? This paper explores that and other questions and looks at collaborative models elsewhere that could guide the growth of the CDL as it eventually tackles social science data collections.

Daniel Tsang has been data files librarian and a social science bibliographer at University of California, Irvine, since 1986.

Data Liberation, Bridges to Cross

Richard Boily
University of Quebec at Rimouski

In Canada, use of statistical data (microdata files and major databases) for teaching and research is an important and increasing phenomena which doesn't seem to loose strength in the near future. This situation is certainly a major consequence of the Data Liberation Initiative (DLI), a partnership between Statistics Canada, several federal departments and Canada's academic community established in 1996. The idea of providing affordable access to Canadian information results of a cooperative effort among the Humanities and Social Science Federation of Canada (HSSFC), the Canadian Association of Research Libraries (CARL), the Canadian Association of Public Data Users (CAPDU), and the Canadian Association of Small University Libraries (CASUL). Less than two years after its launch, more than 50 universities have joined the consortium, a clear indication of a true willingness to get data more available.

This situation illustrates the fact that high costs were an obstacle to data availability, especially in small universities where the lack of a minimum number of students makes the costs/benefits ratio of buying data higher. However, there are still many obstacles to a true liberation of the use of numerical data. If some Canadian universities have a long history in data services (Carleton University's Data Centre was celebrating its 30th anniversary in 1996), such a tradition does not exist everywhere, especially in small universities.

To maximise use of data files, important efforts at the educational level must still be made, on one hand at the reference staff level, on the other hand at the customer level, including the professors themselves. Use of data implies a good knowledge of extraction and analysis instruments. How these tools can be made accessible to customers who are not able to manipulate data files but for which there is a definite need? How can we satisfy different needs for different types of users? How can data be included in the academic curriculum? How can data librarians play their educational role and how this role can be balanced with the responsibility of the professors? Fortunately, interesting answers are being unfolding.

Richard Boily is a librarian at the Universite du Quebec a Rimouski. His main responsibilities are information access for official publications, and data information service. For the latter, he is the Data Liberation Initiative (DLI) official representative for the university. He is also a member of the Working Group on Data of the Conference of Rectors and Principals of Quebec Universities. He has an undergraduate degree in Biology from Laval University, a Master's degree in Public Policy Analysis, also from Laval, and a Master's degree in Library and Information Science from the University of Montreal.



Concurrent Session 3C: Searching the Web: General Strategies and Special Topics
Thursday, 9 AM

Relational Processes in a Hierarchical World: The WWW as an Impediment to Information Acquisition.

Gregory Haley, Columbia University
Director, Electronic Data Service

An increasing volume of materials are created every year only in digital formats. The range of these documents varies from personal web pages to resource or user guides. Similarly, the quality of these "web-published" sources run the gamut from diatribes written by single-issue fanatics to the latest, cutting edge research by eminent scholars. All of these digital publications are arranged in various trees, featuring often complex hierarchical structures. All to often, ones ability to find a document depends on knowing the exact route to follow from the top of the tree down to the document needed. Librarians and other researchers are well aware of the numerous "finding guides" and random URLs that are passed around. For a user who does not know the correct path to follow, the process can be frustrating as well as futile.

This structure of the World Wide Web, a dense forest of hierarchically arranged branches and documents is in direct contrast to the relational method by which most people acquire knowledge. Libraries are organized is such a way to facilitate the associational accumulation of materials. Books are cataloged based on a classification system that permits locating books of a particular topic in the same area as other books and journals carrying the same classification scheme. Card Catalogs and OPACs allow for subject and keyword searches that collect citations to topically related materials that may fall within other classification schema.

The various classification schema, card catalogs, and OPACs are essentially metadatabases. They provide data for leading readers to other data (books and periodicals). These metadata are created in highly controlled systems for cataloging and sharing cataloging information. This cataloging process, however, cannot keep up with the burgeoning number of web documents. The challenge for digital collections, including data libraries, is how to provide metadata that will lead users more quickly to their resources.

This presentation will review some of the recent developments in metadata practice, or perhaps pre-practice is the better term. It will then discuss at greater length the emerging Dublin Core standards and their modifications through the Warwick Framework. Finally, it will offer a model for using Dublin Core to build a searchable index of documents.

Global Access and Local Support to the Processes of European Integration in Central and Eastern Europe through Global Networking

Dusan Soltes, Faculty of Management
Comenius University, Slovakia

The proposed paper will deal with these and various other related issues of interconnection between the global networking and processes of European integration.

Dusan Soltes is a Senior Lecturer for European Integration as well as MIS at the Faculty of Management of the Comenius University of Bratislava. In addition he has been a long- term UN Expert with numerous assignments to various developing countries of Asia and Africa and has been an external advisor to the Deputy Prime Minister on European Affairs and International Relations and founder of the Department of European Integration at the Office of Government of the Slovak Republic and its first director (1995-6).

Searching Commodity Classification Trade Data with Ordinary Language

Frederic Gey, UC Data Archive and Technical Assistance
UC Berkeley

An important social science database is the U.S. Census Bureau's U.S. Imports and Exports numeric datasets on CD-ROM. The following table shows the amount of U.S. automobile imports for the past several years:

PASS MTR VEH, SPARK IGN ENG, NOT OV 1,000 CC
  General Imports Imports for Consumption
Year Quantity Customs Value Quantity Customs Value
1991 173,597 $783,208,626 173,097 $779,772,191
1992 166,951 $736,087,145 171,134 $738,847,548
1993 200,043 $904,605,255 204,215 $907,734,708
1994 178,562 $753,516,749 178,562 $753,516,749

Yet if one does a commodity search using the word "automobiles" on the commonly used WWW database (http://govinfo.kerr.orst.edu/impexp.html) one finds no results. Moreover, if one does the search using the word "cars," one obtains the misleading result "Railway or Tramway Stock, etc." A searcher interested in this database must be aware that the general classification heading for this commodity group is "Tractors, Vehicles for Pass, Goods, Special Purposes" and the particular classification for cars is "Passenger Motor Vehicles, Spark Ignition Engine" as above. Other examples of obscure classification are: "Bovine Animals" instead of "Cows" and "Equine" instead of "Horses".

This paper describes a project to map from ordinary language queries searches into specialized classification schemes such as the International Harmonized Commodity Classification or the U.S. Standard Industrial Classification. The project aim is to develop "Entry Vocabulary Modules" for searching unfamiliar metadata.

Sustainable Development Indicators Databank

Tom Parris, Harvard University
Environmental Resources Librarian, Harvard College Library

Students of sustainable development often require access to comparable time-series environment and development indicators. The interdisciplinary nature of the topic requires that researchers gather indicators from multiple statistical compendia published by a variety of governmental, inter-governmental, and non-governmental organizations. Unfortunately, the mechanics of this common research task requires an inordinate amount of time and energy.

The first challenge is to identify which compendia contain which indicators. Library catalogs do not identify the contents of compendia on a variable-by-variable basis. While good reference works exist for this purpose, students must make a significant effort to seek them out and learn how to use them effectively. A student seeking a handful of indicators will likely find them in three of four separate compendia.

The second challenge is learning to use the extraction software that accompanies each compendia. Each compendium comes with its own, often idiosyncratic software which takes time to master.

The third, and often most difficult, challenge is reformatting the data extracted from multiple compendia into a common format for analysis. One compendia may produce a file with a column for each year, while another produces a file with a column for each country, while a third produces a file with a column for each indicator. All three compendia will likely use different coding schemes for country names. The task of converting these different formats into a common integrated dataset requires significant programming skills and time.

The combined result of these three challenges is that many students simply change research topics to less interdisciplinary topics for which all of the required data resides in a single compendia.

In order to encourage this type interdisciplinary research, the Harvard Environmental Information Center is constructing a Sustainable Development Indicators Databank that will provide world wide web access to data from multiple compendia and deliver data files using a common format and coding structure. Public domain data will be accessible worldwide. Access to proprietary data will be restricted to Harvard affiliates.

The initial focus of this effort will be for comparative, national scale, annual, time-series indicators. A proof-of-concept edition of the Databank is now availble that provides access to the World Bank's World Development Indicators 1997 on CD-ROM. Additional datasets will be incorporated this coming summer.



Concurrent session 4A: Applying Metadata Standards
Thursday, 10:30 AM

Using HTML to Document a Panel Survey

Elaine Prentice-Lane, ESRC Research Centre on Micro-Social Change
University of Essex

Data is only as good as its documentation, and no more so than in the case of a panel survey in which constant, and time-dependent changes to the data contents and structure must be described and explained if efficient use of the resource is to be made.

The British Household Panel Study (BHPS) is an annual panel survey of some 10,000 individuals in 5,500 households, collected by the ESRC Research Centre on Micro-Social Change. Six waves of data have now been released, which up until wave five were accompanied soley by paper documentation and/or its wordprocessed document source. All documentation is now also produced in HTML format, and can be accessed on the WWW (http://www.irc.essex.ac.uk/bhps/doc), and plans are in place to adapt it to provide file: base medium as well.

Although the HTML version of the documentation follows the basic structure of the printed version, HTML's hyper-text facilities have been used to the fullest extent to document and permit rapid navigation through its complexities.

The presentation will describe the BHPS, discuss the general and specific problems attendant upon its use, and describe the design, limitations and systems in place to automate production of the HTML documentation itself, as well as the plans for its future.

The Heart Health in Canada CD-ROM; Data as Program - Using Standardized Metadata to Link Research, Policy and Action

William Bradley
Health and Welfare Canada

The presentation discusses the Heart Health in Canada CD-ROM, which has been created as a research, promotion and policy vehicle for the Canadian Heart Health Initiative. The CD-ROM uses metadata standards to pull together data created from ten, independent provincial surveys, and to link the data and codebooks with research reports and policy documents in a population health framework. The data gathering and dissemination activities are seen as integral components of the associated health promotion programs, which are themselves situated within a broader health determinants and population health context. The latter is achieved by providing a metabase to facilitate data access and comparative analyses for 150 key data sets in the population health, social and economic domain in Canada. The CD-ROM enables students and researchers to drill down to the underlying data from fact sheets and research reports; to browse, search and select questions and variables of interest; to obtain extracts automatically in SPSS, SAS, NSDStat+ or TPL format from data sets that are licensed locally; to print customized inventories and codebooks; and to build custom libraries of questions and data extracts of relevance to local research interests and mandates.

The Dutch Data Documentation Initiative

Marion Wittenberg
Netherlands Institute for Scientific Information Services (NIWI)

Since September 1997 the Netherlands Historical Data Archive (NHDA) and the Dutch social science data archive, Steinmetz Archive (STAR) are fused in the department Data Archives of a new institute, the Netherlands institute for Scientific Information (NIWI). The first collectively project that is carried out is the Dutch Data Documentation Initiative. The central aim of this project is to update and integrate data archiving procedures and documentation standards of NHDA and STAR in the following sense:

It is the intention to come to a situation in which all data documentation activities ( data registration, data acquisition, data description) are carried out by both data archives by using the same system, in principle. This does not mean that the documentation standards and procedures of NHDA and STAR will be 100% identical. The preservation of electronic information for future use entails the need to preserve the context, structure and contents of the data. Both archives share methodological and technological problems, but the ways in which they are solved also show variations, because context, contents and structure of the electronic files that are processed by the two types of data archives are different. In the paper the first results of the project will be presented. The way the study description scheme is modified compared with the DDI.

Marion Wittenberg is sociologist and works at NIWI, Netherlands Institute for Scientific Information Services. She is responsible for the acquisition of data and for the documentation standards of the Steinmetz Archive and she participates in the Dutch Data Documentation Initiative project.



Concurrent Session 4B: Designing and Delivering Data on the Web, Part One
Thursday, 10:30 AM

A New System for Web-based Documentation, Analysis, and Distribution of Survey Data

Thomas Piazza
CSM Program, University of California, Berkeley

For the past two years, the Computer-assisted Survey Methods Program at the University of California in Berkeley has been developing software for the documentation, analysis, and distribution of survey data on the World Wide Web. The currently available procedures include the following:

Since the documentation for the data files can be made very accessible, and since the data analysis procedures are very simple to use, this type of data archive is ideal for many applications. It provides a means of introducing students to data analysis without them having to spend a great deal of time on technical details. It is also a way of improving public access to policy and public opinion data, and it provides statistics on demand for users of data libraries. To facilitate user access in various countries, the interface can be set up in any language readily displayed by a Web browser. Some current applications can be viewed at the following URL: http://csa.berkeley.edu:7502

Thomas Piazza is the Manager of Statistical Services at the Survey Research Center and the Computer-assisted Survey Methods Program at the University of California in Berkeley. He has been involved with the design, analysis, and documentation of surveys for more than 20 years.

Social Science Data Analysis over the Internet: Design and Development Issues

Eric Lang
Sociometrics Corp.

Design considerations, remote analysis issues, and progress on Sociometrics "Multivariate Interactive Data Analysis System" (MIDAS) will be presented. The goal of the service is to provide broad access to interactive data analysis, via the Internet, of over 150 health and social science databases containing over 150,000 variables from seven national data archives. The service will include search & retrieval programming and variable-level and study-level links to over 31,000 pages of supporting documentation, such as original instruments and User's Guides in Portable Document Format. A custom interface will allow users to easily interact with the system through any popular html browser. Online data analytic procedures will include weighted and unweighted frequencies, percentiles, and measures of dispersion and central tendency, as well as two-way tables with measures of association. Users will be able to define case subsets and filters for analysis. Output can then be downloaded or printed. Users will have the ability to download entire datasets and documentation in SAS and SPSS compatible formats, as well as the ability to define variable subsets or case subsets for user-customized dataset downloads.

Eric L. Lang, Ph.D. is a Principal Research Scientist and Director of the Research Support Group at Sociometrics Corporation in Los Altos, California. He directed the development of "Socionet", Sociometrics' commercial social science WWW server. His Ph.D. is in Social Psychology (Univ. of Michigan) and his interests include data archive development and services, the Internet, and social science methodology.

Delivering Data to Undergraduate Classes

Patrick M. Yott, University of Virginia
Director, Geospatial and Statistical Data Center

Introductory undergraduate courses in statistical methods and survey analysis often fail to instruct students in the variety of data available and the skills required to locate, obtain and incorporate existing data into research projects. Typically such classes use a packaged set of data such as a single year cross-section from the General Social Survey (GSS). Working with Faculty and Teaching Assistants, the Geospatial and Statistical Data Center has developed a suite of Web-based data extraction and analysis tools that provide access to a variety of public, commerical and locally produced data sets. This talk will explore several of these and will focus on the instructional and programming requirements of such services.

Establishing a Data Resource Centre

Bo Wandschneider
University of Guelph

The following paper will outline the process of establishing a Data Resource Centre (DRC) at the University of Guelph. Issues such as targeted audience, teaching needs, research needs, levels of service, staffing, hardware, software, security and delivery tools will be discussed.

Prior to the fall of 1996 Guelph was in a situation similar to many other research/teaching institutions. There was no formal procedures in place for acquiring, distributing and analyzing data in an electronic format. It was the responsibility of individual faculty, researchers and students to develop the necessary skills. There was limited statistical support and there was overlap in acquiring data resources, as well as a duplication of efforts with respect to the use of electronic information. In the fall of 1996 a pilot project was started at the University of Guelph to consolidate the delivery of electronic data resources. The project was a joint venture between Computing and Communication Services and the Library. Staff were seconded from both service providers and centralized facilities were established. After a very successful pilot, the DRC became a full service facility in the spring of 1997. Discussions are well under way to develop this service into a seamless, shared resource between the University of Guelph, University of Waterloo and Wilfrid Laurier University.

The paper looks at the motivation behind the DRC, how information on demand was gathered, and who the target audience is. Certain goals and objectives were set and the paper looks at how these goals are being achieved and some of the obstacles encountered. A large portion of the efforts in the DRC are centered around the development of a web retrieval system . A perl script has been developed to interface with SAS, which allows an enormous variety of data to be easily mounted, distributed and analysed on-line. In the 14 months since the first iteration of the script over 200 surveys have been mounted. The paper will also look at how the service is being integrated into the library and how staff are being trained to use the interface, as well as prepare data to be mounted on the web.

For more information please refer to http://drc.uoguelph.ca.

Bo Wandschneider has been responsible for the DRC project at the University of Guelph since its implementation in December 1996. Prior to that he was Computing Coordinator in the Department of Economics at the University of Guelph (1986 - 1996). His educational and research background is in the area of applied economics . For more information see: http://www.uoguelph.ca/~bo



Concurrent Session 4C: Getting Quality in Qualitative Data Analysis.
Thursday, 10:30 AM

From Scissors to Pentiums: Where We have been and Where We are Going in Qualitative Data Analysis

Wendy Wright
Department of Social Sciences Computing, UCLA

Wendy Wright will give a brief orientation to the field of qualitative data analysis discussing historical developments relevant to the field, types of qualitative data analysis software available, benefits and pitfalls of using qualitative data analysis software, and why it is important for Social Sciences Computing departments at universities to recognize the growing field of qualitative data analysis.

Wendy Wright is Manager of Planning and Development for Social Sciences Computing at UCLA. In addition, she is completing a doctorate in Medical Anthropology at UCLA. Her dissertation is on cervical cancer in Mexican-American women. (wright@ssc.ucla.edu)

Qualitative Data Analysis Software as a Tool Rather than Dictator of Process

Raymond Maietta
Indiana University

This paper will focus on the importance of an analyst's personal research style. This style should dictate when and why a software package's features contribute to an examination of qualitative data. Many qualitative data analysis programs are inherently flexible, but not intuitively flexible. Often novice users allow program features (rather than their own research goals) to guide their analyses. As a consultant, I often see how persistent misunderstandings of the logic of NUD*IST software lead to misguided use of the program. For example, NUD*IST's omnipresent node explorer and power search functions tempt users to data reduction. Alternatively, I suggest strategoes for NUD*IST instruction that emphasize flexible, fluid interaction with qualitative data. Ray Maietta received his Ph.D. from the State University of New York at Stony Brook in 1996. His dissertation, "Lost in the Shuffle: In Search of Wayward Friendship," was a qualitative analysis of friendship in a southwest US suburb. Currently, he is an NIMH postdoctoral fellow in the department of sociology at Indiana University. His research interests are on interpersonal relationships and sociology of culture and he is a trainer and consultant in the use of NUD*IST.

The Use of Computer-Assisted Software Programs in Anaylzing Qualitative Data: Methodological Implications

Sharlene Hesse-Biber
Boston College

Sharlene Hesse-Biber will discuss the methodological controversies surrounding the use of computer software programs to analyze qualitative data. She examines several issues: I: The issue of art versus technology; II: Blurring the line between Quantitative and Qualitative Data; III: Issues of Validity and Reliability. She discusses recent cutting edge computer software which analyzes multi-media data including images, video and audio discs and tapes and addresses the methodological issues involved in analyzing multi-media data.

Sharlene Hesse-Biber is professor of sociology at Boston College. She has published widely in the field of computers and qualitative data analysis. Her most recent co-authored articles " Users' Experiences with Qualitative Data Analysis Software," and "New Developments in Video Ethnography and Visual Sociology--Analyzing Multimedia Data Qualitatively," appeared in Social Science Computer Review. Dr. Hesse-Biber is co-developer of the computer software program, HyperRESEARCH, which analyzes qualitative text and multi-media qualitative data. (http://www.reasearchware.com)

Issues in Principled Choice of Qualitative Data Analysis Software

Eben Weitzman
University of Massachusetts, Boston

This paper will address two issues: A principled approach to choosing a software package; and some comments on the state and direction of the field. These issues are intertwined. Choice should be based, not on an abstract notion of which is the "best" or "most powerful" program, but on a careful matching of program abilities, requirements, and constraints to your individual dataset, analytic needs, and personal aptitudes and style. A number of observations will be offered on the current, emerging, and hoped-for state of the field with respect to its support for the varieties of such needs among researchers. Eben A. Weitzman received his Ph.D. in social and organizational psychology from Columbia University and is currently Assistant Professor, Graduate Programs in Dispute Resolution, University of Massachusetts Boston, and Research Associate at the International Center for Cooperation and Conflict Resolution. His interests are in organizational development, cross-cultural conflict, conflict resolution, intergroup relations, and qualitative research methods, and he is the senior author of Computer Programs for Qualitative Data Analysis (Sage, 1995), with the late Matthew B. Miles. (weitzmane@umbsky.cc.umb.edu)



Lunch Speaker: People are People -- Even online where no one can see them

Thursday, 12:00 Noon

Stacy Horn, Founder of ECHO, the virtual salon of New York
Author of "Cyberville: Clicks, Culture, and the Creation of an Online Town"

Stacy Horn founded ECHO in 1989 as a virtual salon of New York City, similar in many respects to THE WELL in California, but quite distinct in its own organizational culture. The genesis of ECHO and the ensuing problems of running a virtual salon and managing arising conflicts between its members are described in Stacy Horn's recently published book 'Cyberville'. Taking issue with a common assertion, Stacy Horn maintains that people's true characters and personalities take precedent over attempts of role-playing and of assuming a fake identity. Consequently, cyberspace will not dramatically alter the essence of human interactions, rather it just adds another channel of communication simply increasing the frequency of human interaction as we have known it all along.



Plenary Panel: Archiving Data from Government Supported Research: Policies, Practices, and Possibilities

Friday, 9:00 AM

Jordan Leiter, National Institute of Justice
William Bainbridge, National Science Foundation
Carl Schmitt, National Center for Education Statistics
Chair: Joel Garner, Joint Centers for Justice Studies, Inc.
Discussant: Bridget Winstanley, ESRC Data Archive, University of Essex

This panel addresses the current Federal policies and practices for the public archiving and use of data generated as a result of extramural research programs. The approaches of several agencies vary in the extent to which 1) policies are established pra ice or in development, 2) archiving is an expected product of research awards, 3) the scope of research data included, 4) the expected schedule of archiving, 5) procedures for supporting data archives, 6) programs encouraging secondary data analysis, and ) the mechanisms for reviewing and evaluating agency policies.



Concurrent session 5A: Designing and Delivering Data on the Web, Part Two.
Friday, 10:30 AM

CASWEB: A Web-based interface to UK Census area statistics

James Harris
University of Manchester

This paper will address the digital dissemination of census data in the UK. A number of important weaknesses in the 1991 model of data access are identified and solutions explored in the context of the CASWEB experimental Web-based interface to Census area statistics. This project, which is funded under the Economic and Social Research Council 2001 Census Programme, is being carried out in close consultation with the UK Census Offices and the academic census user community. The system comprises a large relational database accessed via an intuitive Web interface consisting of both text based menus and desktop mapping functionality embedded within the user’s web browser. The interface allows the user to dynamically select, subset, crosstabulate and interrogate census counts and associated metadata. The map-based front-end places spatially-referenced census data within its geographical context and incorporates a number of spatial data resources including the census boundaries and digital map data.

The system has been implemented across a range of development environments and various platform/software combinations have been evaluated during the course of the project. The Web interface uses a combination of HTML forms, server-side scripting and proprietary server software to pass SQL queries to the database via CGI and the Web server API. The implications and advantages of employing advanced methods for user interaction and data retrieval will be discussed and the presentation will include a live on-line demonstration of the interface.

Taking Web-Based Data Services to the Classroom and Beyond: The ISLAND Model

Brian Kroeker, University of British Columbia

Some data extraction programs have suffered from two important factors so far, these being:

The data user is not a single type of person with a single level of skill, nor are all users equal in terms of training and experience, nor in what they wish to do with datasets.

The same is also true of anyone who maintains a data extraction system for others. In most libraries, staff time is a quantity there is never enough of, so the creation of a data extraction system must take into account this fact by making such a system uncomplicated yet powerful, flexible yet requiring relatively few highly technical computer skills in order to quickly update and add features to such a system.

This paper explores the ISLAND data extraction system at UBC, and identifies actions taken in the creation of this system to overcome some of these difficulties.

Brian Kroeker has been a Programmer/Analyst at the University of British Columbia for nine years. His major goal over the past few years has been adapting data files for access on the World Wide Web with a focus towards effective interface design.

Global Access to the Integrated Public Use Microdata Series: An Update

Steve Ruggles & Matt Sobek
University of MN, Dept of History

This paper describes our system to distribute the data and documentation of the Integrated Public Use Microdata Series (IPUMS). The IPUMS combines all existing national U.S. census samples from 1850 to 1990 into one compatible database. We presented a paper on the project at the IASSIST/CSS conference two years ago, describing our future plans for a web-based data extraction system with hypertext documentation. Since then we have received funding to proceed with the project and have recently completed our comprehensive revision of the database--IPUMS-98, which is now available to the academic and public policy community. The data extraction system has advanced beyond its preliminary form, and most of the hypertext documentation is available on-line and can also be downloaded locally onto PCs. Attendees at a recent planning conference had numerous suggestions about our plans for a transportable database extraction system that would be mirrored at their institutions. Since this is a four-year project, we welcome the chance to discuss these matters with the wider data-user community as we continue to develop our dissemination system.



Concurrent session 5B: Building and Preserving Digital Archives
Friday, 10:30 AM

Preservation of Electronic Records: The Roper Center Experience, 1946-1998

Marc Maynard
Roper Center for Public Opinion Research, University of Connecticut

During the past several decades, data archives have faced a multitude of obstacles in carving their role as integral parts of the research community. These challenges include such things as balancing the needs and expectations of users with the realities of collection condition and variety of data formats; developing criteria for data acquisition; keeping up with new media formats; and training and retaining staff with the unique amalgam of skills and talents needed by data librarians. Challenges of this nature have been overcome, with varying levels of success, by data libraries throughout the world.

The advent of the Internet and the current focus of new technological developments on networking and data access have added new challenges to the operation and, potentially, the existence of data archives. The argument for the continued existence of physical data archives is undergoing scrutiny in these days of "virtual archives." When any scholar, research institute, commercial firm or interest group can host a "virtual archive" on the World Wide Web, what are the incentives and motivations for maintaining social science data archive facilities? What are the incentives for data producers to archive their materials at established libraries? What are the incentives for researchers to utilize archive services, when (at least some) data can be found elsewhere?

The Roper Center has been and continues to be faced with the challenges mentioned above. While these issues are new in the context of the Internet, this paper seeks to re-examine them by looking at the early years of the Roper Center and the development and growth of both it's data collections (regarding substantive and technical issues) and it's mission. The commercial nature and age of it's collections makes the Center a unique enterprise and one from which much can be learned.

Marc Maynard is Assistant Director for Technical Services at The Roper Center for Public Opinion Research, University of Connecticut.

Archives Choice between Museum and Data Library -- The Danish National Archives Adaption to the Information Technology Progress in Denmark

Lars Kristian Larsen and Lise Qwist Nielsen
Danish National Archives

The rapid development of technology use in the Danish central administration has forced the the Danish National Archives to adapt to this technology progress, and to make a choice between becoming a museum of paper archives or to develop into a modern archive, handling any type of archives that Danish authorities are using now or will use in the future. The Danish National Archives chose the Information Technology Strategy, and became by this choice an active component in the Danish information technology progress. The Danish government has presented a solid platform for adaption to the challenge of the IT modernization. - Firstly with "Information Society 2000", the policy paper that was to be mplemented by a new Ministry of Science. The policy paper has become a guide for many central- and municipal authorities using modern information technology as a tool in modernizing the administration. In this process many authorities are trying to implement and prepare the use of electronic communication and archiving. - Secondly a 1996/97-revision of the archival legislation gave the Danish National Archive the authority and responsibility to prepare the future handling of electronic archives. With this paper it is our ambition to present the adaption strategy of The Danish National Archives to information technology in an archival perspective by:

Lise Qwist Nielsen (Master of Library and Information Science) is working as an Archivist in the Danish National Archive, dealing with appraisal and selection of electronic archives from the public sector Lars Kristian Larsen (M.A. of Political Science) is working as an Archivist in the Danish National Archive IT departments section, dealing with standard and method development.

Email as Record: Challenges to Traditional Archival and Records Management of Electronic Records

Mark Conrad, Center for Electronic Records
US National Archives & Records Administration

The National Archives has preserved and made available selected electronic records of the U.S. Federal Government for over two decades. Most of the records that have been accessioned have been statistical data sets or files from simple database applications. Today, however, agencies in the U.S. Government are using computers to produce a greater variety of increasingly complex electronic records. These new records pose serious challenges to traditional archival and records management practices. In this presentation I want to look at records created by e-mail systems to illustrate some of these challenges.

The National Archives has initiated multiple projects to select, preserve, and provide access to several million messages from e-mail systems in Federal agencies. This presentation will report on the challenges we have encountered in carrying out these projects. Some of the challenges these records pose are: the sheer volume of records to be processed; the difficulty in sorting the wheat from the chaff; the complexities of redacting restricted information; the difficulty in identifying and migrating all the component parts of e-mail messages as systems become obsolete; the challenge of helping researchers find relevant records from a corpus of several million messages.

Mark Conrad is an archivist currently working in the Life Cycle Management Division of the National Archives and Records Administration (U.S.). He has been working with archival electronic records for the past seven years.



Concurrent Session 5C: Analysis of Large Data Sets
Friday, 10:30 PM

Applying Parallel Processing to Social Science Data: Pushing the Limits

Albert F. Anderson & Paul H. Anderson
Public Data Queries, Inc.

Computing and information system technologies have had a dramatic impact on the management and analysis of social science data. Massive census and survey data sets that just a few years ago could be handled only in large, centralized mainframe environments are now routinely analyzed on desktop workstations and PCs. One consequence of the increased computing power and storage capabilities has been the capability to handle increasingly large data sets. However, as the data sets available to researchers and policy makers have moved from kilobytes through megabytes to gigabytes and, soon, terabytes, of data, the demands for processing power have stayed ahead of the capabilities of technology.

Many of the data management and analysis tasks that face social scientists are inherently parallel. Typically, the same processing steps are applied to each of perhaps millions of data records. As a consequence, traditional handling of large data sets has been constrained as much by the input capabilities of the available computing systems as their processing power. Paralleling the input/output (I/O) data stream along with data processing tasks has been demonstrated to dramatically decrease the processing time required to handle data sets ranging to hundred of millions of records. Thus, parallel processing is evolving as the next technology to be exploited to the advantage of social scientists.

The authors have more than five years experience in applying parallel computing systems to the management and analysis of social science data. Over that time, processing speeds have increased from a few megahertz (MHz) to hundreds of MHz, hard disk storage from megabytes (MB) to gigabytes (GB), disk access from kilobytes per second (KBPS) to megabytes per second (MBPS), and random access memories from kilobytes to megabytes and now gigabytes. More power is available on the desktop today than was available in the largest mainframe systems of a few years ago. The consequence has been that tasks that once took days can now be accomplished in seconds.

The authors are developing a commercial system, PDQ-Explore, capable of providing interactive access to data sets as large as full national censuses. The effort has involved the design and implementation of a system highly optimized for handling social science data and analytic tasks. It has required balancing performance across the various subsystems (storage, I/O, processing, memory, and inter-processor communications) of the parallel systems. This paper presents a summary review of past progress, outlines the strategies used to achieve maximum performance for specific tasks, and focuses on the current challenges in applying parallel processing power to social science applications. These challenges vary from conceptually simple but challenging procedures such as determining median household income by state, race, and family structure from national census microdata to more complex resampling techniques and iterative fitting of models to large data sets.

Support for this work comes in major part from Small Business Innovation Research (SBIR) and Technology Transfer Research (STTR) grants from the National Institute of Child Health and Human Development (NICHD) and the National Institute on Aging. For information about PDQ, Inc, and PDQ-Explore, see: http://www.pdq.com .

Albert F. Anderson is currently the Director of Research for Public Data Queries, Inc. (PDQ), a family-owned company in Ann Arbor, Michigan. He has a Ph.D. in sociology from Iowa State University. He retired from the University of Michigan in 1996 following 25 years as the co-head of the data processing section at the Population Studies Center. Paul H. Anderson is currently a Vice President and Director of Technological Development at PDQ. He has a master's degree in computer engineering from the University of Michigan and has ten years of experience designing and implementing academic, research, and commercial computing applications on single and multiple processor platforms.

Dynamic Exploratory Data Analysis: The Users Requirements

Derek Bond
University of Ulster at Coleraine

The growth of computing power has, generally, not led to development in statistical techniques for the analysis of large an complex datasets but to a downsizing of computer power needed. For example, a quick scan of most applied socio-economic journals shows that most of the quantitative techniques utilised have varied little in the computing power required from those used thirty years ago.

In the current situation there is much opportunity for new non-parametric techniques to be developed which will allow for the graphical and spatial analysis of complex and large datasets through interactive tools.

Such analysis however places new requirements on data librarians but in providing data access to datasets not necessarily held locally, local support in terms of understanding the limitations of the datasets and the provision of software which allows such analysis.

This paper considers, through practical examples drawn from experience of trying to develop such techniques for UK and RoI data, the challenges this approach to data analysis raises for IASSIST/SSCA members.

Derek Bond is a Senior Lecturer in the Ulster Business School and Director of the Northern Ireland Regional Research Laboratory.

Data FERRET: A Tool to Tabulate and Download Demographic Data

Cavan Capps
US Bureau of the Census

Data FERRET is an interactive Internet system designed to provide access to demographic surveys collected by the Bureau of the Census for Census and other Federal agencies. The system provides interactive data tabulation, downloading, and soon graphing for large micro data sources. The system is designed to be distributed and can support data hosted in remote locations with the same interface and tool set.

Cavan Capps has worked with data bases and networks for over 17 years. Throughout his career he has labored on issues related to data as a economist, or as a software engineer. Currently he is the project manager of the Data FERRET component of the Census DADS effort. Data FERRET provides intelligent internet access to many survey record data bases. A few of the surveys include the Current Population Survey, the Survey of Income and Program Participation, the Health and Nutrition Examination Survey.