|
P1: Data Futures: Perspectives from Statistical Agencies
Chair: Stephen Fienberg, Carnegie-Mellon University |
Speakers:
As we approach the new millennium, many government statistical agencies are reconsidering the role of their activities in the collection of census, survey, and other statistical data, and in their dissemination. New forms of data loom large on the horizon, but so do issues of confidentiality and access. This panel features distinguished statistical agency leaders from three North American countries and from the United Nations. The panelists will share their perspectives on what data the future may hold.
|
P2: Global Directions for Large National and International Survey Research Projects
Chair: Scott Bennett, Carleton University |
Tom Smith, Director, General Social Survey, National Opinion Research Center
A. The Globalization of Survey and Social Science Research
- International Social Survey Program
- Other Studies
B. The Diffusion of Data via the Internet
- Current products
- GSSDIRS
- Others
- New NSF Proposal
C. Intersection of Global Surveys and the Internet
- The Old with the New
- The New with the New
|
P3: The Impact of the Net on Data and Information Professionals
|
Barbara O'Keefe,
Director, Media Union, University of Michigan
[Abstract not yet available.]
| A:1 Models of Disseminating Data by Statistical Agencies
Wednesday, May 19 1100-1230 Place: Rogers Communication Building, Ryerson Polytechnic University Chair: Ernie Boyko, Statistics Canada |
The Australian Data Dissemination Initiative: access to confidentialised unit record files for University teaching and research
Dr Len Smith,
National Centre for Epidemiology and Population Health, Australian National University
Under a newly negotiated agreement between the Australian Bureau of Statistics (ABS) and the Australian Vice Chancellors' Committee (AVCC), ABS has agreed to make confidentialised unit record files from its hosehold surveys and the census sample available to university teachers, researchers and students at no cost to the user. AVCC will pay an annual subscription which will replace the previous system of charging individual users.
It is expected that this agreement will later be extended to cover births, death, marriages, divorces and immigration data which ABS compiles from information provided by other agencies.
This is a major breakthrough for expert data users. However, the information will not be easily accessible by casual users or students, and the reasons for this remaining challenge will be outlined, and possible solutions discussed.
The Scientific Statistical Agency as an intermediate data facility in the Netherlands; Fifth wheel on the wagon or oil in the machine?
Dr. Ron Dekker,
Managing Director, Scientific Statistical Agency
Social scientists want to re-use data that have been collected for other purposes. Reasons are quality of the data, time series, number of cases, international comparability, efficiency.
Data-owners are reluctant in providing data for secondary use, either because of financial or legislative reasons. Consequently, individual researchers (or institutes) have to negotiate with data-owners about the costs and contents of the data. In practice this turns out to be a very time-consuming activity for both parties. Moreover, the costs can be considerable and far too high for a single researcher, or even a faculty or institute.
In the Netherlands a major step with respect to data access was reached when the Dutch Research Council and Statistics Netherlands (CBS) came to an agreement on lump-sum financing the costs for using micro data on persons and households. For each survey there is one standard file. This file is protected against spontaneous recognition of respondents, which is a far less restrictive mode of data protection than public-use data. Together with a secrecy statement and some other regulations, this guarantees the privacy of respondents. Moreover, constructing one user-file per survey speeds up delivery of the data to researchers, because the output can be standardised. As a result of this contract a suit of over 15 repeated cross-sectional data sets two panel-data sets and one register have become available in a quick and cheap way to researchers at universities, planning agencies, ministries and (certified) research organisations.
At the same time the Scientific Statistical Agency was established by the Research Council. Its main tasks are to improve the availability and accessibility of micro data for scientific research. The Agency is a small bureau that has relations to the parties that are involved in data infrastructure: data owners, providers and users.
Some activities are:
- front-office for secondary data, especially the micro data of Statistics Netherlands
- publish a Catalogue with information on the data
- negotiate with data-owners
- organise user-meetings
- take care of the availability and quality of documentation
- co-operate with NIWI on archiving and distributing data and set up projects on metadata, guidelines for documenting data, etc.
The Agency turned out to be rather successful. From the start in 1994 it delivered about sixty data sets a year to faculties, research institutes, planning bureaux, provinces and ministries. The contract between the Research Council and Statistics Netherlands and the founding of the Dutch Scientific Statistical Agency not only opened up the micro data, but it has also greatly improved the relation between Statistics Netherlands and the research community in the Netherlands.
In this presentation we will present the setting of the Agency, the lessons that can be learned from running this Agency and some future plans.
Canadian Initiative on Social Statistics
Doug Norris,
Director, Family, Housing and Social Statistics Division, Statistics Canada
In January 1998 the Canadian Social Science and Humanities Research Council and Statistics Canada set up a joint National Task Force. The mandate of the Task Force was to study a number of broad issues revolving around the use of large-scale quantitative databases particularly to support public policy research. The Task Force identified three main barriers to the optimal use of Canadian social statistics:
This presentation will describe the final report of the task force and in particular the action plan that has been developed to strengthen Canadian social science research using large-scale datasets.
- a lack of trained researchers in the field of quantitative analysis;
- a need to provide researchers with better access to detailed micro-data collected by Statistics Canada; and
- a need to better link researchers and public policy analysts.
| A:2 GIS and Social Science Data Access Over the Net
Wednesday, May 19 1100-1230 Place: Rogers Communication Building, Ryerson Polytechnic University Chair: James Harris, University of Manchester |
Geographic Information Systems and Social Sciences Data: Over the Net
Ian Bowles,
Pulaski Area Geographic Information System, Little Rock, Arkansas
Social science data is becoming increasingly varied, expansive and complex, and this poses problems for retrieval and analysis. Geography is one method of organizing such data which helps with these concerns. By using spacial location, many types of what would otherwise be disparate data subjects can be related. Over the last three decades, Geographic Information Systems have developed increasingly efficient and accurate methods of bringing together large volumes of data.
Unfortunately, the GIS programs that allow sufficient power and complexity for detailed analysis are also the ones that are the most difficult to use, needing significant investments in hardware, software and operator training. They are often less than user-friendly. As a result, many potential researchers and data users have been frustrated by the enormous gap they see between what they know to be possible through GIS, and what they see as their own capacities with the programs. One of the the methods of increasing exposure to GIS, and thus to its analytic power with respect to Social Sciences data, is to provide GIS services over the Internet.
This paper examines some of the work being done with GIS, social sciences data, and the web. Some web-based products have been developed by GIS vendors, and these have been used on some sites to allow data exploration. However, these products have often been found to be limited when trying to do simple data modification, analysis or even downloading. Many data providers have instead returned to the older, more robust GIS products and developed their own interfaces for the Internet, using Perl and JavaScript to build user-interactive sites that essentially translate a user's needs into a language used by the GIS.
Specifically, the author has developed an award-winning site that performs GIS services and data mapping for Pennsylvania, over the web (www.maproom.psu.edu/mapper). The paper explores some of the issues that must be addressed in this endeavour, including: users' understanding of data quality, metadata and geographic accuracy; methods of performing analysis; issues relating to the update, expansion and maintenance of data; providing for an individual user's needs; and cartographic issues. Because the author has both developed the site and worked to respond to users' critiques, the paper covers a variety of perspectives.
Methods of relating social science data by geography over the Internet are just in the stage of infancy, but they are developing quickly. Sites can be easily found that cater to those who are almost geographically illiterate, right up to those who work daily with GIS. Those who are developing these sites must be familiar with a variety of technological issues, including those surrounding GIS, the Internet and the data being delivered. It is important to keep these issues balanced, so that users fully understand what they are working with.
Overcoming Barriers to the Use of the Census through Interactive Visualization
Jackie Carter and Nina Bullen,
MIDAS, Manchester Computing, University of Manchester, UK
The increasing use of the Internet has created opportunities for improving access to and dissemination of Census data. In the UK, MIDAS (Manchester Information Datasets and Associated Services) provides a national support and dissemination service to the academic community for Census statistics and related datasets and information. The decennial Census of Population area counts are a highly valuable resource to researchers of socio-demographic change. However, until recently there have been significant barriers to using these data. These include registration for data and digitised boundaries, considerable knowledge of the structure of the Census, expertise in a specialist and now outmoded data retrieval package (SASPAC) and the skills required for combining Census data with map data in a GIS package. This paper will describe an approach that has been developed to overcome the latter two of these barriers, by providing software and associated Census datasets. These have been used to facilitate teaching in the Social Sciences through a visualization approach to the exploration of area based data. The software is called the Cartographic Data Visualizer (cdv) and it enables students to dynamically interact with a spatial dataset at the data exploration stage, before more formal statistical or spatial analysis is carried out. New developments currently underway will allow interactive exploration of data to be performed through a Web browser using a Java application called Descartes.
Changing Boundaries: Gazetteers, Information Retrieval and Data Browsing
Cressida Chappell,
History Data Service, UK Data Archive, University of Essex
The History Data Service is planning to collaborate with others to develop a UK gazetteer, which can cope with changed and changing geographical boundaries and which will hold information about geographic names, units and hierarchies and incorporate both modern and historical perspectives of geography. This paper will describe how a gazetteer of this type could be used, firstly in online catalogues to make it easier to retrieve catalogue records for all data collections which cover a specified place at a sufficient level of detail, and secondly in online data delivery services where it could be used to produce geographic data subsets and provide integrated access to related attribute and boundary data.
The paper will argue that gazetteers of this type are crucial for effective information retrieval and data browsing both in the context of an historical service provider like the History Data Service, and in a wider social sciences and humanities context. The need for a historical gazetteer is most obvious within history-related disciplines because historical data are often associated either with geographic names or units which no longer exist or with units whose boundaries have changed; it is of course true that the disparity between modern and historical geographic names and units increases the further back in time one goes. Outside the history-related disciplines there is still a need for a historical gazetteer because there is a wide range of data which is associated with non-contemporary geographic units.
Towards an Electronic Historical Atlas of Britain
Humphrey Southall,
Reader in Geography, Queen Mary and Westfield College, University of London; and Programme Director, Great Britain GIS Programme
Since 1994, a team has been working, with complex funding and fluctuating fortunes, to create a new national historical atlas for Britain. To what extent we will succeed remains uncertain, although some very large scale resources are being created. In particular, we are within sight of completing the largest time-variant GIS ever built, recording the CHANGING boundaries of the administrative units of England and Wales, from the 15,000 Civil Parishes upwards, from the mid-19th century to the present; and we are assembling a very large statistical database from both our own OCR-based digitisation work and material scavenged from many other researchers. The presentation will also cover our more tentative steps towards a web-based interactive atlas. The presentation will emphasize the distinctive characteristics of the project, based on a "grass roots" initiative by relatively junior academics. For details of the project, see: http://www.geog.qmw.ac.uk/hgis and for our prototype atlas see: http://www.geog.qmw.ac.uk/aib.
| A:3 Data in the Classroom: Instructional Uses of Data
Wednesday, May 19 1100-1230 Place: Rogers Communication Building, Ryerson Polytechnic University Chair: Tess Trost, Texas Tech University |
Computer-Assisted Personal Interviewing: a method of capturing sensitive information
Emma Forster,
Research Fellow, Department of Psychology and Sociology, Napier University, Scotland, UK
This paper will discuss how Computer Assisted Personal Interviewing (CAPI) coped with collecting sensitive data in a difficult interview situation. A recent research project, funded by Scottish Homes (a Scottish Government Housing Agency) used the CAPI technique to collect information on home ownership at the margins of affordability. This project uses an innovative joint approach between the academic sector and a leading UK survey consultancy. It could be argued that a more sensitive method of collecting this sort of information would have been in-depth interviews, which could then have been analysed using qualitative research methods. The paper will discuss the outcomes of using CAPI and quantitative research methods in such a sensitive project. It is suggested that the use of CAPI has achieved a better response rate on sensitive questions than other techniques would have. The use of CAPI has a number of well known advantages, such as improvements in data quality and turnaround times. This project will assess if CAPI can deliver in a number of interview conditions or if its potential benefits will be realised only under certain conditions. This paper will critically assess how the quantitative method worked in this situation. Due to the constraints of research work in a research contract environment, the future of qualitative research is questionable. Increasingly today, the nature of contract research demands quantitative data which can be analysed statistically and held in data archives for future comparability.
Combining Archive Data and Student-Created Variables in Data Analysis on the
Web
Tom Piazza, Manager of Statistical Services, Survey Research Center, University of
California, Berkeley
Students and researchers frequently need to create their own recodes and computed variables in order to carry out their analysis of a dataset. Usually this requires that they download at least a subset of the archived data and analyze the data on their own computer. At Berkeley we are currently extending the Web-based SDA system to allow users with accounts on the server computer to create and store personal variables, and to combine personal and archive variables in a single crosstabulation or other analysis procedure right on the Web. The presentation will explain how this works.
The Social and Organisational Life Data Archive
Ken Reed, Bowater School of Management & Marketing, Deakin University, Australia
This paper describes a current project designed to improve the capacity and confidence of undergraduate and postgraduate students in using empirical data. Recent decades have witnessed rapid growth in the volume of data generated through the collection of official statistics and through social surveys. However, students are generally exposed only to summary results rather than the data themselves. We argue that these collections offer an untapped resource in fostering inquiry and independent learning, if adequate support, guidance and access are developed.
The project establishes an on-line collection of survey and other statistical data relevant to research in the fields of management, organisational studies, industrial relations, marketing and other related social science fields ("The Social and Organisational Life Data Archive" - SOLDA). SOLDA uses CD-ROM technology and the World Wide Web to deliver:
- data for teaching research methods courses;
- support material for substantive subjects - in the form of Powerpoint slides, Excel tables and graphs - suitable for lectures, tutorials and assignments;
- customised data-sets (and appropriate support material) to enable advanced undergraduates and postgraduates to undertake basic empirical research.
A secondary benefit of the project is that it provides a collection of publicly available national and international survey data appropriate to the needs of researchers and postgraduate students.
The system is designed to be independent of a specific computing platform and adaptable to a wide range of software combinations and configurations. It is integrated with the standard statistical packages and with MS Word and Excel. It uses writeable CD-ROMs as a basis for a customisable and flexible current collection; a mass storage system to hold the complete collection and for production of teaching and research materials; and a university intranet. The Web provides access to the datasets, and includes a hypertext information system describing the datasets, provide guidance in their use, reference lists of published work based on the data, relevant documentation, computer programs and support material.
| B:0 IASSIST at 25: Bridging the Past with the Future
Wednesday, May 19 1400-1530 Place: Rogers Communication Building, Ryerson Polytechnic University Chair: Laine Ruus, University of Toronto |
Many have heard the tale of the founding of IASSIST in a hotel bar in Toronto in 1974. With the goal of supplementing this tale, Carolyn L. Geda, IASSIST's first president, will offer her personal interpretation of the efforts of the IASSIST founders, Margaret Adams will use the IASSIST archives and other materials to document IASSIST's origins and evolution, and Ekkehard Mochmann, also an IASSIST founder, will reflect on the development of IASSIST, CESSDA, and IFDO in the context of building a united Europe.
Early IASSIST as recalled by its first president
Carolyn Geda
The establishment and formative years of IASSIST were a challenge to those involved with the crusade, as well as those against the crusade. This presentation walks through some of the major issues and concerns as experienced and interpreted by one of the founders. Paramount was the need to develop an organizational structure which satisfied the criteria for an international organization for individuals working with electronic data rather than an international organization for organizations, specifically data archives.
IASSIST : Origins and Evolution as revealed in its archives and other materials
Margaret O. Adams
In the tradition of understanding where we have been in order to have perspective on what we have accomplished and where we may wish to head, this presentation will document the origins and evolution of IASSIST. We will discover its roots in a variety of activities within the international social science research community and will trace its evolution from the perspective of the data and technology issues that IASSIST has addressed since its founding, noting some of the themes that have persisted throughout its history. The IASSIST archives and published materials offer rich documentation of the IASSIST heritage and are the primary sources for this analysis.
Social Research Infrastructure from a European Perspective
Ekkehard Mochmann
"To 'build' Europe does not mean building a new, abstract, unitary diagrammatic and technocratic entity. It means federating, developing institutions from the bottom up, following the rising order of federalism, in which unity is founded upon diversity". This observation reflects the dynamics and diversity of societies on their way to an united Europe. It also reminds to a large extent on what many of us have seen in the development of the European data movement over the past forty years. This is a long way from uncoordinated individual surveys to creating a social science infrastructure which supports the integration and development of the European data base for comparative research, responds to the needs for training in data analysis and provides technical support in electronic networks.
The development of IASSIST, the Council of European Social Science Data Archives (CESSDA) and the International Federation of Data Organisations for the Social Sciences (IFDO) will be reflected in this context.
| C:1 Qualitative Data Archives: Enriching Research Possibilities
Wednesday, May 19 1600-1730 Place: Rogers Communication Building, Ryerson Polytechnic University Chair: Sheila Anderson, UK Data Archive |
Text, sound and video-tape: the future of electronic qualitative data in the global network
Louise Corti,
Centre Manager, Qualidata Archival Resource Centre, Department of Sociology, University of Essex
The Qualitative Data Archival Resource Centre based at the University of Essex now plays a significant role in preserving and sharing data from qualitative social science research in Britain. We locate and document qualitative data and arrange for their deposit in suitable public archives and disseminate information about and encourage re-use of these data.
Many of the major sponsors of social science in Britain, including government departments have adopted or are beginning to adopt archival policies for qualitative data. The Centre acts as the focal point for providing advice to grant applicants and award holders on archival strategies for their qualitative data. This can have both methodological and cost implications, but the potential added value of preserving and sharing the data is considered to outweigh many of these possibly detrimental elements. I will discuss these issues.
Unlike the majority of Data Archives across the world, the Centre does not itself act as a data repository, although we do process new data collections to prepare them for archiving in our network of repositories.
We run an information and training service about the availability and re-use potential of qualitative research material from a wide range of social science disciplines. Whilst there is not a well-established tradition of re-using qualitative data as there is for survey data, we are beginning to see a new culture developing. I would like to consider in what format users might wish to acquire data and what they might then do with them.
We are currently advising on archiving projects in three European counties who have recently set up qualitative data resource banks. The word is spreading, now we need to consider seriously and take forward issues such as standards for collecting and preserving qualitative data. How far will other Data Archives consider acquiring qualitative data? What about sound and video data? Is it really in their interests? This year's meeting provides an excellent forum for airing this debate.
Qualitative and quantitative research strategies: towards a possible convergence?
Søren Hviid Pedersen,
Danish Data Archives
The paper's nexus will be on two different but interrelated dimensions of social science data. First, it will be argued that research strategies which combines both qualitative and quantitative dimensions will possess more validity compared with alternative strategies. Social science experiences today a growing concern for exploring new grounds in the development of different and divergent methodologies. Social science research has to accommodate that reality is composed of different dimensions with corresponding different modes of experience apprehending this reality. The qualitative and quantitative strategies are different modes of experiencing the same reality from different perspectives. The future of social science will be characterised by a growing awareness that the hegemony of the quantitative research strategy is dissolved and new research strategies have emerged and have gained growing importance.
The second main dimension elaborated on in this paper will be that such new developments have consequences for data archives in so much that the data sets which eventually will be archived will change in regard to structure and content and therefore will affect the functions and workings of the data archive. A tentative guess would be that the data sets would combine elements from both qualitative and quantitative research strategies in the same study. The paper will set forth arguments that there are good reasons for expecting a larger degree of social science data that combine both qualitative and quantitative strategies. The paper will give some description on how data archives can cope with this changing nature of data sets.
How qualitative data might be archived: from the perspective of the researcher
Dr Judy Paisley,
School of Nutrition, Ryerson Polytechnical University
Qualitative data can take a myriad of forms -- including images, sounds, words, gestures, and observations. Researchers use varied and systematic methods of analysis to capture the meanings and lived experiences of study participants. The choice of method of analysis depends on the complexity of the data, the purpose of the research, and the epistemological perspective from which the researcher approaches her/his work. Numerous approaches to the analysis of qualitative data, ranging from conversion to quantitative data to phenomenology, have emerged from varied epistemological perspectives. As is the case with quantitative data, the nature of archival qualitative data may determine their utility for secondary analysis.
Depending on the purpose of the research, qualitative data sets can vary greatly in complexity. For example, surveys frequently include open-ended questions to which respondents can offer brief responses. Simple data for each question can readily be aggregated and reported in a manner that suits the purpose of the study. However, a much more complex data set would be obtained through a series of in-depth interviews examining the experience of people living with HIV/AIDS. Such data would require the use of an advanced method of qualitative analysis, such as the grounded theory approach.
This presentation will include a demonstration of one advanced method of qualitative data analysis. Issues and benefits concerning the archiving of various types of qualitative data will also be discussed.
| C:2 Collaborative Access to Data: New Approaches to Sharing Resources
Wednesday, May 19 1600-1730 Place: Rogers Communication Building, Ryerson Polytechnic University Chair: Anastassia Khouri, McGill University |
SHERLOCK - A Web Magnifying Glass for Microdata Files
Gaetan Drolet,
Laval University, Québec
The Data Liberation Initiative (DLI) agreement between Canadian universities and Statistics Canada approved in 1996 has removed a significant obstacle to obtaining data, but without really improving the methods of consulting them. In order to make microdata files more usable, Québec university libraries have pooled their resources and expertise for the development of a common infrastructure to facilitate access and use of data.
This paper will examine the background and context of the development of Sherlock as a regional bridge to data and his implantation.
Funded by eleven Québec' university libraries, Sherlock is a web retrieval system with a bilingual query interface (French / English). The paper will describe the capabilities of Sherlock : survey inventory, description, documentation (codebooks), data transfer, extraction and analysis.
The data extraction module facilitates subsetting at the variables level. The results are delivered to users in SPSS, SAS or spreadsheet format. The availability of the results is announced to the user by an e-mail in which a clickable URL establishes a direct link to the files (subset and personalized record layout).
The analysis module gives access to basic statistical operations (mean, mode, frequency, crosstab, linear regression). The result is sent to the user by e-mail.
Sherlock is a distributed system composed of servers institutions. Each server institution has the responsibility for managing specific surveys within Sherlock. The computing architecture of the system will also be presented. In conclusion, some areas of improvement will be addressed.
The Social Science Dream Machine: resource discovery,
analysis and delivery on the Web
Jostein Ryssevik, Assistant Director, NSD
Simon Musgrave, UK Data Archive
The paper will discuss new ways of bringing unlimited amount of digital data resources to end users over the Web. Topics that will be covered are: methods for resource discovery across archives and data providers, integration between data and metadata, methods for on-line data analysis and on-line data delivery, methods for linking data to electronic documents (hyperlinking to on-line data resources from on-line documents). The paper will discuss various technical solutions to these challenges with a particular focus on the NESSTAR system which is a multi-archive resource discovery, analysis and delivery system based on the DDI-DTD, developed jointly by NSD, UKDA and DDA.
Development of a health data archive for Bangladesh - an example of a cost effective and sustainable approach to information
sharing in a developing country.
Deana Leadbeter,
International Health Information Specialist, South East Institute of Public Health, UK
Gaining access to information on the health sector in Bangladesh, and in many other developing countries, can sometimes be very hard. Although a considerable amount of data are collected by government departments, NGOs and other agencies, it is not always easy to find out what information has been collected or to gain access to this information. These difficulties can reduce the potential value of the information, slow the decision-making and planning process or cause it to be based on less reliable information. With the current trend towards involving all stakeholders, in developing countries, in a health sector wide approach to policy-making, planning and programme implementation, the need for co-ordination in information gathering and access is greater than ever.
The Health Economics Unit, of the Ministry of Health and Family Welfare, has initiated the development of a Health Economics Data Archive for Bangladesh, which aims to address the problems of access to information for policy-makers, planners, researchers and others involved in the health sector. Amongst the aims of the project are: providing a tool for dissemination of research results, a standardised approach from which to improve methods of data collection, the development of a health data dictionary for Bangladesh, encouraging data security and fostering a culture of information sharing. Use of the Archive can also prevent duplication of research activities and encourage improved or standardised methodologies
The needs and suggestions of the potential users and holders of an Archive were obtained through a process of workshops, seminars and consultation. The Archive itself was then started as a small entity holding the databases, and supporting documentation, for Health Economics Unit studies. The model used by the UK Economic and Social Research Council Data Archive for documentation of databases was adapted for the Bangladeshi context and used for this purpose. A user-friendly front-end screen was designed in Access 97 software, enabling searches by subject area, key word, geographical area and free text to identify databases held on the Archive. At present, it is possible to hold and use the Archive on a standard PC computer using Microsoft Office 97 software, thus requiring no extra capital investment in the initial development period.
The creation of an operational Archive in a short space of time and at minimal cost has allowed potential users to see the immense benefits of such a tool. The flexibility of the Archive design will allow it to expand to meet the demands of more databases and users with few technical problems. The project has taken a stepped process and started small. The next steps will see wider dissemination so that more databases related to the health sector will be entered on the Archive, users will expand from the Health Economics Unit to a wider audience in the GOB, donors, NGO's and research institutions. The process of institutionalisation is now beginning. Mechanisms for cost-recovery need to be addressed, protocols for information release will be developed and permanent staff will be allocated to maintain the Archive and ensure its sustainability.
| C:3 Preserving Electronic Resources: Issues and User Patterns Shaping the Future
Wednesday, May 19 1600-1730 Place: Rogers Communication Building, Ryerson Polytechnic University Chair: Ann S. Gray |
The Authenticity and Integrity of the Electronic Records
Sue Bryant,
Assistant Director of Operations, Interdepartmental PKI Task Force, Treasury Board Secretariat
As the Canadian federal government moves to deliver more and more services electronically there is a requirement to ensure the integrity and authenticity of the records. In addition, the proposed Bill C- 54 (Electronic Documents Act) will permit electronic filing of documents. To ensure the authenticity and integrity of electronic records the government has developed a GOC Public Key Infrastructure (GOC/PKI). This presentation will present GOC/PKI and will discuss its impact on electronic records.
Statistics Supermarket: Where Do Canadian Social Scientists Shop?
Dr. Kirsti Nilsen,
Lecturer in the Graduate Program in Library and Information Science, Faculty of Information and Media Studies, University of Western Ontario
This paper describes findings from a research project which was designed to assess the effects of Canadian government information policy (e.g. cost-recovery and fiscal restraint) on data access and use. It examines the sources of statistics and data used by Canadian social scientists in five disciplines (economics, education, geography, political science, sociology) over the period 1982 through 1993. The extent of use of data from all Canadian and other provincial, national, and international statistical agencies, as well as from nongovernmental sources is described, with particularly close examination of the use of Statistics Canada's output. Bibliometric methods were used to provide objective evidence of use of sources by those publishing in Canadian social science journals. More subjective impressions were obtained though a 1995 survey of researchers.
Through both research methods, data were gathered on all uses of statistics, and on effects of Statistics Canada price increases and format changes. It was found that there were significant disciplinary differences in data sources used. As of 1995, the move to electronic data sources had not changed use patters. More researchers from all disciplines were still using paper to a greater extent than electronic sources. Although survey respondents expressed unhappiness with Statistics Canada price increases, the bibliometric findings showed that the increasing prices of data had no significant effect on extent of use over the time period covered. Unwillingness to alter habitual research patterns might cause researchers to reallocate funding in order to access the desired data.
While the expansion of data access over the Internet might have changed more recent data access patterns, it is likely that social scientists will continue to use the statistical sources that they have habitually used. This research provided baseline data on the use of the statistics supermarket; future research will examine the effects of Statistics Canada's Data Liberation Initiative and other Internet data sources.
E-journals - What's the difference between a Scientist
and a Social Scientist?
Ross MacIntyre,
Senior Project Manager, Manchester Computing, University
of Manchester, and Ken Eason, Professor of Cognitive Ergonomics, Department
of Human Sciences, Loughborough University, U.K.
The SuperJournal project is a 3-year research project in the UK's Electronic Libraries Programme (eLib) researching the factors that will make electronic journals successful and of real value to the academic community. The objectives are to determine what readers (and authors) really want from electronic journals, and to explore the implications with publishers and libraries. It is a collaborative research project involving 16 publishers, 13 university test sites, the University of Manchester (technical development) and Loughborough University (evaluation studies). The project officially ended in December 1998, by which time the number of registered users was approximately 3,000; final data analysis and report writing activities are concluding.
The approach the project has used is briefly as follows. Firstly, baseline studies were conducted (questionnaires and focus groups) to find out how academic researchers use printed journals and their expectations for electronic journals. Then electronic versions of 49 peer-reviewed journals were made available to users in four clusters in different subject areas: Communication & Cultural Studies (CCS), Molecular Genetics & Proteins (MGP), Political Science (PS) and Materials Chemistry (MC). Usage was monitored at each site and follow-up studies (interviews, questionnaires, focus groups) were conducted with users to explain user behaviour and get their informed views on future electronic journals services.
This presentation will focus on the observed differences between the Social Scientists, using the CCS and PS clusters, and the Scientists, using the MGP and MC clusters. As an indication, some of the main findings about the differences are below; all of which will be supplemented with more data and explanation.
Users of all clusters were positive about the potential value of electronic journals although initially the science cluster users had more experience of using electronic services and electronic journals than the users of the social science clusters. The patterns of use associated with the two social science clusters were different from those associated with the two science clusters. There was considerably similarity in the patterns of use associated with the two social science clusters and also the two science clusters.
The results cover the following areas:
- Repeat Usage - the conversion rate was lower for the two social science clusters;
- Breadth - users of the social science clusters used a wider range of journals;
- Depth - users of the social science clusters got to the full text level of an article in a greater proportion of sessions than users of the science clusters;
- Search Technique - users of the social sciences clusters made more use of the search engines;
- User Types - the users of the social science clusters were present in greater proportions in the user types who made more frequent, more wide-ranging and deeper use of the cluster;
- Non-discriminatory measures - the clusters displayed no differences on a number of measures, e.g. frequency of use and the proportional use of current versus back issues, and there was general interest in multimedia but no pressure to include it.
A cluster analysis based on frequency, breadth and depth of use revealed 9 types of user. There was a very small group who predominantly used search; 3 kinds of non-repeat users and 5 kinds of repeat users who were predominantly browsers. The types of repeat users correlated quite closely with the two kinds of discipline. The typology relating to these observed patterns of behaviour will also be described.
RSS Working group on archiving data: achieving standards for documenting data for preservation and secondary analysis
Hilary Beedham,
Data Archive, University of Essex
In October 1998, the Royal Statistical Society established a working group to create standards for the collection and preparation of data in readiness for preservation.
The goals of the group are as follows:
- To define the extent to which materials, including questionnaires, data coding dictionaries, instructions for computations, working drafts and definitions of terms should be archived for future use.
- To establish a code of best practice for doing this.
- To suggest how data creators, custodians and users can co-operate to ensure that best practice is observed.
| D:1 Reflections on Thirty Years of Canadian National Election Surveys
Thursday, May 20 1100-1230 Place: Rogers Communication Building, Ryerson Polytechnic University Chair: Bob MacDermid, York University |
Speakers:
[Abstracts not yet available.]
| D:2 The Changing Nature of Metadata: Exploring Approaches
Thursday, May 20 1100-1230 Place: Rogers Communication Building, Ryerson Polytechnic University Chair: Ann Green, Yale University |
Metadata and Metainformation: old concepts and new challenges
Dusan Soltes, Senior Lecturer, Faculty of Management, Comenius University, Bratislava, Slovakia
Since the very beginning of computerized data processing, there has been a tendency for ever growing amounts of data to be processed and stored by computers. Probably, not by an accident, the modern computerized data processing also used to be referred to as mass-data processing. Especially in the environment of so-called large-scale information systems, e.g., statistical ones, there was parallel necessity to find the ways and means of managing this rapidly expanding amount of (statistical) data. Technological advancement and users needs finally led not only to the introduction of very large databases and their distribution to database networks but also to the necessity to invent and introduce particular tools for handling the content of the data and information.
Not surprisingly, since the 1970s, it was in the area of statistical systems where such new concepts as metadata, metainformation and metainformation systems have been introduced and become objects of systematic research and development at both the national and international levels. This occurred first as a part of a cooperative network program of European statistical offices and later, in 1981-84, as a inter-country group of experts of national statistical offices under the Statistical Computing Project of the United Nations Economic Commission for Europe. The main results of this international joint work led to defining the following basic concepts: metadata as a physical representation of metainformation, and metainformation as semantical contents of metadata. Metadata is a description of (statistical) data and metainformation informs about (statistical) information. Metadata and metainformation together make up the content of metainformation system (METIS) in the form of a metadata base.
The metadata base itself is organized into a system of mutually related metadata catalogues, dictionaries, directories and registers. Their content is created by formalized descriptions of particular objects such as, in the case of statistical data, indicators, surveys, classifications, code-lists, publications, statistical units, etc. On the basis of its metadata base, METIS is then able to fulfill several important functions regarding the object data and information: information, identification, interpretation, localization, retrieval, etc.
If we compare these concepts, elements, and functions of METIS with the challenges of the contemporary World Wide Web and in general with information sources on contemporary "information highways", we may see their almost absolute inevitability, relevance and direct utilization especially where users worldwide have access to practically unlimited sources of various data. Under such conditions, it is sometimes almost impossible to secure any kind of proper identification, interpretation, comparability, consistency, etc. between data from very different methodical environments if there is no accompanying metadata and/or metainformation. Hence, analogous to large (statistical) information systems in the past, one can expect in the current www environment a major trend towards normalization, standardization, unification and finally legal requirements regarding accompanying metadata and metainformation. Sooner or later we can expect that in addition to the existing information highway there necessarily will be a parallel metainformation (sub)highway and/or accompanying metadata sector which will contain basic (meta-) information describing all particular data/information on the world wide web.
Maximizing the Search Potential of Social Science Codebooks Through the Application of the Codebook DTD
Wendy Treadwell,
Coordinator of the Machine Readable Data Center, University of Minnesota Libraries
A number of approaches to providing access to social science data have been taken in recent years. The focus of these approaches varies dependent upon the scope of the collection and the primary goal of the system. For some, the ultimate goal has been a universal search engine that allows a single flexible approach to the full range of information for all types of users. Others have focused on the specialized needs of social science data users to extract and manipulate data. All are dependent upon the quality and consistency of document preparation within the collection.
The completion of the DTD for social science codebooks and the expansion of the General Record Schema (GRS2) within the Z39.50 protocol provide opportunities for improving and developing new search systems which allow for the identification of appropriate data for secondary research as well as providing access to discrete data elements.
Consistent and complete application of the DTD structure to social science codebooks, combined with the ability to map search fields from one system to another through GRS2 should be seen as a means of expanding our search options by increasing our ability to search across collections and provide access to variable level information without losing the structure of the information upon presentation to the end-user. This paper will focus on the need for consistently structured documents as a basis for the development of effective search systems. A number of selected search approaches will be examined including:
- integrated search systems (across platforms and material types)
- use of thesaurus and specialized classification controls
- integration of metadata access and database extraction or manipulation tools
Dutch Data Documentation Initiative: integrating documentation standards of a historical and social science archive
A. van Nispen,
Data Archivist, NIWI / Netherlands Historical Data Archive
The Netherlands Historical Data Archive (NHDA) and The Steinmetz Archive (STAR) are now both part of NIWI (Netherlands Institute for Scientific Information services). Both Data archives issued a project DDDI (Dutch Data Documentation Initiative). Central Aim of this project is to update and integrate data archiving procedures and data documentation standards of the NHDA and STAR in the following sense:
- procedures and standards must be compatible to each other and to the renewed standards being developed in other historical and social science data archives.
- procedures and standards will be better suited for the documentation of a variety of data structures, including texts, images and multimedia.
- procedures and standards must be better integrated and geared towards the publication of documentation on the WWW.
- procedures and standards must take into account standards (being) developed in electronic text archives and public records offices.
Technical progress in the field of computing in the Humanities and Social Sciences is rapid. Whereas in the past the typical social science type of data set consisted of one or more rectangular files with coded, predominantly numerical information to be processed with a statistical software package, the variation of data structures (and hardware and software platforms) has become enormous. The data documentation strategies of the data archives were and are founded on the rectangular file. International documentation standards as the Study Description Scheme (SHS, employed by STAR) and the Historical Dataset Description Scheme (HDDS, elaborated from the SDS, employed by NHDA)have shortcomings for the documentation of text corpora, image data banks and multimedia. The new standards and procedures must must streamline the production of data documentation towards publishing on the World Wide Web.
Presented will be a state of affairs on the ongoing project DDDI.
If Variables Could Talk, What Should They Be Able to Say? An Intelligent Agent Approach to Metadata
Edward Brent and G. Alan Thompson, Idea Works, Inc.
Albert F. Anderson and Lisa Neidert, Public Data Inquiries, Inc.
In an era of smart appliances, why can't we envision smart data? This paper proposes that we move away from old metaphors for social science data description to the metaphor of an active agent capable of taking the initiative to assist the user in selecting appropriate data sets and variables as well as framing problems so that they can be answered with the data. Why shouldn't we be able to "talk" to our data? Why can't we ask a variable to tell us about itself and more importantly, tell other programs such as statistical routines that use the variable, its important characteristics? The agent metaphor permits the user to issue broad queries delegating the details to the agent. Case-based reasoning permits the program to guide the user with specific appropriate examples. Machine learning permits successful queries to be added to the program's expanding knowledge base for help with future queries. This framework is discussed in relation to the PDQ-Explore system for providing rapid intelligent access to a "superfile" of 1990 U.S. census long form data.
| D:3 Thematic Archives
Thursday, May 20 1100-1230 Place: Rogers Communication Building, Ryerson Polytechnic University Chair: Ann Janda, Northwestern University |
Thematic Archiving: A Cross-national Collaborative Approach
Jonathan Gershuny, University of Essex); Andrew Harvey, St Mary's University, Nova Scotia
Duncan Ironmonger, University of Melbourne
This paper reports on the establishment of the Multinational Time Use Study and the Multinational Household Expenditures Study -- two collaborative research studies established by the Institute for Social and Economic Research, University of Essex; the Households Research Unit, Department of Economics, University of Melbourne; and the Time Use Research Program, Department of Economics, St Mary's University. The objective of the studies is developing a household time and money expenditure databank available to the research community. The studies build on the Multinational Longitudinal Time Budget Archive (MLTBA) established in the mid-1980's, in collaboration with time-budget researchers in several countries, by Jonathan Gershuny, with support from the European Foundation on Living and Working Conditions, based in Ireland.
Most specialists agree welfare is an outcome of the operation of three major social institutions: the market, the state and the household. Until recently, most research has concentrated on measures of the market (income) and the state (transfers). Often this is because much of the welfare that is created at home is produced by unpaid work, which leaves no obvious cash trail. These unpaid activities do leave a trace in terms of time spent and resources consumed. By assembling information about time use and household expenditure it is possible to shed light on the processes of household production, labour supply, saving, consumption, and welfare. Clearly, research on these topics will have considerable importance in achieving both theoretical and policy objectives.
Up to the present most research into household behaviour has been limited to analysis of a single country. Very often the most interesting research questions concern the effect of differences in institutions which only vary between countries. For example, different national labour markets have different characteristics. Most importantly policy within a single country may change little through time, whilst there may be significant variation between the policies of different countries. Cross-national data offer significant opportunities for the analysis of the processes and policies that affect household production, household consumption, and the distribution of resources between households and household welfare. As a result two parallel research studies on time use and household expenditures have been established collaboratively by Duncan Ironmonger (University of Melbourne), Jonathan Gershuny (University of Essex) and Andrew Harvey (St Mary's University, Nova Scotia).
This paper will trace the development of the undertaking from the early days of the Multinational Time Budget Data Archive (MTBDA) to the present. It will describe the Structure of the Multinational Time Use Study and the Multinational Household Expenditures Study, Access to micro data files in the MTUS and MHES and the conditions of use of Micro data files in the MTUS and MHES collections.
Public Opinion Archives at Queen's: Making Content Accessible
Bob Burge,
Centre for the Study of Democracy, Queen's University, Kingston, Ontario
A central component in the development of the Centre for the Study of Democracy, Queen's University, has been the establishment of a select archive of public opinion survey data. The archive is intended to serve the interests of contemporary researchers and policy analysts by providing them a means to track long-term trends in the attitudes of citizens and by making information available to help them develop new and better techniques of analysis in their ongoing research.
The Canadian content of the CSD archive currently includes: the Decima Quarterly (1980-1995); Environics Focus Canada (1978-1997); Environics Focus Ontario (1986-1997); Environics Environmental Monitor (1987-1997); CROP Adhoc Political Surveys (1977-1996); and CROP 3SC Socio-Cultural Surveys (1983-1996). These commercially fielded surveys are made available for scholarly research purposes.
Under the CIDA-funded Partners in Civil Society Program (1996-1998), in cooperation with Ukrainian partners, the CSD has mounted an archive of Ukrainian Public Opinion (1993-1997). The development of an open, accessible archive of Ukrainian public opinion is to encourage debate and dialogue on issues of social, political and economic relevance in Ukrainian society.
Ease of access to the data holdings is of paramount importance. Via the web, researchers may search the questionnaires in the database. On-line frequency distributions are also available. In collaboration with the Social Science Data Centre at Queen's, the CSD has implemented a web interface to permit researchers to perform the statistical analysis they require for teaching and research purposes. The paper and presentation will detail the CSD efforts in developing the archive and making it accessible.
A Didactic Session on the American Religion Data Archive
Roger Finke,
American Religion Data Archive, Purdue University
The American Religion Data Archive (ARDA) is an Internet-based data archive that stores and distributes quantitative data sets from the leading studies on American religion. Supported by the Lilly Endowment and housed at Purdue University, ARDA strives to preserve data files for future use, prepare the data files for immediate public use, and make the data files easily accessible to all. This didactic session will review the data files collected by ARDA, explain- how they are prepared for public use, demonstrate how the site can be used for research and educational instruction, and seek feedback from session participants for future ARDA developments.
| E:1 Data and the Digital Library Movement: Where to go from Here?
Friday, May 21 1100-1230 Place: Rogers Communication Building, Ryerson Polytechnic University Chair: JoAnn Dionne, University of Michigan |
The Digital Library Federation and Numeric Data
Donald J. Waters,
Director, Digital Library Federation
One of the early and primary concerns of the Digital Library Federation (DLF) has been to integrate social science data archives into the broader development of digital libraries. This presentation will briefly review the overall program of the DLF, present the results of a DLF workshop on Social Science Data Archives held in January 1999, and invite discussion about how best to conduct the work called for in the workshop.
Data and Digital Libraries: Developments in Canada
Michael Ridley,
Chief Librarian, University of Guelph
In recent years Canada has seen significant advances in digital library initiatives on an institutional, provincial and national basis. Such developments as the Canadian Initiative on Digital Libraries (CIDL) and the Canadian National Site Licensing (CNSL) project, proposed to the Canada Foundation for Innovation (CFI), illustrate national objectives. In Ontario, the Ontario Universities Digital Library Transformation Project reveals a provincial strategy based on a collaborative vision of all 17 universities. More locally, the TriUniversity Group of Libraries (University of Guelph, University of Waterloo and Wilfrid Laurier University) is an example of how specific institutions are responding to the challenges.
While in all these cases provision of access to data forms an important element of the project, the unique challenges and opportunities of data and the digital library are not always well documented or well understood. In reviewing these projects it is possible to see themes emerging that indicate how digital libraries and data will evolve in Canada. There are clearly national, provincial and local issues that must be addressed as we move forward in conjunction with more global directions.
| E:2 Data Futures: Perspectives from the Researcher and Instructor
Friday, May 21 1100-1230 Place: Rogers Communication Building, Ryerson Polytechnic University Chair: Richard Boily, Université du Québec à Rimouski |
Social Data in Canada, After Liberation
Michael Ornstein,
Associate Director and Senior Research Methodologist, Institute for Social Research, York University
Coupled with advances in the technologies of data dissemination, the Data Liberation Initiative provides the basis for dramatically increased use of quantitative social research in Canada. These advances, however, contend with a decentralized, differentiated and unequal distribution of interests and resources in local institutions. Only to a limited sense do social survey data speak for themselves. What is required is more systematic instruction in the practical use, conceptual understanding and limits of surveys. Further increasing the friendliness of data access will also help somewhat. Ongoing, systematic evaluation of the use, not just the distribution, of social surveys should guide these efforts.
Data Archives and Data Dissemination in Canadian Social Science: Instructor and Researcher Perspectives
Doug Baer,
Department of Sociology, University of Western Ontario
This paper discusses the strengths and limitations of data dissemination facilities and initiatives for Canadian social surveys, including those surveys fielded by Statistics Canada, surveys fielded by grant-funded academic research groups, and surveys fielded by other private (or, in some instances, government) agencies. Issues facing instructors are somewhat different from those facing researchers or graduate students, but in all instances, data librarian resources can be critical, and institutional structures under which universities support (or fail to support) a person in such a role vary widely from one institution to the next. Local resources are, however, insufficient if wider-scale structures do not exist. The Data Liberation Initiative demonstrates how publicly-funded survey research can be made available on a wide scale, subject to the limitations of data censorship associated with some key variables in most datasets. But for other types of survey research that have been undertaken in Canada, convenient access is not always possible. The implications from the standpoint of both instruction and research are discussed.
The Learning Society and the Informed Citizen - turning dreams into reality or a nightmare?
Derek Bond, Senior Lecturer, Faculty of Business and Management, University of Ulster
Moira Cullen, University of Ulster
In recent years there has been considerable hype about the growth of both the 'learning society' and the 'informed citizen'. Indeed the new 'Fifth Framework Programme' of the European Union has these concepts as a key theme of the proposed new research programme. The implications of such a theme are many for members of IASSIST and this paper through reference to practical experiences looks at some of the problems involved in trying to turn these high minded strategies into realities.
The paper focuses on issues which have arisen in research projects and graduate and in-service teaching aimed at developing this more informed, learning society in Northern Ireland. Amongst the issues discussed will be those of:
- creating the right environment for research and learning;
- finding information - the role of meta-data;
- accessing official data and information;
- integrating data with analysis software; and
- future needs/wishes.
The aim of the paper is to try and identify some of the key factors that will help turn the dream of an informed citizen and learning society into reality rather than a nightmare and the pivotal role that data archivists and librarians have in this task.
The Availability of Victimization Data: The Impact on Canadian Research
Catherine Kaukinen,
Doctoral Candidate, Department of Sociology, University of Toronto
This paper examines the impact of the quality and availability of victimization data in Canada on the type of research conducted by Canadian researchers. Next, I outline some of the limitations and advantages for my own research agenda given the data available on Canadian victimization. This research has included examination of: the reporting of violent crime to the police and other help sources; the factors predicting self-protective weapon ownership and public perceptions of the treatment of victims by the criminal courts. Finally, I point to the benefits of large scale victimization surveys in the United States and how these have helped to address a number of questions regarding violent crime.
| E:3 Future Directions of Financial and Economic Databases
Friday, May 21 1100-1230 Place: Rogers Communication Building, Ryerson Polytechnic University Chair: Walter Piovesan, Simon Fraser University |
Data as a New Commodity
Sean Townsend,
London School of Economics
Empirical economists all use data. Today their information is sourced mostly electronically, and is more complex, comprehensive, and frequently updated than ever before. The City of London is now essentially run through a series of global economic and financial databases, some begun by technically-literate entrepreneurs in the 1980s who have seen their inventions become indispensable whilst making them millions in the process. This trade in information, the "weightless economy", has grown with a vengeance, not unlike the move to mechanisation in the nineteenth century.
This paper will explore the impact that this new "information economy" has had, and will continue to have, on academic scholars. It will focus primarily on social scientists, with attention to economists, financial researchers and micro-social analysts. The paper will suggest that future researchers may find themselves more and more reliant on value-added services, that commercialisation might hamper choice, and that there are increasing anxieties over data quality, security, ownership, and the skills needed to survive in this brave new world.
Global access to European statistical data. Re-intermediation in the context of a collaborative partnership
Michael Blakemore,
Executive Director, Economic and Social Research Council (ESRC) Resource Centre rcade (Resource Centre for Access to Data on Europe) and Professor of Geography, University of Durham, UK
The Resource Centre for Access to Data on Europe (rcade) provides online access to European statistical data from UNESCO, the International Labour Office, and the Statistical Office for the European Communities (Eurostat). The Centre was established in 1974 by the Economic and Social Research Council (ESRC) to negotiate centrally, and to work with data owners in areas of data documentation and online delivery. This relationship started as a conventional intermediary between producer and market. As the relationships developed, in particular with Eurostat which provides the 'official' European Union statistical data, we have worked closely with owners in areas of assessing data quality, data documentation and the preparation of metadata; developing strategies for managing uncertain data publication schedules and data sparsity; informing the user community pro-actively about changes to data themes; and negotiating flat-rate data license fees for academic use. The end result for r-cade is a World Wide Web site where the documentation and metadata services are placed in the public domain, and a separate Web interface to the online data which integrates the metadata with simple search and extraction facilities.
However, many data owners themselves are developing dissemination mechanisms on the Web, and there will be a proliferation of 'one-stop' shops. Some see the Web as a way of connecting directly with users, others as an income generating opportunity. Some will use it as a control mechanism in an environment where the ease with which data can flow around the globe can lead to misuse, misunderstanding, and potential loss of reputation for the owners.. So the challenge for academic data resource centres is to maintain intermediation roles in the context of partnerships with data owners where intermediaries work closely on data quality, and generate user documentation in the context of a partnership rather than a supply chain.
This presentation will show how rcade's online strategy has developed in the context of changing relationships with data owners, changing data owners' priorities, and the opportunities presented by new technologies.
Measuring the New Economy
Fred Gault
Director, Science, Innovation and Electronic Information Division, Statistics Canada
What is the ‘new economy’, how do we measure it and why? This talk will begin with discussion of the Information and Communication Technology (ICT) sector which provides the infrastructure to move electronic data and information around, of the content which is moved around electronically, and of the use of both ICT and content products by households and businesses. This will lead to discussion of definitions of the information economy, and the information society, and on plans by the OECD to collect data and to develop indicators for the information society. How such indicators will be used will follow and, finally, the information society will be related to the ‘new economy’.
| F:1 The Role of Data Librarians and Data Archivists: Is There a Future?
Friday, May 21 1400-1530 Place: Rogers Communication Building, Ryerson Polytechnic University Chair: Elizabeth Hamilton, University of New Brunswick |
Breaking Paper Barriers: The Future of Traditional Archives in the Global Network
Thomas E. Brown,
Manager, Archival Services, Center for Electronic Records, Electronic and Special Media Records Services Division, U.S. National Archives
The digital future poses many challenges for a wide range of institutions, including archival repositories filled with aging paper records. These institutions store, preserve, and provide access to that exceedingly small percentage of records of an organization which have historical or permanent value. Since most records in archives today are paper, some have somewhat disparagingly called those establishments paper museums. But that moniker will become inappropriate as archival repositories will no longer have the luxury of avoiding the computer revolution and the resulting electronic records. In doing so, the repositories will change the way they do business. The new methodologies and procedures will not necessarily be either better or worse; they will be different. Within the sea of computer-induced changes to swamp the archives, the presentation will focus on four currents. Space will no longer be a concern, but volume will remain a problem as the sheer number of individual records increases exponentially. Secondly, appraisal decisions will have to expand not just determine whether to keep a body of records but also in what format. In other words, appraisals will need to address whether preserving the content is sufficient or is it also necessary to preserve the structure as well. Archives will experience a paradigm shift in preservation philosophy as the first principle to do no harm will not be enough. Finally, reference services will need to confront and curtail the impending revolution of rising expectations to provide any service which can be provided.
Data Librarians: Evolution or Extinction?
Jocelyn Tipton,
Data and Electronic Services Librarian, Yale University
Library users can access data more easily than ever before: vendors are providing user-friendly interfaces to data collections, new technologies have changed the way people use data and new distribution media have improved access. Statistics have become a part of everyday life and are being included in more social science research by users at varying skill levels. Data needs are being incorporated into general reference and have become a regular part of the electronic resources provided by all library staff.
As a result of these changes, data are no longer a separate entity requiring a specially trained professional to work as an intermediary between the user and the data in all cases. The lines between the job of a data librarian and other subject specialists are now blurred. Social Science subject librarians are increasingly responsible for the selection and provision of data. Training librarians in how to select, budget for and use data and their related technologies has become the new responsibility of data professionals. Thus the role of the data librarian has shifted from intermediary to coordinator. This paper will give reasons for incorporating data into the work of all of its social science librarians, how data librarians can meet this challenge, and how it changes the current responsibilities of data professionals.
It is clear that the incorporation of data skills into all of the subject specialists toolkit and the evolution of the data librarian toward a coordinating role should be encouraged; the smooth path to that goal is more problematic. This paper opens the discussion of how to make this a successful transition.
The Open Source Revolution and the Future of Data Libraries
Gregory Haley,
Head, Electronic Data Service, Columbia University
1998 will be remembered as the explosion of the open source software revolution. It is built on a model of free agents contributing effort and programming expertise without any centralized decision-making structure. The result has been the development of the software components that are the very foundations on which the Internet has been built. While open source software has been developed over the past decade, it gained widespread acceptance with the publication of Eric Raymond's seminal "The Cathedral and the Bazaar" in 1997, and which provided the intellectual apologia on which Netscape based its decision to release the source code for its release 4.5. The open source philosophy is directly antithetical to that of the typical bureaucratic decision-making model of the library world which all too often drives efforts in data libraries. What lessons can we as data librarians learn from the open source philosophy? The service-oriented culture on which IASSIST is built can benefit from adopting the open source model of development and exchange of ideas. This paper will contrast the open market of the free exchange of ideas with the bureaucratic decision-making model. The goal will be to start a discussion on how to incorporate more of the open source philosophy into our professional exchanges of ideas.
Information Literacy Instruction with Electronic Data Files
Mark Anderson,
Government Documents Librarian, University of Northern Colorado
Recently, federal agencies have been relying on electronic file formats for storage and distribution of the data they collect. One ramification of this is that more front-end data users are discovering how versatile electronic data files can be and how a wide variety of computer applications can be used for manipulation, analysis and presentation of data. An increasing number of patrons are requesting assistance from depository library staff in accessing data files for use in spreadsheet or database programs. Also, feedback we get from employers suggests that graduates with experience in some of these applications are viewed more positively, during interviews, than those without any such exposure. Consequently, many higher education faculty have been including this kind of experience as part of the curriculum.
Spring and fall semesters, 1998, depository library staff were closely involved with a class called Sociology of Minorities (SOC 237) taught by Professor Dan O'Connor. One of goals of the class is to expose the students to the vast amounts of demographic, economic, and other types of data about minority populations that has been collected by federal agencies. Students in SOC 237 are expected to able to access and analyze some of this data on their own and to use it to support the theses of their class projects. As the final project for the semester, students are assigned to create and present to the class a speech illustrated with Power Point slides that employs information from their readings combined with demographic data they have found.
The contributions of library staff and resources to SOC 237 were:
- Depository staff members created a web-based teaching aid that combines links to several Internet sources of demographic information with bibliographies of printed Census reports. It also included some one-page instructional aids for using the Census Lookup feature on the Census Bureaus web site and the major functions of Microsoft Excel.
- Several weeks into the semester, library staff conducted a workshop to demonstrate access and use of the CIS Statistical Universe, and the Statistical Abstract of the United States, how to find a data file, save it and load it into Microsoft Excel, and then to perform some basic spreadsheet functions.
- Several weeks after that, when the students had begun their projects, a class period was set aside to function as a lab, in which each student worked on his / her project with library staff on hand to provide assistance.
For Spring Semester 1999, the web page has been revised, and the one-page instruction sheets are being expanded into a kind of a workbook that will lead students, step-by-step, through the processes of finding a file, making a data table or chart, and converting those into Power Point slides.
| F:2 Managing Access through Web Tools
Friday, May 21 1400-1530 Place: Rogers Communication Building, Ryerson Polytechnic University Chair: Wendy Treadwell, University of Minnesota |
Tools, Templates and Training: Using the Web for Teaching in the Social Sciences at Harvard
Paul Bergen,
Manager, Instructional Computing, FAS Computer Services, Harvard University
At the end of fall term, 1995, there were about ten course Web sites at Harvard's Faculty of Arts and Sciences. This year, for the first time, a Web site was created for every course in the curriculum. This presentation examines efforts of the Instructional Computing Group (ICG) of Harvard Arts and Sciences Computer Services to integrate the World Wide Web into teaching at Harvard, with particular emphasis on how the Web is being used in the Social Sciences. ICG's strategy involves balancing support for few innovative, resource-intensive projects, such as customized interactive interfaces to datasets, and developing tools for generic services for a high volume of courses. For more information, see http://icg.fas.harvard.edu/.
Data on the Web: Choices and Solutions
Paul H. Bern,
Data Services Consultant, Princeton University
This article will explore and explain the many choices one has in making data available via the web. The areas discussed will include the following. What hardware - UNIX vs. PC server - should one use to store and process data? What software can be used to run the web page, the cgi, and do the data processing? Are products such as SAS's Internet and other web tools a replacement for PERL and/or HTML? What data should or can one make available? The choice may be simple given that some data are public (i.e., census) and others are proprietary (i.e., CRSP). It is possible, however, to restrict access to certain data and there are several ways of accomplishing this as well. Should one make an entire dataset available or just parts of it? Which parts? There are several options for providing the requested data files as well. Which one is best? You may need different delivery methods based on whether it is public or proprietary data. Last, but certainly not least, I will also discuss page design both from an aesthetic point of view as well as a programming one.
The XML Files: Developing Data UFOs
Ken Miller with Pasqualino "Titto" Assini,
UK Data Archive
XML is paving the way for the next generation of Web services. However, XML is not only an advanced substitute for HTML. It is a markup and specification language which provide an optimal mix between human and machine readability. The paper will present some ideas on how XML can be used to improve the services of data archives and libraries drawing experience from the use of XML in the NESSTAR project. In this project XML DTD's are used to describe datasets (the DDI-DTD), dataset dependent access conditions, user profiles and even data libraries. It is also used for communication between various software components and as a way to configure independent software components. The presentation will pay particular attention to questions related to interoperability and the use of open standards.
Safeguarding On-line Digital Data Archives
Jostein Ryssevik, Assistant Director, Development Department, NSD
Melanie Wright, Acting Director, End User Services, UK Data Archive
Data providers, archives and libraries are increasingly making their data available for on-line analysis and delivery on the Web. Seen from the user's point of view increased access is positive. The development is however also rising several serious questions related to the protection of intellectual property rights, confidentiality, unauthorised use and the protection of the integrity of the data. The paper gives an overview of these challenges and discuss various solutions that can be used to provide an optimal mix between access control and openness. Among the topics that will be covered are: requirements for an access control system for digital data libraries, methods for user authentication, description of dataset dependent access conditions (by means of XML) and available access control technology.
| F:3 Emerging National Social Science Data Services
Friday, May 21 1400-1530 Place: Rogers Communication Building, Ryerson Polytechnic University Chair: Jeffrey Moon, Queen's University |
Lacces aux donnees en France/Accessing Data in France
Irene Fournier,
Ingénieur de recherche, CNRS (Centre National de la Recherche Scientifique), LASMAS-IdL (Laboratoire d'Analyse Secondaire et de Méthodes Appliquées à la Sociologie Institut du Longitudinal)
Un peu d'histoire
A la différence de l'Allemagne, de la Grande-Bretagne ou des Etats Unis par exemple, ni le CNRS, ni les Universités ne réalisent des grandes enquêtes quantitatives en sciences sociales. En France en effet l'Insee occupe une position centrale en matière de création de statistiques extensives ; il a de plus développé en son sein des recherches socio-économiques de qualité. A ses côtés, d'autres grands instituts et services statistiques des Ministères jouent un rôle non négligeable. En dehors d'une ou deux grandes exceptions, l'évolution sur ce point ne s'est traduite que par l'amorce de quelques co-productions (Panel Lorrain, enquête Production domestique). La question de l'accès aux données se pose donc de façon vitale pour le CNRS et les Universités.
Pendant longtemps, l'accès au données pour la recherche a été géré tant bien que mal sous la forme d'accords interpersonnels. La création du Lasmas en 1986 (CNRS) puis la signature de convention avec les producteurs en particulier avec l'Insee ont permis la mise à disposition des fichiers d'enquêtes gratuitement aux chercheurs. Elle est toutefois soumise à certaines conditions:
- Les recherches doivent être financées sur fonds public et les résultats des travaux ne doivent être frappés d'aucune réserve susceptible de restreindre le droit à publication.
- Les utilisateurs s'engagent, individuellement et par écrit, à ne pas permettre à des tiers extérieurs à l'entité juridique CNRS d'en consulter le contenu moyennant ou non rémunération et à protéger l'accès aux données.
- Les utilisateurs s'engagent à fournir au Lasmas-IdL un tiré à part de toute publication réalisée à partir des données d'enquêtes fournies, ainsi qu'un court rapport décrivant l'emploi fait du fichier et les observations concernant la source utilisée.
Dans le cadre de sa mission, le Lasmas fournit un travail permettant aux chercheurs de traiter eux-mêmes avec profit des fichiers adaptés. Il fournit aux chercheurs des données individuelles qui leurs permettent de faire les analyses qu'ils souhaitent, de créer des indicateurs nouveaux, de faire un travail plus précis et proche de leur propre conception. Il met à la disposition des équipes concernées la documentation méthodologique, c'est-à-dire toute l'information concernant: les conditions de réalisation de l'enquête, les nomenclatures utilisées, les questionnaires, les dictionnaires des codes, régulièrement mis à jour et les limites de validité des différentes informations communiquées.
L'objectif est aussi de favoriser le retour de la recherche sur les problèmes, les usages, les difficultés rencontrées vers les producteurs de données pour que les enquêtes futures prennent mieux et plus rapidement en compte les résultats de la recherche. Les collaborations des chercheurs à l'amélioration des enquêtes avec les producteurs se développent.
Le Lasmas-IdL est devenu un centre de ressources sur les enquêtes pour la recherche en sciences sociales et continue de développer ses activités dans ce sens. Il investit maintenant tout particulièrement sur les données longitudinales et leur exploitation. Il a constitué progressivement un fonds, passant de l'achat au coup par coup pour répondre aux demandes de ses utilisateurs à la constitution de séries complètes. Pour permettre une utilisation optimum des enquêtes, un cycle de formation a été mis au point.
Le nombre d'utilisateurs s'accroît progressivement, les plus nombreux sont les économistes et les géographes mais la demande des sociologues se développe aussi. Les laboratoires ou les unités utilisateurs, qu'elles soient unités propres du CNRS ou associées aux Universités, se répartissent sur l'ensemble du territoire : Paris, Lille, Nancy, Aix-en-Provence, Toulouse, Montpellier, Marseille, Lyon.
La création du site Web
le Lasmas-IdL s'était fixé l'objectif de procurer aux utilisateurs une documentation informatisée accessible gratuitement à partir d'un serveur Web, le développement d'Internet pouvant permettre une diffusion plus large des enquêtes acquises. Les premières informations sont désormais accessibles sur le serveur de l'IRESCO: http://www.iresco.fr/labos/lasmas/accueil_f.htm
A terme nous souhaitons fournir, non seulement une documentation complète, facile d'accès, aisée à manipuler sur chacune des enquêtes que nous diffusons mais aussi, dans le cas de séries régulières ou d'enquêtes répétées, inclure un aspect historique, tant sur les questions méthodologiques que sur celles liées à l'évolution des questionnaires ou des nomenclatures.
La première étape comprend:
- La liste des enquêtes accessibles ainsi que les conditions conventionnelles d'accès aux données.
- La définition de certaines grandes catégories statistiques (Personne de référence, Chômeur, Ménage ordinaire, Population active...) ou de notions méthodologiques (Sondage aréolaire, Individu Kish...).
- Un descriptif normalisé des enquêtes comprenant : le nom du producteur, les thèmes de l'enquête, la (ou les) date, les objectifs, le champ, la population concernée, la localisation, la taille de l'échantillon, le plan de sondage, le taux de sondage, les conditions de recueil des données, l'organisation du questionnaire avec indication de la nature et des dates de modification.
- Des remarques, commentaires et indications sur l'évolution des enquêtes et des nomenclatures, les aspects méthodologiques ou d'éventuelles variables erronées pourront être ajoutées afin de tenter de transmettre la mémoire liée à l'expérience accumulée en matière de connaissance fine des enquêtes.
- Toutes les enquêtes diffusées ne sont pas encore décrites.
- Indications sur la documentation (papier) disponible : questionnaires, instructions de collectes, instructions de chiffrement, dictionnaire des variables, bibliographie d'analyse primaire.
- Pour l'enquête FQP seulement, le dictionnaire des variables en liaison avec l'organisation du questionnaire et par ordre alphabétique.
Dans les étapes ultérieures, l'accès aux différents descriptifs devra se faire par mots clefs afin de permettre de cibler toutes les enquêtes susceptibles de fournir des informations sur le thème recherché. Les questionnaires, dictionnaires de variables, effectifs des codes seront fournis.
Social Network Paradigm and Electronic Data Access for Social Sciences and Social Services
Poliana Stefanescu , Ph.D.,
Department Sociology, University of Bucharest, Romania
Inter-human communication and transmission of information within a community are determining factors for the future evolution and organization of the society. Knowledge and education rely on communication, information exchange, as well as on the quality of the both sides: transmitter and recipient. Internet informational offer is a huge resource for social science researchers . This paper will present recent initiatives regarding social data access through educational and research networks, in connection with central and local administration network and civil society network .
Russian Inter-university consortium on economic/social/political/human research
Tatyana Yudina,
Russian Inter-University Social Sciences Information and Research Center, Moscow State University, Russia
The university community in Russia is working to arrange the Russian Inter-university Consortium on Econonic/Social/Political/Human Research. Several organizations are engaged each being responsible for a certain direction of activity.
The Research Computer Center of Moscow State University creates a resource base - an electronic economic/social/political/human domain information center for collective use. The information center will manage the Internet-accessible Information System RUSSIA that stores, regular updates and process a wide scope of data and documents on Russia.
New information technologies stimulate human and social sciences but an appropriate information base is to be available. Currently the research community is suffering from undeveloped information infrastructure in Russia, almost no public domain electronic resources are available on permanent basis, poor funding to obtain scientific literature and other publications. This situation seriously diminishes the research potential and education level in human and social sciences in the Russian universities.
Data management issues in the third world
Kristin Fox,
Derek Gordon Data Bank, Institute of Social and Economic Research, University of the West Indies
This paper will continue to explore issues related to data collection and access to quantitative data for secondary analysis, as well as data management and data preservation in the Caribbean/Jamaica. Special emphasis will be placed on data service issues, in an environment where computer technology and computer expertise are not ubiquitous. Limitations on the use of data by lack of equipment, especially in academia, will be considered, as will the special user needs engendered. The speaker will focus on academic research, primarily with health sciences data in Jamaica, in the context of the Derek Gordon Data Bank, a Canada/UWI sponsored project.
For information about this web site, contact:
Walter Giesbrecht / walterg@yorku.ca
(416) 736-2100 ext. 77551
Logo designed by Amy Burgess / ab@gpu.srv.ualberta.ca