Already a member?

Sign In
Syndicate content

LHorton 2's blog

IASSIST Call for Event Sponsorship Proposals 2017 Round 2: “Mini Grants”

The IASSIST Liaison and Organizational Sponsorship Task Force is seeking proposals for sponsorships of regional or local events during calendar year 2017. In this second round of sponsorships we will be awarding up to four grants of $500 USD each, but requests for any amount up to $500 USD will be considered.

The goal of these sponsorships is to support local networks of data professionals and data-related activities across the globe in order to help support IASSISTers activities throughout the year and increase awareness of the value of IASSIST membership.

Events should be a gathering of data professionals from multiple institutions and may vary in size and scope from workshops, symposia, conferences, etc. These may be established events or new endeavors. We are particularly looking to sponsor regional or local level events that will attract data professionals who would benefit from IASSIST membership, but may not always be able to travel to attend IASSIST conferences. Preference will be given to events from geographic areas outside of traditional IASSIST conference locations (North America and Western Europe), and from underrepresented membership areas as such as Latin/South America, Africa, Asia/Pacific, and Eastern Europe.

Requests for sponsorships may be monetary, and may also include a request for mentorship assistance by matching the event planning committee with an experienced IASSIST member with relevant expertise (e.g., conference planning, subject/content, geographic familiarity).

Accepted events will be required to designate an active IASSIST member as the liaison. Generally, this would be an IASSIST member who will be attending the event and although not required, may be on the planning committee or otherwise contributing to the event. The liaison will be responsible for assistance with coordinating logistics related to the sponsorship, ensuring that the sponsorship is recognized at the event, and contributing a post to the IASSIST iBlog about the event.

Proposals should include:

  • Name of the event and event details (date, location, any other pertinent information)
  • Organizing or hosting institution
  • Description of event and how it relates to IASSIST goals and communities
  • Specific request for sponsorship: amount of money and/or mentorship assistance
  • Description of how the sponsorship will be used
  • Name and contact information of person submitting proposal and designated event liaison to IASSIST (if different)

Proposals are due on Friday, June 30 2017 via the Application Form. Notification of sponsorship awards will be by July 21 2017. The number and monetary extent of awarded sponsorships will depend on the number and quality of applications received within a total budgeted limit. Individual sponsorship requests may range from $0 USD (request for mentorship only) to $500 USD.

Please direct questions to Jen Doty, IASSIST Membership Chair (jennifer.doty@emory.edu).

#IDCC17: Notes from the International Digital Curation Conference 2017

For the third time IASSIST sponsored the International Digital Curation Conference. This time allowing three students, one each from Switzerland, Korea, and Canada to attend the conference, which titled itself "Upstream, Downstream: embedding digital curation workflows for data science, scholarship and society".

Data science was a strong theme of the three keynote presentations, in particular how curation and data management are an active, integrated, ongoing parts of analysis rather than a passive epilogue in research.

Maria Wolters talked about how missing data can provide research insights analysing patterns of absence and, counter-intuitively, can improve the quality of datasets through the concept of managed forgetting –asking is it important to preserve and is it relevant at the moment – we can better manage and find data. Alice Daish showed her work as a data scientist at the British Museum, with the goal of enabling data informed decision-making. This involved identifying data "silos" and "wrangling" data in to exportable formats, along with zealous use and promotion of R, but also thinking about the way data is communicated to management. Chris Williams demonstrated how the Alan Turing Institute handles data mining. He reports that about 80 percent of work on data mining involves understanding and preparing data. This ranges from understanding formats and running descriptives to look for outliers and anomalies to cleaning untidy and inconsistent metadata and coding. The aim is to automate as much of this as possible with the Automatic Statistician project.

In a session on data policies, University of Toronto's Dylanne Dearborn and Leanne Trimble showed how libraries can use creative thinking to matching publication patterns against journal data policies in providing support. Fieke Schoots outlined the approach at Leiden which includes requirements from PhD's to state location of research data before their defence can take place and twenty year retention for Data Management Plans. Switching to journals, Ian Hrynaszkiewicz talked about the work Springer Nature has done to standardise journal data polices into one of four types allied with support for authors and editors on policy identification and implementation.

Ruth Geraghty dealt with ethical challenges in retro-fitting a data set for sharing. She introduced the Children’s Research Network for Ireland and Northern Ireland. This involved attempting to obtain consent from participants for sharing, but also work on anonymising the data to enable sharing. Although a problematic and resource intensive endeavour the result is not only a reusable data set but informed guidance for other projects on archiving and sharing. Niamh Moore has long experience of archiving her research and focused on another legacy archive – the Clayoquot Lives oral history project. Niamh is using Omeka as a sharing platform because it gives the researcher control of how the data can be presented for reuse. For example, Omeka has capacity for creating exhibits to showcase themes.

Community is important in both curation and management. Marta Teperek and Rosie Higman introduced work at Cambridge on collaborative communities and data champions. Finding a top-down compliance approach was not working, Cambridge moved to a bottom-up engagement style bringing researchers into decision-making on policies and support. Data champions are a new approach to seed advocates and trainers around the university as local contact points, based on a community of practice model. The rewards of this approach are potentially rich, but the cost of setting-up and managing it are high and the behaviour of the community is not always controllable. Two presentations on community/citizen science from Andrea Copeland and Peter Darch also hit on the theme of controlling groups in curating data. The Galaxy Zoo project found there were lessons to learn about the behaviour of volunteers, particularly the negative impact of a "league table" credit system in retaining contributors, and how volunteers expected to only contribute classifications were in some cases doing data science work in noticing unusual objects.

A topic of relevance to social science focused curation is sensitive data. Debra Hiom introduced University of Bristol's method of providing safe access to sensitive data. Once again, it's resource intensive - requiring a committee classification of data into levels of access and process reviews to ensure applications are genuine. However the result is that data that cannot be open can be shared responsibly. Sebastian Karcher from the Qualitative Data Archive spoke about managing sensitive data in the cloud, a task further complicated by the lack of a federal data protection law in the United States. Elizabeth Hull (Dryad) presented on developing an ethical framework for curating social media data. A common perception is social media posts are fair use, if made public. However, from an ethical perspective posters may not understand their "data" is being collected for research purposes and users need to know that use of @ or # on Twitter means they are inviting involvement and sharing in wider discussions. Hull offered a "STEP" approach as way to deal with social media data, balancing benefit of preservation and sharing against risk of harm and reasonable consent from research subjects.

Notes from the second Jisc Research Data Network event

Jisc held their second Research Data Network event in Cambridge. I went along to take notes.

Danny Kingsley gave an overview of why data sharing is important, which was useful as introduction for those new to this, and a refresher of first principles to the more experienced.

The day then moved into parallel sessions on aspects of the network's activity.

The Research Data Shared Service is an initiative to help intuitions with RDM infrastructure. Jisc research suggests the priority for universities is addressing the digital preservation gap. Consequently, Jisc are looking at providing data repository and long-term preservation services as well as considering how a service could integrate with existing CRIS systems and repositories. This will take place in a "University of Jisc" that allows a testing environment using research data.

Jisc are developing templates and guidance for publishers on creating a research data policy which can then adapt to their journals. They are working with Springer Nature who are trying to fit their 3000 journals to into one of four types of data policy, ranging from encouraged to mandatory sharing and availability criteria.

Cambridge's Research Data support service provided insight into engaging researchers in research data management. Their initial compliance message was not working, so they switched to a positive benefits message. This is underpinned by "adequate provisions": online information, consultancies, reviewing data management plan, and training sessions. They also invest resources in advocacy and outreach including a "democratic" approach involving researchers in shaping the service and policies.

Jisc are developing a "core" metadata profile for research data. The profile is based on focus group testing, and integration with existing standards. The aim is to encourage better quality metadata submissions from researchers, with "gold, silver, and bronze" thresholds.

The final session introduced Jisc's template business case for RDM support. This is intended to allow institutions to adapt a structured case for supporting RDM services that can be presented to university management. The case covers the economic benefits of data sharing and preservation, along with institutional and researcher benefits, with a focus on numbers. My particular favourite: UK universities hold an estimated 450 petabytes of research data. The case will be available this autumn.

Should you have further interest in their activities, Jisc have a Research Data Network website and presentations from the day are also available.

Feel the Berg! IASSIST 2016

Topic:

The conference began with a reception from the Mayor of Bergen, beautifully performed Norwegian folk song, and dissent over the conference hashtag (it was #iassist16).

The next morning data talk began with Gudmund Hernes. His plenary theme is data availability or the latest “revolution” is, as it always has, causing a shift in power. The role of IASSISTers and data archives should be to “keep the record straight”.

UK Data Service Director Matthew Woollard’s plenary offered a similar theme of adjustment to a changed data world. In sum, a data revolution is only mature when lots of the data created as part of this revolution is reusable. Therefore we need enhance trust between creators and participants, and advocate data quality rather than quantity. Look for a future IASSIST Quarterly article based on his plenary.

The theme of quality and reproducibility was captured in presentations by Christian (Odum) on data verification, which found reproducibly to be a resource intensive activity with 92 percent of manuscripts submitted to Odum requiring resubmission. Arguillas (Cornell) demonstrated R2 at CISER which runs replications. Their job is not to find errors on behalf of researchers but to check replication values; so if replicated study value is off by fraction of a decimal the study is not replicated. Again, it is a time intensive process so Arguillas advised researchers to “curate as you code and code with reuse in mind”. Brown (Cornell) talked about the CED2AR metadata repository that works primarily with those accessing or wishing to access restricted data files. Peer introduced Yale’s new curation tool. Curation for quality and reproducibility, she argued, will become routinized when research data policies and culture mature to recognise curation and sharing and tools to capture the entire workflow become embedded in the research process.

Highlights in other concurrent sessions I attended included Strategies for Discussing and Communicating Data Services where Herndon (Duke) and O’Reilly (Emory) emphasised how expectations of transparency and sharing have changed. Meanwhile, Terrence Bennett (The Collage of New Jersey) killed of his co-author in the name of data sharing to show how negative messages have more impact.

“Teaching data” themed sessions included a systematic “scaffolding” approach from Sapp Nelson (Perdue) on helping learners move across data management domains over time. Hofelich Mohr (Minnesota) and Motes (Surrey) demonstrated teaching activities at the University of Minnesota which included targeting courses with a research methods component. The results are positive, but the costs in resources are intensive and the need to be flexible is critical. Abbaspour (Lewis & Clark) presented lessons learnt from teaching undergraduate students about data, including the lesson that when it comes to licences undergraduate students have limited mental tolerance for a world that deals in shades of grey, and is not simple black and white.

Simpson and Wiltshire (UK Data Service) had presentations on supporting students and researchers in using either using data for dissertations or on using specific datasets. Scott’s (UK Data Service) audience was a little different: researchers applying to use sensitive data. One thing that was positive to see here is how sensitive data holding organisations in the UK are collaborating on this training.

A sizable cohort from the UK Administrative Data Research Network presented at IASSIST on how this service is tackling access to data and its responsible reuse. Presentations from Knight and Greci provide examples. Continuing the theme of responsible reuse, Segadal (NSD) outlined the incoming European Union regulation on general data protection and how it will affect researchers.

The international in IASSIST was demonstrated in a session on Research Data Management Services, with speakers from Denmark, Canada, and India. Fink and Olesen (DDA) presented the role the Danish National Archive will play in supporting Data Management Planning. Mowers (Ottawa) presented a range of research data on management and sharing practices that will inform support. Gunjal (NIT Rourkela) presented on the RDM challenges in India, his institution as a case study, and on initiatives to build Indian data infrastructure.

A couple of librarian orientated sessions provided insights. Solis (NYU) offered findings on her research into economics graduate students data-seeking behaviour, finding a lot of intuitive independent data gathering activity and, worryingly, a “liberal” attitude to sharing licenced data and a perturbing attitude that if something is online then it will always be online. Hogenboom (University of Illinois at Urbana-Champaign) talked about building small dataset collections, considering decisions on the basis of licence terms, quality, money available, potential future use of data, and who is requesting. Blake (Michigan) talked about a data grants programme they ran which saw researchers competitively apply for data resources. Nobel (ICPSR) presented on their curating content activity, deciding to curate studies submitted to Open ICPSR on the basis of methodological rigour, reputation, high priority data, data and documentation quality.

The other data librarian session was built around a new book edited by Kellam (UNCG) and Thompson (Windsor) and featuring a panel of IASSISTers. No spoilers. Go and buy the book. But it was interesting to hear how people found themselves in data librarian positions, the different aspects of the role, and critically, the wide ranging and (unrealistic?) expectations under which data librarian positions are advertised.

A closing mention goes to this year’s conference paper winners: Lafferty Hess and Christian (ODUM) for their paper "More Data, Less Process: The Applicability of MPLP to Research Data" in which they ask what the “golden minimum” is for archiving digital data.

Finally, IASSIST recognised Libby Stephenson and Ann Green with achievement awards for too many accomplishments to cover in this blog post.

The conference closed with a little less polished singing than the reception featured, hashtag wars resolved, and the IASSIST banner packed and headed for #iassist17 in Lawrence, Kansas.

#iassist16 tweets are Storified (including #iassist2016 tweets).

IASSIST 2015: Blog Post from a Data Librarian in Minneapolis

Topic:

“Hey Charlie I'm pregnant and living on 9th Street”. Wait. I don’t know anyone called Charlie. I’m not pregnant and this isn’t 9th Street. I’m living in a dorm room at University of Minnesota contemplating how I managed to end up back in dorm living before succumbing to assisted living. The reason? IASSIST 2015.

What follows is my take on this Aquarian Explosion: 3 Days of Data & Music.

By the time we got to Minnesota we were a couple of hundred strong. Stardust, golden and superbly organised by the Minnesota Population Centre (MPC), who managed to book a little remembered British R&B combo called the Rolling Stones to perform during the conference.

MPC can be faulted only for their failure to prevent a thunderstorm on Wednesday afternoon.

Lynda Kellem and Sam Spencer did a great job managing the conference programme, as did workshop, poster, and Petcha Kutcha coordinators, giving IASSIST 2015 legitimate claim to be the best ever.

Plenary sessions

The conference, entitled “Bridging the data divide”, was orientated around three challenging plenary sessions, which covered the destruction or construction of metaphorical bridges between data creators and users.

Steven Ruggles (MPC) outlined the downfall of the United States Census from the world’s leading innovator in data gathering, analysis, and dissemination to one hampered by policies of contracting out government services.

Curtiss Cobb from facebook presented a view we rarely get at academic conferences, a commercial company that needs and uses data and wasn’t actually trying to sell their creation at the conference (no need really as the person in front of me spent an hour utilising Mr Cobb’s product regardless). Whatever your view of that company, or speculations on the motives behind their stated aims, their needs embrace IASSIST’s organisational goals of supporting high quality meaningful data.

Andrew Johnson, Minneapolis city councillor and assuredly not the 17th President of the United States, recounted his campaign platform of using open data in government -- another bridge built, and one I hope connects governments to electorates and - ultimately - better governance. Cllr. Johnson’s session also revealed a set of cultural challenges familiar to anyone who’s interviewed researchers on data sharing: “[It] will be used to make us look bad”, “people could do anything with it”, and one I haven’t seen yet in data sharing excuses bingo: “Geeks will have an unfair advantage”.

Concurrent sessions

My first session produced three good presentations on RDM services.

Jungwon Yan’s research at University of Michigan indicated knowledge of RDM may vary across discipline and a stakeholder analysis may be helpful to understand the kind of RDM service needed.

Mayu Ishida (University of Manitoba) and Sarah Williams (University of Illinois at Urbana Champaign) claimed libraries are responding to funding agencies data mandates and developing research data services to include different types of data, domains, and needs.

Two Amies, Neeser and West (University of Minnesota), ended on a positive note for those of us struggling to deliver RDM support: it takes a long time, no one else is better/faster/more, and there is no “done”.

Kelly Chatain (ICPSR) began the session on “Integrating Principles, Practices, and Programs to support Research Data Management” by mentioning outreach to build goodwill.

Lizzy Rolando (Georgia Tech) highlighted the distinctions between data services and archives, which have implications for service provision.

Bethany Anderson (University of Illinois at Urbana Champaign) emphasised the importance of documentation for reuse, reproducibility, and replicability, urging us to take whole-lifecycle view into mind and think of preserving scientific memory as without context, data has no historical value.

Session C3 on “Data Sharing Behaviour and Policy” featured your friend and humble narrator going on about UK Higher Education Institution Research Data Policies.

After the audience had recovered, Amy Pienta (ICPSR) presented on the differences in data sharing attitudes between disciplines even if there is no apparent explanation in the data for those differences.

Alexia Katsinidou (GESIS – Leibniz Institute for the Social Sciences) offered preliminary survey analysis on non-compliance in data sharing that suggests surprising counter-intuitive reasons for not sharing.

D1 featured “Data Professionals”.

IASSIST 2015 fellow Adetoun Oyelude, (University of Ibadan) talked about her interviews with data specialists in Nigeria and the considerable financial and working culture challenges they face doing their job.

A. Michelle Edwards (Cornell) mapped the data lifecycle we all know and love into an approach for starting a new job.

The session then ended with Line Pouchard (Purdue) outlining differences between regulatory environments in United States and United Kingdom on video feeds in the CAM2 project, stating existing regulations were written before “Big Data” came, and subsequently they make sharing difficult.

Restricted-Use Data Support in Academic Libraries” found a “catalogue” (suggestions for a better collective noun are welcomed) of US based librarians speaking about attempts to facilitate sensitive data access in their institution.

It seems this is often on a basis of the librarian having prior knowledge and experience in these areas.

Reasons a secure data room was requested are essentially a) graduate students do not have their own space in which to work with sensitive data, and b) the supplier's request data only be provided with a consummate level of security provided.

Researchers need help with restricted data: facilities to ensure data security, and a professional to mediate applying for, receiving, and handling data, advice on complying with restricted data controls.

The final session I attended featured librarians working in the Data Management Plans as A Research Tool (DART) project.

This project uses NSF and NIH DMPs as a means to develop research data services at academic libraries thorough a standardised review process.

The findings are that DMPs are getting better over time, but there is a need for better, clearer “boilerplate” language to manage researcher expectations and halt their misinterpretation of what data services can offer.

Pecha Kucha

Doing the Pecha Kucha session justice in this blog post is impossible for a writer of my ability, and someone conscious of an already lengthy word count. You had to be there for the experience as IASSISTers unleashed their comedic and creative talents for six minute 40 second takes on a range of data (and wine) related topics.

Thanks to this year’s session, attendees are now aware of what it takes to draw an owl.

Poster session

The poster session was also full of good presentations. A few singled out for relevance to me included University of Toronto on RDM training, The UK’s new Administrative Data Research Network, and the simple, but effective, idea of collecting RDM stories.

Workshops

I’m sure they were great. I just didn’t go to one.

And finally…

Amy West did the data viz job in capturing #iassist15 tweets While Kristin Briney’s session notes became a work of art*.

Slowly, surely, presentations will start to appear on the conference or IASSIST website. And of course in the end there was a song.

What’s next?

Next year we move the show to Bergen, Norway. Oil, fish, Black Metal, and data. Join us!

* Briney, Kristin (2015): IASSIST 2015 - Whole Notebook of Sketchnotes. figshare. http://dx.doi.org/10.6084/m9.figshare.1439792 Retrieved 10:55, Jun 11, 2015 (GMT)

"Before anything else, preparation is the key to success." Notes from RDMF13: Preparing Data for Deposit

The Digital Curation Centre’s most recent Research Data Management Forum took place last week in London.

UK Data Service’s Louise Corti began the day with an overview of their acquisitions process. The Service (under various names) is almost 50 years old that gives it experience and perspective many institutions do not have. Lessons from those years include the importance of a collections development policy that’s allowed to evolve. The Archive evaluates on a basis of teaching and re-use for validation and replication. They have learnt from past mistakes and now keep access licences to three options: open, safeguarded (requiring registration), and controlled (locked-down access). Common problems persist however. Poor file names, weak description of methods and contextual documentation, limited metadata, and unexplained missing data files. The UK Data Service play a number of roles as a data service, from hand-holders and evangelical preachers, to being the Economic and Social Research Council’s police officer for non-compliance on data sharing.

Suzanne Embury made a valuable point in her presentation. Of course, the one thing we know is we don’t know how other people will re-use data in the future. But we can reasonably guess what they will want to do is discover, integrate, and aggregate it. To this end, simple things can help – check spellings, aim for standardised vocabularies, avoid acronyms. Finally, apply a domain expert test to see if people in the discipline can independently understand the data. With that, echoes of Gary King’s replication standard came to mind.

A presentation on meeting the RDM challenge focused on the University of Loughborough who have adopted a data preservation and sharing solution based on figshare and Arkivum support. Loughborough desire making depositing data as easy as possible for researchers by taking care of as much of back end stuff as possible. But at what cost, in both finances and quality? At the last IASSIST we learnt RDM takes a village, but Loughborough acknowledged the contribution of 61 people in setting up their service, so maybe it really takes a small metropolitan statistical area.

IASSIST’s own web editor Robin Rice directed us through data deposit at the University of Edinburgh guided by former IASSIST president Peter Burnhill’s refrain of "helping researchers to do the right thing". Edinburgh provide support throughout the data lifecycle with strong training resources (Research Data MANTRA), plus face-to-face sessions on managing data, creating DMP, good practice, handling data in SPSS, working with personal and sensitive research data. Like the UK Data Service, they recognise the value in keeping things simple and offering good incentives. Licence options, for example. Their repository only accepts open data (CC-BY 4.0) but depositing is based on five required metadata fields. In return, depositors get their data available quickly with open download stats for every item.

The afternoon sessions split into three discussion groups. Emerging from them were thoughts on keeping metadata requirements as simple as possible, recognising the concentrate on different aspects depending on the discipline; some disciplines require precision while others do not require so much. An acknowledgement that data discovery is often undertaken through google. Also, while there inevitably is a range of people providing a service, there needs to be or a person connecting existing resources in a university. Finally, raising awareness is a problem, demand related to institutional awareness.

Presentations from the event are available from the DCC, and tweets with the hashtag #rdmf13. The DCC will be blogging about the discussion group sessions.

  • IASSIST Quarterly

    Publications Special issue: A pioneer data librarian
    Welcome to the special volume of the IASSIST Quarterly (IQ (37):1-4, 2013). This special issue started as exchange of ideas between Libbie Stephenson and Margaret Adams to collect

    more...

  • Resources

    Resources

    A space for IASSIST members to share professional resources useful to them in their daily work. Also the IASSIST Jobs Repository for an archive of data-related position descriptions. more...

  • community

    • LinkedIn
    • Facebook
    • Twitter

    Find out what IASSISTers are doing in the field and explore other avenues of presentation, communication and discussion via social networking and related online social spaces. more...