Already a member?

Sign In

Shedding our skins? Reflections on a liasion librarian's attempt to learn how to scrape data from the web using Python

Presenter 1
Jeremy Darrington
Princeton University

In recent years, there has been a sharp increase in the amount of data being created and exposed on the web. Many useful data sources–such as campaign finance and lobbying records, speeches, court rulings, legislative bills, news, geocoded event data, etc.–are of interest to social scientists and the librarians who assist them. These sources come in a variety of formats and often require considerable work to extract, organize, and clean up for analysis. As more of my clientele have expressed interest in using these kinds of sources, I decided to embark on a personal training effort to learn how to scrape and process text from the web using the Python programming language. This paper will present reflections on my experience, the utility of this kind of training for liaison librarians, and issues surrounding professional development and the acquisition of new skills by librarians.

Presentation File: 
  • IASSIST Quarterly

    Publications Special issue: A pioneer data librarian
    Welcome to the special volume of the IASSIST Quarterly (IQ (37):1-4, 2013). This special issue started as exchange of ideas between Libbie Stephenson and Margaret Adams to collect

    more...

  • Resources

    Resources

    A space for IASSIST members to share professional resources useful to them in their daily work. Also the IASSIST Jobs Repository for an archive of data-related position descriptions. more...

  • community

    • LinkedIn
    • Facebook
    • Twitter

    Find out what IASSISTers are doing in the field and explore other avenues of presentation, communication and discussion via social networking and related online social spaces. more...