This is a series on web archiving at the UBC Library. For all posts about web archiving, please see https://digitize.library.ubc.ca/tag/web-archiving/
The Digital Initiatives unit at UBC Library offers an opportunity every term for students to have a Professional Experience project in web archiving. During the Summer of 2018, I had the chance to work with them in the web archiving initiatives.
The Professional Experience was great in many ways. I had the opportunity to learn about web archiving, from understanding its importance to performing quality assurance on crawled web pages. During the term, I focused on creating a web archiving collection of sites related to Marijuana Legalization in Canada, and more specifically in British Columbia. The collection will enable people to access web content that was created, like awareness campaigns, and also see different perspectives about the topic.
Why web archiving?
Developing a working knowledge of web archiving seemed like a great opportunity. New data and content are posted to the internet every day and a great portion of it is made of information with a short life cycle—social media and news, for example.
Along with the massive production of information, there is also information loss. Websites, web pages, and links just stop working, because someone decided that the information was no longer useful or the content was moved from one site to another. Web archiving is an alternative to prevent information loss. Archiving websites and webpages uses a process that involves information curation, copyright, crawling the web, quality assurance and a lot of troubleshooting.
Web archiving initiatives are growing. Not only are academic and public libraries investing in web archiving, but companies and cities are as well. Libraries have created collections to serve different purposes and preserve information on specific topics like institutional memory, elections, natural disasters, landmark laws, politics, educational purposes, and more.
Companies that use web archiving services tend to do so for two main reasons: competitiveness and litigation purposes. For example, a company may want to avoid or create legal processes, based on what is on the internet, or to save statements and information released about a competitor.
As a future librarian, I perceive several opportunities with web archiving, due to the profile of our profession. In general, librarians are experts when it comes to monitoring information, content curation, metadata, users’ needs, copyright, and technology. Those are some of the skills and knowledge needed to work with web archiving in the mentioned contexts.
In turn, web archives are a useful tool for librarians as they can help in many ways, for example:
- Reducing the amount of work needed to update broken links on Research Guides
- Making it easier to find new resources to substitute ones that are no longer available for access
- Ensuring access to great resources on the web, without worrying if they will still be available
- Registering information that is easily lost, like social media and news
While web archiving is full of opportunities, it is also full of challenges. The main ones in working with web archiving are:
- To perform quality assurance (QA): sometimes web crawlers have problems collecting information from websites with interactive content, for example. Figuring out how to scope and define rules for crawling in order to properly display web pages may be challenging sometimes.
- To find a balance between archiving content and data loads: finding the ideal scope and rules helps to find the balance but is not everything. Decision making is required to find the balance between how much of data will be saved (meaning how much will be invested) and archiving the website/web page (how much content should be web archived) is another challenge.
A Professional Experience in web archiving is an excellent opportunity to learn about the topic, have a hands-on experience, work with professionals from the field and strengthen your resume. The position will enable you not only to learn about web archiving, but also to exercise and improve your skills related to time and project management, reporting, and to work autonomously.
If you are interested in getting to know more about web archiving, then check the resources:
- UBC’s web archiving collections
- UBC’s web archiving policies and project proposal
- Internet Archive: Wayback Machine
- Interview with Sylvie Rollason-Cass, web archivist at Archive-It
Written by Paula Arasaki, MLIS student at UBC