Web Curator Tool

WCT logo


One of our innovative tools is Web Curator Tool (WCT).  WCT is used for acquiring web material, such as websites, web pages, and other documents on the internet.

Web Harvesting and WCT

The National Library of New Zealand runs a selective web harvesting programme using the Web Curator Tool. Websites harvested by this method are deposited into the Library’s digital archive (Rosetta) and are then available to researchers via our main delivery channels. The tool enables a user to enter descriptive and administrative metadata for a website, schedule and run a web crawl on that site and review the archived content.The collected web material is then stored and preserved in the digital archive.

 Web Harvests into preservation system


                        Web harvest into preservation system

The Web Curator Tool was developed in 2006 as a collaborative effort by the National Library of New Zealand and the British Library (external link) , initiated by the International Internet Preservation Consortium (IIPC). The WCT is written in Java and designed to run in Apache Tomcat. It has a flexible architecture, allowing the components of the tool to be distributed over multiple servers. WCT is available under the terms of the Apache Public License.The Web Curator Tool was released as open-source software and can be downloaded from GitHub http://dia-nz.github.io/webcurator/ (external link) .

The Web Curator Tool (WCT) has now gone out with a 2.0 release in December 2018. This release is the product of a collaborative development effort started in late 2017 between the National Library of New Zealand (external link) (NLNZ) and the National Library of the Netherlands (external link) (KB-NL).

The Web Curator Tool is designed for use in libraries and other collecting organisations. It supports collection by non-technical users while still allowing complete control of the web harvesting process. The tool supports:

  • Harvest authorisation - obtaining permission to harvest web material and make it accessible;
  • Selection, scoping and scheduling  - deciding what to harvest, how, and when;
  • Basic description - adding unqualified Dublin Core metadata and web-specific notes;
  • Harvesting - downloading the selected material from the internet;
  • Quality review - ensuring the harvested material is ready to archive; and
  • Archiving - submitting harvest results to a digital archive.

Archiving to Rosetta

When archiving,

  1. The WCT packages up the web harvest in a SIP structure along with a METs xml file.
  2. It authenticates with Rosetta via the PDS login to get a PDS handle,
  3. Transfers the files to a secure location via FTP,
  4. Then makes a deposit web service call to Rosetta to make the SIP submission (including the FTP folder),
  5. Rosetta then returns the SIP ID on success.



WCT user manuals

Before using the tool, we recommend you read the manual which you can download from:



Back to top