World’s Cleverest Website Archiving Tool

was supported by technologies and brainpower from Redwerk

PageFreezerPageFreezerVancouver, Canada is an industry-strength web based service for managing, archiving, retaining, and replaying dynamic web content and social media.

All Customers

Product Development is one of those projects where Redwerk team implemented modules and special features from the ground up. As a full-development agency, we provide quality realization at every product development stage and guarantee perfect applications, fully-ready for launch.

Learn more

Data Mining

Automatic processing of websites and social network APIs, scraping them as big data and rendering archived websites back to the users is what we can code.

Learn more


PageFreezer is the name of a technology start-up and also a web service which archives websites in a convenient and easy-to-use way, according to flexible schedules defined by the user. Any website, blog, or even Facebook and Twitter profiles, can be preserved for “future generations” in an interactive way, going much further than common screenshots.

This is a useful service for regulatory compliance, litigation protection, or marketing purposes. PageFreezer is an enterprise-class SaaS solution which supports even the most complex websites, and is convenient for individuals, small firms, as well as large corporations.

PageFreezer makes archiving the web easy, and enables you to re-live archived websites of the past as if they were hot off the press!

Redwerk was tasked with supporting the underlying technology, the IT “intelligence” behind this innovative web service. The goal was to build a SaaS application that would enable clients to permanently preserve their website and social media content in evidentiary quality and then access those archives and replay them as if they were still live. It was fundamental that this solution should support even the most complex websites, blogs, Twitter or Facebook profiles, and all that one the same integrated platform. The application had to use web crawling technologies to capture websites automatically, as often and when users wanted. The crawled content also had to be made searchable.

The main features included:

  • Automatic archiving
  • Public records compliance
  • Live replay/browsing of archives
  • Search for contents
  • Digital signatures
  • Data export
  • Data access through API


Website Crawling

For PageFreezer, we created a proprietary highly advanced web crawler, which takes into account every minor peculiarity of every known web server and web browser software. It’s a Java library, which integrates well with any project and provides interfaces to override various behaviors.

In order to monitor the crawling processes as conveniently as possible, we created an informative admin interface. We made it possible to crawl and capture images as well as text, and even flash animations, even when they were on different domains. An extra URL list was created for this purpose.

Include, exclude and advanced website settings were introduced, making it even more convenient for users who wish to crawl certain URLs depending on keywords. Flexible user agent selection for crawling was also added. The mechanism was designed to crawl web pages at moments when they are not under high load. Clients can also use the option of crawling speed to configure the number of crawl workers for each individual task to reduce the load on the website.

Redwerk also implemented a standard sitemap XML crawling feature to reduce the time it takes to crawl large websites, because only modified pages and their contents are crawled and archived.

A number of outstanding, technologically advances crawling options were also made available:

  • parsing links out of XML files using XSLT templates
  • generic authentication mechanism allowing crawlers to authorize on almost any website

All of these features make PageFreezer a much more technologically advanced solution compared to the competition.

Website Playback

One of the main goals and most impressive usage scenarios was that users had to be able to browse copies of websites as if they were live now. This was perhaps the key challenge and involved a lot of complex thinking and innovative approaches in terms of enterprise app development. But our extensive experience in providing web development services helped us to create a solution based on hyperlink resolution and on-fly substitution, JavaScript and redirect interception and much more.

In order to get to your desired point in time, a convenient calendar was created, highlighting the dates on which the snapshots were taken. In order to allow the user to see the site structure we created a simple navigation tree which reflects the URL hierarchy. All the tree nodes are clickable and open the corresponding site page.

Social Media

Crawling social media profiles was a much harder challenge, as different rules apply to them compared to conventional websites. PageFreezer’s link extraction was initially created with the help of regular expressions and content parsers, but most Twitter, Facebook and other social networks are dynamically built with JavaScript. As they were all different, it was very exhausting to build the framework and extend it to additional social networks. The whole solution was unreliable at this stage, and all future modifications to these social networks would have had to be implemented in the system, too. In the end, it was decided to develop a social network adapter based on third-party social network client libraries in Java. Spring Social was identified as meeting our requirements.

Data Storage

One of the most difficult tasks in this project was to select the best storage option, which had to be very scalable. The project started with approximately 500 sites, but had to be prepared for much more. We toyed with the idea of using S3 or Google for some time, but those proved to be too slow to access and too expensive. So Redwerk had to come up with a more flexible, custom-tailored idea, and after some benchmarking we built a simple yet scalable custom storage cloud from scratch, based on a database and NFS file system.

Data Integrity

It was essential, as always, to ensure that no information was lost in case of failure of any part of the system. We implemented a modern logic which makes crawlers stop and wait in case the database or the file system are unavailable. When these components come back, no information gathered by the crawlers is lost, and the use of checksums helps maintain the integrity of all stored data.

Digital Signatures

A digital signature is a set of algorithms and other methods for validating digital documents or messages. They are used almost in all sectors of economy to detect forgery or tampering, making it a fundamental security tool.

The PageFreezer service is no exception. Here, Redwerk opted TSA, used by PageFreezer to digitally sign all crawled content. Hash data of crawled content, verified certificates, user keys and timestamps are all used when signing through TSA. Therefore, a valid TSA signature is what guarantees to PageFreezer clients a reason to believe that original webpage was crawled at particular moment of time. PageFreezer data can even be used as evidence in court thanks to this implementation.

Once the system is enabled, all snapshots available to the user will be signed through TSA, and the signature can be verified on the browsing page at any time.


To protect data from destructive forces and the unwanted actions of unauthorized users we use a rock-solid combination of firewalls, fail2ban, back-ups and slave database servers. Generally speaking, the system was created to be as modular and scalable as possible. The components do not affect the performance of each other. Crawlers are separate processes, and different modules were designed for logged-in users and guests.

Need a team to build your product?

Request Quote


developers on the dedicated team


QA engineers on the team


years long engagement


lines of code


This was the kind of challenging software outsourcing that Redwerk is renowned for. The solution was successfully prototyped, built and underwent a couple of re-designs over the last couple of years, to make sure it stays state-of-the-art.

Redwerk has been adding new functionalities to meet new demands by PageFreezer’s customers. Our software developers handle all the maintenance of the system, including such administrative tasks as upgrades and backups of the database and the archived content. Today, PageFreezer is the leading solution for flexible online content archiving needs, and we are proud to say Redwerk’s technology and know-how have contributed to its success!


Website and social media archiving solution PageFreezer - dashboard

In Press

If you’re looking for an archiving software, HearsaySocial, Socialware, PageFreezer, or Smarsh are all great places to start.

Leader growth platform that delivers topics from marketing, to sales, to customer service.

PageFreezer is a leading SaaS application that gives the users the power to manage, edit, and fine-tune archives without the need to contact customer service.

Popular media about tech sector and economics.


Red Herring Global 100 Finalist

Red Herring Top 100 Global Finalist

He estado trabajando con Redwerk casi continuamente desde 2006 en varios proyectos complejos de desarrollo de software (C++, Java, JSP, Spring, Django, iPhone). Esta empresa ofrece excelentes servicios de desarrollo de aplicaciones de software a excelentes precios. Son muy flexibles, centrados en el cliente, receptivos y comunicativos. Recomiendo sinceramente a otras empresas que los contraten para sus proyectos de desarrollo de software.
Michael Riedijk
Michael Riedijk, CEO de

Want an award too? Work with us!

Contact Now

Looking for the best price-to-quality ratio?

Contact Us
He estado trabajando con Redwerk casi continuamente desde 2006 en varios proyectos complejos de desarrollo de software (C++, Java, JSP, Spring, Django, iPhone). Esta empresa ofrece excelentes servicios de desarrollo de aplicaciones de software a excelentes precios. Son muy flexibles, centrados en el cliente, receptivos y comunicativos. Recomiendo sinceramente a otras empresas que los contraten para sus proyectos de desarrollo de software.
Michael Riedijk
Michael Riedijk, CEO de
PageFreezer - website archiving tool. Screenshot
PageFreezer - website archiving tool. Digital sign screenshot
PageFreezer - public records compliance
PageFreezer - website archiving tool. Report
PageFreezer - website archiving tool. Report list
PageFreezer - website archiving tool. Sitemap XML crawling feature
PageFreezer tool - social media archive technology
PageFreezer archive tool - social media data export feature
PageFreezer social media archive tool screenshot
Website and social media archiving solution PageFreezer - Dashboard
PageFreezer social media archive tool - Users screen
PageFreezer website archiving solution - parsing links out of XML
PageFreezer social media archive tool - XSLT templates


Hire us