On April 23, 2005 the first video ever on YouTube was uploaded by Jawed Karim, one of the founders of the website. During that year, ‘Jawed’, as his username was, would upload some sixty more video clips. But only this first one, titled “Me at the Zoo”, is still available on YouTube. This is typical for much of YouTube’s history: once uploaded, videos might be there but they might also be gone, unavailable for those who would like to revisit the early years of YouTube. And this loss matters, especially for media historians who aim to study that era when YouTube started, and rapidly became this rather unruly community in which people experimented with new media forms and formats. At the time new genres emerged, including remixes, prank videos, tutorials and intimate personal vlogs. New user generations who entered this platform were drawn to what they took to be a total freedom of expression or by the ability to gain a potential worldwide audience. The early YouTube videos indicate a new type of media use that disrupted and ultimately transformed social, aesthetic and cultural conventions on how, what and when to use the camera and share those recorded moments on a public website.

In just a decade, the platform transformed from a modest video repository into a global popular archive of professional and non-professional audio-visual materials. Being this huge database of millions and millions of mediated human traces, of happy and unhappy moments, makes YouTube a fascinating platform and a ‘heterogeneous, but for the most part accidental and disordered public archive.’1 In addition, the combination of audio-visual content and the responses by viewers, including comments that liked, hated, or supported the content, makes YouTube a valuable resource for future historical research on media culture as it brings together the links in the chain of media production, distribution and reception. If archived properly, media historians might be able to understand the process of meaning making during the emergence of a digital audio-visual media culture.

This article addresses the challenges that historians face when they aim to reconstruct the early history of YouTube. Based on some preliminary findings that came out of the research project ‘Intimate Histories. A Web-Archaeological Analysis of YouTube’s Early History’2, I learned that through a combination of digital search tools and computational methods like computer vision, it is possible to create an interesting collection of sources that are still available on the platform. However, I also realized how, because of the dynamics of this platform, there is a great level of uncertainty as to what is still there. what is lost or what is no longer traceable. Therefore, addressing the question if and/or how YouTube can be understood as an archive is crucial and, in addition, the issue of how YouTube related content can be used as historical sources.

Although many people, including media scholars, treat YouTube like an online open access archive, there are reasons to be cautious. For one, it is impossible to have precise knowledge about the exact parameters of this collection, as the video repository is extremely dynamic with almost five hundred hours of content uploaded every minute, while at the same time videos are removed either by the platform or by users themselves. We don’t even know what is there now; let alone what was there in the early years. However, as web historian Ian Milligan recently stated: ‘We need to understand gaps in how web archives find content on the Web and what gets preserved and what does not.’3 The article will take up exactly this issue of gaps in web archives when it comes to exploring the early history of YouTube via formal and more informal popular web archival practices. After a more general discussion, I will present a case study that shows that the lack of stable archival collections can be a challenge and an opportunity at the same time.

Amateur Media History

This article situates the early YouTube culture (starting in April 2005 until the end of 2006 when Google bought the platform) within recent academic debates about the cultural dynamics of amateur media practices. It builds on previous historical research that explored the changing means and meanings of ‘amateur’ film, taking a long-term historical perspective across more than hundred years.4 Amateur media history shows how – time and again – new media technologies were explored by diverse groups of users, each attracted by different modes of amateur media usages.

As Tom Slootweg demonstrated in his research on the introduction of amateur video, there are basically three amateur modes representing the rich diversity in practices and functions among historical film and video amateurs, namely the home mode (mainly used to describe a memory practice within a more private and domestic context), the community mode (mainly used for delineating serious hobbyist who are well organized in film or video clubs) and the counter mode (referring to a more ideological perspective indicating film and video makers embracing amateur equipment as something that can offer tools to resist institutionalized media and/or politics).5 Although these modes offer a heuristic to think about amateurism beyond a more normative approach, it leaves the question open as to how flexible these functional modalities are in the present era of digital media where a complex combination of technological, social, political, economic and cultural changes take place. For instance, can we still speak about the home mode when we analyse family vlogs uploaded on a public platform like YouTube? And how relevant are those categories when basically everyone makes use of recording technologies to document his or her life?

In his book on the cultural dynamics of home movie, Tim van der Heijden took a long-term historical perspective. One of his conclusions about the current digital era is that amateur film practices have become much more inclusive, meaning that almost everyone – old and young, male and female, poor and rich – uses media technologies to produce images.6 This increased process of democratization is, according to Van der Heijden, closely related to a process of multimediatization referring to the availability of multiple media technologies enabling everyday users to record and share self-made moving images. These developments transformed home movie making from a rather exclusive practice into a ubiquitous phenomenon. As a consequence, as Van der Heijden suggests, these transformations raises questions about what constitutes amateur film making today.

The emergence of the Web 2.0 challenged notions of amateurship in entirely new ways, since uploading self-made videos on an open platform moved the practice beyond small social circles of distribution and reception. YouTube’s initial call to ‘broadcast yourself’ brought new opportunities to present oneself in front of a potentially worldwide audience.7 The arrival of social media platforms like Facebook or YouTube marked the arrival of social spaces that made the amateur visible in a way that they were never before.8 Still, despite the changes there is continuity as well, as Henry Jenkins suggested earlier. According to him the emergence of an online participatory culture was well prepared by decades of activities by DIY communities that appropriated media technologies: ‘YouTube may represent the epicentre of today’s participatory culture but it doesn’t represent its origin point for any of the cultural practices people associate with it.’9

To look more closely into this matter and become aware of both the continuities and discontinuities, Lisa Gitelman’s approach might be of help. She suggested to look at how amateurs’ ‘doings and do-abilities are situated within the broader cultural economy.’10 Such an approach might give concrete pointers to re-evaluate the traditional boundaries and hierarchies between professional and amateur, public and private and commercial and non-commercial in the digital era. To come back to the early history of YouTube: who were those early users that entered the platform and became actively involved as producers of content such as videos, comments, profiles, likes, etcetera? What type of content was produced in terms of aesthetics, genre and topics? How did people appropriate media devices such as webcams or editing software, and how did they learn to navigate the platform? And finally, what was the self-perception of those early content creators?

YouTube histories

There is growing amount of literature about YouTube. Already four years after YouTube started, an extensive edited volume, titled The YouTube Reader was published in 2009 by Pelle Snickars and Patrick Vonderau. Signalling the growing importance of this platform, the book addressed issues of mediality, types of usage, storage and archival questions, changing forms and formats and the political economy of the platform. That same year, Jean Burgess and Joshua Green published their book YouTube. Online Video and Participatory Culture, in which they combined theoretical reflections with empirical research about amateur producers. Burgess and Green described the growing pains of a platform that became a success almost overnight. They discussed the uncertainties among the users and the platform itself of what it should or could be its goal, especially after Google bought the website in October 2007. For many users, the contradictory desires and actions by corporate activities and the original community raised fierce debates. In 2010, Michael Strangelove published Watching YouTube: Extraordinary Videos by Ordinary People, emphasizing the everyday and the ordinary aspect of new forms of cultural production and communication.11 Since then, the amount of titles has grown tremendously. In a recent Convergence issue from 2018, titled “Researching YouTube”, the platform’s changing character and significance after its first ten years of development was explored.12

In her book The Culture of Connectivity, Jose van Dijck takes a historical perspective. Looking back at the history of the main social media platforms she identified a rather quick transformation of YouTube from an amateur-driven community platform into a commercially driven large corporation who exploits, rather than enables users’ connectedness.13 For our case, it is especially interesting to note how Van Dijck emphasised the fact that YouTube became so successful because it was able to develop a strong sense of community, whether its users were content creators or just viewers leaving their responses in the comment section. According to Van Dijck, dedicated YouTubers volunteered to monitor the site and took pride in a kind of community-based philosophy. After Google bought the platform, the company initially respected that identity and remained hesitant to implement drastic modifications. Eventually, many major changes were introduced, most of them aimed at standardization of production and broadcasting of video content and at commercialisation favouring professional content over user-generated content. Frequently, these changes incited resistance by YouTubers; with every new mutation, active communities of users debated the effects of what do would mean for them and their everyday routines on the platform.14

Elsewhere, I discussed how these critical reflections by users about a loss of agency gradually evolved into a more nostalgic sense of loss of what once was.15 For instance, on YouTube there are multiple examples to be found of videos that display old layout versions of the website or video compilations that contain remixes of old YouTube clips. The main aim of those types of videos is to give the viewer some sense of past experiences of the platform and stimulate viewers to share comments about their time spent on the platform.16 These performances of nostalgia initiate narratives of a shared history of the medium. Indeed, these practices can be understood as a form of memory work which creates, as historian Joseph Wachelder suggested, a process of ‘ex-post, explicit (re) construction of a collective generational identity.’17 In other words, when the first generations of YouTube users grew older and started reflecting on their past, they developed and acquired a collective biography of this – in their view – once ideal community.

In amateur media history there is a growing interest in how a medium as a technology ages and how this process of change and loss fuels a kind of ‘technostalgia’ – as ‘a bitter sweet longing for past technology’ – reflected in practices that aim to re-experience and revive old amateur media technologies.18 Whereas these media histories tend to refer to media objects like 8mm film cameras or Kodachrome colour film, the no less material practices related to interactions with past web features, video resolution qualities or the platform’s layout are likewise experienced as ‘old’ technologies. The commitment of users to revive those experiences through video compilations or through writing comments can be valuable in helping to understand these specific forms of social media nostalgia but they can also be instrumental in relocating sources that are otherwise lost or difficult to find.

Web history

In order to study the early YouTubers and their online practices, it is important to have a critical understanding of the videos uploaded, the comments and likes shared and the tagging and linking as being born digital sources. Similarly, in order to be able to reflect critically on the probabilities to collect those sources, we need to have a better understanding about the state of the art of institutional and informal web archives. However, web history and web archiving are still emerging fields of study. As the leading web historians Niels Brügger and Ian Milligan stated, historians and archivists need yet to become ‘familiar with both the technical and material aspects of web archives, as well as the related theoretical and epistemological concerns that arise when dealing with these digital artefacts.’19 According to Milligan, web archives hold documents that differ from what historians are used to work with, both in terms of scale (abundance of data) and scope (different kinds of sources).20 For instance, with regard to scale: instead of collecting and analysing hundred home movies about birthday parties found in a film archive, YouTube returns millions of results about the topic.

This means that historians need to be aware about the peculiarities of web archiving and accept that these are closely related to the peculiarities of the web itself. As Brügger has extensively explained in his work, it is impossible to archive all dimensions of the web and make the archived object look like the original ‘live’ online version. Especially the dynamic of constant updates poses archival problems. Niels Brügger and Ole Finnemann characterise the web and its materials by its ‘ongoing changes, modifications, disappearances of content, and reconfigurations of relations.’21 The web creates unstable objects, and to archive these is impossible without stabilizing them first, which – in return – results in a transformation of the object deleting its authenticity.22 Because the archival process of the web is so different compared to analogue archives, historians need to be aware of the fact that there are multiple factors to take into account when studying web materials. Brügger categorized the complexities of web archival documents as follows: 1. There is an absence of a stable original with which to compare the archived version. 2. The archive is most probably incomplete and full or errors. 3. A web archival document is by definition a ‘re-born’ version or in other words, the archival record is a unique version and not a copy. 4. The document will most probably be fragmented, being temporally and spatially inconsistent.23 Working with web materials from YouTube’s early era demonstrates exactly these issues.

In addition, there is the issue of availability. While there is a promise of ‘unprecedented abundance of primary sources for diachronically tracing, examining and understanding major events’ there is paradoxically also the real threat that a majority of web-based sources vanish.24 Already in 2003, digital historian Roy Rosenzweig acknowledged the risk that in the near future historians would simultaneously experience both a digital Dark Age and overload of information.25 Constructing a collection about YouTube’s early history is a good example of this paradox. There is still a lot of material available from that early period, but much is lost as well. Therefore, urgent questions are: do we know what is there and do we know what is lost?

Why YouTube is (not) an archive

To retrace those early moments of active participation and interaction, including instances of enthusiasm and resistance, not only the videos but a far wider variety of historical sources are useful, including comments, interfaces and layout, tags, usernames, guidelines, and profiles. But what are the chances that those documents and elements are still available and traceable? The fact is that the long-term sustainability of YouTube is questionable as there are several reasons to be cautious about YouTube as a media archive for historical research.

For one, it is impossible to have precise knowledge about the exact parameters of this collection, as the video repository is extremely dynamic with almost five hundred hours of content uploaded every minute. There have been several attempts by scholars to measure the scale of YouTube, ranging from eighty million to more than three billion videos in 2015.26 In a statistical analysis of ten years of YouTube channels and videos, Matthias Bärtl found that there is still a tremendous growth of user-generated videos uploaded that are tagged as ‘People & Blogs’ channels: the majority of newly created channels were in this category, coming close to 75% in 2016.27 However, because there is a declining interest by viewers in these types of materials, this strong growth ‘creates an increasing mismatch between demand and supply and will naturally leave many channels with very few views,’ according to Bärtl. Looking at all the videos uploaded in one particular year (2016), Bärtl analysed that some 50% received no more than 89 viewers.28 As a consequence, these low numbers creates a problem finding them, since any crawling or querying of YouTube’s API will be biased towards popular videos and thus ignore the number of videos with little attention. The search algorithms favour recent and popular uploads, rather than assisting users to find historical or unique material that rarely has been viewed before. If a user then abandons his channel, footage might become ‘orphaned’ and eventually untraceable. In addition, there is no ‘YouTube-archivist’ nor ‘catalogue’ that can help retrace sources. YouTube lacks the standard archival practice of organisation based on principles of provenance and order.29

A third reason to be hesitant to consider YouTube as an archive is the fact that the platform has implemented a continuous process of deletion, and it is unclear how much is deleted and by whom. Users with an account can delete the content they have uploaded. Once a user deletes a video, it is removed permanently from YouTube, without any possibility of recovery. Much more content is disappearing because of changing guidelines. Ever since the platform started, YouTube has been criticised for distributing unwanted content, whether that evolved copyright infringements, images of violence and extremism, sexual abuse, child’s safety, spam and scams, cyberbullying or other forms of bad taste. As a response YouTube adapted its guidelines multiple times and implemented methods for flagging content.30 In its “Transparency Reports”, Google publishes extensive overviews of what type of materials were removed for what kind of reasons. In a report published in early 2019 it gives an overview of the last quarter of 2018: Between October and December 2018, YouTube terminated 2,398,961 channels, which included in total 76,9 million of videos.31 In addition, the site removed another 8,765,773 videos because of violation of the terms of services. These videos were partly identified through automated flagging systems without human intervention. In addition, the platform removed over 261 million comments for violating its Community Guidelines. According to the report, the total number of comment removals represents only a fraction of the billions of comments posted on YouTube.

These platform policies clearly show that the main interest is not in taking responsibility for sustainability of its content but for the endurance of the platform as a commercial enterprise. A more dangerous prospect is that Google as a company can decide to close the website, and indeed it does have a rich history of ‘killing’ services: the ‘Google Graveyard’ already contains multiple apps and services such as Google Hangout and Google+.32 In addition, the last decade demonstrated that it is more than likely that popular sites will eventually disappear or discontinue their service. It happened to Geocities in 2009, to the Dutch platform Hyves in 2013, and to the very popular app service Vine in 2017. Most of the web materials are gone, although Twitter, who owned Vine, stored the vine videos in a static archive.33

Archiving YouTube

Increasingly, web archives and media archives feel the urge to make sure that the culture represented by YouTube material needs to be saved. According to Helen Hockx-Yu, there is a general frustration about the content gaps in the web archives. In her report on web archiving, she noticed that many national libraries have clear ambitions to collect portions of YouTube, and other social media.34 At the same time, as archivist Luke McKernan, curator for news and moving images at the British National Library formulated, there are some challenges that needs to be addressed first:

What then is an audio-visual archive? Is it the archive gathered by traditional means, in which the best-quality material is selected through curatorial guidelines, to ensure a representative collection of optimum preservation quality? Or is it the random vastness of the web archive, in which videos of low image quality, minimal metadata and frequently spurious significance, are contained within a larger archive of web texts? Should we sacrifice quality of image for quantity of content, or should we maintain principles of selectivity, so that the best content is preserved in its optimum form?35

The British Legal Deposit of 2013 makes it possible to archive web materials, but video and sound are excluded. This means that so far, no systematic collection of YouTube material has taken place in the UK.36 In 2015, the Netherlands Institute for Sound and Vision did start a web video project aiming to archive Dutch YouTube videos that represents the current online media landscape. Based on certain criteria they started collecting videos from popular content creators, typical Dutch themes or videos addressing important societal issues. To avoid legal issues and to ensure high quality content, the institute make license agreements with YouTubers and opt-out agreements with incidental users crawled by the archive.37 The advantage of this approach is that this official media archive can ensure a policy of sustainability and that a collection will contain a formalized set of metadata. The downside of this approach is that the focus is on video clips as the main source and not so much on the rich context of YouTube culture. In addition, material of unknown users tend not to be archived as extensively, which might create a gap in understanding the more vernacular side of the platform.

So far, the most important location for YouTube material is the USA-based non-profit Internet Archive, which has been crawling the platform at regular intervals. Librarian Brewster Kahle founded the Internet Archive, by now the oldest and the largest web archive, in 1996 in San Francisco.38 Since then, the archive has collected millions of books, movies, software, music, and websites. Between April 28, 2005 and May 1, 2019 some 547,712 YouTube snapshots were archived, each snapshot being a representation of a tiny fraction of the platform at a particular time. The snapshots are available through the Wayback Machine, a service that enables users to visit archived websites pages by its original URL.

The volume of snapshots may sound impressive, but there are reasons to be critical as to how useful this archive is. It does have serious drawbacks, as it is unable to reproduce flash-based videos and therefore older snapshots show nothing but a front page with empty screen. Another problem is the depth of the crawl: many snapshots do not allow for the researcher to follow a link. It might be that the machine will bring the researcher to another snapshot closest in time, or even to the live Web. This sort of leakage of live content into archived sources has been described as ‘zombies’, which makes those items vulnerable as historical sources.39 According to Richard Rogers, ‘the Wayback Machine sacrifices temporal matching for smooth navigation, and as such embeds a period in Web history, in an experience that could be described as more living museum of a surfer’s space than historian’s meticulous archive.’40

There is also a more fundamental issue that makes the Internet Archive different from traditional archives; the researcher has no exact knowledge how a snapshot was produced. As Anat Ben-David and Adam Amran explained in their analyses of the socio-technical processes behind the construction of snapshots, a website might have been archived by a crawler or by a human.41 As they explain:

The circumstances could be endless: from an accidental archiving of a crawler that fetched a deep link or an external link from another website, through a website owner submitting their website for archiving, to people who use the IAWM’s ‘save this page now’ feature for various reasons. Lacking sufficient circumstantial context, we would argue, makes it difficult to ground an archived snapshot as historical fact. For although two snapshots can be identical in terms of their content (albeit archived at different timestamps), the knowledge production process behind their archiving might tell a very different story and might be initiated either by human or non-human actors.42

In order to give the user of the Wayback Machine more insight to the provenance of snapshots, the archive introduced a new feature in 2016, called ‘about this capture’. This button gives the researcher access to information about for instance the collection in which it is stored or about the organizer behind this crawl (which can be the Internet Archive itself, but also third parties). Individual users or institutions can actively upload content in the Internet Archive since 2006 when they introduced Archive-IT, a service that helped organizations to harvest, build, and preserve collections of digital content. More recently, the new tool ‘Save-Page-Now’ (SPN) allows individual users to add webpages to the archive. According to its founder Brewster Kahle, the Save-Page-Now is used heavily, up to a hundred times per second.43

The intentions of users participating in building collections within the domain of the Internet Archive might vary from personal, to historical, to political or to more legal reasons. An interesting case is the ISIS videos of beheadings that were removed by YouTube. In a blog post on web archives, it is noted how Al-Qaeda and ISIS sympathizers upload clips documenting executions to the Internet Archive to circumvent the removal policies.44 For instance, on February 4th of 2015, ISIS uploaded a video to the Internet Archive showing the recordings of the execution of a Jordanese pilot. While this video is no longer available on YouTube, it still is possible to watch it via the Internet Archive. The paradox is that the extremist wants this material archived for propaganda reasons while others, who want to make sure that these atrocities will be prosecuted, also want to keep the data safe as it might be used as evidence in court. The same holds for media scholars who like to keep the material sustainable for the sake of referential reasons or for future research.

Informal archival practices

An interesting alternative to these more or less institutional attempts to archive YouTube is to be found in the growing movement of Do-It-Yourself archivists – active users, amateur media collectors, socially engaged, or political activists, volunteers and artists – who collect, share and redistribute YouTube videos. An example would be those channels on YouTube, like ChannelArchive45 or video’s like “YouTube Layout Evolution (2006-2016)”46, through which users curate and recirculate videos that are deemed lost or forgotten. For instance, this video “YouTube Layout Evolution (2006-2016)” uploaded by user MDTech in 2015, shows a collage of the design of the platform.47 As such, the video itself does not offer new or lost information as the images were taken from the Wayback Machine. But what makes the video interesting is that it sparkled multiple responses of different generations of users commenting on the many changes YouTube had gone through. In addition, it inspired commenters to start a process of relating the age of the medium with their own age, realizing suddenly the historicity of the platform when comparing its age with their own (see Figure 1). These comments can be understood as rich sources for understanding complex discursive constructions of a platform’s past.

Other responses demonstrate how some users take a critical stance. By periodising the history of YouTube these users construct critical historical narratives of a platform as a real community. According to some of the responses, the more popular and massive the platform became, the more YouTube lost its soul by introducing commercial incentives in the system of cultural production (see Figure 2).

Similar activities can also be found outside YouTube, on websites like Reddit, where there are multiple communities, using names like DataHoarder, YouTube Archivist or DeepYouTube. These groups are active in sharing tips and tricks on how to crawl data, download large amounts of videos and store YouTube material, including metadata. Other groups are interested in curating marginal YouTube clips that attracted no or just a few views and thus are ignored by the YouTube filters. These practices of communities of lovers of YouTube content keep the material alive that would otherwise be lost or untraceable and as such they can be instrumental in circulation of rare and elusive material.

Figure 1 

Screenshot of comments (produced on 3 March 2019) of video “YouTube Layout Evolution (2006-2016),” YouTube, uploaded on 24 November 2015, https://www.youtube.com/watch?v=aVIy54CtWXM.

Figure 2 

Screenshot of comments (produced on 3 March 2019) of video “YouTube Layout Evolution (2006-2016),” YouTube, uploaded on 24 November 2015, https://www.youtube.com/watch?v=aVIy54CtWXM.

The same goes for the activities of the so-called ‘Archive Team’ that aims to rescue endangered internet culture. In their own words, they are ‘a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage.’48 Usually they work with volunteers who help to save relevant data. On their wiki page about YouTube, they keep a close watch on the features that YouTube ended, like video responses, annotation, while they also introduce and explain archiving tools that help to automate downloading large collections of relevant material.49

These user-generated or amateur-based archives, embedded in communal and collaborative efforts, have become part of the internet culture at large. Some scholars argued that in a networked culture, this type of circulation could be interpreted as a new kind of storage model.50 Every document available online will at least be shared with someone, enhancing its chances of survival. It is a model that clearly differs from the institutional archives that collect unique documents rather than residual copies. According to Annet Dekker the new digital archives are basically living archives in which content is constantly recontextualised.51 As a living and flexible system it is open, collaborative and creative ‘allowing many new voices that act and assert agency.’52 It also means that user archivists might ignore the idea that experts should be responsible for what should or should not be selected. These bottom up or grassroots initiatives represent therefore an alternative model of online archiving; one that entails collecting data in an idiosyncratic manner. Consequently, these popular archives create their own canons. As a result, a multitude of personal cultural canons – ‘canons-of-one’ – exist, as Abigail de Kosnik observed in her research on popular archives.53 She characterised these types of user-generated collections as ‘rogue archives’.54 In sum, these amateur archivists demonstrate a sense of shared responsibility towards our cultural heritage, which is often personal and collective at the same time. According to Luke Treddnick, this development means that we should accept that digital archives are no longer only based on formal mechanisms for legal deposits and preservation. On the contrary, this situation of ‘distributed creation, dissemination and storage of information and knowledge’ is more stable than when depending on official archives.55 One effect is quite fundamental, as Treddnick analyses: by sharing information, we become part of a kind of a ‘living archive’.56

Finding Traces of YouTube

The challenges of dealing with both web archives like the Internet Archive and the more informal archives as represented by YouTubers as curators became quite clear in a test case. In this casus, the aim was to build a historical corpus of videos (focussing on vlogs), comments, design and navigation, community guidelines from YouTube during the first twelve months of its existence (April 2005 - April 2006). In what follows, I will present some findings related to the Internet Archive’s Wayback Machine and informal archival practices on YouTube itself.

Wayback Machine

I started collecting information (videos, comments, design and navigation, community guidelines) detracted from snapshots crawled by the Wayback Machine from June 2005 onwards. Per month, the snapshots were compiled and used as a source of reference to check what was there. Basically, the Wayback Machine was used for two reasons: to check what is still there, but also to find the traces of what is no longer available. In addition, I used the service also to analyse the usability of the snapshot as a historical source.

For instance, if we want to know how the page of Jawed Karim looked like when he uploaded this first video on YouTube, there are thousands of snapshots to choose from. Between December 2005 and July 2019, the particular page was captured almost six thousand times. A recent snapshot shows the video plus the caption saying: ‘The first video on YouTube. Maybe it’s time to get back to the zoo? The name of the music playing in the background is Darude – Sandstorm’ (see Figure 3). The fact that the caption invites the visitor to go back after such a long time, indicates that the caption might be added long after the original text was placed.

Going back in time, we can trace other appearances of the same source on June 30, 2014, when for instance the caption is quite different, saying: ‘The first video on YouTube, uploaded at 8:27 P.M. on Saturday April 23rd, 2005. The video was shot by Yakov Lapitsky at the San Diego Zoo’ (see Figure 4). At the time, the maker of the clip was mentioned, the precise location and the exact upload time.

Figure 3 

Screengrab taken from Internet Archive Wayback Machine, 21 June 2019.

The first snapshot taken from this upload is from December 8, 2005 and that time it simply mentioned: ‘The first video on YouTube.’ It’s the earliest trace there is of this particular case and probably the one that comes closest to what might have been the first post (Figure 5).

The history of this single snapshot makes us aware of the fact that, even if the video is still there, the context changes continuously, making the post a living document. It reflects the dynamics of YouTube as a platform, as the sequence of the snapshots of the same video document the many changes. In addition, not only the captions changed, the sequence of snapshots also shows alterations in the interface, the options to respond (rating system), the number of views growing to over seventy million, or the amount of comments that went up to three million. Over the years, the clip has acquired different meanings: from a modest first try to test YouTube as a repository for quickly sharing streaming video content to a special video that is recognized by the commenters as historical. As such, the snapshots of the clip – in their combination of being an unique archival record in a long sequel of comparable copies, combined with its present appearance on the live web – has become an example of what William Uricchio identified as a ‘networked collaborative form of cultural production that is multiply voiced, collective, and ongoing.’57 And indeed, the historian needs to be critical while also appreciating the challenge of working with sources that are dynamic, fluid and evolving as it represents the web itself.58

Figure 4 

Screengrab taken from Internet Archive Wayback Machine, June 21, 2014.

Figure 5 

Screengrab taken from Internet Archive Wayback Machine, December 8, 2005.

The second step in testing the Wayback Machine was about the frequency of the crawls and the usefulness of the hyperlinks present in the snapshots. In June 2005, the machine crawled the platform only thirteen times. This is a very moderate number; by the end of the year the tool captured YouTube daily, sometimes several times a day. In 2019, the system is far more advanced, crawling the platform around 24 times a day, including focused crawls, which are collections of frequently updated data, often geared towards a subdomain of the platform. In addition, some distinct crawls are running for a day, a month or even longer. However, in June 2005 that was not yet the case and the amount of data therefore was limited. In addition, the snapshots only showed limited information, showing ‘featured video’, usernames, upload dates, number of views and comments. That last feature became only available on YouTube after June 14. If the videos mentioned are still available on YouTube, the comments might also still be there as well, but if a video has been removed, only fractions of the original comments can be reconstructed.

The analysis of the hyperlinks captured in the snapshots produced mixed results. In some cases the videos were no longer available, while the comment and response section were partly visible, including some contextual information. Analysis of links can be instrumental in following a user’s presence online through time. Some early users kept being active for a longer period of time, while other accounts disappeared. Exact reasons for terminations of these channels were not always clear, as it could be either the choice of the user self or a decision of YouTube. Warnings about termination have a temporal appearance in the YouTube system.59 After a while the sign will disappear and then there will be no longer trace of that video. So, a snapshot displaying a warning sign that a video has been disabled or removed transforms from our perspective in a historical document, namely a trace or proof that a certain video with a particular ID once existed (see Figure 6).

Another interesting result from this attempt to test the Wayback Machine as a tool for historical research was the fact that some of the early users had changed their username after a while and thus became difficult to trace. This might have been the effect of a change by Google introducing the social networking system Google+ in December 2012. This new service aimed at ending anonymity for those users who commented on the site. In order to be able to comment, users were required to login with a Google+ account based on real names. Such changes in policies can cause difficulties in tracing back content creators as these users perform using different names. In contrast, another interesting find was that two videos that were traced by the Wayback Machine but for which the links were dead, turned out to be still available as re-uploads at another URL. These re-uploads were actions undertaken by other users without any clear explanation as to why or how.

Figure 6 

Screengrab taken from Internet Archive Wayback Machine, August 11, 2011, https://web.archive.org/web/20110811172754/http://www.youtube.com/watch?v=kY2jc3PIoZ0.

All in all, the Wayback Machine gives access to valuable information, although that information is patchy and fragmented. This can be the result of the structure of the snapshot itself, the period in which the snapshot was produced, or the fact that it links to the live web, and thus the historical search becomes entangled with the dynamics of the website.

Alternative traces on YouTube

Since the live web is so important in locating sources, an alternative strategy that can be instrumental in finding traces is to locate YouTube communities that have taken on the role of curators, actively collecting and displaying historical records. As discussed earlier, these alternative or grassroots modes of archiving are governed by their own unwritten rules and regulations of what should be preserved and remembered. Using these popular DIY archival practices are more than fan-based collections. They are reflections of a participatory culture that is not only engaged in the common practice of uploading, commenting and sharing content, but also in collecting, selecting, and sharing historical content. These type of curatorial practices show a clear passion for detail. To illustrate this we can look at how users discuss the quest for finding the second video ever uploaded on YouTube. The Wayback Machine fails to give any clue, since it started crawling the platform only two months after the start of YouTube. By that time, chances were that videos were not crawled or that the user had removed the video. On the platform itself users discuss the issue regularly: ‘The second oldest video on YouTube, right?’ someone asks about a five-seconds long clip called “tribute”, uploaded on April 24 2005, just one day after “Me at the Zoo”, showing a young man jumping in a hallway while screaming (see Figure 7).60

Figure 7 

Screengrab “tribute,” uploaded by GP on YouTube, April 24, 2005, https://www.youtube.com/watch?v=aBfUFr9SBY0.

The question mark indicates an insecurity about the stated assertion. And indeed, other users started questioning this claim. ‘The 8th video uploaded to YouTube,’ someone else stated in the comments, ‘Yes,’ another user supported this claim, although this opinion was not taken for granted by a next commentator: ‘No, it’s second,’ followed by a user introducing a different view: ‘Actually it’s the third,’ which then led to a nuance by another user saying: ‘no its 8th, but third available today.’ The desire to list early material as second, third or even as eight may sound futile, as also one visitor recognizes sarcastically claiming “tribute” to be the ‘first video that doesn’t contain elephants’, referencing the “Me at the zoo” video. However, the discussion in the comment sections identify possible gaps in the results of the Wayback Machine. In addition, the comments reveal a great interest by the YouTube community to discuss its early history.

An interesting example in curating early YouTube materials is an attempt to collect the names and user profiles of the hundred oldest channels on YouTube, uploaded in 2011 (see Figure 8).61 As the uploader, named ‘Channelarchive’, explains, the channels were collected by a person named Nick claiming to have been able to produce a very accurate list thanks to the help of many others:

Many accounts and videos were found and discovered within a hacking community of thousands back in 2010, it was an era where any accounts inactive could be obtained through many methods. There was a guy far more obsessed within the community day after day for a couple of years who put together the video you see showing the first 100 channels. I had my copy of it and when he deleted it from his channel I reuploaded it to make sure history wouldn’t be lost.

The interesting thing about this list is that by showing profiles of the early YouTube users it offers contextual information about the original users which is otherwise not available. It allows researchers to develop some knowledge about those people who were so quick to join the new platform. For example, the user profile of ‘hurlex’ shares the discourse typical of many of these first users, representing that moment in time when the fun of joining this platform and make use of its affordances of distributing the ‘magic’ of the ‘absolutely randomness’ of the videos posted (see Figure 9).

Figure 8 

Screengrab from Channelarchive, “The 100 oldest channels on YouTube,” YouTube, https://www.youtube.com/watch?v=1fdoDtkrmlY.

Figure 9 

Screengrab from Channelarchive, “The 100 oldest channels on YouTube,” YouTube, https://www.youtube.com/watch?v=1fdoDtkrmlY.

The same passion for retracing the oldest or first, can be seen in the quest for the first comment. Was it the comment posted, so many years ago, by that first video of Jawed Karim “Me at the Zoo”? For a while the ‘honour’ of being first went to YouTuber COBALTGRUV who responded at the “Me at the Zoo” upload with the word ‘Interesting’. He even wrote a booklet about this ‘historical’ fact.62 It is possible to go straight to the video and scrape the comments in order to analyse the responses, although the amount of comments posted at this video exceeds 2,359,876 comments (as of May 16, 2019). Most of these comments are spam and a second difficulty is that YouTube allows users to reply to old comments, creating a-synchronicity which complicates the option of constructing chronology and temporal unity, while at the same time it represents the aliveness of YouTube as a living archive. Following the discussions in the different corners of YouTube communities like ‘Wikitubia’, discussion on Reddit or videos posted by user YouTube Cache and Archives ‘Interesting’ is not the first comment ever.63 According to current knowledge, the first user to comment on YouTube was Marco Cassé writing, ‘LOL!!!!!!!’ on June 14, 2005.64 Cobaltgruv’s comment is indeed the first comment made on the first YouTube video, but only after June 14, 2005.


We started this article acknowledging the challenge of archiving web materials, which is a prerequisite for investigation the history of YouTube. This is the reason why digital historian Jane Winters pleads for a close collaboration of web historians and web archivists.65 But the appreciation of archiving the web or in our case, YouTube related sources starts only after an appreciation of the fact that its history is indeed worthwhile reconstructing. However, historians usually come somewhat later in the game; while in the meantime fans have already started creating curated repositories, canons of what should be kept. Even though, as we saw, their archival reflexes are most of the time desires to list the oldest, the first, the weirdest or the most popular, as discursive practices they do represent the culture of YouTube as a participatory ecosystem. In addition, these informal archival products can also be helpful, finding, and distributing material that would otherwise be lost or unknown.

For this reason, I would extend Winter’s plea and ask for some acknowledgement of these popular informal archival practices. In a vernacular web culture that tends to be a vulnerable environment in terms of sustainability, the help from more informal self-proclaimed archivists is crucial. According to Rick Prelinger, users who curate or archive should be seen as ‘new archivists who are synthesists, remixers, masher-ups.’66 Their efforts could ‘fill the gaps unaddressed by libraries, archives and universities.’67 But they also produce a kind of ‘pseudo-archives’ of residual material that seems to be unable to distinguish archival and contemporary material.68 As such, the web’s potential to act as a ‘living archive’ can be both a promise and a challenge. Developing methods that critically acknowledges user-generated archival practices and uses the insights of early generation of users’ reconstructions of the platform’s history, even if it might seem anecdotal evidence, can lead to deeper knowledge of that early community. By actively participating in reconstructing YouTube’s recent past these users shape nostalgic narratives while they also collect and share useful information that guide historians finding new traces.