Consumer Social Media Free at the Point of Use

   Critically Endangered small

Social media platforms free at the point of use with a business model based on reselling user data for consumer behavior and/or advertising analysis, mainly for profit-driven corporations. This entry broadly includes digital content created, shared and hosted on social media platforms as well as current interfaces of social media platforms.

Digital Species: Social Media

Trend in 2023:

increased riskTowards even greater risk

Consensus Decision

Added to List: 2017

Trend in 2024:

increased riskTowards even greater risk

Previously: Critically Endangered

Imminence of Action

Immediate action necessary. Where detected, should be stabilized and reported as a matter of urgency.

Significance of Loss

The loss of tools or services within this group would have a global impact.

Effort to Preserve | Inevitability

Loss seems likely. By the time tools or techniques have been developed, the material will likely have been lost.

Examples

Instagram, Facebook, Twitter/X, Pinterest, Yahoo Groups, Truth Social, Reddit, Mumsnet, Sina Weibo, Flickr, Bebo, and legacy BBS.

‘Practically Extinct’ in the Presence of Aggravating Conditions

Lack of preservation capacity in provider; Lack of preservation commitment or incentive from provider; proprietary products or formats, including user interface; poor data protection; inaccessibility to web archiving; political or commercial interference; Lack of offline equivalent; super-abundance; Uncertainty over IPR or the presence of orphaned works; Lossy compression in upload scripts.

‘Endangered’ in the Presence of Good Practice

Offline backup and documentation of media assets; Migration plan; Early warning from vendors; Roadmap from vendors; Accessible to web harvest; Suitable export functionality; Licencing enables preservation; Preservation commitment from vendor; Preservation capability in vendor; Resilient to hacking; Selection criteria.

2023 Review

This entry was added by the 2019 Jury as a subset of a broader social media entry first introduced in 2017. It was created as a standalone entry to draw attention to the different threats faced by online services that are paid for versus ‘free at the point of use’ (both depend on the business model of the vendor and the terms they impose). The 2021 Jury raised the risk classification from Endangered to Critically Endangered based on concerns arising with trends towards harmful and malicious hate speech as well as misinformation and deliberate deletion. The 2022 Taskforce agreed on a trend towards even greater risk based on the continued, significant trend towards hate speech, misinformation and disinformation, and deliberate deletion in light of ongoing global conflicts that include (but are not limited to) social and economic inequalities and climate change. In particular, they mentioned the sale of Twitter prompting a moment of instability in consumer social media, with the scale of Twitter, evident acrimony between parties prior to the sale and the hostile news coverage afterward, elevating the risks associated with social media. They also brought to attention issues surrounding platforms enabling extreme views not permitted on mainstream platforms, which emerged and proliferated noticeably and, from a preservation standpoint, could be argued are potentially at very high risk and historically significant.

Based on the assessment of the rescoped entry, the 2023 Council agreed on the Critically Endangered classification and noted an increase in the imminence of action required as well as the effort to preserve. The need for major efforts to prevent or reduce losses continues, but it is now much more likely that loss of material has already occurred and will continue to do so by the time tools or techniques have developed. There is a greater urgency to prioritize the assessment of these materials and develop tools or techniques to prevent or reduce further losses in this group.

The 2023 Council recommended further rescoping and adjusting of this and other social media entries in light of how web-based and cloud-based business products and services have developed in recent years. This included:

  • Clarifying the scope. This entry broadly refers to the preservation of content and interfaces of social media platforms, with these platforms designed to facilitate the creation and sharing of media through interactive social networks. These services, particularly those provided by largely unregulated (or underregulated) platforms, pose critical risks for not only capturing and preserving the content hosted on the social media platform but also the interfaces of the platforms themselves.

  • Similarly, the entry specifically refers to risks for digital materials created, shared, and hosted via social media services offered ‘free at the point of use,’ in which the business model and sustainability can only be guessed, and contracts tend to be asymmetrical in favor of the supplier. Moreover, because these services have a low barrier to entry, they may be favored by agencies or individuals least able to respond to closure or loss.

  • As part of this rescoping, relevant information concerning cloud-based aspects were incorporated into the ‘Cloud-based Services and Communications Platforms’ entry to more clearly differentiate the risks associated with cloud hosting and computing technologies, allowing this entry on consumer social media free at the point of use to focus on challenges, notably those relating to harvesting and managing content and interfaces of web-based social networking platforms.

The 2023 Bit List Council additionally recommended that the next major review for the Bit List include:

  • A restructure and splitting of the entry to create separate entries for ‘digital content hosted on social media platforms’ and for ‘interfaces of social media platforms’, where each can be teased out to provide greater clarity about specific risks, aggravating factors and recommended actions. This should include expanding on API access to data, providing examples of legacy content already lost, and pointing to examples where risk is especially high (e.g., things that are still up but alarmingly fragile!)

  • A consideration of merging the ’Data Posted to Defunct or Little-used social media platforms’ entry with this entry, to incorporate examples of loss in the presence of aggravating conditions.

  • A consideration of merging the ‘Born Digital Photos and Video Shared on Social Media’ entry with this entry, to provide examples of particular types of digital content hosted on social media platforms that are lost or at risk. This is mostly due to the fact that so many of the ‘regular’ social media platforms have tended toward more ways to mimic or copy TikTok-style videos, and making the distinction will become harder in the future since they all have similar functionality and ways to create photo/video content.

  • A consideration of merging the ‘Legacy Interfaces and Services’ entry with this entry to provide examples of particular interfaces of social media platforms that are lost or at risk

2024 Interim Review

The 2024 Council identified a trend towards even greater risk due to a number of factors, summarized below.

Creators and archivists relying on consumer social media free at the point of use inhabit a precarious position. Free services may be favored by agencies or individual creators who are least able to respond to closure or loss because of the low barrier to entry associated with ‘free at the point of use’ services. Proprietary interfaces and services pose risks, as companies prevent third-party attempts to preserve either hosted content and/or the end-user experience of the environment. An inability to preserve social media interfaces diminishes future potential for emulation and may inhibit researchers' ability to glean important context, as described in the Bit List 2023 review.

Additional barriers to preservation via web capture are also present in terms of service for user accounts that explicitly prohibit crawling. For example, the X Terms of Service state “You may not access the Services in any way other than through the currently available, published interfaces that we provide. For example, this means that you cannot scrape the Services, try to work around any technical limitations we impose, or otherwise attempt to disrupt the operation of the Services” and “crawling or scraping the Services in any form, for any purpose without our prior written consent is expressly prohibited” (X, 2023). Another example, from the Facebook Terms of Service, states “You may not access or collect data from our Products using automated means” (Facebook, 2022).

An additional recommendation for the next 2025 review is to assess if ‘proprietary formats’ (e.g. the platform interfaces) adequately demonstrates the scope of this entry and answers the first bullet point of the 2023 Council recommendation. The 2023 recommendations for re-scoping and combining entries will also be assessed in more detail in 2025.

2024 Council members also raised concerns regarding Artificial Intelligence and Machine Learning, noting that for this entry and, more broadly, anything related to Social Media, an emerging risk is AI training fears. This manifests in two ways:

  • Social media users deleting their content entirely in an attempt to prevent it from being used or sold for training a Large Language Model (Edwards, 2024). However, as noted by the Council members, it can be difficult to have a good sense of how widespread this is or how much it’s affecting content that people want to preserve – but it is nonetheless worth noting.

  • Website operators blocking scrapers for similar reasons, (Voorhees, 2024). This primarily targets AI crawlers but may also affect web crawlers used for preservation. It can similarly be difficult to have a good sense of how widespread this is, but it seems likely that anti-crawler provisions are only going to get worse, not better.

Additional Comments

Social media free-at-the-point-of-use remains at a critical risk due in large part to the policies of unregulated (or underregulated) corporate platforms such as Facebook, X (previously Twitter), and their parent companies. The content shared on these platforms and the history of the development of platform infrastructure and policy itself provide a critical source of information for policy-makers and researchers. The complete lack of preservation provision and deliberate obstruction of archiving attempts for public interest puts this valuable content at high risk of loss and draws attention to the critical risk posed by these examples of platforms.

Content hosted on social media platforms (that users might not have stored elsewhere) is at risk and users may lose the opportunity to keep their own data for personal archiving or to donate to an organization. Collecting organizations may lose the opportunity to archive hosted content within their collecting remit using web or API harvesting tools. In both instances, data remains at high risk because it is hosted by companies that could change policies or access on a whim. Also, the inability to archive even free content unless you have a login as an archivist (like with Browsertrix). Additionally, there are social media companies requiring payment to access data for preservation.

There are interfaces of social media platforms that researchers may want to see to study the evolution of the platforms over time (through web harvesting typically) that are at risk. Preservation is affected by researcher API access being shut down, halting preservation of entire platforms. There are also differences between the themes/collecting policies of institutions and researchers who are scraping their own data and depositing it in repositories.

Preserving this stuff en masse is still incredibly difficult, but many of these platforms allow the downloading of their own personal content/archives. However, these lose all the context of social media and therefore, whilst they do preserve the data, they do not preserve the essence of the material. Platforms like X (previously Twitter) have both opened and closed their API further in recent years, but others like Yahoo have closed, and Facebook, as well as X (formerly Twitter), continues to be almost hostile towards archiving and preservation attempts.

With digital materials from premium or institutional social media services, the business model and sustainability are more obvious, and contracts may be enforceable more readily. Moreover, because these services have a slightly higher barrier to entry, they may be favored by agencies that are better able to respond to closure or loss. Traditional web archiving can be employed where the user pays for a service, but the content is ultimately publicly available (such as Flickr). But much is unclear about how to preserve internal social media / closed networks that web archiving cannot get to, or existing tools do not cover.

Social media capture via web harvesting has become increasingly difficult. Social media platforms have done nothing to address the barriers to automated capture that prevent the preservation of even so-called public content. For example, campaign websites or other election-related content that is only published on Facebook or on X (previously Twitter) because these services are ‘free.’ This content is of particular concern as it appears on no other website. Web archivists are constantly shifting strategies and approaches and trying out new (but limited) tools to best capture this content. If we cannot successfully preserve these platforms, we are missing out on documenting organizations, campaigns and elections around the globe. Much of this data exists as data sets based on aggregated use rather than individual files.

Often these are external proprietary platforms bound by intellectual property law and potentially privacy law which will impede the imminence of action. What recourse do archives or digital repositories have to deal with this and capture the materials?

Case Studies or Examples:

  • Mentioned examples of additional barriers to preservation via web capture present in terms of service for user accounts that explicitly prohibit crawling included X (2023) ‘Terms of Service’, Effective: September 29, 2023. Available at https://web.archive.org/web/20240611040225/https://x.com/en/tos [accessed 06 September 2024]; and Facebook (2022) ‘Terms of Service’, Date of Last Revision: July 26, 2022. Available at: https://web.archive.org/web/20240610150804/https://www.facebook.com/terms/ [accessed 06 September 2024]

  • Mentioned examples relating to AI and ML concerns included Edwards, B. (2024) ‘Stack Overflow users sabotage their posts after OpenAI deal’, Ars Technica. Available at: https://arstechnica.com/information-technology/2024/05/stack-overflow-users-sabotage-their-posts-after-openai-deal/ [accessed 06 September 2024]; and Voorhees, J. (2024) ‘How We’re Trying to Protect MacStories from AI Bots and Web Crawlers – And How You Can, Too’, Available at: https://www.macstories.net/stories/ways-you-can-protect-your-website-from-ai-web-crawlers/ [accessed 06 September 2024]

  • A range of use cases are presented in Thomson, S. (2016) ‘Preserving Social Media’, DPC Technology Watch Report (16-02). Available at: http://doi.org/10.7207/twr16-02.

  • The National Library of Scotland ‘The Archive of Tomorrow: Health Information and Misinformation in the UK Web Archive’ project, to record the proliferation of misinformation about coronavirus. See Archive of Tomorrow (2022-2023), National Library of Scotland. Available at: https://www.nls.uk/about-us/working-with-others/archive-of-tomorrow/ [accessed 24 October 2023].

  • The archiving of the ‘In Her Shoes’ collection, part of the Archiving Reproductive Health (ARH) project. Working with key stakeholders, including activist organisations like Abortion Rights Campaign, Together for Yes, Terminations for Medical Reasons, Coalition to Repeal the Eighth, and many others, ARH gathered and preserved a selection of digital objects and research data, including social media, that tells part of the story of this historic campaign. ARH published collections of design and publicity material from activist groups, as well as a sequence of stories from the popular Facebook page ‘In Her Shoes’, a page where people anonymously shared stories of their experiences of being unable to access abortion in Ireland. This initiative received a 2022 Digital Preservation Award for Safeguarding the Digital Legacy. See Archiving Reproductive Health Project (2022), ‘Archiving Reproductive Health’, Digital Preservation Awards 2022. Available at: https://www.dpconline.org/events/digital-preservation-awards/dpa2022-archiving-reproductive-health [accessed 24 October 2023].

  • An example of a tool available to help libraries and archives with capture is Archive Social. See CIVICPLUS (n.d.) ‘ArchiveSocial’. Available at: https://archivesocial.com/ [accessed 24 October 2023].

See also:

  • In the 2023 NDSA Web Archiving Survey Report, one of the major takeaways related to respondents’ concerns about ability to collect social media—in particular, Twitter, Instagram, Facebook, and Reddit. Content housed within social networks has always been difficult to capture for a myriad of reasons and recent changes to numerous social platforms have made this task harder. See: National Digital Stewardship Alliance (NDSA) (2023) Web Archiving Survey Results: An NDSA Report. October 2023. Available at: https://doi.org/10.17605/OSF.IO/N5MYR [accessed 11 September 2024]

  • Willison, S. (2024) ‘Slop is the new name for unwanted AI-generated content’, 8 May 2024, Simon Willison’s Weblog. Available at: https://simonwillison.net/2024/May/8/slop/ [accessed 12 September 2024]

  • Cannelli, B. (2022) ‘Mapping social media archiving initiatives: state of the art, trends, and future perspectives’, IIPC Blog. Available at: https://netpreserveblog.wordpress.com/2022/11/30/mapping-social-media-archiving-initiatives-state-of-the-art-trends-and-future-perspectives/ [accessed 24 October 2023].

  • A 2022 report on a nationwide questionnaire survey conducted to obtain the responses of people to hypothetical scenarios of social media archiving by the National Diet Library in Japan, noting legal and ethical concerns as well as respondent views on the preserving of private data publicly available on social media. See Shiozaki, R. (2022) ‘People’s perceptions on social media archiving by the National Library of Japan’. Journal of Information Science. Available at: https://doi.org/10.1177/01655515221108692


Scroll to top