• Telorand@reddthat.com
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    1 day ago

    It’s that interoperability of unique instances that makes the Fediverse resistant to scraping. The posts are all public, but crawling it all and categorizing everything is probably like untangling a cotton ball.

    • unalivejoy@lemm.ee
      link
      fedilink
      English
      arrow-up
      9
      ·
      1 day ago

      Or you can host your own instance and let the servers send you all their data (instances can still defederate)

    • General_Effort@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      23 hours ago

      Don’t really see the problem. If you pick up the content while web crawling, you will end up with a lot of duplicates, but that’s normal. If you wanted to scrape the Fediverse in particular, you’d know the structure of the data.