Back to projects

Distributed Social Media Archiver

PythonRabbitMQAzureDistributed SystemsDocker

High-throughput system that archived 500,000+ videos at 8+ videos/second using RabbitMQ and Azure.

Independently mobilized an urgent digital preservation effort against a platform ban deadline. Architected a distributed collection system that safeguarded culturally significant user-generated content.

System Architecture:
  • High Throughput: Achieved sustained 8+ videos/second via RabbitMQ message queuing and concurrent Python workers.
    Smart Proxies: Orchestrated Azure VMs with dual-NIC Squid proxies that rotated IPs dynamically to bypass anti-scraping countermeasures.
    Zero-Cost Ops: Automated infrastructure via cloud-init, leveraging free-tier resources to eliminate operational expenses entirely.
    Analytics: Integrated Pandas scripts for trend analysis on collected metadata.
© 2025 gabe