Back to projects
Distributed Social Media Archiver
PythonRabbitMQAzureDistributed SystemsDocker
High-throughput system that archived 500,000+ videos at 8+ videos/second using RabbitMQ and Azure.
Independently mobilized an urgent digital preservation effort against a platform ban deadline. Architected a distributed collection system that safeguarded culturally significant user-generated content.
System Architecture:- High Throughput: Achieved sustained 8+ videos/second via RabbitMQ message queuing and concurrent Python workers.
Smart Proxies: Orchestrated Azure VMs with dual-NIC Squid proxies that rotated IPs dynamically to bypass anti-scraping countermeasures.
Zero-Cost Ops: Automated infrastructure via cloud-init, leveraging free-tier resources to eliminate operational expenses entirely.
Analytics: Integrated Pandas scripts for trend analysis on collected metadata.