Whistlerlib: a distributed computing library for exploratory data analysis on large social network datasets

At least 350k posts are published on X, 510k comments are posted on Facebook, and 66k pictures and videos are shared on Instagram each minute. These large datasets require substantial processing power, even if only a percentage is collected for analysis and research. To face this challenge, data scientists can now use computer clusters deployed on various IaaS and PaaS services in the cloud. However, scientists still have to master the design of distributed algorithms and be familiar with using distributed computing programming frameworks. It is thus essential to generate tools that provide analysis methods to leverage the advantages of computer clusters for processing large amounts of social network text. This paper presents Whistlerlib, a new Python library for conducting exploratory analysis on large text datasets on social networks. Whistlerlib implements distributed versions of various social media, sentiment, and social network analysis methods that can run atop computer clusters. We experimentally demonstrate the scalability of the various Whistlerlib distributed methods when deployed on a public cloud platform. We also present a practical example of the analysis of posts on the social network X about the Mexico City subway to showcase the features of Whistlerlib in scenarios where social network analysis tools are needed to address issues with a social dimension.

Datos y Recursos

Información Adicional

Campo Valor
Fuente https://scholar.google.com/citations?view_op=view_citation&hl=es&user=b81TvMMAAAAJ&pagesize=100&sortby=pubdate&citation_for_view=b81TvMMAAAAJ:hqOjcs7Dif8C
Autor A Garcia-Robledo, A Espejel-Trujillo
Última actualización octubre 21, 2025, 08:59 (UTC)
Creado octubre 21, 2025, 08:59 (UTC)
Año 2024
DOI https://doi.org/10.1007/s11042-024-19827-z
Google Scholar URL https://scholar.google.com/citations?view_op=view_citation&hl=es&user=b81TvMMAAAAJ&pagesize=100&sortby=pubdate&citation_for_view=b81TvMMAAAAJ:hqOjcs7Dif8C
Identificador hash 1ee805b50082
Lugar de publicación Multimedia Tools and Applications 83 (39), 87071-87104, 2024
Tipo Publicación
Tipo de publicación Revista
URL directo https://link.springer.com/article/10.1007/s11042-024-19827-z