Estás viendo una versión antigua de este conjunto de datos. Para ver la versión actual, click aquí.

Whistlerlib: a distributed computing library for exploratory data analysis on large social network datasets

At least 350k posts are published on X, 510k comments are posted on Facebook, and 66k pictures and videos are shared on Instagram each minute. These large datasets require substantial processing power, even if only a percentage is collected for analysis and research. To face this challenge, data scientists can now use computer clusters deployed on various IaaS and PaaS services in the cloud. However, scientists still have to master the design of distributed algorithms and be familiar with using distributed computing programming frameworks. It is thus essential to generate tools that provide analysis methods to leverage the advantages of computer clusters for processing large amounts of social network text. This paper presents Whistlerlib, a new Python library for conducting exploratory analysis on large text datasets on social networks. Whistlerlib implements distributed versions of various social media, sentiment, and social network analysis methods that can run atop computer clusters. We experimentally demonstrate the scalability of the various Whistlerlib distributed methods when deployed on a public cloud platform. We also present a practical example of the analysis of posts on the social network X about the Mexico City subway to showcase the features of Whistlerlib in scenarios where social network analysis tools are needed to address issues with a social dimension.

Datos y Recursos

Este conjunto de datos no tiene datos

Información Adicional

Campo	Valor
Fuente	https://doi.org/10.1007/s11042-024-19827-z
Autor	A Garcia-Robledo, A Espejel-Trujillo
Última actualización	octubre 10, 2025, 07:19 (UTC)
Creado	octubre 10, 2025, 07:19 (UTC)
Publicación	Revista
Tipo	Publicación