A synthetic data generator for online social network graphs

11/30/2023

Hence, in the present work we address these issues by designing and implementing a sophisticated synthetic data generator together with an anonymization processor with strict privacy guarantees and which takes into account the local neighborhood when anonymizing. Also, there is a lack of systems which facilitate the work of a data analyst in anonymizing this type of data structures and performing empirical experiments in a controlled manner on different datasets. Thus, improving this aspect will have a high impact on the data utility of anonymized social networks. Current anonymization techniques are good as identifying risks and minimizing them, but not so good at maintaining local contextual data which relate users in a social network. However, when data is anonymized to make it safe for publication in the public domain, information is inevitably lost with respect to the original version, a significant aspect of social networks being the local neighborhood of a user and its associated data. On the other hand, there are many risks for user privacy, as information a user may wish to remain private becomes evident upon analysis. Also, data analysts have found a fertile field for analyzing user behavior at individual and collective levels, for academic and commercial reasons. In recent years, online social networks have become a part of everyday life for millions of individuals. The data generator is also highly configurable, with a sophisticated control parameter set for different “similarity/diversity” levels. A good match is obtained between the generated data and the target profiles and distributions, which is competitive with other state of the art methods. The empirical tests confirm that our approach generates a dataset which is both diverse and with a good fit to the target requirements, with a realistic modeling of noise and fitting to communities. In the following work, we present and validate an approach for populating a graph topology with synthetic data which approximates an online social network. However, this presents a series of challenges related to generating a realistic dataset in terms of topologies, attribute values, communities, data distributions, correlations and so on. One possible solution to both of these problems is to use synthetically generated data. Two of the difficulties for data analysts of online social networks are (1) the public availability of data and (2) respecting the privacy of the users. Nettleton, DF (2015) Generating synthetic online social network graph data and topologies, 3rd Workshop on Graph-based Technologies and Applications (Graph-TA), UPC, Barcelona, Spain, March 18th 2015. Please give me your feedback on your analysis/use of this data and suggestions for improvement. Please reference the paper when using this data and publishing results in your work. A new more sophisticated version of this method will be made available soon (datasets and code). We have followed a two step process: (1) generate a topology using R-Mat apply Louvain to identify some communities then apply Louvain recursively to selected communities to obtain some smaller ones, giving a total of 10 communities (2) Populate the graph structure with data by choosing seeds in each community and propagating from them. 1K user records (nodes) and 50K link records (edges), respectively. Two datasets are included which represent a graph which contain approx.

0 Comments

I'm James. This is my year of travel.

A synthetic data generator for online social network graphs

Leave a Reply.

Author

Archives

Categories