Open Source Knowledge Networks: An Example Of StackOverflow Networks For Python Questions

Abstract:

One of the reasons for individuals to form organizations is to achieve common goals
(Coleman, 1994). Throughout the 20th century, organizations have mainly been seen as
hierarchical structures (Simon, 1947; March & Simon, 1958; Cyert & March, 1963). Under
the influence of information technology and increasingly volatile environments,
organizational structures are becoming flatter, offering more opportunities for collaborations within companies (e.g. Zammuto et al., 2007). In research, such non-hierarchical structures are connected to innovation and improved performance (Wang et al., 2014). This development attracts attention to the processes used by non-hierarchical organizations. Understanding of these communities will shed light on how the non-hierarchical practices can be best adapted to virtual teams of hierarchical organizations. I study teams emerging in the StackOverflow network and investigate the knowledge structures within these teams.
Organizational opportunities for teams range from hierarchical, partially hierarchical
(e.g. shared leadership) to fully non-hierarchical (Mehra et al., 2006). In this paper, I
investigate the teams placed on the latter side of the spectrum - the non-hierarchical selforganized teams. Such teams are of the biggest relevance to the open source project
management research (Crowston & Howison, 2006) as it is common for problem-solving
projects that members are not bounded and are free to leave at any moment. In this paper, I am attempting to answer how teams leverage the community knowledge network for
answering python-related questions. I am looking at the homophily-heterophily of knowledge and the knowledge triads (e.g., forbidden triads) and their effects on the success of provided solutions.
I use StackOverflow data on python questions available on Kaggle (StackOverflow,
2019) and create the knowledge network from the tags associated with each question.
StackOverflow data is a well-known data source for open source knowledge research (see Ye et al., 2016 for an overview). The dataset includes a total of 149177 users (nodes) involved in solving 607282 questions between 2008 and 2016. An edge is created between two users if they are involved in answering questions with the same tag, i.e. they are connected by common knowledge. To evaluate individual positions in the knowledge network, I use the network state before a question was asked. From this, I derive individual positions and the knowledge network of the team answering the question.
Preliminary analysis (as shown in figure 1) shows an overall network that is highly centralized and four types of knowledge networks in teams answering questions. I am
currently working on the analysis of knowledge homophily, heterophily, and team network
structures. I measure the effect of these structures on the success of the question (score), and the distribution of individual answer scores. Furthermore, I plan to extend the analysis to the temporal development of the network.