In the following article, Boyd, and Crawford question the overrated use of the term ‘big data’ with the aim of shedding light on the underlying limitations, consequences, implications, and considerations one ought to take when using big data during the research process. The notion of accountability and the need to ceaselessly question big data are notably the two overarching themes governing this text.
The article is sectioned into six provocations, all of which are given an equal weight of importance. To begin, the first provocation highlights how new technologies, and big data have changed and affected one’s understanding of knowledge during the research process, whilst informing the reader on the need to question and understand big data on the basis of the new meaning of knowledge and its limitations. The second provocation assesses the philosophical tensions and debate between qualitative and quantitative researchers who seek to abide by the traditionally established ways of working and using data in their respective methodologies.
On the one hand, there are those who believe in the objectivity of the research process and methodology. On the other hand, there are those who believe in the subjectivity of it. Nevertheless, boyd and Crawford believe that any minor decision taken to filter the data shapes the subjectivity of the method. Through the use of subjectivity and questioning of the data the process avoids being flawed or misleading. Furthermore, the third provocation lays emphasis on the overriding assumption that the bigger the data and the more one makes use of, the better the results will be. This assumption, as boyd and Crawford clarify, is wrong as both large and small data render insightful research results. Quantity does not necessarily signify quality and in fact, as boyd and Crawford believe, in many instances smaller sets of data render better results. The fourth provocation suggests how all data is not equal, thus ushering the need to consider the data in accordance with the differing contexts and the platform where it has been gathered from.
Data ought to be contextualized, and researchers ought to avoid making assumptions about the data or the patterns that have been created by it, as otherwise, it will lead to flawed or skewed results. In addition, boyd and Crawford’s fifth provocation relates to the level of publicness of a user’s data and the need to consider and question the underlying ethics of the data-gathering process in undertaking research. The provocation emphasizes on the need for researchers to acknowledge the consequences of gathering and using data without questioning the ethical framework. For boyd and Crawford, showing accountability is crucial not only for users but also for the field of research to which the researchers belong to. Lastly, the sixth provocation highlights the digital divide created by large institutions and corporations which own most of the power and control over the means to access large sets of data. Access to data, as the authors suggest, has somehow become a privilege where only but a few individuals, universities, and corporations are able to grow and be recognized as a result of the data, information, and knowledge they have access to. Hence, this divide is increasingly strengthening a system of data and information rich and poor.
To conclude, taking accountability, being aware of who handles the data, with what aim, how this data is being accessed and by whom are all important matters one ought to consider for the good of everyone’s future in a world of big data. This article has presented an outline of the ethical and epistemological provocations which one ought to take into account when undertaking research. This article will certainly act as an ethical reminder for the need to question and contextualize data gathered during the research process of the dissertation. Perhaps, one of the limitations has been the lean towards a rather pessimistic view of the notion of big data rather than providing a balanced reading of both aspects. In spite of this limitation, boyd and Crawford’s article has served more useful than Cukier and Schoenberger’s account which through their enthusiasm fail to provide depth on the dangers of the big data phenomenon. boyd and Crawford’s article notably counters the overall approach taken by Cukier and Schoenberger. However, both of these sources share a common view on the claims made with regard to the consequences and implications of big data.