While our dataset consists of original messages only, the Sina API also provides information on the number of times any given message was rebroadcast and commented upon (even deleted messages).False discovery rate control is widespread in bioinformatics, since it gracefully handles tens of thousands of simultaneous hypothesis tests (unlike the Bonferroni correction).
This case was prepared as the basis for class discussion rather.But four percent of our terms have deletion rates in this range, indicating that deletions are substantially non-random conditional on textual content.Charles Chao, CEO of Sina Weibo, reports that the company employs at least 100 censors, though that figure is thought to be a low estimate (Epstein, 2011).The government censors content for mainly political reasons, but also to maintain its control over the populace.Villeneuve (2008b) examines the search filtering practices of Google, Yahoo, Microsoft and Baidu in China, noting extreme variation between search engines in the content they censor, echoing earlier results by the Human Rights Watch (2006).However, China has strict censorship rules that filter much of the web content that is available on the Internet.
By comparing social media messages on Twitter with those on domestic Chinese social media sites and assessing statistically anomalous deletion rates, we are identifying keywords that are currently highly salient in real public discourse.We are conducting tens of thousands of simultaneous hypothesis tests, so must apply a multiple hypothesis testing correction.Case Studies in Ethics 4 dukeethics.org China, Censorship, and the Golden Shield Project History China has been playing a game of catch-up in recent years, attempting.By examining the deletion rates of specific messages by real people, we can see censorship in action.According to Reporters Without Borders, the firewall makes large-scale use of Deep.Jonathan Sullivan. See all. Blogs, censorship and civic discourse in China.
With Twitter and Facebook blocked in China, the stream of information from Chinese domestic social media provides a case study of social media behavior under the influence of active censorship.Introduction The freedom of press and speech are the absent factors that have permitted China to traverse all other countries and assume the.
We qualitatively analyzed the most highly deleted terms that passed the p w.Note that we are not looking at censorship as an abstraction ( e.g., detecting keywords that are blocked by the GFW, regardless of the whether or not anyone uses them).Smith is the Finmeccanica Associate Professor in the Language Technologies Institute and Machine Learning Department, School of Computer Science, Carnegie Mellon University.
Directed by Hollywood, Edited by China: U.S.-China Economic and Security Review Commission. a Chinese government agency responsible for film censorship.Frequently Censored Topics Freedom on the Net FREEDOM ON THE NET 2015.Taken together, these information sources lead to three conclusions.Prior work has shown that rebroadcasting accounts for a large part of user activity on Sina Weibo, especially in its function for developing trends (Yu, et al., 2011). We might suspect that if a politically sensitive message is being heavily rebroadcast, it may be more likely to be deleted.Using Wikipedia substantially increases the number of named entities represented.We restrict attention to words appearing in at least 50 messages in our 1.3 million message sample.In the absence of external corroborating evidence (such as reports of the Chinese government actively suppressing salt rumors, as above), these results can only be suggestive, since we can never be certain that a deletion is due to the act of a censor rather than other reasons.One area where we can see a sharp distinction between the two datasets, however, is where they originate geographically.
While this evaluation can only confirm terms that are governed by hard censorship (not the soft censorship we are interested in), it does provide confirmation that such terms are indeed sensitive.
Connecting decision makers to a dynamic network of information, people and ideas, Bloomberg quickly and accurately delivers business and financial.Over the three month period, this led to a total collection of 56,951,585 messages (approximately 600,000 messages per day).Challenged in China The shifting dynamics of censorship and control As Xi Jinping takes office as president of China, the citizenry he governs is more sophisticated and.Case Study Google in China 2. Internet censorship in China is conducted under number of laws and.Censorship and Evolving Media Policy in China by Rebecca Wetherbee — 113 been imprisoned for their activity online, 50 were sentenced in China (Reporters Without.Table 2 lists the results of this analysis. 17 terms known to be politically sensitive are deleted at rates significantly higher than the baseline, chosen such that the upper bound on the false discovery rate is 2.5 percent (we expect less than one in 40 to have resulted from chance).
By revealing the variation that occurs in censorship both in response to current events and in different geographical areas, this work has the potential to actively monitor the state of social media censorship in China as it dynamically changes over time.In order to build a dataset, we queried the public timeline at fixed intervals to retrieve a sample of messages.
Given this range of deletion reasons, we turn to incorporating other lexical signals to focus on politically sensitive keywords.
Internet Censorship in China: Where Does the Filtering Occur.When terms are more frequent, their observed deletion rates should naturally be closer to the base rate.Experts are saying that while China may feel the need to maintain control through censorship,.We also are learning that Internet censorship and regulation in China have serious economic implications for many U.S. compa-nies, such as Go Daddy.Sina provides a method for identifying suppressed political terms that are currently salient in global public discourse.Since we expect fewer than 1 out of 40 of these terms could have been generated at random, they are reasonable candidates for further analysis.If the null hypothesis were true, only one in 1,000 terms would have deletion rates above the top orange line.