Register Thursday | November 21 | 2024
Age of the Algorithm

Age of the Algorithm

The all-powerful Google search has given rise to sites like eHow.com, which critics dismiss as online sweatshops.

Content farms depend heavily on search engines for traffic. Photograph by Marc Rimmer.

SOUTH BEND, Ind.—Tony Bucciferro put the Michigan State Spartans on his back Sunday and spurred them to a 3–0 win over the Notre Dame Fighting Irish (7–11) at Frank Eck Stadium.

Bucciferro kept the Fighting Irish off the board during his nine innings of work for Michigan State (12–4). He struck out five and allowed one walk and three hits.

This generic piece of sports reporting is remarkable for only one reason: a computer algorithm wrote it. StatsMonkey, the algorithm responsible, is programmed to scan statistical data about a game and look for crucial findings. The algorithm then does something unique: it structures these results into a coherent written narrative, capturing the game’s critical moments and players, just as a sports reporter would. It can identify Tony Bucciferro’s performance as the key ingredient in the Spartans’ victory, displaying insight beyond how many hits and walks the pitcher allowed. StatsMonkey, developed by the Intelligent Information Laboratory at Northwestern University in Illinois, will never be an elegant wordsmith. But it could one day seriously threaten the jobs of sports journalists—not to mention the rest of the news team.

Simply put, an algorithm is a series of instructions that, if properly followed, can lead to the solution of a problem. An apple pie recipe can be considered an algorithm, if the instructions are written out in enough detail that anyone could bake a pie by following them. Most algorithms are expressed numerically, and today most are written for computers—in the form of programs—rather than for people. One big difference lies in the level of detail; a “pinch of salt” might be an adequate instruction for a person, but a machine requires more precision.

As computers have become faster and more powerful—and as the costs of storage and bandwidth have plummeted—there is virtually no limit to the specificity, size and complexity of computer algorithms. They are insinuating themselves into more and more areas of our lives: in the office, on trading floors and financial exchanges, even on movie screens. And the most ubiquitous and influential algorithm of the digital age is the one you encounter every time you type a few words into that rectangular bar on your computer screen: search.

The algorithm that changed the world was developed in 1998 by two Stanford doctoral students named Larry Page and Sergey Brin. The web had been around for about a decade, but it was in danger of collapsing under its own weight. Although there were already about a hundred million pages ready to be searched, none of the available search engines had figured out how to do it very effectively. Sites like Yahoo or AltaVista yielded too many useless results and were vulnerable to manipulation by spammers.

Page and Brin solved these problems by developing an algorithm that paid more attention to how many other pages were linking to a given site—and the authority those pages brought to the subject—rather than just the text on the page. This made life more difficult for spammers, who could no longer get highly ranked pages simply by flooding them with keywords, and it provided users with high-quality, relevant results in a fraction of a second.

The world beat a path to their door. Google, the company that Brin and Page started, now controls more than 80 percent of the worldwide search market, took in $29 billion in revenue last year and is valued at roughly $200 billion. Their algorithm, the “secret sauce” at the heart of the Google empire, is probably the most valuable piece of intellectual property in the world.

The success of that algorithm in delivering quality searches has placed search at the centre of our internet lives, and its impact was first felt in marketing. In 2009, Canadians bought more than $15 billion worth of goods and services online; many of those purchases originated by typing something into a Google search bar. A search query reveals two critical pieces of information: where a potential customer lives and what that customer is looking for. This ability to zero in on niche markets has revolutionized the world of advertising, with roughly half of the $25 billion a year that US advertisers spend online now directed toward search marketing.

The key to successful search marketing is placing your site at or near the top of the Google search rank. That’s where the overwhelming majority of users click—an estimated 70 percent of all Google search traffic goes to the top four results—and where advertisers desperately want to be. Getting there requires an understanding of what Google’s secret search algorithm is really looking for—no easy task. The algorithm has evolved significantly since 1998, and although we know that it weighs about two hundred factors when ranking results, we don’t know what they all are or which are most important. Google also tweaks its algorithm every week, mostly to stay ahead of spammers, resulting in a never-ending, high-stakes battle of the nerds.

Spammers aren’t the only ones who want to crack Google’s code. Almost anyone with something to sell online must pay close attention to what is called search engine optimization, or SEO: how to properly design your site, write text, and employ keywords and links in a way that will please the algorithm. Getting the search engine’s attention is as important for news outlets as it is for marketers, as more readers now find news online through Google than any other source. But if SEO becomes a key journalistic principle, algorithms, not editors, will choose how stories are told—and which aren’t told at all.

In December 2009, Michael Arrington, founder of the influential American technology blog TechCrunch, wrote a post titled “The End of Hand Crafted Content.” Arrington is a savvy Silicon Valley insider, a critic of old-school media and no friend to Luddites, so this message seemed out of character. Arrington the innovator was defending some very traditional ideas about producing content for the web, and his primary target was a group of companies that styled themselves as the cutting edge of the content revolution. The irony wasn’t lost on him. “Just as old media is complaining about us,” Arrington wrote, “look for us to start complaining about the new jerks.”

The “new jerks” included familiar names like Yahoo, which had recently purchased a site called Associated Content for $100 million, and a little-known company from southern California called Demand Media. Deemed “content farms” by their many critics, sites like Associated Content push massive amounts of search-engine-optimized, advertiser-supported content on to the web. Most of it is of dubious quality and value, and, not coincidentally, they pay the people who provide this content almost nothing.

“What really scares me,” Arrington wrote, “is the rise of cheap, disposable content on a mass scale, force fed to us by the portals and search engines.” The result would be “a race to the bottom situation, where anyone who spends time and effort on their content is pushed out of business.” He concluded, “Hand crafted content is dead. Long live fast food content, it’s here to stay.”

Arrington is not alone in his fears. Last spring, some of the leading thinkers and doers in the worlds of technology and social media gathered in Toronto for Mesh, an annual web conference.  (One session, entitled “The Battle Between Crafted and Machine-Driven Content,” directly addressed the issue of content farms.) A Mesh participant named Jason Fry, a New York–based blogger and former Wall Street Journal online columnist, expressed concern about the economic impact of content farms: they could deliver yet another blow to traditional media’s shaky business model and deplete the already meagre incomes of freelance writers. “This is the journalist as Chinese factory worker,” Fry proclaimed. “If you want to know how our profession ends, look at Demand Media.”

You may never have heard of Demand Media, but you have almost certainly visited one of its many websites. There are over a million Demand stories floating around on the web right now and, before today is over, the company’s army of roughly thirteen thousand freelancers will churn out another five or six thousand articles and videos. You can find them on more than sixty Demand-run sites, including eHow.com, which specializes in answering questions that begin with the words “how to”; LIVESTRONG.com, a health and fitness site run in partnership with cyclist (and Demand Media investor) Lance Armstrong; or Answerbag.com, which proclaims that “every question deserves a great answer.” Demand Media says its freelance videographers produce more videos on YouTube than any other single source.

Founded in 2006, the company’s business plan revolves around several fairly simple principles. The first is that it will only deliver content that people want, which it discerns through—what else?—search. In the traditional media model, it was editors who knew (or thought they knew) what readers were looking for. The best editors won the trust of readers by offering stories that engaged them. That trust built loyalty, which attracted the attention of advertisers, which made the newspaper or magazine economically viable. But editors can be wrong. They can misjudge their readers’ appetites and interests. And if their mistakes lose readers, they’ll lose advertisers and, ultimately, money.

Demand Media seeks to eliminate the guesswork. In its business model, everything it publishes—all those millions of articles and videos—must be monetized through advertising. If a story can’t pay its own way, it won’t go online. Nothing is left to chance, and thanks to the power of search, nothing has to be. Before the web, information was largely driven by supply; readers read what they were given to read. Now, search allows readers to demand the information they want, and companies like the aptly named Demand Media have positioned themselves as suppliers. “We’re content creators making things that people want,” Demand CEO Richard Rosenblatt recently told a Wall Street Journal panel. “There’s no piece of content made that we think is good, because we only make content that people tell us they think is good.”

Demand has three sophisticated algorithms that determine what people “think is good.” The first scans hundreds of millions of search queries, looking for popular keywords, while another determines what advertisers would be willing to pay to appear alongside those topics. Writers earn what their article is worth to those advertisers, and no more. For a typical three-hundred- to five-hundred-word article, that means $15—about three cents a word. Finally, a third algorithm tallies how many articles on that topic are already online. (Demand Media also used to employ human assignment editors, until it realized it could pull in much more ad revenue using exclusively algorithms.)

After the algorithm issues its stamp of approval, the title is reviewed by a freelance editor, who tweaks it to ensure it will be search-engine friendly. Editors get paid four cents for every title. It is then offered up on Demand Media’s website to be claimed by a freelance writer. Judging by the testimonials on the site, many of Demand’s freelancers are unemployed writers, stay-at-home moms and people looking to pick up extra income in their spare time. The number of topics currently available exceeds three hundred thousand and is growing rapidly. The algorithm spits out so many topics, it’s almost impossible for the humans to catch up.

None of these stories, it should be noted, would be found anywhere near the front page of the newspaper. The war in Afghanistan or upheaval in the Middle East hold no interest for the Demand Media algorithm. These are topics that change quickly and that advertisers generally steer clear of. Demand Media is looking for “evergreen” content with a long shelf life, primarily service journalism: stories that explain how to get things done, what products to buy or which places to visit on vacation.

Demand is also looking for stories on the internet’s “long tail,” a phrase popularized by Wired’s Chris Anderson. These are the out-of-the-way, highly specific niches—some trivial, many not—that the algorithm recognizes as neglected. Topics like “Kentucky Requirements for Laser Hair Removal” or “How to Build a French Drain in Clay Soil in Denver Colorado” or “How Do I Make a Gazebo Using PVC Pipes” clearly sit at the end of a very long tail. And if you were in the business of selling PVC pipes, or equipment for laser hair removal or French drain building, chances are you would be happy to attach your ad to those stories. The lack of competition on the long tail means your story—and its corresponding advertisements—stand a good chance of getting near the top when a user types those queries into a search engine.

To become a Demand Media freelancer, you have to submit your resume and a writing sample of at least three hundred words—preferably a published article. What you don’t need to do is pitch any story ideas. The algorithm has already taken care of that. Until last spring, Demand Media only accepted writers living in the United States. But when the doors finally opened for residents of Canada and the UK, I decided to try out life on the farm.

After my application was accepted, I eagerly scrolled through pages of possible stories. I quickly realized I was probably not the most suitable candidate for this job. Many of the how-to topics involved technology, construction, and home and appliance repair—not my strong suits. I bypassed “How to Build a Portable Cattle Stanchion,” “How to Build a Motorized Bicycle Weed Eater” and “How to Fix the Pressure Valve on a Two Gallon Compression Sprayer.” All of these would require a lot more work than my $15 paycheque could justify. I thought “What Causes Pimples in your Nose?” was best left to someone with a background in dermatology. I foolishly passed on “What Are the Advantages of Horizontal Fly Men’s Underwear?” because I didn’t know what a horizontal fly was. Moments later it occurred to me that I could probably speculate on that topic as well as the next guy, but by the time I returned to the index, the topic had already been claimed by another enthusiastic freelancer.

Finally, I came across “How Do I Register a Canadian Business Name?” Bingo! I had registered a business name earlier that year. There was nothing complicated about it. You just go to your provincial government website, fill out a form online, pay some money and you’re all set. As an added bonus, I could suggest you visit the Canada Revenue Agency website to pick up a tax number while you were at it. The topic was listed under a category called “Short Answer,” which meant the editors were looking for only seventy-five to two hundred words. I thought starting small was probably a good strategy. The bad news was that my story, once approved by the Demand Media editorial team, would pay me just $5.

I consulted my Demand Media Short Answer Style Guide. Demand publishes detailed style guides for every category of story; the full document that outlines all its editorial guidelines is fourteen pages long. It emphasizes searchability, instructing writers to include “at least three unique ‘key concepts’ which concisely summarize what the article is about.” According to my Short Answer guidelines, I would need to “clearly outline the actions necessary to complete the stated objective in the title,” and each of my three sections needed to start with an actionable verb. The keywords and links to other sites I included would help grab the attention of the Google algorithm.

“Registering a business name in Canada falls under provincial jurisdiction,” I boldly declared in my opening sentence. “But getting a tax number for your business must be done through the Canada Revenue Agency.” I thought both “registering” and “getting” were nice actionable verbs for my introduction. I went on to explain that, when starting a business, you have to decide if you want it to be a sole proprietorship, a partnership or a corporation. I sent my work off to the Demand Media copy editors—who make $3.50 per article—for review.

Two days later, my editor responded. She informed me that my actionable verbs were not actionable enough. She recommended I rewrite my article using verbs like “obtain,” “visit,” “provide” and “pay.” I was also told to avoid verbs like “decide” that require “mental processes.” Instead, I should convert these to commands like “select.” Finally, I was instructed to provide references to exact pages used as information for this article so they could be fact-checked.

I began my rewrite. I did more research and found more websites that I could refer readers to. I used the recommended action verbs. I properly formatted my references. The second draft was one hundred words, forty-four words less and, admittedly, better than the first. On one level, I appreciated—and was, frankly, surprised by—the editorial rigour that Demand Media displayed. Fact-checking and line-editing are becoming increasingly rare in mainstream journalism. And if your Demand story is published (not all are accepted), the money is deposited into your bank account within a week. I recently had to wait a year and a half to get full payment from a major Canadian magazine.

Still, I had spent nearly two hours on an article that earned me $5. I hadn’t worked so long to make that kind of money since I was a teenager. Although I’ve gotten faster over time, I would still have to churn out about eight or ten of them a day, every day of the week, to approximate the weekly wages of a supermarket cashier.

This is machine-driven content, the industrial model brought to journalism; it’s often been compared to Henry Ford’s assembly lines. And it’s working. Demand Media’s initial public offering, issued this January, valued the company at $1.5 billion—making it worth more than the New York Times Company and marking the biggest IPO for any internet company since Google. Although critics question its long-term profitability, Demand Media is one of the US’s top twenty largest web properties as measured by unique visitors.

Although content farms tout their commitment to “high-quality content,” their definition of quality actually has more to do with relevance. What matters most is whether the article answers the original search query, and whether this optimizes its search rank and attracts advertisers. Michael Arrington was probably right that the days of “hand crafted content” are numbered—not for all websites, but certainly when it comes to service journalism and the infamous long tail. I have no doubt that a sophisticated sports-writing algorithm like StatsMonkey could spit out prose that conforms to the Demand Media style guide, and it would never ask for pay.

SEO already affects how news is written online. Editors at mainstream sites are now trained to write keyword-friendly headlines that search engines will recognize. In January 2010, when former US presidential candidate John Edwards finally acknowledged that he fathered his mistress’ baby, the New York Post headline was “I’m the Pop, Says the Weasel.” It was a classic, clever tabloid headline, but it just wouldn’t work online. Search engine–friendly titles must be direct, descriptive and contain as many popular keywords as possible. When Conan O’Brien decided to quit NBC rather than accept a later timeslot, the headline in the print edition of the Washington Post was “Better Never than Late.” Online, the headline became a veritable orgy of keywords: “Conan O’Brien Won’t Give Up Tonight Show Time Slot to Make Room for Jay Leno.”

If stylish writing is less valuable in the age of machine-driven content, accuracy may be another victim. All content farms pay lip service to accuracy, but their business models belie that commitment. If it’s true that you get what you pay for, you don’t get much from Demand Media; how much research can a writer realistically be expected to do for under $20? For one of my Demand stories—“How Do I Become a Real Estate Appraiser in Ontario?”—my search yielded a few relevant websites, but I found the material rather confusing. A call to someone at the Ontario branch of the Appraisal Institute of Canada probably would have cleared up my confusion, but that might have added another half hour to a $5 job I had already invested more than an hour in. So I did the best I could, and I have no idea whether I got it right.

Not surprisingly, the most vociferous critics of content farms are people currently working in mainstream media. They mock the poor quality of content farm production, and decry their appallingly low pay scale. But big news outlets could learn a lot from Demand Media. For too long, newsrooms have been run as closed shops, with companies relying on polls, surveys and focus groups to find out what audiences want. This disconnect has undoubtedly contributed to mainstream media’s declining fortunes. Search engines are more precise, and likely more reliable, than focus groups, and companies like Demand Media are unapologetic about their focus on what readers ask for.

Journalists employed by big media conglomerates no longer have a monopoly over what is newsworthy. While that is a positive, democratizing step, there is one important caveat: in the world of content farms, it is ultimately the level of advertiser interest—not reader demand—that determines if a story gets written.

These priorities are nothing new. Canadian press baron Lord Thomson of Fleet once dismissed editorial content as “the stuff you separate the ads with.” Indeed, until recently, the business model for news had changed little since the nineteenth century. Most of the money for newsgathering came from advertisers. Readers paid a nominal amount for the privilege of accessing content, but without advertising, there would be no foreign bureaus, no investigative reporting, no journalists travelling the world in pursuit of stories.

Some parts of the newspaper—travel, cars and homes—have always been more advertiser-driven than others. Like Demand Media, they provide the kind of practical information that readers are looking for, and the highly targeted audiences that advertisers treasure. Demand Media content is already beginning to find its way into traditional media; you can find stories written by Demand freelancers in the travel sections of papers like USA Today. Such outsourcing saves news outlets hundreds of dollars per story, although readers are probably unaware that the article was written by someone being paid $15, and edited by someone making $3.50.

But service journalism is not all the media does. Advertiser-supported sections subsidize essential political and international coverage. If low-cost, search-optimized online content becomes the primary source for information about travel and home repairs, how will media outlets fund reporters in Afghanistan? Demand’s Rosenblatt once declared that his company’s goal was to “publish the world’s commercial content,” meaning anything that can turn a profit. So who will publish the stories that can’t pay their own way?

Demand Media capitalizes on two inescapable truths about content creation today. First, there are a lot of people prepared to write for little or no money. According to the American Society of News Editors, more than 13,500 newspaper jobs were lost in the US between 2007 and 2010, providing a deep pool of writing talent for content farms to draw on. (Canadian outlets like CBC and Sun Media have also collectively laid off thousands of employees over the last few years.) The second truth is that there is now an open market for content creation. Before readers could get their news fix online, only a small number of giant media outlets had the infrastructure to reach mass audiences. Advertisers were prepared to pay a healthy premium to be associated with that content, and those who provided it were reasonably well paid.

But in the internet era, the marginal value of content decreases because anyone can create or distribute it. A Demand Media article is clearly worth something to the advertisers it targets—just not a lot. Content farms have found a way to make money by determining the true market value of their articles. Writers might not like to hear that the fruits of their labour are worth only three cents a word, but that may well be the ugly economic reality.

Still, the future of content farms might not be so rosy. Roughly a quarter of Demand Media’s annual revenue currently comes from Google ads, and about 60 percent of the traffic to its largest site, eHow, is delivered by Google searches. If Google revises its algorithm so that low-quality, long-tail content doesn’t get rewarded in its rankings, content farms will suffer—and the search engine has insinuated it might want to do just that. A January post on the official Google blog stated that “we hear the feedback from the web loud and clear: people are asking for even stronger action on content farms and sites that consist primarily of spammy or low-quality content.” (Rosenblatt has claimed that this was not aimed at Demand Media “in any way.”)

Despite such hints, looking to Google for salvation may not be the answer. The material generated by content farms is supported by advertisers, and Google makes money every time someone clicks on those ads. YouTube, which is owned by Google, loses money every year because the vast majority of its content is of no interest to advertisers. According to Demand Media, its ad-supported YouTube videos are streamed about 2.5 million times a day. Would Google really want to kill that golden goose?

If content farms are really about giving readers what they want, they might be in trouble when those desires change, as they invariably will. Right now, readers will tolerate mediocre content, especially on the web’s long tail. But if they begin to demand quality, content farms will be ill equipped to provide it.

Last summer, Yahoo announced its own attempt to tackle the quality issue. The company launched the Upshot, a news blog that uses search queries to help guide stories on politics, national affairs and the media. And unlike content farms, the Upshot provides steady employment to actual journalists. “The process of journalism hasn’t changed,” insisted Yahoo News vice-president Mark Walker in an interview with the blog WebNewser. “What has changed is the nature of information, the nature of newsgathering, and the nature of inputs.” But using algorithms to determine reader demand does more than just change the “nature of inputs.” It irrevocably changes the “process of journalism” itself, and not entirely for the better.

“This is a different conversation than we’ve been having for the past ten years,” said Christopher Anderson, professor of media studies at the City University of New York, at last spring’s Mesh conference. “We’re not talking about whether bloggers are journalists, or how Twitter can help journalism. This is a fundamentally new conversation, because now we’re talking about algorithms.”

It’s not just journalism insiders who are using algorithms like StatsMonkey in novel, unsettling ways. A British company called Epagogix has developed an algorithm that treats screenplays as mathematical propositions: it looks at hundreds of factors and can predict with impressive accuracy how much money the film will gross at the box office. Other companies are developing similar programs for producing hit songs. This might be good news for entertainment moguls, but bad news for fans of artistic expression. Meanwhile, at IBM, programmers have put together mathematical models of fifty thousand of the company’s tech consultants. They crunched massive amounts of data on the employees—how many emails they sent, who got them, who read the documents they wrote—and used this information to assess their effectiveness and deploy their skills in the most cost-efficient way. The employees have no idea how the algorithm weighs the various factors it measures.

One of the defining features of the age of the algorithm is that we are being asked to trust decisions made through uncontrollable processes. The keepers of these secrets can assume enormous power for good or mischief; the inner workings of their creations are legally protected as intellectual property, far from the glare of public scrutiny. Sophisticated algorithms that can make decisions in a fraction of a second now trigger about 70 percent of all trading in American equities and 30 percent of Canadian stock trades. They make so many trades so quickly that they can exploit the tiniest inefficiencies in the markets—which also means that even a minor coding error can touch off a financial meltdown. The creators of these algorithms are known as “quants” (short for quantitative analysts), and despite the financial collapse of 2008, they remain the masters of the Wall Street universe. There is ample evidence that quants don’t fully understand what their creations are capable of. Last year, the New York Stock Exchange fined Credit Suisse for “failing to adequately supervise the development, deployment, and operation of a proprietary algorithm.”

The fact that even analysts can’t keep track of what their algorithms are doing should be seen as a problem. Instead, it’s considered a measure of success. We have put so much faith in the authority of algorithms that we assume they must be smarter than we are. The more unknowable they are, the more power they exercise over us. As the computer scientist Jaron Lanier put it, algorithm enthusiasts seem to believe that “the computer is evolving into a life-form that can understand people better than people can understand themselves.”

“Algorithms are not right or wrong in and of themselves,” Anderson told me at the end of the Mesh conference, “but they are wrong if we give them godlike status.” He believes demand is too complex a phenomenon to be defined algorithmically. “I think that the real issue is that what people want is incredibly complicated. We all want many things in our lives, in our worlds, in our families; now, five minutes from now. I think it’s dangerous when you boil down what people want to a simple mathematical formula.”

In the age of the algorithm, you can get just about anything you think you want, learn everything you think you need to know, by clicking on a link or typing a few words into a search bar. But we adapt to serve the tools we create. Algorithms don’t just answer our questions—they shape what we ask. And the most important questions are often those that algorithms can’t answer, because we don’t yet know to ask them.

See the rest of Issue 39 (Spring 2011).

Related on maisonneuve.org:

—Citizen Uprising
—Good Night and Good Luck
—Adult Language

SubscribeFollow Maisy on TwitterLike Maisy on Facebook