Zócalo Public Squaredata analysis – Zócalo Public Square

Why the Census Must Frame the Right Questions on Race and National Origin

By Jennifer Lee — Tue, 25 Apr 2017 07:01:33 +0000

Like most Americans, I spent most of my life not appreciating the herculean effort the U.S. Census Bureau undertakes every 10 years.

Since its inception in 1790, the U.S. Census has aimed to count every living person in the country, and the stakes are high. The results of the census determine the allocation of hundreds of billions of federal dollars, which affect every slice of American life.

In order to do so, the Census must ask Americans the right questions—and give them the right options for their answers. It seems relatively simple, but—as I learned in 2013, when I became a member of the Committee on Population Statistics of the Population Association of America—the undertaking is so enormous that the planning for the 2020 Census began even before the completion of the 2010 Census. In 2010, the Census Bureau launched the Alternative Questionnaire Experiment (AQE) to compare different Census questionnaire design strategies. Five years later came the National Content Test (NCT), in which different questionnaires were sent to a statistically representative sample of approximately 1.2 million households in the United States and Puerto Rico.

I had the opportunity to review the results of both tests and assess which questionnaire design results in the most accurate count of the U.S. population. That meant taking three interrelated components into consideration. The first is increased reporting: Which questions were people most likely to answer? The second is decreased non-reporting: Which questions were more likely to get groups who are susceptible to non-reporting (including poor families who get evicted, immigrants who do not read or understand English, and undocumented migrants who may fear government officials) to respond? The third is increased, detailed reporting: Which questions yield more information about the respondents?

The design of a question itself affects how people answer it. Take the race and ethnicity question. People who identify as Asian or Hispanic answer it differently depending on how it is presented on the Census form.

In the 2010 Census, Hispanic origin and race were listed as two separate questions. In both the AQE and NTC, the Census Bureau tested the option of combining race and Hispanic origin into one question, which they refer to as the “combined format.” In addition, they tested which combined format would elicit the most detailed reporting on origin.

One option was to list the racial categories only, with an option to write in their detailed origin. A second option was to list racial categories and also provide check boxes denoting examples of detailed origin, along with the option to write in one’s origin.

More than 70 percent of self-identified Hispanics said they were Hispanic when they were offered Hispanic as a race option (the combined option). When they are not presented with this option, as in the 2010 Census, self-identified Hispanics are more likely to check “some other race” or mark two or more races. In short, the combined option—in which Hispanic is listed as a race category—more easily allows Hispanics to accurately report their Hispanic identity. Moreover, when Hispanics are offered the combined option, they are significantly less likely to mark “some other race” or two or more races to self-identify. Both results indicate more accurate reporting on the part of Hispanics.

Moreover, Asians were most likely to mark their race, including their detailed race, when they are provided with a check box to mark their national origin (for example, Chinese, Filipino, Asian Indian, Vietnamese, Korean, Japanese). When these check boxes are removed, however, and Asians are presented with only a space to write in their national origin, they are less likely to report it. The difference is significant. The check-box format yielded a 97.4 percent response rate among Asian-Americans, and plummeted to 92.6 percent when they were provided only with a write-in option.

Detailed reporting among Asians is critical because it allows researchers to disaggregate data, which is essential to identifying health, educational, and economic disparities among Asian ethnic groups.

Such disaggregation may sound technical and mathematical, but it can have profound human impacts. For example, having data specific to different sub-groups on disease rates, health insurance coverage rates, and birth and death rates can allow policy makers and community organizations to make more informed decisions about how to best serve these populations.

Some Asian ethnic groups are more susceptible to certain health risks: Men and women of Vietnamese origin experience the highest rates of lung cancer among all Asian American subgroups, while men and women of Korean origin have some of the highest colorectal cancer rates. Such data can guide outreach on health insurance coverage; while 13 percent of Asian Americans lack health insurance, the rate is as high as 20 percent among Koreans.

In the state of California, there’s been broad recognition of the importance of breaking out such data. Last fall, Governor Jerry Brown signed legislation directing the Department of Public Health to disaggregate data for the Asian American, Native Hawaiian, and Pacific Islander populations on or after July 1, 2022. Following suit, the University of California and California State University have agreed to begin releasing disaggregated data on admissions, enrollment, and graduation rates—data that will help to unveil the wide disparity in educational attainment among Asian Americans.

Data disaggregation is a powerful weapon to dismantle the dominant narrative of Asian Americans as the model minority, which has resulted in their exclusion from policy debates on poverty, health care, and education. While Asian Americans may be touted as academic high achievers, one-third of Cambodians, Laotians, and Hmong do not graduate from high school. Data disaggregation exposes these gaping differences among Asian ethnic groups, and points to the dire need for the federal resources to help boost the educational outcomes of these groups, which are essential to immigrant and second-generation integration.

If the 2020 Census provides only a write-in option to list one’s origin, we will lose a lot of disaggregated data, and be unable to identify the stark differences among U.S. Asians. We will also miss a great deal of information on the country’s growing and increasingly diverse Hispanic population.

You don’t have to be on a committee like I am to weigh in on the Census. For the second time before the potential revisions of the 2020 Census, the White House Office of Management and Budget has invited public comments. While the ultimate decision about potential changes rests in the hands of Congress, your opinion counts. April 30 is the last day to weigh in.

The post Why the Census Must Frame the Right Questions on Race and National Origin appeared first on Zócalo Public Square.

Why Artificial Intelligence Won’t Replace CEOs

By Judy D. Olian — Wed, 02 Nov 2016 07:01:51 +0000

Peter Drucker was prescient about most things, but the computer wasn’t one of them. “The computer … is a moron,” the management guru asserted in a McKinsey Quarterly article in 1967, calling the devices that now power our economy and our daily lives “the dumbest tool we have ever had.”

Drucker was hardly alone in underestimating the unfathomable pace of change in digital technologies and artificial intelligence (AI). AI builds on the computational power of vast neural networks sifting through massive digital data sets or “big data” to achieve outcomes analogous, often superior, to those produced by human learning and decision-making. Careers as varied as advertising, financial services, medicine, journalism, agriculture, national defense, environmental sciences, and the creative arts are being transformed by AI.

Computer algorithms gather and analyze thousands of data points, synthesize the information, identify previously undetected patterns, and create meaningful outputs—whether a disease treatment, a face match in a city of millions, a marketing campaign, new transportation routes, a crop harvesting program, a machine-generated news story, a poem, painting, or musical stanza—faster than a human can pour a cup of coffee.

A recent McKinsey study suggests that 45 percent of all on-the-job activities can be automated by deploying AI. That includes file clerks whose jobs can become 80 percent automated, or CEOs’ jobs that can be 20 percent automated because AI systems radically simplify and target CEOs’ reading of reports, risk detection, or pattern recognition.

AI has been one of those long-hyped technologies that hasn’t transformed our whole world yet, but will. Now that AI appears ready for prime time, there is consternation, even among technologists, about the unbridled power that machines may have over human decision- making. Elon Musk has called AI “our biggest existential threat,” echoing Bill Joy’s 2000 warning in Wired magazine that “the future doesn’t need us.” On the other side, of course, are enthusiasts eager for smart machines to improve our lives and the health of the planet.

I’m on the side of Microsoft CEO Satya Nadella, who says we should be preparing for the promise of ever smarter machines as partners to human decision-making, focusing on the proper role, and limitations, of AI tools. For business school educators like me who believe the future will indeed need us, the expanding power of AI or deep learning poses a challenge and opportunity: How do we prepare students for the coming decades so that they embrace the power of AI, and understand its advantages for management and leadership in the future?

It would be a mistake to force every MBA graduate to become a data scientist. The challenge for business schools is to update our broadly focused curricula while giving our MBAs a greater familiarity and comfort level with data analytics. Tomorrow’s CEOs will need a better sense of what increasingly abundant and complex data sets within organizations can, and cannot, answer.

The sophistication and volume of data may be increasing, but history affords models of a decision maker’s proper relationship to data analytics.

Take D-Day. General Dwight D. Eisenhower sought as much data as possible to inform his decision on when to land hundreds of thousands of Allied forces on the beaches of Normandy in that fateful late spring of 1944. As Antony Beevor’s book on the battle and other accounts make clear, Eisenhower especially craved reliable meteorological data, back when weather forecasting was in its infancy. The general cultivated Dr. James Stagg, his chief meteorologist, and became adept not just at analyzing Stagg’s reports, but also at reading Stagg’s own level of confidence in any report.

For months before the fateful decision to “embark upon the Great Crusade,” Eisenhower developed a keen appreciation for what meteorological forecasts could and could not deliver. In the end, as history knows, Stagg convinced him to postpone the invasion to June 6 from June 5, when the predicted storm raged over the English Channel and when many others questioned Stagg’s call that it would soon clear.

How do we prepare students for the coming decades so that they embrace the power of AI, and understand its advantages for management and leadership in the future?

No one would argue that Eisenhower should have become an expert meteorologist himself. His job was to oversee and coordinate all aspects of the campaign by collecting pertinent information, and assessing the quality and utility of that information to increase the invasion’s probability of success. Today, big data and the advent of AI expand the information available to corporate decision-makers. However, the role of a CEO in relation to data echoes the absorptive and judgmental function exercised by General Eisenhower in reading probabilities into his meteorologist’s weather reports.

It’s noteworthy that today, amidst all the talk of technological complexity and specialization across so much of corporate America, a Deloitte report prepared for our school found that employers looking to hire MBA graduates value prospective employees’ “soft skills” more than any others. They want to hire people with cultural competence and stronger communication skills, who can work collaboratively in diverse teams, and be flexible in adapting continuously to new opportunities and circumstances in the workplace and market.

This isn’t just about intolerance for jerks in the office. It’s about a leader’s need to be able to synthesize, negotiate, and arbitrate between competing and conflicting environments, experts, and data. If there was once a time when corporate leaders were paid to make “gut check” calls even when essential information was lacking, today’s CEOs will increasingly have to make tough, interpretive judgment calls (a different type of “gut check”) in the face of excessive, often conflicting, information.

Those in the driver seat of institutions have access to an expanding universe of empirically derived insights about widely varying phenomena, such as optimal models for unloading ships in the world’s busiest ports in various weather conditions, parameters of loyalty programs that generate the ‘stickiest’ customer response, or talent selection models that yield both the most successful, and diverse, employment pools.

Corporate leaders will need to be discerning in their use of AI tools. They must judge the source of the data streams before them, ascertain their validity and reliability, detect less-than-obvious patterns in the data, probe the remaining “what ifs” they present, and ultimately make inferences and judgment calls that are more informed, nuanced around context, valid, and useful because they are improved by intelligent machines. Flawed judgments built on flawed or misinterpreted data could be even more harmful than uninformed flawed judgments because of the illusion of quasi-scientific authority resulting from the aura of data.

As a project management tool, AI might prescribe optimal work routines for different types of employees, but it won’t have the sensitivity to translate these needs into nuanced choices of one organizational outcome (e.g., equity in employee assignments) over another (family values). AI might pinpoint the best location for a new restaurant or power plant, but it will be limited in mapping the political and social networks that need to be engaged to bring the new venture to life.

Machines also lack whimsy. Adtech programs have replaced human ad buyers, but the ability to create puns or design campaigns that pull at our heartstrings will remain innately human, at least for the foreseeable future.

A new level of questioning and integrative thinking is required among MBA graduates. As educators we must foster learning approaches that develop these skills—by teaching keen data management and inferential skills, developing advanced data simulations, and practicing how to probe and question the yet unknown.

In parallel to the ascendancy of machine power, the importance of emotional intelligence, or EQ, looms larger than ever to preserve the human connectivity of organizations and communities. While machines are expected to advance to the point of reading and interpreting emotions, they won’t have the capacity to inspire followers, the wisdom to make ethical judgments, or the savvy to make connections.

That’s still all on us.

The post Why Artificial Intelligence Won’t Replace CEOs appeared first on Zócalo Public Square.

Why Big Data Isn’t So Bad

by Phillip Leslie — Thu, 04 Jun 2015 07:01:43 +0000

Big data gets a bad rap.

While stories show up practically every day about the novel and sometimes surprising ways Internet companies can use the massive amounts of data they collect from us (recently, for instance, Belgium criticized the way Facebook shares users’ likes with its advertisers), the ability to collect and analyze large amounts of social science data has the capacity to do much social good.

I know this because I’m an economist who has been working with extremely large datasets from a time before we even called this “big data.” What first convinced me of how valuable big data could be came in the late 1990s—when we just called it “empirical research.”

I was working on a study of what happened after Los Angeles County’s Health Department began placing letter-grade signs in the windows of restaurants to rate hygienic food preparation conditions. We had data for some five years on close to 30,000 restaurants, which were all inspected about two to three times per year. That was hundreds of thousands of data points, which felt like a lot of data to be working with back then. Less than 20 years ago, it was common to see published papers where the number of observations was under 1,000 and even fewer than a couple hundred. That would seem quaint, if not scandalously incomplete, by today’s standards.

As economists know, one of the main challenges in working with data has not as much to do with size of the data as the kinds of variation that are in the data—the stock market may seem to jump with news of one war, for example, and may seem to drop with the news of another, if we don’t pay attention to what else is going on in the background. Real life has so many varied factors at play that it can be hard to distinguish whether two events just happened at the same time by coincidence, or whether there’s a true cause-effect relationship between them. Economists wish we could run randomized trials like drug companies, testing only one variable at a time, but instead we try to uncover natural experiments in the available data.

The reason I was drawn to studying the restaurant cards was that there was a clear separation between the time before restaurant grade cards and the time afterward. It took the health department only a few months between a hidden-camera newscast exposé of disgusting conditions in some restaurants’ kitchens and the county launch of the grade-card program in January 1998. This meant the grade cards were unanticipated, allowing us to say that changes in restaurant and customer behaviors were caused by the grade cards.

We knew we were trying to measure subtle, small effects. And anyone who works with data will tell you that data tends to be noisy, which means you try to follow a lot of leads that go nowhere. We first started to look at how revenues changed according to whether a restaurant got an A, B, or C, and then we kept pushing the data more and more. The biggest surprise for us in this research was the ability to connect how grade cards changed health outcomes. We realized we could look at the number of people hospitalized with food-borne illnesses in L.A. county hospitals before and after the grade cards; and the signal turned out to be clear and strong: a 20 percent decrease in hospitalizations for illnesses associated with poor food safety, such as staphylococcal food poisoning, after the grade cards appeared. It was astonishing that we could make the connection.

Of course, studies like this raise many questions. Life isn’t always easily packed into a standardized grid. What if a restaurant got a particularly strict instructor rather than a more lenient one for their inspection? What if the restaurant was having a particularly bad day when the inspector showed up? Should they have to display that grade card for the next few months even if it’s not representative?

These questions went beyond the scope of our study, but they’re worth asking. We have to recognize that studies like this show the trade-offs we are making. And to me, they’re OK because we can clearly see that there was a reduction in illness. In the years after this study, many other jurisdictions have adopted similar grade cards.

Big data can also help with important health problems whose influences are hard to track. Obesity is one of those. In 2008 and 2009, I worked with colleagues on a project studying the effect of a new New York City law that mandated the display of calorie counts on menus.

Starbucks, home of decadent caramel lattes and “skinny” lattes, agreed to share transaction-level data on purchase behavior with us. We didn’t have any names attached to the data—and I’m not sure if Starbucks itself knew. We compared the data we got from New York City to Philadelphia and Boston, where there was no calorie posting.

We found that consumers at Starbucks reduced calories by 6 percent when the calories were posted. While this doesn’t sound like a huge amount—typically choosing a drink that was 232 calories rather than one that was 247 calories—we could tell the reduction was real and not some fluke variation.

But the study also revealed to us that New Yorkers weren’t all identical automatons. When we indexed our results by zip code, we found that the zip codes with wealthier, more educated, and less obese people tended to cut their calories to a larger degree than those from the poorer, less educated, more overweight zip codes. We were disappointed, of course, that those who could benefit most from losing weight weren’t very responsive to the campaign.

Of course, just because there are millions of data points doesn’t mean the interpretation of these data points isn’t slippery. Take the different analyses of the 10-year-long “Moving to Opportunity” study, which observed what happened when 4,600 families from low-income parts of Baltimore, Boston, Chicago, Los Angeles, and New York City were given the chance to move into more well-to-do neighborhoods.

Early analysis of the data from the late 1990s suggested that moving to a new location didn’t matter all that much in terms of how likely a kid was to graduate from college or how much he or she earned as an adult. But recently, economist Raj Chetty at Harvard came up with a new way to slice the data: if you looked at how young the children were when they moved to better neighborhoods, you could see that every extra year of childhood spent in a better neighborhood mattered.

Still, no matter a study’s merits, there’s always the privacy issue to consider. I recognize that people are very sensitive to companies using data they consider personal. It was a little disconcerting when Target started sending ads to young women that their data-crunching algorithms found were likely to be pregnant.

Do companies and governments screw that up? Of course. But it’s a mistake to presume that the intent inside these organizations is to misuse, or abuse, that information. In my experience working with businesses, they are very sensitive about how they can keep their customers’ trust. And there are valuable uses of the data companies collect—especially when tracking customer behavorial shifts in response to outside factors like new laws or regulations, such as in the case of New Yorkers suddenly confronting posted calorie counts.

I, for one, am quite happy that an online retailer may know I’m a cycling enthusiast and lets me know when helmets or other gear are on sale. I am also willingly sharing data through a smartphone app with a website for cyclists called Strava. The website tells you what rides you have been on and compares your rides to others’. And the benefits to me are clear: I’m connecting with friends through the site because it enables me to discover new places to ride and compares my performance to others. And it helps me stay in shape: I get on the bike a lot more than I would have otherwise.

The post Why Big Data Isn’t So Bad appeared first on Zócalo Public Square.