Chapter 2: AI Reflects Human Choices and Perspectives
AI Reflects Human Choices and Perspectives 🪞
Section titled “AI Reflects Human Choices and Perspectives 🪞”Discovering who “teaches” AI and how human choices shape technology.
Imagine you are creating a brand-new custom character for your favorite video game. You get to choose their outfit, their skills, their strengths, and their weaknesses. You decide if they are going to be a stealthy ninja or a heavy-armor knight. If you make them super fast but terrible at jumping, that is exactly how they will behave in the game. The character does not make those choices; you do. They simply act out the rules you programmed for them.
Artificial Intelligence is remarkably similar. Because AI systems run on computers, it is incredibly easy to assume they are perfectly logical, neutral, and objective. We are used to computers being literal. If you type 2 + 2 into a calculator, it will always say 4. Because of this, we tend to think of all machines as being free from human flaws, emotions, and prejudices.
But AI is not a simple calculator, and it is not truly independent. Behind every single AI system is a massive team of humans making thousands of invisible choices. Humans decide what the AI will be used for, what rules it will follow, what problems it is trying to solve, and most importantly, what information it will use to learn.
If AI is a mirror reflecting the world, humans are the ones holding the mirror and choosing which exact part of the world it points at. If you point a mirror at a messy room, the mirror shows a mess. To truly understand AI, we have to look behind the screen and realize that AI does not invent its own knowledge out of thin air. It reflects our human choices, our unique perspectives, and sometimes, our deeply rooted human flaws.
2.1 — The Data Diet: What Does AI “Eat”? 🍽️
Section titled “2.1 — The Data Diet: What Does AI “Eat”? 🍽️”In Chapter 1, we learned that AI systems are powered by algorithms that look for patterns in massive amounts of data. But where exactly does all that data come from?
Think of data as an AI’s diet. In computer science, there is a famous saying: Garbage in, garbage out Garbage in, garbage out: A computer science principle meaning that if you feed a system bad or flawed input data, it will produce bad or flawed output. The quality of an AI's results depends entirely on the quality of its training data. . This means that if you feed a computer bad information, it will give you bad answers. If an AI is going to learn how to write essays, translate languages, or generate cool images of astronauts riding skateboards, it has to consume a massive “diet” of information first. This is called training data training data: The large collection of information (text, images, audio, etc.) that is used to teach an AI system how to recognize patterns and generate outputs. The quality and diversity of this data directly shapes how the AI behaves. .
For most modern AI tools, the primary source of training data is the open internet. Developers use automated programs to scrape scrape: The process of using automated software to collect large amounts of data from websites and online sources. Web scraping is how many AI systems gather their training data from the internet. or download massive chunks of the web. They feed the AI billions of websites, digital books, Wikipedia articles, news stories, online forums, and public social media posts. To put that in perspective, it would take a human being thousands of lifetimes to read the amount of text an AI consumes in just a few weeks. If you have ever posted a public comment on a YouTube video, uploaded a picture to a public profile, or written a review for a video game, there is a very good chance your words and images have been scooped up to become part of an AI’s training data.
But it is not just old internet posts. AI systems also learn from real-world, physical data collected by digital sensors. For example, GPS apps track how fast thousands of cell phones are moving on a highway to learn where traffic jams are. Smartwatches track heart rates and sleep patterns.
Furthermore, AI systems are constantly gathering new data from you in real time to adjust their behavior. Every single time you click “like” on a funny TikTok, replay a specific part of a song on Spotify, or skip past an advertisement before it finishes, the AI algorithm is feeding on that exact data point. It uses your everyday digital choices to adjust its mathematical predictions and serve you more of what it thinks you want to keep your attention on the screen.
2.2 — The Invisible Human Workforce 👷
Section titled “2.2 — The Invisible Human Workforce 👷”Because computers process data so quickly, we often talk about AI “learning” to recognize a dog or “learning” to block a spam email as if it happens by magic. But AI does not learn these things entirely on its own. It requires a massive, often invisible, amount of human labor.
Before an AI algorithm can even begin to recognize patterns, humans have to organize, categorize, and label label: In AI training, labeling means a human manually tagging or identifying elements in data — such as drawing boxes around objects in photos or marking text as positive or negative — so the AI can learn to recognize those patterns on its own. the training data. Imagine showing a toddler a picture book and pointing out, “This is a fire truck, this is a banana, this is a dog.” AI developers have to do this exact same thing, but on a massive, global scale. To teach a self-driving car algorithm to recognize a stop sign, humans must look at thousands and thousands of photos of streets and manually draw digital boxes around the stop signs. They have to label edge cases, too — like what a stop sign looks like when it is covered in snow, or blocked by a tree branch.
You’ve Helped Train AI! 🤖
Section titled “You’ve Helped Train AI! 🤖”You have probably even helped label data for AI yourself! Have you ever tried to log into a website and had to prove you were not a robot by clicking all the pictures containing a crosswalk, a bicycle, or a traffic light? That is called a CAPTCHA CAPTCHA: A test used by websites to tell humans and computers apart. CAPTCHAs often ask you to identify objects in images. While they verify you're human, the data you provide is also used to train AI systems like self-driving cars. . While it does prove you are a human, it is also a brilliant way for tech companies to use your brainpower to label images for free. Every time you click the crosswalks, you are actively teaching an AI system how to identify them in the real world.
Ghost Workers 👻
Section titled “Ghost Workers 👻”Beyond labeling images, humans also work as content moderators content moderators: Workers (sometimes called 'ghost workers') whose job is to review AI outputs and rate them as helpful, harmful, accurate, or biased. Their feedback directly shapes how the AI learns to respond in the future. . Sometimes called “ghost workers,” there are thousands of people around the world whose full-time job is to chat with AI systems and grade their answers. They have to review thousands of AI outputs to tell the system, “This is a helpful, safe answer,” or “No, this answer is toxic, rude, and harmful.” The AI is entirely dependent on the personal judgment, beliefs, and cultural backgrounds of the specific humans doing the labeling. If the human moderators make a mistake, or if they only view the world from one specific cultural perspective, the AI will learn those mistakes and limits as if they were absolute facts.
2.3 — The Problem with the Mirror: Algorithmic Bias ⚠️
Section titled “2.3 — The Problem with the Mirror: Algorithmic Bias ⚠️”Because AI learns entirely from human-created data and human-labeled categories, it inevitably learns human prejudices. This is known as algorithmic bias algorithmic bias: When an AI system produces unfair, inaccurate, or discriminatory outcomes for certain groups of people. Bias occurs because of flaws in the training data, the way the AI was designed, or the perspectives of the humans who built it. . Bias occurs when an AI system produces unfair, inaccurate, or unequal outcomes for certain groups of people based on flaws in its training data or how it was designed.
Facial Recognition Failures
Section titled “Facial Recognition Failures”Let’s look at a real-world example that middle and high schoolers experience all the time: facial recognition and camera filters. Have you ever used a filter on Snapchat or Instagram that puts a funny hat on your head, gives you dog ears, or changes the shape of your face? To do this, the AI has to accurately track exactly where your eyes, nose, and mouth are in the video.
However, a few years ago, computer scientists discovered a massive problem: many popular facial recognition AIs were terrible at recognizing the faces of people with darker skin tones. Some systems wouldn’t even register that a person was in the photo at all. Why did this happen? It wasn’t because the computer was intentionally trying to be racist or mean. It happened because of the AI’s data diet. The human developers who built the system trained it using millions of photos, but the vast majority of those photos were of people with lighter skin. The AI became an expert at recognizing lighter faces and failed miserably at recognizing darker ones, simply because it was not given enough examples to learn from.
This might seem like a small issue for a funny photo filter, but it becomes a massive problem when police departments use flawed facial recognition to identify suspects, or when a student with dark skin gets locked out of their own school tablet because the camera doesn’t recognize them.
Bias Is Everywhere
Section titled “Bias Is Everywhere”Bias can show up anywhere, not just in pictures:
| Where Bias Shows Up | What Goes Wrong |
|---|---|
| Voice assistants (Siri, Alexa) | Struggle to understand certain accents, fast speakers, or kids with speech impediments because training data heavily features standard adult American or British accents |
| Language translation | If trained on 1950s data where most doctors were men and nurses were women, it may automatically translate “doctor” as “he” and “nurse” as “she” |
| Essay grading AI | If trained only on essays from one region or school type, it might unfairly penalize students who use different cultural slang or regional phrases |
2.4 — Fixing the Mirror 🔧
Section titled “2.4 — Fixing the Mirror 🔧”Bias inherently exists in AI because bias inherently exists in human society. If the internet is full of stereotypes, rumors, and mean comments, an AI trained on the internet will act out those stereotypes. It is just reflecting the mirror back at us.
However, because humans create these systems, humans also have the power to fix them. AI developers have a serious ethical responsibility to act as “fairness detectives.” Before releasing an AI tool into the real world, they must test it extensively to see exactly who it benefits and who it might accidentally harm. They do this through a process called red teaming red teaming: A testing strategy where developers intentionally try to break an AI system or trick it into producing biased, harmful, or incorrect outputs. The goal is to find weaknesses and fix them before the AI is released to the public. , where developers intentionally try to break the AI or trick it into saying something biased, just to see where its weak spots are so they can be patched. They can also fix algorithmic bias by intentionally feeding the AI more diverse training data — making sure photos, voices, languages, and perspectives from all different types of people, ages, and cultures around the world are included in the data diet.
As a user of AI, you also have an incredibly important role to play. Whenever you use a chatbot to help brainstorm for a project, or when an algorithm recommends a news story or a video to you, you must remember that the AI is not a perfect, objective genius. It is simply a reflection of the data humans chose to give it.
2.5 — You Are What You Click: The Engagement Loop 📲
Section titled “2.5 — You Are What You Click: The Engagement Loop 📲”Every time you open TikTok, Instagram, or YouTube, an AI is watching you very carefully. Not in a creepy-spy way — but in a mathematical, data-collection way. Every single thing you do sends a signal:
- ❤️ You liked a video → strong positive signal
- ⏩ You scrolled past a video in less than 2 seconds → strong negative signal
- 🔁 You rewatched a clip three times → extremely strong positive signal
- 💬 You left an angry comment → positive signal (engagement is engagement!)
- 📤 You shared a video with your group chat → the strongest signal of all
These signals feed into what is called the engagement loop engagement loop: A cycle in which a user's digital actions (likes, shares, pauses, rewatches) train an algorithm to show them more targeted content, which triggers stronger reactions, which generates more data, which makes targeting even more precise. . The loop works like this:
You take an action → The algorithm collects data → The pattern model updates → You see more targeted content → You react more strongly → More data is collected → Repeat
This is exactly how TikTok’s “For You” page works. When you first create an account, TikTok shows you a mix of random videos. As soon as you start interacting, the algorithm begins building a model of you — your interests, your humor, your emotions, the time of day you watch certain types of content. Within 30 minutes of use, TikTok’s algorithm often knows your preferences better than people who have known you for years. Within a few days, it can predict what video will keep you watching with unsettling accuracy.
The Dark Side: Engagement Bait 🪤
Section titled “The Dark Side: Engagement Bait 🪤”Here is the uncomfortable truth: the engagement loop does not optimize for what is good for you. It optimizes for what keeps you on the app the longest. This is why you sometimes fall into rabbit holes of content that makes you feel anxious, angry, or upset — but you can’t stop watching.
Content creators and marketers have figured this out and deliberately create engagement bait — content specifically designed to trigger strong emotional reactions like outrage, shock, fear, or morbid curiosity. A video titled “You won’t BELIEVE what this school did to this student 😡” is not designed to inform you. It is designed to get your heart rate up so you click, share, and comment — all of which train the algorithm to push it to millions more people.
2.6 — The Cost of Convenience: The Privacy Trade-Off 🔓
Section titled “2.6 — The Cost of Convenience: The Privacy Trade-Off 🔓”Here is a question nobody asks when they download a free app: “If this app is free, how does the company make money?”
The answer, almost always, is your data. The implicit bargain of the internet age is this: you get a free service (maps, email, social media, music), and in exchange, the company gets to collect detailed information about your behavior, preferences, location, and habits. They then sell access to this data (or directly sell ads targeted at you) to generate billions of dollars in revenue. You are not the customer — you are the product.
The Cambridge Analytica Scandal
Section titled “The Cambridge Analytica Scandal”In 2018, the world discovered just how far this data economy could go. A political consulting firm called Cambridge Analytica obtained detailed personal data on approximately 87 million Facebook users — without those users’ knowledge or meaningful consent. They got this data through a seemingly innocent third-party quiz app that asked users to share not just their own profile, but also the profiles of all their friends.
Cambridge Analytica then used this data to build detailed psychological profiles of voters and send them highly personalized political advertisements — different ads targeted at different people based on their fears, values, and political leanings. The goal was to influence the outcome of elections, including the 2016 US presidential election and the Brexit vote in the UK.
Why does this matter? In a democracy, we believe that citizens should form their own political opinions based on honest debate and shared facts. When an AI can secretly profile 87 million people and push targeted propaganda to each one individually, it becomes nearly impossible to have that honest debate. Different voters were seeing completely different — and sometimes completely false — versions of political reality, engineered specifically to manipulate their vote.
What You Share vs. What You Think You Share
Section titled “What You Share vs. What You Think You Share”| You Think You’re Sharing | What the App Actually Collects |
|---|---|
| Just your username and profile photo | Your precise GPS location, down to the building you’re in |
| Your posts and photos | Every other app on your phone and how often you use each one |
| Your messages to friends | Metadata: who you message, when, and for how long |
| Nothing (the app is just in the background) | Microphone access patterns, contact lists, browsing history |
| Your age and birthday | Inferred income, health status, political views, and relationship status based on behavior |
2.7 — Around the World: Different Rules for Data 🌍
Section titled “2.7 — Around the World: Different Rules for Data 🌍”Not everyone agrees on how much power companies should have over your data. Different countries have taken very different approaches to protecting (or not protecting) their citizens.
Europe: The GDPR Approach 🇪🇺
Section titled “Europe: The GDPR Approach 🇪🇺”In 2018, the European Union passed the GDPR GDPR: The General Data Protection Regulation — a landmark European law that gives citizens strong rights over their personal data. Companies must get explicit consent before collecting data, must explain how data is used, and must delete data if a user requests it. (General Data Protection Regulation), one of the strongest data privacy data privacy: The right of individuals to control how their personal information is collected, used, shared, and stored by companies and governments. laws in history. Key features include:
- Consent first: Companies cannot collect your data without clearly asking for your permission first. No more buried checkboxes.
- Right to explanation: If an AI makes a major decision about you (like denying you a job application), you have the legal right to ask how and why.
- Right to be forgotten: You can ask any company that holds your data to permanently delete it — as if you never existed in their system. Try asking Google to delete your search history entirely, or ask an app to erase your account completely, not just deactivate it.
- Massive fines: Companies that violate GDPR can be fined up to 4% of their global annual revenue — which for a company like Google, could be billions of dollars.
The United States: A Different Approach 🇺🇸
Section titled “The United States: A Different Approach 🇺🇸”The US takes a very different approach. Instead of one comprehensive federal law, the US has a patchwork of industry-specific rules. The general philosophy is opt-out rather than opt-in: companies can collect your data by default, and it is your responsibility to find the settings buried deep in menus and turn off data collection if you don’t want it. Most people never do.
Some states (like California with its CCPA law) have stronger protections. But at the national level, US tech companies largely operate under a system of voluntary guidelines and self-regulation.
Why It Matters to You
Section titled “Why It Matters to You”The rules your country has about data directly affect your daily life. In Europe, when you visit a website, you see a clear “Accept or Reject All” cookie banner because the law requires genuine choice. In the US, you might just see a banner that says “By using this site, you agree to our terms” — no real choice at all.
As an AI-literate citizen, understanding these differences helps you advocate for the protections you deserve — no matter where you live.
Chapter Activity: Fairness Detectives 🔍
Section titled “Chapter Activity: Fairness Detectives 🔍”Let’s put your detective skills to the test! Imagine your school district is building a brand-new AI system to automatically select which student gets the “Student of the Month” award. The developers plan to train the AI using data from the past 10 years of school records to figure out what a “good student” looks like.
Part 1: Investigate the Data Diet
Section titled “Part 1: Investigate the Data Diet”What specific kind of data do you think the AI will look at? Think about:
- 📊 Grades and test scores
- 📋 Attendance records
- 🏈 Sports statistics
- 🎭 Club memberships
- ⚠️ Detention records
- 📝 Teacher notes and recommendations
Step 2: Spot the Bias
How could using 10 years of historical data be unfair to current students?
- What if a new club (like an Esports team or a Robotics club) was only created this year? Would the AI know to value it as much as the 50-year-old football team?
- What if certain teachers in the past were stricter graders than others?
- What if the definition of a “good student” has changed over the last decade?
Step 3: Fix the System
If you were the developer in charge of this project, what new rules or data would you add to make sure the AI is fair to every type of student, not just the ones who get straight A’s or play traditional sports?
Part 2: My Data Diet Audit 📱
Section titled “Part 2: My Data Diet Audit 📱”Time to investigate your own digital life! For this activity, list 5 apps you use regularly and think carefully about what data each one might be collecting from you. Then honestly rate how comfortable you are with that collection.
For each app, fill in the table below:
| App Name | Likely Data Collected | Comfort Rating (1-5) |
|---|---|---|
| Example: TikTok | Location, watch time, device info, contacts, clipboard | 2 — I didn’t realize it tracked my location |
| Your App 1 | ||
| Your App 2 | ||
| Your App 3 | ||
| Your App 4 | ||
| Your App 5 |
Rating Scale: 1 = Very uncomfortable, 3 = Neutral, 5 = Completely fine with it
Discussion Questions:
- Were there any apps where you were surprised by what they collect?
- Did any app collect data you weren’t aware of before this audit?
- Are there any apps you might delete or limit based on what you found?
- What is the difference between data collection that helps you (like a maps app knowing your location) vs. data collection that profits from you?
Key Concepts Checklist
Section titled “Key Concepts Checklist”- I understand what “training data” is and where it comes from
- I can explain the phrase “garbage in, garbage out”
- I know what web scraping is and how my online activity becomes AI data
- I understand the role of human labelers and content moderators in AI training
- I can define algorithmic bias and give real-world examples
- I know what red teaming is and how developers can fix bias
- I understand my own role in questioning AI outputs and protecting my data
- I can explain what the engagement loop is and how it shapes what I see online
- I understand what “engagement bait” is and how to recognize it
- I can describe the data privacy trade-off behind free apps
- I know what the Cambridge Analytica scandal was and why it matters for democracy
- I can explain what GDPR is and how it differs from the US approach to data privacy
- I understand what the “right to be forgotten” means
- I can explain why the COMPAS algorithm case is an example of high-stakes algorithmic bias