Privacy and Ethics in Technology — Data Collection, AI Bias, and Regulation

Privacy and Ethics in Tech: Who Owns Your Data?

The Technology in This Course Is Not Neutral

Facebook knows your political leaning, relationship status, financial situation, and health conditions -- even if you have never posted about any of them. It infers them from your behavior: what you click, how long you pause on a post, who you message, what you search, what you buy through tracked links. Clearview AI scraped 30 billion photos from the public internet -- Instagram, Facebook, LinkedIn, news sites -- to build a facial recognition database that law enforcement agencies across the world use without most people's knowledge or consent. China's social credit system monitors citizens' purchases, social media activity, traffic violations, and social connections to generate a score that determines access to travel, loans, schools, and jobs. Your phone tracks your location 24 hours a day and shares it with an average of 40 apps.

None of this is accidental. Every piece of technology in this course was built by people with specific incentives, and understanding those incentives is as important as understanding the code. The algorithms that recommend your next video were optimized for engagement, not your wellbeing. The data collection that powers "free" services was designed to maximize advertising revenue, not to protect your privacy. The AI systems making decisions about loan approvals, hiring, and criminal sentencing were trained on historical data that reflects decades of human bias. This article is about the gap between what technology can do and what it should do -- and why that gap is your problem whether you build technology or simply live in a world shaped by it.

$250B+
Annual revenue of the data broker industry -- selling your personal information in bulk
30B
Photos scraped by Clearview AI to build a facial recognition database without anyone's consent
$1.3B
Fine imposed on Meta for illegal data transfers under GDPR -- the largest privacy fine in history
40
Average number of apps on your phone with access to your location data

The Attention Economy: You Are the Product

If you are not paying for the product, you are the product. That line has become a cliche, but its mechanics are worth examining precisely. Social media platforms are advertising businesses. Facebook (Meta), Instagram, TikTok, YouTube, and Twitter/X do not sell social networking. They sell your attention to advertisers. Their revenue is directly proportional to how much time you spend on the platform. Every design decision flows from that single incentive.

Infinite scroll removes natural stopping points. Without it, you would hit the bottom of a page and decide whether to continue. With infinite scroll, the content never ends, and the decision to stop becomes an active effort rather than a passive default. Notification triggers are engineered to pull you back: "Someone liked your post!" is not informing you of something urgent. It is exploiting a dopamine response to re-engage you. Algorithmic amplification prioritizes content that generates reactions -- and research consistently shows that outrage, fear, and controversy generate more engagement than calm, nuanced content. The algorithm does not have a political agenda. It has an engagement agenda, and emotional extremes serve that agenda.

In 2021, former Facebook employee Frances Haugen leaked internal documents (the "Facebook Files") showing that the company's own research found Instagram made body image issues worse for one in three teen girls. The research also showed that the platform's algorithm steered users toward increasingly extreme content because engagement metrics rewarded it. The company knew. The company continued. The incentive structure made it rational to continue -- user wellbeing and advertising revenue pointed in different directions, and revenue won.

Key Insight

The attention economy is not a conspiracy. It is an incentive structure. Nobody at Facebook decided to harm teenagers. But the system was designed to maximize engagement, engagement correlates with emotional intensity, and emotional intensity correlates with anxiety and comparison. The harm is a side effect of optimization. Understanding this matters because it means the solution is not "better people at Facebook" -- it is changing what the system is optimized for. As long as the business model is attention-based advertising, the incentive to maximize screen time will override every internal research report about user harm.

Data Collection: The Infrastructure of Surveillance

The scale of data collection in modern technology is difficult to comprehend until you see it mapped out. Every interaction with a digital device generates data, and that data is collected, stored, analyzed, and often sold.

Your Data Footprint: What's Collected and By Whom YOU Digital footprint Phone Location GPS coordinates 24/7 Wi-Fi networks nearby Cell tower triangulation Accelerometer / gyroscope Browser History Every URL visited Search queries Time on each page Browser fingerprint Social Media Likes, shares, comments Who you follow / message Dwell time per post Inferred political views Purchases Financial Every transaction Spending patterns Income estimates Credit behavior IoT / Smart Home Voice commands (Alexa, Siri) Thermostat / energy use Data Brokers: $250B+ industry
Every digital interaction generates data that flows to companies and data brokers. Your phone reports your physical location. Your browser reveals your interests and behaviors. Social media maps your relationships and beliefs. Purchase data reveals your financial profile. IoT devices record your home behavior. Data brokers aggregate all of it into profiles containing thousands of data points per person -- sold to advertisers, insurers, employers, and political campaigns for fractions of a cent per profile.

The data broker industry is the infrastructure most people never see. Companies like Acxiom, Oracle Data Cloud, and LexisNexis aggregate data from public records, purchase histories, app trackers, loyalty programs, and dozens of other sources to build profiles containing thousands of data points per person. Your profile might include: estimated income, health conditions (inferred from purchases), political affiliation (inferred from browsing), religious affiliation, whether you own or rent, your credit score range, whether you are pregnant (Target famously figured this out before a woman's family did, based on purchase patterns). These profiles are sold to advertisers, insurance companies, potential employers, political campaigns, and anyone willing to pay -- often for fractions of a cent per person.

Government surveillance adds another layer. The Snowden revelations in 2013 exposed that the NSA was collecting metadata (who called whom, when, for how long) on virtually every phone call in the United States through the PRISM program. Metadata does not reveal what you said, but it reveals who you talk to, how often, at what times, and from where -- which is often more revealing than content. A call to an oncologist at 2 AM tells a story without any words.

AI Bias: When Algorithms Inherit Human Prejudice

Machine learning systems learn from historical data. If that data reflects human bias -- and it almost always does -- the system will reproduce and amplify that bias at scale. This is not a theoretical concern. It is happening now, in systems making consequential decisions about real people's lives.

Amazon's hiring AI (2018): Amazon built an AI resume screening tool trained on ten years of hiring data. Because the company had historically hired predominantly men (especially in technical roles), the system learned that resumes mentioning "women's" -- as in "women's chess club captain" or "women's college" -- were negative signals. It systematically penalized female candidates. Amazon scrapped the tool after discovering the bias.

Facial recognition accuracy gap: A landmark MIT study by Joy Buolamwini found that commercial facial recognition systems from IBM, Microsoft, and Face++ had error rates of 34.7% for dark-skinned women compared to 0.8% for light-skinned men. The cause: training datasets overwhelmingly featured lighter-skinned faces. When the system encountered faces that looked different from its training data, it failed. This is not an abstract accuracy problem. When facial recognition is used by law enforcement, a 34% error rate for dark-skinned women means innocent people are identified as suspects.

Predictive policing: Systems like PredPol analyze historical crime data to predict where crimes are likely to occur. But historical crime data reflects where police have been deployed, not where crime actually occurs. Neighborhoods that were over-policed in the past generate more arrest data, which makes the algorithm predict more crime in those neighborhoods, which leads to more police deployment, which generates more arrests -- a self-reinforcing feedback loop that mathematically encodes discriminatory policing patterns.

AI Bias Feedback Loop: How Systems Amplify Inequality Biased Historical Data Reflects past discrimination trains ML Model Training Learns biased patterns produces Biased Decisions Hiring, policing, lending Decisions create new biased data Cycle reinforces itself Predictive Policing Over-policing creates more arrest data Hiring Algorithms Past male hires penalize female applicants Loan Approvals Historical denial rates bias new decisions
AI bias is not a one-time error -- it is a feedback loop. Biased historical data trains a biased model. The model makes biased decisions. Those decisions generate new data that reinforces the original bias. Without active intervention (auditing, balanced training data, outcome monitoring), the system becomes more biased over time, not less. The algorithm does not "decide" to discriminate -- it optimizes for patterns in the data, and the data contains the history of human discrimination.
Key Insight

The fundamental problem with AI bias is not that algorithms are racist or sexist. It is that they are optimizers, and they optimize for patterns in the data they are given. If the data contains the consequences of decades of discriminatory practices -- in hiring, lending, policing, healthcare -- the algorithm will learn those patterns and reproduce them. "Bias in, bias out" is the concise version. The expanded version: bias in, amplified bias out, at a scale no human hiring manager or loan officer could match. One biased human makes biased decisions about hundreds of people. One biased algorithm makes biased decisions about millions.

Privacy Regulation: The Global Patchwork

Governments have begun responding to the data collection crisis, but their approaches vary dramatically. Three regulatory frameworks illustrate the spectrum.

GDPR (EU, 2018)

The strongest privacy regulation in the world. Gives EU residents the right to access (see all data a company holds about you), right to deletion (demand your data be erased), right to data portability (export your data in a usable format), and requires explicit consent before data collection. Companies must have a legal basis for processing personal data. Violations carry fines up to 4% of global annual revenue. Meta was fined $1.3 billion in 2023 for transferring EU user data to the US without adequate protections. GDPR applies to any company that processes EU residents' data, regardless of where the company is located.

CCPA (California, 2020)

Gives California residents the right to know what data is collected, the right to delete, and the right to opt out of data sales. Weaker than GDPR: it allows data collection by default (opt-out vs. GDPR's opt-in) and applies only to businesses meeting certain size thresholds. Fines are modest by comparison ($7,500 per intentional violation). Its real significance is as the strongest US privacy law in a country with no comprehensive federal privacy legislation. Many companies apply CCPA standards nationwide because managing different data practices per state is impractical.

No Comprehensive Regulation (US Federal)

The United States has no federal privacy law comparable to GDPR. Privacy protections are sector-specific: HIPAA covers health data, FERPA covers education records, COPPA covers children's data online. But there is no general law governing what a tech company can collect, how long it can retain it, or who it can sell it to. The result is a patchwork where the same company can be subject to GDPR in Europe, CCPA in California, LGPD in Brazil, and nothing at all in most US states. Industry lobbyists have consistently blocked federal privacy legislation, arguing it would stifle innovation. Consumer advocates counter that the absence of regulation has enabled a surveillance economy that profits from the systematic erosion of privacy.

Heavy State Control (China)

China's Personal Information Protection Law (PIPL, 2021) grants individuals privacy rights similar to GDPR -- against private companies. But the Chinese government retains extensive surveillance capabilities, including mandatory real-name registration for internet use, the "Great Firewall" censorship system, and social credit monitoring. The law protects citizens from corporate data abuse while preserving the state's ability to monitor comprehensively. This model -- privacy from companies, transparency to the state -- represents a fundamentally different philosophy than Western approaches that attempt to protect privacy from both corporate and government intrusion.

Algorithmic Accountability: Who Is Responsible When AI Causes Harm?

When a human loan officer denies your application, you can ask why. You can appeal. There is a person responsible for the decision. When an algorithm denies your application, the situation becomes murky. The algorithm's decision may be based on hundreds of variables weighted in ways that no human fully understands (the "black box" problem). Who is responsible? The company that deployed the algorithm? The engineers who built it? The data scientists who selected the training data? The original data collectors?

This accountability gap plays out across consequential domains:

Autonomous vehicles: In 2018, an Uber self-driving car killed a pedestrian in Tempe, Arizona. The system detected the person but classified them as an unknown object, then a vehicle, then a bicycle -- never triggering emergency braking. Who bears responsibility? The safety driver who was watching a video on her phone? Uber's engineering team? The algorithm that misclassified? The answer, in this case, was the safety driver (charged with negligent homicide) -- but the systemic question remains unresolved.

Content moderation: Algorithmic content recommendation has been linked to radicalization pathways. A 2019 study found that YouTube's recommendation algorithm could lead a user from mainstream political content to extremist content within a sequence of recommended videos. When a teenager is radicalized through algorithmically recommended content, is the platform responsible? Current Section 230 protections in the US largely shield platforms from liability for user-generated content -- but the algorithm that actively recommends content is not user-generated. It is the platform's product.

Healthcare decisions: An algorithm used by US health systems to allocate healthcare resources was found to systematically deprioritize Black patients. The system used healthcare spending as a proxy for health needs. But due to systemic inequities, Black patients had historically spent less on healthcare (not because they were healthier, but because they had less access). The algorithm interpreted lower spending as lower need, directing resources away from the people who needed them most. The algorithm was not designed to discriminate. It optimized faithfully for the metric it was given -- and the metric was wrong.

What You Can Do: Practical Privacy Protection

You cannot eliminate your digital footprint, but you can reduce it significantly. These are not theoretical recommendations -- they are specific actions ranked by impact.

1

Audit app permissions. Go to your phone's settings and review which apps have access to your location, camera, microphone, contacts, and photos. Revoke permissions that are not essential to the app's function. A flashlight app does not need your location. A game does not need your contacts. Most apps request maximum permissions because data is valuable, not because they need it to function.

2

Use a privacy-focused browser and search engine. Firefox with uBlock Origin blocks trackers by default. Brave blocks ads and trackers natively. DuckDuckGo does not track search queries or build a profile. Switching from Chrome + Google to Firefox + DuckDuckGo eliminates one of the largest data collection pipelines in your daily life.

3

Install an ad blocker. uBlock Origin is free, open-source, and blocks not just ads but tracking scripts, fingerprinting attempts, and malicious domains. This is not about avoiding annoyance -- it is about cutting the data collection pipeline. Every ad that loads on a page also loads tracking scripts from multiple advertising networks, each building a profile.

4

Use encrypted messaging. Signal is end-to-end encrypted, open-source, collects virtually no metadata, and is free. Use it for any conversation you would not want a data breach to expose. WhatsApp uses the same encryption protocol but collects metadata (who you message, when, how often, your contacts) and shares it with Meta.

5

Read privacy policies (or use tools that read them for you). Terms of Service; Didn't Read (tosdr.org) rates privacy policies in plain language. It is not realistic to read every privacy policy in full (they average 4,000+ words), but knowing the worst offenders helps you make informed choices about which services you use.

6

Opt out of data broker lists. Services like DeleteMe and Privacy Duck submit opt-out requests to major data brokers on your behalf. You can also do this manually (each broker has a removal process, though they make it deliberately cumbersome). This does not eliminate your data from the internet, but it reduces the number of places where it is aggregated and sold.

Answers to Questions People Actually Ask

Is privacy dead? Does it even matter if I have nothing to hide?

The "nothing to hide" argument fails on multiple levels. Privacy is not about hiding wrongdoing -- it is about power. When a company or government knows everything about you, they can predict your behavior, manipulate your choices, and act on information you did not consent to share. Insurance companies using health data to raise premiums. Employers using social media profiles to screen candidates. Advertisers exploiting psychological vulnerabilities to drive spending. You might have nothing to hide today, but you do not know what future governments, employers, or algorithms will consider disqualifying. Privacy is not the opposite of transparency. It is the right to control what you reveal, to whom, and when. Every civil rights movement in history depended on the ability of organizers to communicate privately. Surveillance chills dissent whether or not there is anything to hide.

Can AI be made fair?

AI can be made fairer, but "fair" is not a single thing -- there are multiple mathematical definitions of fairness that can contradict each other. A system can be calibrated (equal accuracy across groups) or outcome-balanced (equal approval rates across groups), but achieving both simultaneously is often mathematically impossible. The practical approach is: audit training data for representational bias, test model outputs across demographic groups, require human review for high-stakes decisions, and continuously monitor deployed systems for disparate impact. The EU's AI Act (2024) requires exactly this for "high-risk" AI systems (hiring, lending, law enforcement). Perfect fairness is unattainable because it requires a precise definition that society has not agreed on. But significant, measurable improvement over the status quo is achievable -- and required.

Does deleting my social media actually help?

Deleting accounts reduces ongoing data collection but does not erase historical data. Under GDPR, you can request full data deletion from companies operating in or serving the EU. In the US, your options are more limited. What deletion does accomplish: it stops the continuous feed of behavioral data (likes, scrolls, clicks, dwell time) that feeds the advertising profile. Shadow profiles -- data collected about you from other people's contacts, tagged photos, and cross-site tracking -- may persist even after deletion. The most impactful single action is not deleting accounts but changing your relationship with them: use privacy-focused browsers, avoid clicking personalized ads, revoke unnecessary permissions, and treat social media as a broadcast medium rather than a diary.

What about privacy-focused alternatives to mainstream tech?

Practical alternatives exist for most common services. Search: DuckDuckGo (no tracking), Brave Search. Email: ProtonMail (end-to-end encrypted, Swiss jurisdiction). Messaging: Signal (gold standard for privacy). Browser: Firefox + uBlock Origin, or Brave. Cloud storage: Proton Drive, Tresorit (client-side encryption -- the provider cannot read your files). VPN: Mullvad (anonymous accounts, accepts cash payment) or ProtonVPN. Maps: OpenStreetMap-based apps (OsmAnd). The tradeoff is usually convenience: Google Maps is better than OsmAnd. Gmail integrates more smoothly than ProtonMail. The question is whether the convenience difference justifies the data collection difference. For most people, switching browser and search engine is the highest-impact, lowest-friction change.

Where Privacy and Ethics Take You Next

Privacy and ethics in technology are not peripheral concerns that you consider after the engineering is done. They are design constraints that shape what you build and how you build it. The most consequential decisions in technology are not about which algorithm to use or which cloud provider to choose -- they are about what data to collect, who has access, what happens when systems make mistakes, and who bears the cost of those mistakes.

The regulatory landscape is tightening. The EU's AI Act, passed in 2024, will require transparency and human oversight for high-risk AI systems. GDPR enforcement is intensifying, with fines reaching billions. Companies that treat privacy as an afterthought are accumulating legal and reputational risk that will compound over time.

But regulation is reactive. It addresses harms after they occur. The deeper shift is in professional norms -- the growing expectation that engineers, product managers, and designers consider the ethical implications of their work before deployment, not after a crisis. The most valuable skill you can develop is the habit of asking: "Who is affected by this system? What happens when it fails? Whose interests does it serve? And who is not in the room when these decisions are made?"

The takeaway: Technology is built by people with incentives, and understanding those incentives is as important as understanding the technology itself. The attention economy optimizes for engagement, not wellbeing. Data collection powers a $250 billion surveillance industry. AI systems reproduce the biases embedded in their training data. Privacy regulation is catching up but remains fragmented. The tools to protect yourself exist -- privacy-focused browsers, encrypted messaging, ad blockers, permission audits -- and using them is a practical first step. The larger step is recognizing that privacy and ethics are not someone else's problem. Every person who builds, uses, or is affected by technology has a stake in how these questions are answered.