0 / 5 complete
Module 03 of 05

Files & Folders

Knowing where your files live, how data is encoded, what different file types mean, and how to prepare clean data for AI tools โ€” these are the skills that separate productive AI use from frustrating trial and error.

Files, Folders, and File Systems

Every piece of data on a computer is stored as a file โ€” a named container holding data (text, images, code, video). Files are organized into folders (also called directories), which can contain other folders, creating a tree-like hierarchy. This entire structure is managed by the file system (NTFS on Windows, APFS on Mac).

Understanding this hierarchy means you always know where to find your work and can communicate clearly about file locations โ€” critical when sharing documents with colleagues or troubleshooting with IT.

Analogy

Think of a filing cabinet. The cabinet itself is your hard drive. Drawers are top-level folders. Within each drawer are hanging folders (sub-folders). Inside those are individual documents (files). A file path is like the address: "Cabinet A โ†’ Drawer: Finance โ†’ Folder: Q3 2024 โ†’ Document: Budget.xlsx"

๐Ÿค– Why This Matters for AI

Almost every AI interaction involves a file. Uploading a PDF for Claude to analyze, pointing a model at a CSV of sales data, saving AI-generated code to the right folder โ€” all of this requires knowing your file paths, extensions, and data types. But knowing where your files are is only half the story. The other half is making sure the data inside those files is clean, correctly typed, and in a format the AI can actually parse. This module covers both: the file system fundamentals and the data preparation skills that make AI tools work reliably.

Reading a File Path

A file path is the complete address of a file on your computer. Each segment has a specific meaning.

WINDOWS PATH:

C:\Users\JohnSmith\Documents\Acme_Corp\Q3_Report.xlsx

C: = the drive letter (your main hard drive) ยท Users\JohnSmith = your user profile folder ยท Documents\Acme_Corp = subfolder path ยท Q3_Report.xlsx = the file (an Excel spreadsheet)

MAC / LINUX PATH:

/Users/jsmith/Desktop/Presentation.pptx

/ = the root (top of the file system) ยท Users/jsmith = user profile ยท Desktop = folder on the desktop ยท Presentation.pptx = PowerPoint file

โš  Did you catch the difference?

The slashes go in opposite directions depending on the operating system. This is one of the smallest details in computing and one of the most consequential. Mix them up and a file path silently breaks.

\

Backslash
Windows

vs.

/

Forward slash
Mac / Linux / URLs

Think of the slash as part of the address system โ€” just like different countries write addresses in different orders (street first in the US, country first in Japan), different operating systems use different separators. The "map" (the OS) determines which slash means "go into this folder." Use the wrong one and the OS doesn't recognize the address. Web URLs always use forward slashes, regardless of your OS โ€” which is why https://example.com/page works everywhere.

Quick check โ€” test yourself:

A colleague sends you this file path and says they're on a Mac:
C:\Users\Ana\Documents\report.pdf

What's wrong with this path?

What does the file extension tell you?

The letters after the final dot in a filename tell the OS what type of data the file contains and which app should open it.

๐Ÿ“„
.docx
Word Document
๐Ÿ“Š
.xlsx
Spreadsheet
๐Ÿ“‘
.pptx
Presentation
๐Ÿ“•
.pdf
PDF Document
๐Ÿ–ผ๏ธ
.jpg .png
Image
๐Ÿ“‹
.csv
Data Table
๐Ÿ“
.txt
Plain Text
๐Ÿ—œ๏ธ
.zip
Compressed
๐ŸŒ
.html
Web Page
๐ŸŽฌ
.mp4
Video
๐ŸŽต
.mp3
Audio
โš™๏ธ
.exe .msi
Run with caution

Important: Windows hides file extensions by default. To show them: open File Explorer โ†’ View โ†’ Show โ†’ File name extensions. This helps verify file types โ€” crucial for security (a virus might be named invoice.pdf.exe).

How computers encode data

Every file โ€” whether it's a spreadsheet, a photo, or an email โ€” is stored as a sequence of binary digits (bits): ones and zeros. A single bit is a 1 or 0. Eight bits make a byte. Your 256 GB hard drive holds roughly 256 billion bytes. But raw bits are meaningless without knowing how to interpret them โ€” and that's where data types and encoding come in.

Analogy

Think of a sequence of dots and dashes. Without knowing it's Morse code, it's just a pattern. The dots and dashes are the bits; Morse code is the encoding. The word you decode is the data type. Computers work the same way โ€” the same bits can represent a number, a letter, or a pixel color depending on how the software is told to read them.

Data types โ€” what computers see

When you look at a spreadsheet, you see names, dates, prices, and yes/no fields. A computer sees each of these as a specific data type โ€” and the type determines what operations are possible. Getting the type wrong is one of the most common causes of errors when working with data tools and AI.

Data TypeWhat It StoresExamplesWatch Out For
Integer Whole numbers (no decimals) 42, -7, 0, 1000000 A zip code like 02134 may lose its leading zero if stored as an integer
Float / Decimal Numbers with decimal points 3.14, -0.5, 99.99 Floating-point math can produce tiny rounding errors (0.1 + 0.2 = 0.30000000000000004)
String (Text) Any sequence of characters "Hello", "97301", "N/A" The number "42" stored as text can't be added to another number without conversion
Boolean True or False (yes/no, on/off) TRUE, FALSE Some systems encode True as 1 and False as 0; others use "Yes"/"No" โ€” mixing causes errors
Date / DateTime Calendar dates and times 2024-03-15, 2024-03-15T14:30:00 Dates are notoriously messy โ€” "03/04/2024" means March 4 in the US but April 3 in Europe
Categorical One value from a fixed set "Red", "Blue", "Green" or "Small", "Medium", "Large" Typos create phantom categories โ€” "Smal" and "Small" look like two different values
๐Ÿค– Why This Matters for AI

When you upload a CSV to an AI tool or a data pipeline, the system has to infer what type each column is. If your "Revenue" column contains the text entry "$1,200" instead of the number 1200, the system may treat the entire column as text โ€” and every calculation breaks silently. AI tools are powerful but not magic: they work with what you give them. Understanding data types helps you catch these problems before the AI does something confidently wrong with bad input.

Text encoding โ€” why characters sometimes break

Text is stored using an encoding system that maps each character to a number. The modern standard is UTF-8, which handles virtually every character in every language (plus emoji). Older systems used ASCII (English only, 128 characters) or regional encodings like Latin-1 (Western European). When you open a file and see garbled characters like รƒยฉ instead of รฉ, it means the file was saved in one encoding but opened in another. When working with AI tools, stick to UTF-8 โ€” it's what almost everything expects.

Local, Network, and Cloud Storage

๐Ÿ’ป Local Storage (Your Computer)

Files saved directly to your hard drive or SSD. Fast access, works without internet. Located in C:\Users\YourName\ (Windows) or /Users/yourname/ (Mac). Risk: if your computer fails without backup, files can be lost.

๐Ÿข Network Drive (Company Server)

Files on a central company server. Appears as a drive letter like Z: or \\server\share. Multiple people can access and edit. Requires company network or VPN. IT manages backups.

โ˜๏ธ Cloud Storage

Files on remote servers accessed via internet: OneDrive, Google Drive, Dropbox, SharePoint. Files sync across all devices. Accessible anywhere. A folder on your computer syncs automatically to the cloud.

Structured vs. unstructured data

Not all data is the same shape. Understanding the difference between structured and unstructured data is essential for knowing what AI tools can do with your files โ€” and which file format to use.

Structured Data

Data organized into rows and columns with consistent types โ€” like a spreadsheet or database table. Each column has a name and a data type; each row is a record.

Formats: .csv, .xlsx, .tsv, .json (tabular), database tables
Examples: sales records, employee rosters, survey responses, financial reports
AI use: data analysis, trend detection, forecasting, dashboards

Unstructured Data

Data without a predefined row/column format โ€” text documents, emails, images, audio, video. Most of the world's data is unstructured.

Formats: .pdf, .docx, .txt, .jpg, .mp4, .html, email threads
Examples: customer emails, meeting transcripts, contracts, photos
AI use: summarization, extraction, classification, image analysis

Why the format matters for AI

Different AI tools expect different inputs. When you paste a table of data into Claude as plain text, it works โ€” but when you upload the same data as a well-formatted .csv file, the AI can parse the columns, understand the types, and give you better analysis. Knowing which format to use is the difference between a vague answer and a precise one.

.csv (Comma-Separated Values) โ€” The universal format for tabular data. Each line is a row; values are separated by commas. Nearly every tool can read it: Excel, Google Sheets, Python, R, AI platforms. When in doubt about how to share data with an AI tool, CSV is almost always the right answer.

Name,Department,Salary,Start_Date
Sarah Chen,Marketing,72000,2022-06-15
James Park,Engineering,95000,2021-01-10
Maria Lopez,Sales,68000,2023-03-22

.json (JavaScript Object Notation) โ€” Structured data in key-value pairs. Used heavily by APIs and web services. More flexible than CSV โ€” can represent nested data (a customer who has multiple orders, each with multiple items). AI APIs (like OpenAI's) send and receive JSON.

{"name": "Sarah Chen",
 "department": "Marketing",
 "salary": 72000,
 "start_date": "2022-06-15"}

.txt (Plain Text) โ€” No formatting, no structure โ€” just characters. Useful for unstructured data like meeting notes, email bodies, or raw text you want an AI to summarize or classify. The simplest format, readable by everything.

.pdf (Portable Document Format) โ€” Preserves visual layout, but data inside is often hard for AI to parse. A PDF of a table looks clean to humans but may be a jumbled mess to an AI that has to extract the text. When possible, give the AI the underlying data (CSV, XLSX) rather than the PDF report generated from it.

Garbage in, garbage out โ€” cleaning data for AI

AI tools are powerful, but they are not tolerant of messy input. If you feed a model a spreadsheet full of inconsistent dates, misspelled categories, and columns where numbers are stored as text, the output will reflect that mess โ€” confidently and without warning. Data sanitization is the practice of cleaning and standardizing your data before giving it to any tool.

Analogy

You wouldn't hand a colleague a report with pages in random order, some paragraphs in French, and half the numbers written in Roman numerals โ€” then ask for a summary. But that's what feeding messy data to an AI tool is. The model will try its best, but the result will be unreliable in ways you can't easily detect.

The most common data problems (and how to fix them)

ProblemExampleFix
Inconsistent text "Sales", "sales", "SALES", "Slaes" Standardize case; use spell-check; search for near-duplicates
Mixed date formats "3/4/2024", "2024-03-04", "March 4, 2024" Pick one format (ISO 8601: YYYY-MM-DD is best) and apply to the whole column
Numbers as text "$1,200" or "1 200" instead of 1200 Remove currency symbols, commas, and spaces; ensure column is formatted as a number
Missing values Empty cells, "N/A", "n/a", "-", "unknown" Decide on one representation (blank or "NA"); be consistent; document what missing means
Leading/trailing spaces " Marketing" vs "Marketing" Use TRIM() in Excel/Sheets; these invisible differences cause failed lookups
Merged cells A header spanning three columns in Excel Unmerge all cells before exporting. Merged cells break CSV export and AI parsing
Multiple data points in one cell "Smith, Jones, Lee" in a single "Assignees" cell Split into separate rows or separate columns โ€” one value per cell is the rule

A pre-flight checklist before uploading data to any AI tool

One clear header row โ€” column names in row 1, data starting in row 2, no blank rows above

Consistent data types per column โ€” don't mix numbers and text in the same column

One value per cell โ€” no merged cells, no comma-separated lists inside a single cell

Missing values handled consistently โ€” pick one representation and use it everywhere

Dates in ISO format โ€” YYYY-MM-DD (2024-03-15) sorts correctly and is universally readable

No formatting artifacts โ€” remove currency symbols, percentage signs, and commas from numbers

Saved as .csv (UTF-8) โ€” the safest, most portable format for AI tools

๐Ÿค– Why This Matters for AI

When you ask an AI to "find the average salary by department" and your data has "Marketing" in some rows and "marketing " (with a trailing space) in others, the AI will treat them as two separate departments and give you two separate averages โ€” both wrong. When your date column has three different formats, the model may misparse March 4 as April 3. AI tools don't complain about messy data. They just give you confidently wrong answers. Cleaning your data before uploading is the single most impactful thing you can do to get useful output from any AI tool.

Finding lost files & good practices

Windows: Search box in File Explorer (top right). Press Win+S to search the entire computer. Filter by date modified, file type, or size. Check Recent items in most apps (File โ†’ Recent).

Mac: Use Spotlight (Cmd+Space) and type the filename. In Finder, press Cmd+F to search within a specific folder.

Accidentally deleted? Check the Recycle Bin (Windows) or Trash (Mac) โ€” files go there first. Right-click โ†’ Restore.

Use dates in filenames for version tracking: 2024-03-15_Budget_Final.xlsx (YYYY-MM-DD sorts chronologically). Avoid spaces โ€” use underscores or hyphens. Avoid special characters (/ \ : * ? " < > |). Be specific: Smith_Contract_Signed.pdf not Document1.pdf.

Naming-based versioning (Report_v2_FINAL_JSmithEdits.docx) gets unwieldy fast. Better: use cloud tools like Google Docs or SharePoint that have built-in version history โ€” you can see every change and restore any previous version. Check: File โ†’ Version History.

The 3-2-1 backup rule: keep 3 copies of important data, on 2 different types of storage, with 1 copy offsite (or in cloud). For most workers: files on your computer (1), synced to cloud storage like OneDrive (2), and optionally on a USB drive (3). Verify with IT what's actually backed up.

File and folder situations at work

You saved a report yesterday but can't find it today.

Systematic approach: (1) Check the app's Recent Files (File โ†’ Recent). (2) Search by filename in File Explorer or Spotlight. (3) Search by date โ€” you know you saved it yesterday. (4) Check cloud folders (OneDrive, Google Drive). (5) Check Downloads folder โ€” sometimes files default there.

A client asks for a PDF, but you only have the .docx Word file.

Convert to PDF: In Word, File โ†’ Save As (or Export) โ†’ choose PDF. Or File โ†’ Print โ†’ "Microsoft Print to PDF." On Mac: File โ†’ Print โ†’ PDF โ†’ Save as PDF. PDFs preserve formatting and can't be accidentally edited by the recipient.

Your manager says "put the final version in the shared drive, in the Projects folder."

On Windows, look in File Explorer's left sidebar for a mapped drive letter (Z:, S:) or a network location. On Mac, check Finder's sidebar under "Network." If you can't find it, ask for the exact path or SharePoint link. Never assume you've found the right location โ€” confirm before replacing files.

You need to email someone 30 files. Your email rejects the attachment โ€” too large.

Options: (1) Compress the folder โ€” right-click โ†’ "Compress" (Mac) or "Send to โ†’ Compressed zip folder" (Windows). (2) Share via cloud link โ€” upload to OneDrive or Google Drive, share the link. Better for large files and keeps them editable. (3) Use WeTransfer for very large files.

๐Ÿงช Try It Yourself
  1. Navigate to your Downloads folder and note the full file path displayed in the address bar. Write it down.
  2. Find a file you saved recently using only the search function (Win+S on Windows, Cmd+Space on Mac).
  3. Turn on file extensions if they're hidden (Windows: File Explorer โ†’ View โ†’ Show โ†’ File name extensions). Look at a few files and identify their extensions.
  4. Inspect a spreadsheet for data type issues: Open any .xlsx or .csv file you have. Pick one column and check: are all the values the same type? Are there any numbers stored as text (look for left-aligned numbers or leading apostrophes)? Are dates in a consistent format?
  5. Export a clean CSV: Take a small spreadsheet (even one you create with 5 rows), make sure there are no merged cells, one header row, consistent types โ€” then File โ†’ Save As โ†’ CSV UTF-8. Open the resulting .csv in a text editor (Notepad or TextEdit) and look at the raw commas-and-text structure.
  6. Test garbage in, garbage out: If you have access to an AI tool, try uploading a messy dataset (inconsistent dates, mixed text/numbers) and ask it to compute an average. Then clean the same data and try again. Compare the results.

Module 3 Quiz

Answer all questions to complete this module.

1. In the path C:\Users\Ana\Documents\report.pdf, what is "report.pdf"?

2. Why should you turn on file extensions in Windows?

3. What's the best way to share a 500MB video file with a client?

4. A spreadsheet column called "Revenue" contains values like "$1,200" and "$950". You want to upload this to an AI tool for analysis. What's the problem?

5. You have customer feedback emails (unstructured) and a sales spreadsheet (structured). Which format should you use to upload the sales data to an AI for trend analysis?

6. Your "Department" column has these values: "Marketing", "marketing", "Marketing ", "Mktg". You ask an AI for average salary by department. What will happen?