Word Documents

Lab 2.1: Word Documents

Extract and convert DOCX to clean markdown

Process Word documents

/CC.m2.lb1

Identify the key structural elements that Claude Code preserves when extracting content from Word documents (headings, lists, tables) versus elements that are lost (fonts, colours, complex formatting)
Apply the docx skill to convert messy Word documents into clean, structured markdown whilst preserving semantic meaning and logical hierarchy
Create properly tagged markdown files with YAML frontmatter that integrate processed documents into the knowledge vault for future searchability

Concept	Description
Content Extraction	The process of pulling meaningful information from documents whilst discarding visual formatting noise
XML Structure	The underlying document structure that Claude Code parses, containing headings, paragraphs, lists, and tables
YAML Frontmatter	Metadata block at the start of markdown files containing title, tags, and other searchable properties
Semantic Meaning	The actual meaning and logical relationships within content, independent of how it appears visually
Metadata Tags	Descriptive labels added to processed documents that enable searching and filtering in the knowledge vault

“It extracted all that mess into clean sections in under a minute - I’ve spent entire afternoons doing this manually!”

Navigation: Module 1: Lab 5 | Module Overview | Lab 2: PowerPoints