OKF Explained
◆ Open Knowledge Format · v0.1

Teaching machines what your
data actually means.

OKF is a tiny, open standard for writing down the knowledge that normally lives in people's heads, scattered wikis, and proprietary catalogs — in a format that both humans and AI agents can read. No SDK. No database. Just folders of Markdown files.

In one sentence
It's a shared language for describing your tables, metrics, and runbooks, so the context an AI needs to answer "How do we compute weekly active users?" is written down once — and works everywhere.
The 30-second version

What is OKF, really?

Strip away the jargon and OKF is three plain ideas stacked together.

📁

Folders of Markdown

Each "thing you know" — a table, a dataset, a metric, a playbook — is one Markdown file. Group them in folders. That folder is a bundle.

🏷️

A little structure on top

Each file starts with a small YAML frontmatter header (type, title, description…). Machines read the header; humans read the rest.

🔗

Links between concepts

Files reference each other with ordinary Markdown links, quietly forming a knowledge graph an agent can walk.

💡 Why this is clever: the same file is readable by a person in any text editor and parseable by a program — so there's no translation layer, no vendor lock-in, and the knowledge travels with your code in version control.

Why it exists

The problem: knowledge is scattered

When an AI agent tries to answer a real data question, the answer is smeared across a dozen incompatible places. Every team rebuilds the same plumbing from scratch.

● Before — context locked in silos

Proprietary catalog APIs Wikis & shared drives Code comments Docstrings Slack threads An engineer's memory 🧠

"Every agent builder is solving the same context-assembly problem from scratch, every catalog vendor is reinventing the same data models, and the knowledge itself is locked behind whichever surface created it."

● After — one portable format

📄 orders.md 📄 customers.md 📄 weekly_active_users.md 📄 refund_runbook.md

Knowledge is written once in a neutral format. Humans and agents read the same files. It ships as a tarball, lives in a git repo, and survives moving between tools, teams, and companies.

The inspiration · Andrej Karpathy's "LLM Wiki"

The big idea: let the AI keep the notes

OKF formalizes a pattern Karpathy described: instead of an AI re-reading raw documents every time you ask something, it maintains a living, cross-linked wiki — a compounding artifact that's compiled once and kept current.

1

Ingest

New sources get read, summarized, and folded into existing pages — updating entities, revising summaries, flagging contradictions.

2

Query

Questions are answered by searching the relevant wiki pages. Good answers can be filed back as brand-new pages.

3

Lint

Health checks hunt for contradictions, stale claims, orphans, and missing cross-references — keeping the knowledge honest.

"LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The bookkeeping that causes humans to abandon personal wikis is exactly what LLMs are good at."

— Andrej Karpathy, the LLM-Wiki gist that inspired OKF

The punchline: the boring part of a knowledge base isn't reading or thinking — it's bookkeeping. That's the chore humans quit and machines never tire of. OKF is the file format that makes that machine-maintained wiki portable.

Under the hood

How a bundle is laid out

A bundle is just a directory tree. Each concept is one file; folders group related concepts; an optional index.md describes what's inside.

  sales-bundle/
sales/
├── index.md            # optional table of contents
├── log.md              # optional update history
├── datasets/
│   ├── index.md
│   └── orders_db.md
├── tables/
│   ├── index.md
│   ├── orders.md       # one concept = one file
│   └── customers.md
└── metrics/
    ├── index.md
    └── weekly_active_users.md

🆔 The Concept ID

A file's identity is just its path minus .md — so tables/orders.md is the concept tables/orders. Other files link to it by that path.

📑 Reserved filenames

index.md = a friendly listing for progressive disclosure. log.md = a dated history of changes. Everything else is a concept.

📦 Ships as nothing fancy

Zip it as a tarball, push it to GitHub (it even renders!), or drop it next to your code. No server, no runtime, no required SDK.

Anatomy of one concept

Click a line to learn what it does

Every concept file has two parts: the YAML frontmatter (the structured header machines query) and the Markdown body (the rich docs humans read). Tap any highlighted line below.

  tables/orders.md
---
type: BigQuery Table
title: Orders
description: One row per completed customer order.
resource: https://console.cloud.google.com/bigquery?...
tags: [sales, revenue]
timestamp: 2026-05-28T14:30:00Z
---
# Schema
| Column      | Type   | Description                |
|-------------|--------|----------------------------|
| order_id    | STRING | Globally unique order id.  |
| customer_id | STRING | FK to [customers](/tables/customers.md). |
👈 frontmatter + body guide

Pick any highlighted line

The lines between the --- markers are frontmatter — structured fields a program can query. Everything after is normal Markdown that a human reads. Click a line on the left to see exactly what each field is for.

Tip: only one field is actually required — type.

Design philosophy

Three rules that keep it simple

OKF is deliberately tiny. These principles are why it can be a shared standard instead of yet another platform.

01

Minimally opinionated

Only type is required. OKF won't force a taxonomy on you — producers define their own content models and even add custom fields.

02

Producer / consumer independence

The format is the contract; tooling is swappable. A human can write a bundle an agent consumes, or an agent can write one a human reads. Neither needs the other's tools.

03

A format, not a platform

Never requires a proprietary account, runtime, or SDK. Vendor-neutral by design — the goal is a lingua franca knowledge can be exchanged in, like JSON or Markdown.

🧩 What OKF deliberately does NOT do: it doesn't replace domain schemas like Avro, Protobuf, or OpenAPI; it doesn't dictate where you store files; and consumers are expected to tolerate broken links and unknown types gracefully rather than crash. Forgiving by design.

Where it fits

OKF vs. the alternatives

OKF sits between dumping raw documents at an AI (classic RAG) and burying knowledge in a proprietary catalog.

  Classic RAG Proprietary catalog OKF
Human-readable Sometimes Behind a UI Always (plain Markdown)
Machine-readable Yes Via vendor API Yes (YAML frontmatter)
Portable / vendor-neutral Depends Locked in Fully portable
Lives in version control Rarely No Yes — diff & review like code
Persistent & compounding Re-derived each query Yes Yes — a living wiki
Setup cost Vector DB + pipeline Onboard a platform Make a folder
From spec to reality

What Google actually shipped

OKF isn't just a document — it launched with open-source tooling and example bundles to prove the idea.

🤖

An enrichment agent

Walks your BigQuery datasets, drafts OKF documents automatically, and enriches them with schemas and documentation — bootstrapping a bundle for you.

🕸️

A static HTML visualizer

Turns any OKF bundle into an interactive knowledge graph you can browse — and it needs no backend at all.

📚

Three sample bundles

GA4 e-commerce, Stack Overflow, and Bitcoin datasets — ready-made proof-of-concept knowledge bundles.

🌐

An open spec + catalog support

Published as open source on GitHub and wired into Google Cloud's Knowledge Catalog, with an explicit invite for community implementations.

Cheat sheet

Key terms in plain English

The handful of words you need to sound fluent in OKF.

Bundle

A directory of Markdown files that together describe a body of knowledge — e.g. everything about your "sales" data. The unit you ship around.

Concept

A single thing worth knowing — a table, dataset, metric, API endpoint, or runbook — captured as one Markdown file inside a bundle.

Frontmatter

The small YAML block at the top of a file (between --- lines) holding structured, queryable fields like type, title, and tags. The machine-readable part.

Concept ID

A concept's address: its file path without the .md. tables/orders.mdtables/orders. Other files link to it using this.

type (the only required field)

A free-text label categorizing the concept — "BigQuery Table", "Playbook", "API Endpoint". The one field every concept must have.

index.md / log.md (reserved files)

index.md is an optional human-friendly listing of a folder's contents; log.md is an optional dated history of updates. They're not concepts themselves.

Conformant bundle

A bundle that follows the rules: every non-reserved .md file has parseable frontmatter with a non-empty type, and reserved files follow their format. That's basically it.

So… why should you care?

Because AI agents are only as good as the context you give them — and right now that context is trapped in silos. OKF makes that context writable once, readable by everyone (human or machine), and portable forever. It's not a database or a product. It's a humble, shared format — and that humility is exactly what lets it become a standard.