• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

WebMEM™

The Protocol for Structuring, Delivering, and Conditioning Trust-Scored AI Memory on the Open Web

  • Primer
  • Memory-First
  • Protocols
    • SDT Specification
    • WebMEM SemanticMap
    • WebMEM MapPointer
    • Digest Endpoint Specification
    • ProvenanceMeta Specification
    • AI Retrieval Feedback Loop Specification
    • Semantic Feedback Interface (SFI) Specification
    • Glossary Term Protocol (GTP) Specification
    • Examples
  • RFC
  • Glossary
  • About
    • WebMEM License
    • Mission
    • Charter

RFC-003 — Provenance Mapping Specification (v0.1)

A Standard for Declaring Source Dataset Metadata, Trust Layers, and Retrieval Scope

Metadata

  • rfc_id: RFC-003
  • title: Provenance Mapping Specification
  • status: Draft
  • version: 0.1
  • authors:
    • David W. Bynon (@TrustPublishing)
    • WebMEM Working Group
  • date_created: 2025-07-15
  • license: CC BY-SA 4.0
  • domain_scope: General (Dataset-Centric)
  • depends_on: RFC-001, RFC-002

Purpose

This specification defines the standard structure for declaring dataset-level provenance in a WebMEM Digest or any AI-ingestible, fragment-level memory object.

It supports:

  • Trust scoring
  • Dataset versioning
  • Citable metadata
  • Cross-format retrieval
  • PROV-O compatibility

Core Object: ProvenanceBlock

dataset_id: CMS_PBP_2025
dataset_title: CMS Plan Benefit Package 2025
dataset_type: source              # [source, derived, aggregate, inferred]
source_agency: CMS
license: Public domain
published: 2025-07-01
retrieved: 2025-07-06
dataset_home: https://cms.gov/pbp
dataset_archive: https://cms.gov/pbp/2025/files.zip
trust_layer: primary              # [primary, secondary, inferred]
trust_scope: semantic-digest      # e.g. fragment, plan, claim, model-training
confidence: 1.0
data_format: tsv
schema_format: internal-cms       # or 'semantic-digest-v0.1'
fields_covered:
  - in_premium
  - moop
  - in_specialist
  - in_primary
  - in_mc_dent_preventive
notes: >
  This dataset defines cost and benefit data for all MA plans for 2025.

Field Descriptions

Field Description
dataset_id Globally unique dataset token (e.g. CMS_PBP_2025)
dataset_title Human-readable name of the dataset
dataset_type Dataset classification: source, derived, aggregate, inferred
source_agency Publishing agency or data originator
license Usage license (e.g., Public Domain, CC BY)
published Date dataset was officially published
retrieved Date dataset was accessed for publishing
dataset_home Landing page or documentation URL
dataset_archive Direct archive or raw file download URL
trust_layer Declared trust tier (primary, secondary, inferred)
trust_scope Retrieval-level scope this dataset substantiates
confidence Optional numeric confidence score (0.0–1.0)
data_format Original format (e.g., csv, json, tsv)
schema_format Schema structure used (internal or RFC-based)
fields_covered Array of data_id tokens supported by this dataset
notes Additional context, qualifiers, or limitations

Why This Matters

A single ProvenanceBlock enables:

  • Trust scoring by AI agents
  • Source traceability for derived claims
  • Explainable citation behavior in agentic systems
  • W3C PROV-O compatibility for structured trust lineage

Format Use Cases

Use Case Format
Digest metadata YAML
Semantic web ingestion Turtle (.ttl)
Trust trace graphs PROV-O
Retrieval conditioning JSON-LD

Example: Digest Inclusion

provenance:
  - dataset_id: CMS_Landscape_2025
    dataset_title: CMS MA Landscape 2025
    ...
  - dataset_id: CMS_PBP_2025
    trust_layer: primary
    fields_covered:
      - in_premium
      - moop
      - in_specialist

Suggested Dataset Registry

For large-scale verticals (e.g., healthcare, energy, education), a public dataset registry is encouraged. Each entry should include a descriptor file in .yaml or .json format conforming to RFC-003.

Recommended structure:

/datasets/
  ├── CMS_PBP_2025.yaml
  ├── HHS_Eligibility_2024.yaml
  └── MedicareEnrollmentRates_2023.json

Canonical Reference

RFC-003 is maintained at webmem.com/rfc/rfc-003/ and versioned in the WebMEM RFC Registry.

Primary Sidebar

Request for Comments (RFC)

  1. RFC Onboarding Framework
  2. RFC-001 — Data Definition Specification
  3. RFC-002 — Document Specification
  4. RFC-003 — Provenance Mapping
  5. RFC-004 — Glossary Vocabulary
  6. RFC-005 — Trust Score Computation
  7. RFC-006 — SFI Registry Specification

Copyright © 2025 · David Bynon · Log in