Skip to main content

Pilot Design Architecture

Project: UN/CEFACT GRID Pilot

Version: 5.1 (Refactored Ingestion)

Status: Operational / Ready for Scale

1. Executive Summary

The Pilot Design has been developed by considering how to create a representative model of the "logical model" for the GRID. There are key elements of the logical model design that we want to explore, such as a discovery service supported by a decentralised and loosely coupled approach that retains sovereignty and resilience.

We chose to constrain the pilot to the resources already available from the environment of the UNICC Gitlab instance and then to consider how this might work with Registrars and their potential use of gitlab.com and/or github.com repositories.

The intent is that this design is low (to zero) cost and simple enough to enable quick set up and use, but useful enough to help drive real insights into the requirements for the GRID operational design, and for the GRID participants.

At the time of writing, the operational design and implementation of the GRID is expected to be subject to an appropriate UN request and procurement process. The pilot design presented here is intended solely to explore how different approaches might work, not to dictate the chosen design.

The logical model for GRID is shown below.

Logical Model

Context model of the Global Registrar Information Directory

The GRID Pilot is a decentralized system where "Sovereign Nodes" (countries) manage their own data in their own repositories. A UNICC hosted build process "harvests" the data shared by Country registars, but only includes entries that are cryptographically signed by the Country's registered Identity Key. The pilot makes use of the built in capability of git systems to provide version control and signing commits and verification using cryptographic keys. GPG keys are used to enable more sophisticated key management processes.

Key objectives of the pilot are to explore and demonstrate how the design enables the GRID to stay "lightweight" by only harvesting metadata, while the Authoritative Registrars maintain control over their data and the lifecycle of the DIAs they issue.

2. Key Design Decisions

2.1 The Resilient Harvest Model

The pilot uses a "harvest" approach which simulates a global directory that does not "own" data but "points" to it. This supports the objective that even if the UNICC GRID service fails, the Sovereign Nodes retain their service capability as the primary records of truth.

Instead of a centralized database where Registrars log in to edit rows, we use a Pull ("harvest") Model.

  • Decision: We do not host the data source; we mirror it to enable metadata discovery. Registrars remain responsible for the integrity and hosting of their data

  • Benefit: Sovereignty. If UNICC GRID goes offline, the Sovereign Node's data remains accessible and verifiable in their own repository.

Importantly, this approach is consistent with the issuer holding linked data model proposed by the UNTP Specification.

2.2 Dual-Stack Verification (SSH & GPG)

For the pilot design, we support two verification methods to accommodate different infrastructure maturity levels.

  • Internal Bootstrap Nodes: Verified via SSH Signatures against a local allowed_signers file (Native Git Verify). This is so that we can rapidly make changes to our dummy/trial entries within the GTR Gitlab environment.

  • External Sovereign Nodes: Verified via GPG Signatures against a Keyring (Harvester Verify). These will be the nation-state pilot participants (e.g. Spain, Canada, India, Netherlands etc.)

Explanation: SSH keys primarily authenticate access to remote systems, while GPG (GNU Privacy Guard) keys offer broader functionality for data encryption and digital signatures, including more robust key management like built-in expiration and revocation, making them better for verifying data integrity (like Git commits/tags) and securing communications (emails/files). GPG signatures provide detailed identity, expiration, and usage control within a "web of trust," whereas SSH signatures are simpler, often reusing existing keys, but lack native expiration, making GPG more versatile for supply chain security.

Design objectives:

  1. the Pilot should demonstrate that updating a company's status (Revocation) is a Local Registrar Action that requires zero interaction with the central GRID Hub. Only a change in the Registrar's own "Identity Anchor" data triggers a GRID update.

  2. The design should show a separation between registrar keys used for the GRID interactions and those used for DIA issuance and other registrar operations.

2.3 Internal design consequence - "Sidecar" Verification (Chain of Custody)

Having decided to use a Git based model and a pull (harvest) mechanism, a challenge was that Git Signatures live in the .git folder, not the file itself. When moving files to the build stage, the signature is lost.

  • Solution: We verify at source (during a clone operation) and generate a .sig "sidecar" file.

  • Mechanism: companies.md is paired with companies.sig.

  • Result: The build script trusts the sidecar, maintaining the chain of custody across pipeline stages.

2.4 Delegated Trust (One Key, Many Registrars)

We use a simple Root of Trust model. Where one DID is used per country and a country may have many registrars. This provides simplicity for the Pilot and mirrors an effect that we would like to see in operation, where Registers are verifiably under the control of the nation-state authority.

  • Decision: The Sovereign DID (e.g., did:web:...:genovia) and its associated GPG Key represent the Country - in this case the fictitious land of Genovia.

  • Mechanism: That single key signs multiple files (e.g., companies.md, land.md).

  • Outcome: The GRID table shows multiple registrars for the same country, all verified by the same Sovereign Anchor.

2.5 Design Consequence Subfolder Ingestion Strategy

Given how we are replicating and presenting files on the website (using Docusaurus), to prevent filename collisions (e.g., both UK and France submitting companies.md), we handle namespacing at the ingestion level.

  • Method: Harvester sorts files into folders (external/GEN/companies.md) and prefixes the filenames with the country code (if not already present)

  • Benefit: The Generator script treats Internal and External files identically, removing logic duplication and "double-prefix" bugs.


3. Technical Implementation

3.1 The Harvester (.gitlab-ci.yml)

The pipeline runs on a schedule to ingest external data.

  1. Inject Credentials: Uses git-credential-store to clone private sovereign repos.

  2. Import Key & Trust: Imports the Sovereign's Public GPG Key and applies signature checks against the pulled content.

  3. File naming and placing:

    • Reads country_code from the harvested file frontmatter (e.g., "GEN").

    • Creates a directory registrars/external/GEN/.

    • Copies the verified file and its .sig sidecar into that directory.

3.2 The Generator (generate-grid.js)

The script acts as the verification and publishing engine.

Future Improvements

This approach can/will be improved on. A future iteration will only pull data from registers with updates, and only update those files that have changed. This current release (5.1) cleans and builds everything which is inefficient and that means that if a Registrar is temporarily unavailable during the harvest and build, they will disappear from the GRID for that build.

The operational (and preferably the pilot) system need a better mechanism to manage the list of Registrars to pull data from.

  1. Hygiene: Cleans the docs/registrars folder on every run to remove "ghost files" from previous runs.

  2. Recursive Scan: Recursively finds markdown files in both internal/ and external/ subfolders.

  3. Strict Mode Verification:

    • Internal: Checks git log for SSH signature status G.

    • External: Checks .sig sidecar for GPG status G.

    • Rejection: Any status other than G (Good) is marked as "Unverified" or "Bad".

  4. Unified Renaming:

    • Applies the Country Code prefix to all files at the final stage.

    • Input: internal/atlantis/companies.md -> Output: ATL_companies.md

    • Input: external/GEN/companies.md -> Output: GEN_companies.md

3.3 The Frontend (Docusaurus)

  • Build Time: The formatted Markdown files are copied into the Docusaurus docs/ folder.

  • Display: The grid-directory.md page is built by the script and renders a sortable table with the Trust Badges displayed.

  • Linkage: Clicking a registrar name opens the detailed page (e.g., docs/registrars/GEN_companies.md). With content created by and copied from the Country Registrar repository during the clone operation.


4. Operational Guide for Ops Team

Adding a New Pilot Country

To onboard a new participant (e.g., Australia), the UNICC team:

  1. Receive Secrets: gets the Repo URL, Deploy Token (User/Pass), and Public Key (.asc) from the country lead.

  2. Update Variables: Adds AUS_USER and AUS_PASS to the GitLab CI/CD Variables.

  3. Update Pipeline: Edits .gitlab-ci.yml:

    • Add a git credential approve block for the new user.

    • Add a git clone block for the new repo.

    • Add the keys/australia.asc file to the repo.

    • Add the import logic to the script.

  4. Commit: The next pipeline run will automatically harvest, verify, and display the Australian nodes.

Future Improvements

This is another area for design improvement. The process of adding a new country is very manual (cut and paste and several steps). Clearly a production environment would improve on this. However this is a pilot and the aim is to realise the need for these requirements so that they can be specified for the operational system.

5 Detailed Design

This architecture enables a pilot of the UN/CFACT GTR Project "Global Registrar Information Directory" (GRID). It simulates a trust ecosystem using standard Git cryptography and CI/CD automation.

The system uses a hybrid data model of dummy countries and active participants:

  1. Bootstrapped Data: Fictitious countries (Atlantis, Gondal, Ruritania) hosted internally for testing and demonstration.

  2. Sovereign Data: Real pilot participants (e.g., Spain, Canada, Netherlands, UK, Australia) hosting data in their own controlled repositories, which are "harvested" by the central system.

5.1 Overview

The system functions as a Static Site Generator (Docusaurus) and build scripts that aggregate data from multiple sources into a single verifiable directory.

5.2 The Components

  • The GRID Hub (UNICC): The central Docusaurus project. It hosts the website, the fictitious DIDs, and the aggregation pipeline.

  • The Sovereign Nodes (GitLab.com, GitHub should also work): Private projects owned by Pilot Registrars. They contain only their specific registrar data.

  • The Identity Layer (GPG): Participants sign their data commits with GPG keys, creating a verifiable "Digital Identity Anchor" within the Git log.

Note that we are choosing to use GPG keys where possible since this will let us test revocation, rotation and other key management functions.

5.3 Bootstrapping Strategy (Fictitious Countries)

To test the system immediately, we establish three internal fictitious entities (Atlantis, Gondal and Ruritania) and one fictitious external entry (Genovia).

5.4 DID Hosting (The static Folder)

For the pilot we are using the did:web path-based standard. Docusaurus publishes files in the /static directory verbatim, allowing us to host raw JSON DID documents.

Directory Structure (UNICC Repo):

/static/pilots/
├── atlantis/
│ └── did.json
├── gondal/
│ └── did.json
└── ruritania/
└── did.json

The resolution URL for the Atlantis DID becomes:

https://un.opensource.unicc.org/unece/uncefact/gtr/pilots/atlantis/did.json

And the DID string is:

did:web:un.opensource.unicc.org:unece:uncefact:gtr:pilots:atlantis

5.5 Data Storage

The Markdown data for these countries is stored directly in the central repository to simplify management.

Location: /registrars/internal/

Files:

  • atlantis.md

  • gondal.md

  • ruritania.md

5.5 The Sovereign Node Strategy (Real Pilots and the Genovia example)

Real participants should have data sovereignty. This will require unique/distinct git repositories that they manage.

5.5.1 External Repository Setup

  • Host: GitLab.com (Public Cloud). (GitHub should work too)

  • Visibility: Private (initially) or Public.

  • Authorization: The project owner creates a Group Deploy Token (read-only) and provides the credentials to the UNICC Pilot Lead. This is to enable the "Harvest" operation.

5.5.2 Data Storage

The participant manages a single file in their own repo: registrar-info.md.

5.6 Data Specification (Markdown + Frontmatter)

This format bridges the gap between machine-readable data (YAML) and human-readable context (Markdown).

Example markdown

---
# GTR Pilot Metadata
country_code: "GEN"
registrar_name: "Royal Ministry of Genovia Companies"
did: "did:web:un.opensource.unicc.org:pilots:genovia"

# THE SOVEREIGN ANCHOR
authoritative_identifier_scheme: "GEN-REG-101"
signing_key_id: "2EBA5339DBEE1079D295A3DE7782530389BB46E3"

# TRUST ATTRIBUTION POLICY (Pilot Demonstration)
recognized_schemes:
- scheme: "GS1_GLN"
trust_level: "Asserted" # The Registrar allows companies to list these but doesn't vet them.
- scheme: "ISO_17442_LEI"
trust_level: "Verified" # The Registrar cross-checks these against the GLEIF database.
---

## Registrar Verification Policy
This section describes how the Genovian Registrar protects itself from liability while supporting
global supply chain standards.

5.7 The Aggregation Logic (The Scripts)

The logic must handle two distinct data sources (local/internal and external).

5.7.1 The Directory Structure

The Central Project (gtr) organizes input data as follows:

/
├── static/ # Hosted DID Docs
├── registrars/
│ ├── internal/ # Fictitious Markdown files (Committed here)
│ └── external/ # Real Pilot files (Cloned here by CI)
└── scripts/
└── generate-grid.js # The Logic Engine

5.7.2 The Generator Script (scripts/generate-grid.js)

This script scans both folders, validates YAML, checks GPG signatures, and builds the table.

See the GTR GitLab repo for the current build version of this script: generate-grid.js

5.7.3 The Pipeline (.gitlab-ci.yml)

Configured to clone each instance of external Pilot participant data.

See the UNICC GTR GitLab repo for the current version of this document: [.gitlab-ci.yml]((https://opensource.unicc.org/un/unece/uncefact/gtr/-/blob/main/.gitlab-ci.yml)

note

This would be another area of obvious improvement. Adding a new Pilot participant requires a copy/paste/update operation of a chunk of the yml below. The operational solution would require a more elegant approach to managing participants

6. Future Improvements

While the pilot design is necessarily limited, there are a number of future improvements that may be considered:

  • Automated Membership Management: Transitioning from manual .yml updates to an Automated Onboarding Portal where Registrars submit their own Public Keys and metadata. Note that this requires a logical separation of the eligibility processing of applications to join the GRID, and the onboarding of successful applications.

  • Cost-Recovery Simulation: Tracking the "Compute/Storage Credits" used by each node to model the future annual membership fee (the ICAO-style fee calculation).

  • Resilience Testing: A "Chaos Engineering" test where the central Hub is intentionally taken offline to prove that DIAs held by supply chain participants remain verifiable locally.