IOTA-1 semantic interlingua

The Iota Protocol

IOTA-1 studies whether mutable language expressions can converge on reviewable concept evidence while staying grounded in public ISO/IEC 10646 rendering. The target is bounded semantic isomorphism: measured preservation of task-relevant relations, not exact universal translation.

Open Language Converter Universal representation notes Greeting concept JSON API status

Pipeline

Layered conversion

Source text is normalized to NFC, annotated with locale/script/direction metadata, segmented phrase-first, embedded, neutralized, resolved against the concept registry, and then rendered as public Unicode candidates.

Converter

Multilingual workbench

The language converter now exposes source language, locale, top-K ranking, evidence, vector preview, quantized payload opt-in, public Unicode enforcement, and the selected concept authority in one place.

Open Language Converter

Concept registry

Opaque IDs first

Concepts use stable IDs such as C0001842. Human labels, aliases, and glyphs are evidence and display material, not the system of record.

Vectors

Raw differs, canonical holds

Raw expression vector hashes may differ for Hello, Hola, Bonjour, こんにちは, 你好, and مرحبا. After concept resolution they may share the same selected concept ID and canonical vector hash when the profile accepts that broad greeting equivalence.

Neutralization

Language signal reduction

The first profile is identity-v1 when no trained centering or projection profile is configured. The response still records whether neutralization ran and which version was used.

Quantization

Lossy and versioned

Quantized payloads are optional, profiled, and lossy. They can help compact transport, but they do not replace concept IDs, provenance, or evidence.

Unicode safety

Public mode rejects PUA

Normal public-symbol mode rejects private-use characters, unsafe controls, hidden bidi controls, and zero-width risk patterns. Private-use characters are not a public IOTA meaning channel.

Validation

Drift stays visible

Round-trip output is treated as a diagnostic gist. The protocol tracks retained, lost, added, and ambiguous concepts instead of pretending conversion is lossless.

Semantic isomorphism

Policy-relative, not absolute

IOTA-1 uses semantic isomorphism as a testable engineering property. A result is stronger when declared normalization, locale, registry, vector model, and validation policy preserve the intended relation across expression changes.

Mutable languages

Translations are evidence

Translations and paraphrases can carry different idioms, politeness, domain assumptions, and cultural context. They help only when the sense, task, model, and provenance are explicit.

Translation centroids

Prototype, not proof

A normalized average across sense-aligned translations can be useful as a concept prototype. Its spread, outliers, model ID, language list, and near-miss failures must remain visible.

Shared vector space

Operational universals

The experiment seeks practical invariants: retrieval intent, relation structure, validator behavior, and candidate ranking that remain stable enough across languages for reviewable AI handoff.

Read Universal Semantics

ISO/IEC 10646

Substrate, not semantic ontology

Assigned public characters, names, normalization, grapheme behavior, and public metadata make output inspectable. They do not make a code point or glyph carry IOTA meaning by itself.

Experiment metrics

Measure the pursuit

Track top-K concept hit rate, centroid spread, unknown rate, Unicode-safety warnings, drift, retrieval rank, human review outcomes, and model/version deltas.

Architecture stance

What IOTA-1 promises

IOTA-1 promises a public, inspectable approximation pipeline. It does not promise identical raw embeddings across languages, secret-codebook compression, private-use Unicode meaning, or exact translation.

The long-term direction is concept canonicalization: different source languages may produce different raw vectors, but successful matches should converge on a stable concept ID, canonical vector hash, provenance trail, public rendering candidates, and explicit drift evidence.

API surface

Evidence-rich responses

The converter accepts source_language, locale, topK, includeEvidence, includeVectorPreview, includeQuantizedCode, publicUnicodeOnly, and mode values including DatabaseOnly, Hybrid, Semantic Hybrid, and Semantic Interlingua.

Responses include normalization, source language, segments, concept candidates, selected concept, glyph candidates, confidence, unknown_rate, drift, vector hashes/previews, provenance, ranking lanes, and Unicode safety checks.

POST semantic resolve POST semantic embed GET concept C0001842