Workstation Model Services

Overview

Three independent model services run as user-systemd units on the workstation. Each is reachable from anywhere through the workstation domain over a reverse SSH tunnel to the VPS. The two RAGs (SignalMind, ComponentMind) use trained MiniLM embeddings with BM25 hybrid retrieval; the SI-Extract LLM is a Gemma-4-E4B-it model LoRA-fine-tuned on the same SI-Extract corpus the two RAGs feed.

SignalMind

SI net classification evidence engine. It ranks SI categories for a target net using net name, topology, related nets, current candidate state, exclusions, boosts, and group context.

Use for: ambiguous net category help inside SI-Extract.
Output: category rankings, scoring, source agreement, decision flags, retrieval hits.
Authority: advisory scoring input to the SI-Toolkit pipeline.

ComponentMind

Component evidence engine. It retrieves cited evidence for part numbers, aliases, pin-role evidence, Windchill metadata, and reviewed ComponentLibrary/ComponentMind audit decisions.

Use for: component lookup, alias repair, cited pin/electrical evidence.
Output: ranked cited component evidence with allowed-use and exact-match flags.
Authority: advisory only. It does not replace SignalMind or final SI classification.

SI-Extract LLM

Gemma-4-E4B-it fine-tuned with LoRA SFT on a 24,035-row multi-task corpus generated from the SI-Extract pipeline. Served locally as a Q5_K_M GGUF through llama-cpp-python with an OpenAI-compatible REST shim.

Use for: advisory verdicts, SignalMind arbitration, ComponentMind grounding, regex-rule proposals, and abstain decisions.
Output: one JSON object per query, schema gemma4-si-llm-output/v1.
Authority: advisory only. Smoke-eval JSON parseability was marginal; treat outputs as suggestions, not decisions.

Usage Telemetry

Live metadata-only traffic summary for SignalMind and ComponentMind. The services do not store full request payloads, prompts, retrieved evidence, or model outputs.

Loading usage

- model calls in the last 30 days Waiting for telemetry API

- total requested items Batch calls count each item

- batch transactions Health checks: -

- reported users Uses SI-Toolkit headers when provided

- reported boards Falls back to payload metadata

- failed requests HTTP errors and runtime exceptions

By Service

Service	Calls	Items	Batches	Errors	Last Used
Loading

By Request Type

Service	Type	Calls	Items	Errors
Loading

Top Users

User	Calls	Items	Sessions	Last Used
Loading

Top Boards

Board	Calls	Items	Users	Last Used
Loading

Recent Requests

Time	Service	User	Board	Type	Items	Status	Latency
Loading

19,461 SignalMind approved retrieval chunks

2,080 ComponentMind approved evidence documents

10,000 Max batch size configured on both services

SignalMind

SignalMind is the SI net category retrieval engine. It helps the pipeline decide whether a net looks like I2C, PCIe, reset/pgood, clock, DDR, MISC, and other SI categories, while preserving audit evidence.

Item	Value	Notes
Base URL	`https://workstation.kowalski-technologies.com/signalmind/1669873e786a472792568705a3261506`	Tokenized nginx route, no basic auth on this path.
Runtime	`/home/elkowalski/regex-augmenter/dist/signalmind_context_engine_runtime`	User-systemd service `signalmind.service`.
Model	`sentence-transformers/all-MiniLM-L6-v2`, fine-tuned as `minilm_group_triplet_v1`	Used through a Chroma index and BM25 fusion.
Retrieval sources	`bm25_single`, `bm25_group_context`, `trained_minilm_group_context`	Weights are 1.0, 0.5, and 0.5 with reciprocal-rank fusion.
Live health	OK: 19,461 chunks, max batch 10,000	Checked through the public HTTPS route.
Validation summary	Hybrid category recall at 5: `0.954315`; dev recall at 5: `0.995757`; holdout recall at 5: `0.848921`	From the bundled SignalMind runtime manifest.

ComponentMind

ComponentMind is a new component evidence RAG, deployed the same way as SignalMind. It answers component-data questions and returns evidence with citations and authority flags.

Item	Value	Notes
Base URL	`https://workstation.kowalski-technologies.com/componentmind/1669873e786a472792568705a3261506`	Tokenized nginx route, no basic auth on this path.
Runtime	`/home/elkowalski/componentmind-finetune/dist/componentmind_context_engine_runtime`	User-systemd service `componentmind.service`.
SI-Toolkit client	`BoardVerification/ComponentMind/ComponentMindClient.cs`	HTTP wrapper matching the SignalMind client pattern; pipeline integration should start in audit-only mode.
Model	`sentence-transformers/all-MiniLM-L6-v2`, fine-tuned as `component_mind_minilm_group_triplet_v1`	Same base family as SignalMind, trained for component evidence retrieval.
Corpus	2,080 documents from ComponentLibrary, CircuitCAD/J0, component knowledge, Windchill cache, and ComponentMind review decisions	Includes alias, key-pin, and Windchill BasicName decisions with allowed-use flags.
Training	348 train rows, 1,360 triplets, 43 validation rows, 45 test rows	Trained MiniLM retriever plus BM25/RRF hybrid.
Stress eval	436 rows, top-1 `1.0`, top-5 `1.0`, citation correctness `1.0`, unsafe authority violations `0`	This means the current eval set passed. It does not mean the data is universally complete.
Live health	OK: 2,080 chunks, max batch 10,000	Checked through the public HTTPS route.

Important: ComponentMind marks exactComponentMatch on each hit and returns warnings: ["no_exact_component_match_in_top_results"] when a requested part has only similar-family evidence. Downstream code should not promote non-exact hits as direct evidence for the requested component.

SI-Extract LLM

A Gemma-4-E4B-it model LoRA-fine-tuned on a 24,035-row multi-task SI-Extract corpus, served locally as a quantized GGUF through an OpenAI-compatible HTTP shim. The same shim already serves the Carolina and Alex personas; the SI-Extract weights plug in via the runtime's <gguf_dir>/<name>-q5km.gguf naming convention with no code changes.

Model Provenance

Item	Value	Notes
Base model	`unsloth/gemma-4-E4B-it`	Gemma 4 E4B Instruct (text-only path). Audio and vision towers are stripped before quantization to keep the GGUF text-only.
Persona	`GemmaJudge`	Single system prompt: "advisory SI-Extract reasoning model" analyzing SI-Toolkit phase evidence, SignalMind hits, ComponentMind hits, rules context, and audit decisions.
Fine-tuning	LoRA SFT, 1 epoch on the full corpus	PEFT `peft_type=LORA`, `r=16`, `lora_alpha=32`, `lora_dropout=0.05`, no bias, no quantization config (bf16 weights during training).
Training hyperparameters	lr 2e-4, per-device batch 4, grad-accum 2, max-seq 4096, seed 20260511, bf16	3,005 optimizer steps over 24,035 train rows + 200 val. Final train loss `0.0151`. Total wall time ≈ 6.4 h on a RunPod L4 / RTX 4090-class GPU.
Training corpus	24,035 SFT rows + 200 val (run name `gemma4-e4b-fullsft-20260511`)	Generated by the 11-phase SI-Extract dataset pipeline in `~/gemma4_26b_a4b_si_llm_work/`. Input schema `gemma4-si-llm-input/v1`, output `gemma4-si-llm-output/v1` (single JSON object per query).
Task mix	judge_verdict 10,784 · signalmind_arbitration 10,784 · abstain_safety 2,065 · componentmind_grounding 396 · regex_rule_proposal 6	All five tasks share one I/O schema; the `TASK:` header in the user message selects behavior.
Tokenizer	`GemmaTokenizer`, vocab 262,144, pad `<pad>`	Chat template from `chat_template_snapshot.jinja` bundled with the run.
Serving artifact	`~/trained-models/model-assets/gguf/siextract-q5km.gguf` · 5.76 GB	Adapter merged into base, multimodal towers removed, then quantized to Q5_K_M. `head_count=8`, `kv_head_count=2` (GQA), `n_ctx_train=131,072`.
Runtime context	`n_ctx=16384`, `n_gpu_layers=-1`, `max_tokens=8192`	Full GPU offload to the workstation P4000 (~5.6 GB VRAM resident). `RB_GGUF_DIR` / `RB_LLM_N_CTX` / `RB_LLM_N_GPU_LAYERS` overrides set in the systemd unit.
Smoke evaluation	Verdict MARGINAL — 0 / 10 generations parseable as strict JSON	Final loss is excellent but the smoke eval suite flagged JSON-format drift on greedy decoding. Outputs should be treated as advisory and wrapped in a tolerant parser; a follow-up DPO pass on format violations is the planned remediation.
Live health	Checking…	Probes `GET /v1/models` through the auth-bypass token route.

Important: This is an advisory model. Its outputs should never gate a deterministic SI-Toolkit decision on their own. The smoke eval explicitly failed JSON-parseability at strict thresholds; downstream callers must validate against schema/si_llm_output.schema.json and degrade gracefully on parse error.

Endpoints

Endpoint	Path	Notes
Base URL (auth-bypass)	`https://workstation.kowalski-technologies.com/siextract/1669873e786a472792568705a3261506`	Tokenized nginx route, OpenAI-compatible. No basic auth on this path.
Base URL (auth-gated)	`https://workstation.kowalski-technologies.com/siextract`	Same shim, htpasswd-workstation challenge.
Chat completions	`POST /v1/chat/completions`	Model name `siextract`. Default temperature 0.0, max_tokens 8192.
Model list	`GET /v1/models`	Returns the GGUF currently loaded.
Health	`GET /health`	Liveness probe for the llama-cpp-python process.

Input and Output

The JSON schemas are intentionally tolerant so SI-Toolkit and external callers can evolve without breaking every request. Fields not needed by a request can be omitted.

SignalMind Request Fields

board: board or extraction name.
net or target_net: target net name.
related_nets: array of {net, relationship}.
bus_id, xnet_group, name_family: group context.
current_category, current_confidence, current_source: current pipeline state.
observations, pull_info, topology: physical/topology hints.
physics_eliminated_categories, hard_excluded_categories, soft_excluded_categories, boost_categories, allowed_categories: controls.
top_k: number of categories/evidence rows to return.

SignalMind Response Fields

top_category, top_categories, raw_top_categories: ranked SI categories.
score_details: top scores, margin, dominance, category scores.
source_agreement: source-level agreement and conflict markers.
decision: state, confidence, reasons, flags.
category_evidence: per-category retrieval source support.
hits: retrieved chunk IDs, categories, source types, and source ranks.
controls_applied: exclusions, boosts, and allowed-category effects.

ComponentMind Request Fields

query: direct natural-language query. If present, this is used as-is.
board, refdes: board context.
part_number or partNumber: target component part number.
device, component_class or class: extracted component metadata.
pin_name or pinName, pin_number or pinNumber: pin evidence lookup.
net: optional net name tied to the component lookup.
source_path, observations, context: extra audit context.
top_k, search_k: returned rows and internal retrieval depth.

ComponentMind Response Fields

status: ok or no-hit.
classification_authority: always advisory_only.
top_results and hits: ranked cited evidence rows.
exactComponentMatch: true only when the hit matches the requested component key.
allowedUse: permitted use, such as rag_retrieval, electrical_review, or metadata_only.
deterministicClassificationAllowed, deterministicPinRoleAllowed: trust flags for downstream policy.
citation: source path, type, key, and sometimes source hash.
warnings: guardrail warnings such as no exact component match.

Call From Anywhere

Use HTTPS with JSON. The tokenized base paths are routed through nginx on the VPS to reverse SSH tunnels back to the workstation.

SignalMind

curl -sS -X POST \
  "https://workstation.kowalski-technologies.com/signalmind/1669873e786a472792568705a3261506/classify" \
  -H "Content-Type: application/json" \
  -d '{
    "board": "my_board",
    "net": "BMC_I2C_SCL",
    "related_nets": [
      {"net": "BMC_I2C_SDA", "relationship": "bus_peer"}
    ],
    "observations": ["candidate bus context from SI-Extract"],
    "top_k": 5
  }'

ComponentMind

curl -sS -X POST \
  "https://workstation.kowalski-technologies.com/componentmind/1669873e786a472792568705a3261506/classify" \
  -H "Content-Type: application/json" \
  -d '{
    "board": "my_board",
    "refdes": "D1",
    "part_number": "1N4001",
    "query": "1N4001 rectifier diode component library evidence",
    "top_k": 5
  }'

SI-Extract LLM

curl -sS -X POST \
  "https://workstation.kowalski-technologies.com/siextract/1669873e786a472792568705a3261506/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "siextract",
    "messages": [
      {"role": "user", "content": "TASK: judge_verdict\nPACKET_SCHEMA: gemma4-si-llm-input/v1\nOUTPUT_SCHEMA: gemma4-si-llm-output/v1\n\nINPUT_PACKET_JSON: { ... }"}
    ],
    "max_tokens": 4096,
    "temperature": 0.0
  }'

Health

curl -sS \
  "https://workstation.kowalski-technologies.com/signalmind/1669873e786a472792568705a3261506/health"

curl -sS \
  "https://workstation.kowalski-technologies.com/componentmind/1669873e786a472792568705a3261506/health"

curl -sS \
  "https://workstation.kowalski-technologies.com/siextract/1669873e786a472792568705a3261506/v1/models"

Batch Shape

curl -sS -X POST \
  "https://workstation.kowalski-technologies.com/componentmind/1669873e786a472792568705a3261506/classify_batch" \
  -H "Content-Type: application/json" \
  -d '{
    "board": "my_board",
    "top_k": 3,
    "requests": [
      {"refdes": "D1", "part_number": "1N4001"},
      {"refdes": "U10", "part_number": "SN74AHC1G86"}
    ]
  }'

Deployment

All three services share the same shape: a workstation user-systemd unit, a paired user-systemd autossh tunnel pinning a VPS loopback port back to the workstation, and a token-prefixed nginx location block on the VPS.

CallerSI-Toolkit, curl, Python, C#, browser tools

VPS nginxworkstation.kowalski-technologies.com

Token path/signalmind/.../, /componentmind/.../, or /siextract/.../

Reverse tunnelVPS 127.0.0.1:8088/8089/8090 to workstation

RuntimeFastAPI RAGs (8088/8089) and llama-cpp-python OpenAI shim (8090)

Service	Systemd Units	Runtime Assets	Proxy
SignalMind	`signalmind.service`, `signalmind-tunnel.service`	`/home/elkowalski/regex-augmenter/dist/signalmind_context_engine_runtime`	`/etc/nginx/sites-available/workstation.kowalski-technologies.com` to port `8088`
ComponentMind	`componentmind.service`, `componentmind-tunnel.service`	`/home/elkowalski/componentmind-finetune/dist/componentmind_context_engine_runtime`	`/etc/nginx/sites-available/workstation.kowalski-technologies.com` to port `8089`
SI-Extract LLM	`siextract.service`, `siextract-tunnel.service`	`/home/elkowalski/trained-models/runtime/llm_server.py` + `model-assets/gguf/siextract-q5km.gguf`	`/etc/nginx/sites-available/workstation.kowalski-technologies.com` to port `8090`

Wiki

Operational commands and update process. Run systemd commands on the workstation unless the command says VPS.

Restart and Logs

systemctl --user status signalmind.service signalmind-tunnel.service
systemctl --user status componentmind.service componentmind-tunnel.service
systemctl --user status siextract.service siextract-tunnel.service

systemctl --user restart signalmind.service signalmind-tunnel.service
systemctl --user restart componentmind.service componentmind-tunnel.service
systemctl --user restart siextract.service siextract-tunnel.service

journalctl --user -u signalmind.service -n 120 --no-pager
journalctl --user -u componentmind.service -n 120 --no-pager
journalctl --user -u siextract.service -n 120 --no-pager

Update Model Assets

# SignalMind runtime
cd /home/elkowalski/regex-augmenter/dist/signalmind_context_engine_runtime
./.venv/bin/python tools/signalmind_runtime.py --json '{"board":"smoke","net":"BMC_I2C_SCL"}'

# ComponentMind runtime
cd /home/elkowalski/componentmind-finetune/dist/componentmind_context_engine_runtime
./.venv/bin/python tools/componentmind_runtime.py --json '{"part_number":"1N4001"}'

# SI-Extract LLM (swap GGUF, then restart)
ls -lh ~/trained-models/model-assets/gguf/siextract-q5km.gguf
systemctl --user restart siextract.service

VPS Proxy Check

sudo nginx -t
sudo systemctl reload nginx
ss -ltnp | grep -E ':(8088|8089|8090)\b'

curl -sS http://127.0.0.1:8088/health
curl -sS http://127.0.0.1:8089/health
curl -sS http://127.0.0.1:8090/v1/models

Data Refresh Rules

Do not hand-edit generated RAG output as a fix. Repair the source data or review-decision generator, then rebuild.
Keep Windchill metadata as metadata unless reviewed evidence grants stronger authority.
Preserve citation fields and allowed-use flags in every new evidence row.
Run stress evals before replacing the deployed model or index.

Next Steps

Current recommended work after the deployment.

Near-Term Integration

Wire the SI-Toolkit ComponentMind client into the component context path with the existing audit-only switch first.
Use exactComponentMatch as a hard guard before applying component-specific evidence.
Persist ComponentMind request/response snippets into generated reasoning files for review.
Compare ComponentMind evidence against existing ComponentKnowledgeLoader behavior.

Dataset and Training

Expand real-board eval coverage beyond the current stress set.
Add Windchill refresh jobs that keep metadata separate from reviewed electrical evidence.
Add negative tests for similar part families, especially logic gates, buffers, connectors, and power parts.
Re-run the ComponentMind data-readiness audit after each ComponentLibrary or Windchill import.