Workstation Model Services

The three model services running as user-systemd units on the P520c workstation: SignalMind and ComponentMind (MiniLM + BM25 hybrid retrieval RAGs) and the SI-Extract LLM (Gemma-4-E4B-it LoRA SFT, served as Q5_K_M GGUF). Each is reachable through this domain via a reverse SSH tunnel to the VPS.

SignalMind live on port 8088 ComponentMind live on port 8089 SI-Extract LLM live on port 8090 Updated May 12, 2026

Overview

Three independent model services run as user-systemd units on the workstation. Each is reachable from anywhere through the workstation domain over a reverse SSH tunnel to the VPS. The two RAGs (SignalMind, ComponentMind) use trained MiniLM embeddings with BM25 hybrid retrieval; the SI-Extract LLM is a Gemma-4-E4B-it model LoRA-fine-tuned on the same SI-Extract corpus the two RAGs feed.

SignalMind

SI net classification evidence engine. It ranks SI categories for a target net using net name, topology, related nets, current candidate state, exclusions, boosts, and group context.

  • Use for: ambiguous net category help inside SI-Extract.
  • Output: category rankings, scoring, source agreement, decision flags, retrieval hits.
  • Authority: advisory scoring input to the SI-Toolkit pipeline.

ComponentMind

Component evidence engine. It retrieves cited evidence for part numbers, aliases, pin-role evidence, Windchill metadata, and reviewed ComponentLibrary/ComponentMind audit decisions.

  • Use for: component lookup, alias repair, cited pin/electrical evidence.
  • Output: ranked cited component evidence with allowed-use and exact-match flags.
  • Authority: advisory only. It does not replace SignalMind or final SI classification.

SI-Extract LLM

Gemma-4-E4B-it fine-tuned with LoRA SFT on a 24,035-row multi-task corpus generated from the SI-Extract pipeline. Served locally as a Q5_K_M GGUF through llama-cpp-python with an OpenAI-compatible REST shim.

  • Use for: advisory verdicts, SignalMind arbitration, ComponentMind grounding, regex-rule proposals, and abstain decisions.
  • Output: one JSON object per query, schema gemma4-si-llm-output/v1.
  • Authority: advisory only. Smoke-eval JSON parseability was marginal; treat outputs as suggestions, not decisions.

Usage Telemetry

Live metadata-only traffic summary for SignalMind and ComponentMind. The services do not store full request payloads, prompts, retrieved evidence, or model outputs.

Loading usage
- model calls in the last 30 days Waiting for telemetry API
- total requested items Batch calls count each item
- batch transactions Health checks: -
- reported users Uses SI-Toolkit headers when provided
- reported boards Falls back to payload metadata
- failed requests HTTP errors and runtime exceptions

By Service

ServiceCallsItemsBatchesErrorsLast Used
Loading

By Request Type

ServiceTypeCallsItemsErrors
Loading

Top Users

UserCallsItemsSessionsLast Used
Loading

Top Boards

BoardCallsItemsUsersLast Used
Loading

Recent Requests

TimeServiceUserBoardTypeItemsStatusLatency
Loading
19,461 SignalMind approved retrieval chunks
2,080 ComponentMind approved evidence documents
10,000 Max batch size configured on both services

SignalMind

SignalMind is the SI net category retrieval engine. It helps the pipeline decide whether a net looks like I2C, PCIe, reset/pgood, clock, DDR, MISC, and other SI categories, while preserving audit evidence.

ItemValueNotes
Base URL https://workstation.kowalski-technologies.com/signalmind/1669873e786a472792568705a3261506 Tokenized nginx route, no basic auth on this path.
Runtime /home/elkowalski/regex-augmenter/dist/signalmind_context_engine_runtime User-systemd service signalmind.service.
Model sentence-transformers/all-MiniLM-L6-v2, fine-tuned as minilm_group_triplet_v1 Used through a Chroma index and BM25 fusion.
Retrieval sources bm25_single, bm25_group_context, trained_minilm_group_context Weights are 1.0, 0.5, and 0.5 with reciprocal-rank fusion.
Live health OK: 19,461 chunks, max batch 10,000 Checked through the public HTTPS route.
Validation summary Hybrid category recall at 5: 0.954315; dev recall at 5: 0.995757; holdout recall at 5: 0.848921 From the bundled SignalMind runtime manifest.

ComponentMind

ComponentMind is a new component evidence RAG, deployed the same way as SignalMind. It answers component-data questions and returns evidence with citations and authority flags.

ItemValueNotes
Base URL https://workstation.kowalski-technologies.com/componentmind/1669873e786a472792568705a3261506 Tokenized nginx route, no basic auth on this path.
Runtime /home/elkowalski/componentmind-finetune/dist/componentmind_context_engine_runtime User-systemd service componentmind.service.
SI-Toolkit client BoardVerification/ComponentMind/ComponentMindClient.cs HTTP wrapper matching the SignalMind client pattern; pipeline integration should start in audit-only mode.
Model sentence-transformers/all-MiniLM-L6-v2, fine-tuned as component_mind_minilm_group_triplet_v1 Same base family as SignalMind, trained for component evidence retrieval.
Corpus 2,080 documents from ComponentLibrary, CircuitCAD/J0, component knowledge, Windchill cache, and ComponentMind review decisions Includes alias, key-pin, and Windchill BasicName decisions with allowed-use flags.
Training 348 train rows, 1,360 triplets, 43 validation rows, 45 test rows Trained MiniLM retriever plus BM25/RRF hybrid.
Stress eval 436 rows, top-1 1.0, top-5 1.0, citation correctness 1.0, unsafe authority violations 0 This means the current eval set passed. It does not mean the data is universally complete.
Live health OK: 2,080 chunks, max batch 10,000 Checked through the public HTTPS route.

Important: ComponentMind marks exactComponentMatch on each hit and returns warnings: ["no_exact_component_match_in_top_results"] when a requested part has only similar-family evidence. Downstream code should not promote non-exact hits as direct evidence for the requested component.

SI-Extract LLM

A Gemma-4-E4B-it model LoRA-fine-tuned on a 24,035-row multi-task SI-Extract corpus, served locally as a quantized GGUF through an OpenAI-compatible HTTP shim. The same shim already serves the Carolina and Alex personas; the SI-Extract weights plug in via the runtime's <gguf_dir>/<name>-q5km.gguf naming convention with no code changes.

Model Provenance

ItemValueNotes
Base model unsloth/gemma-4-E4B-it Gemma 4 E4B Instruct (text-only path). Audio and vision towers are stripped before quantization to keep the GGUF text-only.
Persona GemmaJudge Single system prompt: "advisory SI-Extract reasoning model" analyzing SI-Toolkit phase evidence, SignalMind hits, ComponentMind hits, rules context, and audit decisions.
Fine-tuning LoRA SFT, 1 epoch on the full corpus PEFT peft_type=LORA, r=16, lora_alpha=32, lora_dropout=0.05, no bias, no quantization config (bf16 weights during training).
Training hyperparameters lr 2e-4, per-device batch 4, grad-accum 2, max-seq 4096, seed 20260511, bf16 3,005 optimizer steps over 24,035 train rows + 200 val. Final train loss 0.0151. Total wall time ≈ 6.4 h on a RunPod L4 / RTX 4090-class GPU.
Training corpus 24,035 SFT rows + 200 val (run name gemma4-e4b-fullsft-20260511) Generated by the 11-phase SI-Extract dataset pipeline in ~/gemma4_26b_a4b_si_llm_work/. Input schema gemma4-si-llm-input/v1, output gemma4-si-llm-output/v1 (single JSON object per query).
Task mix judge_verdict 10,784 · signalmind_arbitration 10,784 · abstain_safety 2,065 · componentmind_grounding 396 · regex_rule_proposal 6 All five tasks share one I/O schema; the TASK: header in the user message selects behavior.
Tokenizer GemmaTokenizer, vocab 262,144, pad <pad> Chat template from chat_template_snapshot.jinja bundled with the run.
Serving artifact ~/trained-models/model-assets/gguf/siextract-q5km.gguf · 5.76 GB Adapter merged into base, multimodal towers removed, then quantized to Q5_K_M. head_count=8, kv_head_count=2 (GQA), n_ctx_train=131,072.
Runtime context n_ctx=16384, n_gpu_layers=-1, max_tokens=8192 Full GPU offload to the workstation P4000 (~5.6 GB VRAM resident). RB_GGUF_DIR / RB_LLM_N_CTX / RB_LLM_N_GPU_LAYERS overrides set in the systemd unit.
Smoke evaluation Verdict MARGINAL — 0 / 10 generations parseable as strict JSON Final loss is excellent but the smoke eval suite flagged JSON-format drift on greedy decoding. Outputs should be treated as advisory and wrapped in a tolerant parser; a follow-up DPO pass on format violations is the planned remediation.
Live health Checking… Probes GET /v1/models through the auth-bypass token route.

Important: This is an advisory model. Its outputs should never gate a deterministic SI-Toolkit decision on their own. The smoke eval explicitly failed JSON-parseability at strict thresholds; downstream callers must validate against schema/si_llm_output.schema.json and degrade gracefully on parse error.

Endpoints

EndpointPathNotes
Base URL (auth-bypass) https://workstation.kowalski-technologies.com/siextract/1669873e786a472792568705a3261506 Tokenized nginx route, OpenAI-compatible. No basic auth on this path.
Base URL (auth-gated) https://workstation.kowalski-technologies.com/siextract Same shim, htpasswd-workstation challenge.
Chat completions POST /v1/chat/completions Model name siextract. Default temperature 0.0, max_tokens 8192.
Model list GET /v1/models Returns the GGUF currently loaded.
Health GET /health Liveness probe for the llama-cpp-python process.

Input and Output

The JSON schemas are intentionally tolerant so SI-Toolkit and external callers can evolve without breaking every request. Fields not needed by a request can be omitted.

SignalMind Request Fields

  • board: board or extraction name.
  • net or target_net: target net name.
  • related_nets: array of {net, relationship}.
  • bus_id, xnet_group, name_family: group context.
  • current_category, current_confidence, current_source: current pipeline state.
  • observations, pull_info, topology: physical/topology hints.
  • physics_eliminated_categories, hard_excluded_categories, soft_excluded_categories, boost_categories, allowed_categories: controls.
  • top_k: number of categories/evidence rows to return.

SignalMind Response Fields

  • top_category, top_categories, raw_top_categories: ranked SI categories.
  • score_details: top scores, margin, dominance, category scores.
  • source_agreement: source-level agreement and conflict markers.
  • decision: state, confidence, reasons, flags.
  • category_evidence: per-category retrieval source support.
  • hits: retrieved chunk IDs, categories, source types, and source ranks.
  • controls_applied: exclusions, boosts, and allowed-category effects.

ComponentMind Request Fields

  • query: direct natural-language query. If present, this is used as-is.
  • board, refdes: board context.
  • part_number or partNumber: target component part number.
  • device, component_class or class: extracted component metadata.
  • pin_name or pinName, pin_number or pinNumber: pin evidence lookup.
  • net: optional net name tied to the component lookup.
  • source_path, observations, context: extra audit context.
  • top_k, search_k: returned rows and internal retrieval depth.

ComponentMind Response Fields

  • status: ok or no-hit.
  • classification_authority: always advisory_only.
  • top_results and hits: ranked cited evidence rows.
  • exactComponentMatch: true only when the hit matches the requested component key.
  • allowedUse: permitted use, such as rag_retrieval, electrical_review, or metadata_only.
  • deterministicClassificationAllowed, deterministicPinRoleAllowed: trust flags for downstream policy.
  • citation: source path, type, key, and sometimes source hash.
  • warnings: guardrail warnings such as no exact component match.

Call From Anywhere

Use HTTPS with JSON. The tokenized base paths are routed through nginx on the VPS to reverse SSH tunnels back to the workstation.

SignalMind

curl -sS -X POST \
  "https://workstation.kowalski-technologies.com/signalmind/1669873e786a472792568705a3261506/classify" \
  -H "Content-Type: application/json" \
  -d '{
    "board": "my_board",
    "net": "BMC_I2C_SCL",
    "related_nets": [
      {"net": "BMC_I2C_SDA", "relationship": "bus_peer"}
    ],
    "observations": ["candidate bus context from SI-Extract"],
    "top_k": 5
  }'

ComponentMind

curl -sS -X POST \
  "https://workstation.kowalski-technologies.com/componentmind/1669873e786a472792568705a3261506/classify" \
  -H "Content-Type: application/json" \
  -d '{
    "board": "my_board",
    "refdes": "D1",
    "part_number": "1N4001",
    "query": "1N4001 rectifier diode component library evidence",
    "top_k": 5
  }'

SI-Extract LLM

curl -sS -X POST \
  "https://workstation.kowalski-technologies.com/siextract/1669873e786a472792568705a3261506/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "siextract",
    "messages": [
      {"role": "user", "content": "TASK: judge_verdict\nPACKET_SCHEMA: gemma4-si-llm-input/v1\nOUTPUT_SCHEMA: gemma4-si-llm-output/v1\n\nINPUT_PACKET_JSON: { ... }"}
    ],
    "max_tokens": 4096,
    "temperature": 0.0
  }'

Health

curl -sS \
  "https://workstation.kowalski-technologies.com/signalmind/1669873e786a472792568705a3261506/health"

curl -sS \
  "https://workstation.kowalski-technologies.com/componentmind/1669873e786a472792568705a3261506/health"

curl -sS \
  "https://workstation.kowalski-technologies.com/siextract/1669873e786a472792568705a3261506/v1/models"

Batch Shape

curl -sS -X POST \
  "https://workstation.kowalski-technologies.com/componentmind/1669873e786a472792568705a3261506/classify_batch" \
  -H "Content-Type: application/json" \
  -d '{
    "board": "my_board",
    "top_k": 3,
    "requests": [
      {"refdes": "D1", "part_number": "1N4001"},
      {"refdes": "U10", "part_number": "SN74AHC1G86"}
    ]
  }'

Deployment

All three services share the same shape: a workstation user-systemd unit, a paired user-systemd autossh tunnel pinning a VPS loopback port back to the workstation, and a token-prefixed nginx location block on the VPS.

CallerSI-Toolkit, curl, Python, C#, browser tools
VPS nginxworkstation.kowalski-technologies.com
Token path/signalmind/.../, /componentmind/.../, or /siextract/.../
Reverse tunnelVPS 127.0.0.1:8088/8089/8090 to workstation
RuntimeFastAPI RAGs (8088/8089) and llama-cpp-python OpenAI shim (8090)
ServiceSystemd UnitsRuntime AssetsProxy
SignalMind signalmind.service, signalmind-tunnel.service /home/elkowalski/regex-augmenter/dist/signalmind_context_engine_runtime /etc/nginx/sites-available/workstation.kowalski-technologies.com to port 8088
ComponentMind componentmind.service, componentmind-tunnel.service /home/elkowalski/componentmind-finetune/dist/componentmind_context_engine_runtime /etc/nginx/sites-available/workstation.kowalski-technologies.com to port 8089
SI-Extract LLM siextract.service, siextract-tunnel.service /home/elkowalski/trained-models/runtime/llm_server.py + model-assets/gguf/siextract-q5km.gguf /etc/nginx/sites-available/workstation.kowalski-technologies.com to port 8090

Wiki

Operational commands and update process. Run systemd commands on the workstation unless the command says VPS.

Restart and Logs

systemctl --user status signalmind.service signalmind-tunnel.service
systemctl --user status componentmind.service componentmind-tunnel.service
systemctl --user status siextract.service siextract-tunnel.service

systemctl --user restart signalmind.service signalmind-tunnel.service
systemctl --user restart componentmind.service componentmind-tunnel.service
systemctl --user restart siextract.service siextract-tunnel.service

journalctl --user -u signalmind.service -n 120 --no-pager
journalctl --user -u componentmind.service -n 120 --no-pager
journalctl --user -u siextract.service -n 120 --no-pager

Update Model Assets

# SignalMind runtime
cd /home/elkowalski/regex-augmenter/dist/signalmind_context_engine_runtime
./.venv/bin/python tools/signalmind_runtime.py --json '{"board":"smoke","net":"BMC_I2C_SCL"}'

# ComponentMind runtime
cd /home/elkowalski/componentmind-finetune/dist/componentmind_context_engine_runtime
./.venv/bin/python tools/componentmind_runtime.py --json '{"part_number":"1N4001"}'

# SI-Extract LLM (swap GGUF, then restart)
ls -lh ~/trained-models/model-assets/gguf/siextract-q5km.gguf
systemctl --user restart siextract.service

VPS Proxy Check

sudo nginx -t
sudo systemctl reload nginx
ss -ltnp | grep -E ':(8088|8089|8090)\b'

curl -sS http://127.0.0.1:8088/health
curl -sS http://127.0.0.1:8089/health
curl -sS http://127.0.0.1:8090/v1/models

Data Refresh Rules

  • Do not hand-edit generated RAG output as a fix. Repair the source data or review-decision generator, then rebuild.
  • Keep Windchill metadata as metadata unless reviewed evidence grants stronger authority.
  • Preserve citation fields and allowed-use flags in every new evidence row.
  • Run stress evals before replacing the deployed model or index.

Next Steps

Current recommended work after the deployment.

Near-Term Integration

  • Wire the SI-Toolkit ComponentMind client into the component context path with the existing audit-only switch first.
  • Use exactComponentMatch as a hard guard before applying component-specific evidence.
  • Persist ComponentMind request/response snippets into generated reasoning files for review.
  • Compare ComponentMind evidence against existing ComponentKnowledgeLoader behavior.

Dataset and Training

  • Expand real-board eval coverage beyond the current stress set.
  • Add Windchill refresh jobs that keep metadata separate from reviewed electrical evidence.
  • Add negative tests for similar part families, especially logic gates, buffers, connectors, and power parts.
  • Re-run the ComponentMind data-readiness audit after each ComponentLibrary or Windchill import.