Skip to content

Product Roadmap

This page turns the current backlog into an implementation roadmap grounded in the codebase that exists today. It assumes the current chat stack in backend/internal/chat, tool registry in backend/internal/capability, artifact system in backend/internal/artifacts, auth stack in backend/internal/auth, realtime stack split across backend/internal/ws and chat SSE, and the Vite frontend in frontend/src remain the foundation.

Planning Principles

  • Prefer additive extensions to the current architecture over large rewrites.
  • Keep zero-knowledge guarantees intact. "Less code" for encryption should mean less duplication, not weaker security.
  • Favor structured outputs over raw free-form HTML or JSON whenever the model must edit something repeatedly.
  • Separate provider adapters from product features so search, browser, wallet, and reasoning vendors can change without touching chat orchestration.
  • Land observability and rollout controls with each feature instead of treating them as follow-up work.
PhaseFocusItemsWhy this order
1Platform foundationReduce LoC for Encryption, Improve Streaming System, Better ChartsThese are shared foundations for almost every other item.
2External research toolsWeb Search Prototype, AI Use-BrowserSearch is the simplest way to add fresh external knowledge; browser automation should build on that.
3Artifact platformBetter Artifact System, Reusable Artifact UIBrowser use will produce richer outputs; the artifact layer should be cleaned up before pushing further into design-generation features.
4Visual creationSemi-Figma MCP / Design StudioThis should build on a stronger artifact model and a more coherent artifact workspace.
5Personalization and reasoningMemory Presets, Reasoning Mode / Chain of ThoughtBoth features need a cleaner prompt assembly pipeline and clearer run metadata.
6Access and identityLogin with SolanaWallet auth is isolated enough to ship later once core product workflows are stronger.

Shared Refactors Before Feature Work

Two cross-cutting refactors should happen before the larger roadmap items:

  • Extract prompt construction out of backend/internal/chat/service.go into a new package such as backend/internal/prompt. That package should compose base instructions, enabled capability guidance, artifact context, file context, memory presets, and reasoning settings in a deterministic order.
  • Introduce a run-oriented abstraction for chat responses. The current system stores an empty assistant row, streams plaintext over SSE, and patches the final message at the end. A chat_runs abstraction makes streaming, reasoning, browser steps, citations, and retries much easier to manage.
  • Upgrade the existing planning/TODO behavior into a universal run-level checkpoint system. Do not build a browser-only TODO feature. It should work across browser use, search, file tools, artifacts, spreadsheet generation, design tools, and any future tool call chain.

Universal TODO / Checkpoint System

  • Treat TODOs as a first-class run primitive, not a special tool for one feature area.
  • The TODO system should be able to:
  • create a checklist at the beginning of a run
  • associate one or more TODO items with any tool call
  • record evidence, notes, and completion status
  • expose unresolved items before the run finalizes
  • allow follow-up runs to continue an incomplete checklist when appropriate
  • If an existing TODO mechanism already exists in the product, this roadmap should upgrade and generalize it rather than introduce a second competing system.
  • The current user-facing gap is not just backend logic. The chat UI still does not expose a visible TODO checklist; the current streaming flow primarily surfaces status text and tool call cards. The roadmap should treat checklist rendering as part of the core implementation, not as an optional enhancement.

1. Web Search Prototype

Goal

Give the assistant a low-risk, citation-friendly way to answer questions that require fresh or external information.

Current Touchpoints

  • backend/internal/capability/registry.go
  • backend/internal/inference/executor.go
  • backend/internal/chat/service.go
  • frontend/src/routes/_authenticated/chats/$chatId.tsx
  • Start with a single tool: web_search.
  • Return normalized search hits only. Do not fetch full page content in the first pass.
  • Restrict the tool to 3-5 results, plus title, URL, snippet, source domain, and optional published timestamp.
  • Add org-level or platform-level search provider configuration in admin.

Backend Implementation

  • Add a new package such as backend/internal/websearch.
  • Define a provider interface with a single Search(ctx, query, options) method and a normalized result type.
  • Implement one provider first. Tavily, Brave Search API, SerpAPI, or Exa are all reasonable. The code should not assume a specific vendor outside the provider adapter.
  • Add a capability file such as backend/internal/capability/web_search.go.
  • Register web_search in backend/internal/capability/registry.go.
  • Update prompt construction so the model is told to use web_search for fresh information, public facts, and citation-worthy answers.
  • Add caching for identical queries with a short TTL to reduce cost and rate-limit pressure.
  • Add allowlist and denylist support for domains at the provider config layer.

Data and API

  • Add a new config table for search provider settings, or extend platform settings if search will always be global.
  • Expose admin endpoints to read and update search config, for example:
  • GET /api/admin/search-config
  • PUT /api/admin/search-config
  • POST /api/admin/search-config/test
  • Do not store raw search result pages in the database for the prototype. Persist only config and optional short-lived cache data.

Frontend Implementation

  • Add an admin page section for search provider selection, API key, default result count, and domain guardrails.
  • Extend the chat UI so web_search tool calls render source cards cleanly instead of dumping raw JSON.
  • Add explicit citation chips or a "Sources" block in the final assistant message UI when a run included search results.

Risks and Guardrails

  • Search snippets can be stale or misleading. The assistant should be instructed to treat snippets as evidence, not as guaranteed truth.
  • The tool must not become a generic web crawler. Keep the first version intentionally narrow.
  • Budget controls matter. Add rate limiting and optional daily caps per organization.

Done When

  • Admins can configure one search provider.
  • The assistant can answer freshness-sensitive prompts with links and source names.
  • Search tool usage appears clearly in the chat UI.
  • Search failures degrade gracefully without breaking the whole run.

2. AI Use-Browser

Goal

Let the assistant inspect page contents after search, not just cite search snippets.

Current Touchpoints

  • backend/internal/ws
  • backend/internal/chat/service.go
  • frontend/src/hooks/useStreamMessage.ts
  • frontend/src/routes/_authenticated/chats/$chatId.tsx
  • Phase 1 should be a fetch-and-extract browser, not a full remote-controlled browser.
  • Only support http and https URLs.
  • Start with three tools:
  • browser_open(url)
  • browser_extract(session_id, selector_or_mode)
  • browser_find(session_id, pattern)
  • Defer click automation, forms, screenshots, and authenticated browsing until the fetch-based version is stable.
  • Require browser runs to use the universal TODO/checkpoint system for multi-step tasks so the model can plan, track, and verify progress before answering.

Execution Model

  • Treat browser use as a guided investigation, not as a loose sequence of fetch calls.
  • A browser-enabled run should follow this shape:
  • create a short TODO list from the user goal
  • perform search or open-page actions against the active TODO
  • extract evidence from pages
  • mark TODOs done, blocked, or still open
  • check whether the answer criteria are satisfied before producing the final response
  • If the model cannot satisfy all required TODOs, it should explicitly report which checkpoints remain unresolved.

TODO Checkpoints

  • Browser use should consume the universal run-scoped TODO support, not define its own isolated task model.
  • The TODO system can be exposed as small tools, internal orchestration primitives, or both. A practical first version would include:
  • todo_write(items)
  • todo_list()
  • todo_complete(id, note)
  • todo_check(id, evidence)
  • These TODO primitives should be available to any agentic run, not just browser-enabled ones.
  • Tool steps should be able to reference active TODO IDs so one checklist can chain across search, browser, file-read, artifact creation, and later design-editing steps.
  • The assistant prompt should encourage TODO usage whenever:
  • the user asks for comparison across multiple pages
  • the answer needs verification from more than one source
  • the browsing task has multiple subquestions
  • the answer requires more than one tool family
  • The frontend should render these TODOs as a compact checklist within the run UI so users can see what the browser agent is trying to prove.

Session and State Model

  • Add a browser session model tied to a chat run.
  • Store session state either in a short-lived database table or a TTL cache with fields such as:
  • session_id
  • chat_run_id
  • current_url
  • page_title
  • status
  • content_hash
  • created_at
  • expires_at
  • Store browser step logs separately from final message content so retries and audits remain possible.
  • Add run-scoped TODO records such as chat_run_todos with status, notes, evidence references, display order, and optional parent-child relationships.
  • Keep TODOs attached to the run, not to the browser session, so the same checklist can continue across other tool calls in the same task.

Backend Implementation

  • Add a package such as backend/internal/browser.
  • Model the browser layer as an ephemeral session service that stores sanitized page state in memory or a short-lived cache.
  • browser_open should fetch the page, enforce URL validation, strip scripts, run readability extraction, and keep both raw HTML and extracted text in a capped session object.
  • browser_extract should return a compact text slice, not the entire page.
  • browser_find should search within the extracted text and return matching sections with offsets or headings.
  • Add a browser coordinator that associates the active browser step with the active TODO so every page fetch has a reason.
  • Save extracted evidence references, such as URL plus heading plus text range, so TODO completion can point to concrete proof.
  • Normalize all page text as untrusted input. The prompt layer should explicitly tell the model not to obey instructions embedded in fetched content.
  • Add SSRF protections:
  • reject private IP ranges
  • reject loopback and link-local targets
  • follow limited redirects only
  • cap body size and request time
  • Add provider-agnostic content extraction stages:
  • raw fetch
  • readability extraction
  • section chunking
  • search indexing within the session
  • evidence reference generation
  • Defer JavaScript execution to a second implementation stage with a separate headless worker if the fetch-based model proves useful.

Transport and Streaming

  • Browser actions will create longer, more structured runs than simple file search.
  • Emit dedicated events for browser steps: browser_start, browser_result, browser_error.
  • Emit TODO events too, such as todo_added, todo_completed, todo_blocked, and todo_checked.
  • Keep transport on SSE for now, but move the event schema toward a shared run event model so browser steps and tool steps use the same reducer on the frontend.
  • Tool events should be able to include TODO linkage, for example active TODO IDs or checkpoint references, so the UI can show why a step happened.

Frontend Implementation

  • Render browser actions as timeline cards in chat, similar to tool cards but with URL, page title, and extracted section labels.
  • Render the active TODO checklist near those browser cards so users can see the plan, current step, and finished checkpoints.
  • Show a clear distinction between search results and opened-page content.
  • Add a compact preview for visited URLs and a way to expand extracted content without flooding the main message bubble.
  • Add a "verified from" affordance so completed TODOs can link back to evidence snippets from the browser session.
  • Make the checklist component reusable outside browser use so later agentic flows can render the same TODO UI.

Risks and Guardrails

  • Prompt injection from arbitrary websites is the main risk. Treat fetched page text as hostile.
  • Some pages will require JavaScript. Do not block launch on that. Mark them unsupported in v1 and plan a headless worker later if adoption justifies it.
  • Network access must be observable. Log URL, domain, status, size, and duration.
  • TODO misuse is another risk. The system should avoid giant planning lists by capping TODO count and encouraging focused checkpoints.

Done When

  • The assistant can open a search result and quote or summarize page content with source attribution.
  • Unsafe URLs are blocked before any outbound request is made.
  • Browser failures surface as tool errors, not broken chat streams.
  • Multi-step browser runs create and complete TODO checkpoints before finalizing the answer.

3. Better Artifact System / Reusable Artifact UI

Goal

Make the artifact system feel coherent, reusable, and useful across many output types instead of feeling like a collection of disconnected special cases.

Why This Pass Sits After Browser-Use

Browser use will immediately create pressure for better saved outputs:

  • research packs
  • cited summaries
  • browser evidence captures
  • generated dashboards
  • spreadsheets
  • design scenes

If the artifact layer remains inconsistent, each of those use cases will keep adding one-off UI and storage behavior.

Current Problems

  • The current artifact layer already supports text, visual HTML, and spreadsheet artifacts, but the UI is split between chat-specific sidebar behavior and a separate workspace view.
  • Artifact interactions are shaped around storage type more than user intention.
  • There is no unified artifact card model for "preview, inspect, version history, related chat, export, and continue editing."
  • Browser-use output will need a way to save structured research evidence without forcing everything into raw chat text.

Current Touchpoints

  • backend/internal/artifacts
  • backend/internal/capability/artifacts.go
  • frontend/src/components/artifacts/ArtifactWorkspace.tsx
  • frontend/src/components/artifacts/ChatArtifactSidebar.tsx
  • frontend/src/components/artifacts/ArtifactVisualFrame.tsx
  • Keep the existing artifact storage and versioning primitives, but introduce a clearer artifact product model:
  • artifact kind
  • artifact summary
  • artifact preview data
  • artifact actions
  • artifact relations
  • Standardize the artifact UI around three reusable surfaces:
  • inline artifact card in chat
  • sidebar inspector
  • full workspace
  • Make those surfaces kind-aware but structurally consistent.
  • Add a browser-capture artifact kind for saved research outputs after browser-use lands.

Backend Implementation

  • Add a richer artifact view model so the API can return normalized metadata across text, visual, spreadsheet, browser-capture, and future scene artifacts.
  • Add artifact relation support where useful, for example:
  • artifact created from chat run
  • artifact derived from browser session
  • artifact based on another artifact version
  • Add summary-generation hooks per artifact kind so the UI does not need to infer everything from filename and MIME type.
  • Ensure artifact version records can carry structured change summaries and optional evidence references.

Frontend Implementation

  • Redesign artifact cards so every artifact consistently shows title, kind, latest version, source context, primary action, and available secondary actions.
  • Use the same mental model everywhere:
  • open
  • preview
  • inspect versions
  • export
  • continue working
  • Add kind-specific tabs only after the shared structure is stable.
  • Unify the current chat sidebar and full workspace interactions so learning one artifact surface teaches the user the others.
  • Add a clearer empty state and navigation model for "no artifact selected," "artifact loading," and "artifact type not previewable."

Reusable Use Cases

This pass should support many future outputs without another UI rewrite:

  • browser research pack
  • markdown report
  • generated app or HTML experience
  • spreadsheet workbook
  • chart collection
  • design scene
  • internal notes or checklists

Integration with Browser-Use

  • Browser runs should be able to save a "research artifact" that contains source list, evidence snippets, TODO completion state, and optional summary text.
  • Artifact cards for browser-derived outputs should link back to the run or evidence set that created them.
  • This creates a clean bridge between "the AI investigated something" and "the user can keep working with the result."

Integration with Universal TODOs

  • Artifact-producing runs should be able to embed or reference the checklist that led to the output.
  • A report artifact, browser research artifact, or design artifact should be able to show which TODOs were completed, which evidence was used, and which items remain open.
  • This keeps TODOs from being transient debug state only; they become part of the reusable workflow when that helps the user.

Done When

  • Artifacts feel like one system instead of several separate UIs.
  • The same artifact interaction model works across text, browser, spreadsheet, visual, and future design outputs.
  • Browser-use can save reusable outputs without inventing a new side-channel UI.

4. Better Charts / Better Chart Design

Goal

Move from basic bar and pie output to a chart system that looks intentional and supports more analytical use cases.

Current Touchpoints

  • backend/internal/capability/chart_visualization.go
  • frontend/src/components/charts/ToolChart.tsx
  • frontend/src/routes/_authenticated/chats/$chatId.tsx
  • Keep show_bar_chart and show_pie_chart working for compatibility.
  • Add a new generic chart tool, for example render_chart, with a typed schema.
  • Support at least:
  • bar
  • stacked bar
  • line
  • area
  • pie / donut
  • scatter
  • table fallback

Schema Direction

The long-term chart schema should look more like this:

json
{
  "type": "line",
  "title": "Revenue by Month",
  "subtitle": "Q1 to Q4",
  "x_field": "month",
  "series": [
    { "name": "Actual", "color": "#0f766e", "data": [{ "x": "Jan", "y": 120 }] },
    { "name": "Target", "color": "#2563eb", "data": [{ "x": "Jan", "y": 100 }] }
  ],
  "y_axis": { "label": "USD", "format": "currency" },
  "annotations": [{ "label": "Launch", "x": "Apr" }]
}

Backend Implementation

  • Add a new chart capability file instead of continuing to overload the existing bar and pie schema.
  • Centralize chart argument validation so bad tool payloads fail early and cleanly.
  • Encourage the model to choose chart type based on analytical intent rather than hard-coded chart names.
  • Optionally add a helper that converts small chart payloads into downloadable artifact snapshots later.

Frontend Implementation

  • Replace the current one-off rendering logic in ToolChart.tsx with a renderer registry keyed by chart type.
  • Introduce shared chart theme tokens for spacing, typography, gridline color, tooltip layout, legend behavior, and mobile sizing.
  • Add better empty states, long-label handling, numeric formatting, and responsiveness.
  • Add download actions for PNG and CSV where possible.
  • Add table fallback when a chart cannot render or the data shape is invalid.

Design Direction

  • The visual language should match the rest of BoxedAI rather than generic chart-library defaults.
  • Use deliberate palettes, restrained gridlines, readable legends, and chart-specific typography.
  • Optimize for a clean analytical look, not just "more colors."

Done When

  • The assistant can produce charts for comparisons, trends, and distributions.
  • Tool cards render charts consistently on desktop and mobile.
  • Invalid chart payloads degrade to a readable table instead of a blank box.

5. Semi-Figma MCP / Design Studio

Goal

Turn the current visualization artifact flow into a more reliable design system for dashboards, mockups, presentations, and richer generated interfaces.

Product Direction

The right mental model is not "let the model write better raw HTML." The right model is "let the model manipulate a structured scene format that the app can render, edit, and version."

This should be inspired by tools like Bricks in the sense of reusable templates, block-based composition, and "refresh data without rebuilding the layout," but it should be implemented in a way that fits BoxedAI's artifact architecture.

Current Touchpoints

  • backend/internal/capability/artifacts.go
  • backend/internal/artifacts
  • frontend/src/components/artifacts/ArtifactWorkspace.tsx
  • frontend/src/components/artifacts/ArtifactVisualFrame.tsx
  • Keep create_visualization for backwards compatibility.
  • Add a new artifact format such as boxedai.scene.v1.
  • Introduce scene-aware tools instead of relying only on raw HTML:
  • create_design_scene
  • edit_design_scene
  • read_design_scene
  • Support a constrained block set first:
  • frame
  • text
  • card
  • chart
  • table
  • image
  • metric
  • filter control

Data Model Direction

Use a structured envelope similar to:

json
{
  "format": "boxedai.scene.v1",
  "frame": { "width": 1440, "height": 900, "background": "theme.surface" },
  "theme": { "name": "boxed-analytics" },
  "blocks": [
    {
      "id": "hero_1",
      "type": "metric",
      "x": 48,
      "y": 48,
      "w": 280,
      "h": 160,
      "props": { "label": "Revenue", "value": "$1.2M" }
    }
  ],
  "bindings": []
}

Backend Implementation

  • Add a new artifact kind and content adapter for scene artifacts in backend/internal/artifacts.
  • Build a scene compiler that turns the structured scene into safe HTML for preview.
  • Keep the source of truth as the scene JSON, not the compiled HTML.
  • Extend artifact versioning so scene edits create clean change summaries.
  • Reuse existing artifact_updated events so the chat UI and artifact workspace stay synchronized.
  • Once the scene model is stable, expose it through an MCP-compatible surface if external agents need to edit the same primitives. Do not start with MCP as the storage format.

Frontend Implementation

  • Add a scene renderer and editor mode inside ArtifactWorkspace.
  • Add block selection, drag, resize, duplicate, reorder, snap-to-grid, and alignment helpers.
  • Add a right-side inspector for theme, spacing, typography, and data bindings.
  • Add a starter template gallery for common use cases:
  • KPI dashboard
  • report cover
  • product mockup
  • investor slide
  • data summary page

Integration with Other Roadmap Items

  • Chart improvements should feed into scene blocks so charts inside scenes use the same renderer and theme tokens.
  • Web search and browser tools can optionally capture cited content into a scene or report artifact.
  • Memory presets can influence brand rules, tone, and layout constraints for generated designs.

Risks and Guardrails

  • Raw HTML generation is too brittle for iterative editing. The scene format exists to avoid that trap.
  • Do not let arbitrary JavaScript become the primary extensibility path for generated designs.
  • Version diffs matter. Store enough metadata to explain what changed between scene versions.

Done When

  • The assistant can create and revise structured visual layouts instead of only free-form HTML artifacts.
  • Users can manually adjust generated layouts in the UI without losing the AI-generated structure.
  • One template can be refreshed with new data while preserving layout.

6. Backend: Reduce LoC for Encryption

Goal

Reduce duplicated encryption logic without changing the zero-knowledge model.

Current Problems

  • AES-GCM helpers exist in multiple places.
  • Artifact encryption, file encryption, message sealing, and search-index decryption each carry their own branching logic.
  • WebSocket crypto request flow is used indirectly from multiple packages with repeated glue code.
  • Encryption policy checks re-query organization state in multiple places.

Current Touchpoints

  • backend/internal/files/crypto.go
  • backend/internal/files/service.go
  • backend/internal/artifacts/tool_command_service.go
  • backend/internal/capability/file_read.go
  • backend/internal/capability/file_search.go
  • backend/internal/ws/hub.go
  • Add a shared package such as backend/internal/e2ee.
  • Move all AES-GCM helpers into that package.
  • Add shared helpers for:
  • org encryption policy lookup
  • browser crypto bridge access
  • encrypted key resolution
  • encrypt-if-enabled and decrypt-if-needed flows
  • Replace one-off decrypt helpers in file read and file search with shared functions.
  • Add a small context-bound service interface so artifacts, files, and capabilities do not need to know the details of ws.Hub directly.

Code Shape

The package should expose a small surface area, for example:

  • PolicyProvider
  • CryptoBridge
  • EncryptTextIfEnabled
  • DecryptTextIfNeeded
  • ResolveObjectKey
  • ResolveIndexKey

Follow-On Cleanup

  • Move encryption decision branching out of handlers and into service-layer helpers.
  • Standardize error messages so frontend code can react consistently to "key not available" vs "ciphertext invalid" vs "ws unavailable".
  • Add package-level tests that cover ciphertext format once, rather than repeating similar tests in multiple feature packages.

Done When

  • Encryption-related logic is centralized and easier to audit.
  • New encrypted features no longer need custom helper code.
  • Total encryption-related LoC drops because duplicate utilities are removed, not because behavior is deleted.

7. Backend: Improve Streaming System

Goal

Replace the current fragile stream lifecycle with a run-based system that supports tool steps, browser steps, retries, cancellation, and better observability.

Current Problems

  • The current flow inserts an empty assistant message before generation and patches it later.
  • Tool-call deltas are handled with a "clear streamed garbage" workaround.
  • SSE payloads are message-centric, not run-centric.
  • Cancellation exists client-side but not as a first-class server run state.
  • Browser tooling and reasoning summaries will make the current event model harder to maintain.

Current Touchpoints

  • backend/internal/chat/service.go
  • backend/internal/chat/events.go
  • backend/internal/chat/handler.go
  • frontend/src/hooks/useStreamMessage.ts
  • frontend/src/services/api/chats.ts
  • Add a chat_runs table with fields like:
  • id
  • chat_id
  • user_id
  • status
  • started_at
  • completed_at
  • error_code
  • model
  • reasoning_effort
  • Optionally add a chat_run_todos table for run-scoped checkpoints and progress notes.
  • Optionally add a chat_run_steps table for tool invocations, browser actions, and citations.
  • Introduce a RunManager package to own lifecycle, persistence, and event emission.
  • Keep the final assistant message as a result of a completed run, not as the thing that defines the run.
  • Ensure chat_run_steps can reference related TODO IDs so any tool call can be chained to the checklist.

Event Model

Move toward typed run events such as:

  • run_started
  • run_status
  • todo_added
  • todo_completed
  • todo_blocked
  • todo_checked
  • text_delta
  • tool_started
  • tool_completed
  • browser_started
  • browser_completed
  • citation_added
  • run_failed
  • run_completed

Transport Recommendation

  • Keep SSE as the first transport to minimize disruption.
  • Add run IDs and event sequence IDs so the frontend can resume or deduplicate events.
  • Revisit transport unification with WebSocket only after the event schema is stable.

Frontend Implementation

  • Replace the current ad hoc state updates in useStreamMessage.ts with a reducer keyed by run_id.
  • Render status, TODO checkpoints, tool calls, browser steps, and citations from the same event stream.
  • Add a persistent checklist panel or inline checklist block in the chat run UI. This is required work, because the current UI does not visibly expose a TODO checklist.
  • Add explicit resume and retry flows for interrupted runs.
  • Add latency metrics in the UI such as time to first token and total run duration for debugging.

Operational Requirements

  • Add heartbeat events to keep long responses alive through proxies.
  • Add cancel support that marks the run canceled server-side and stops downstream provider work.
  • Add structured logs and metrics for every run state transition.

Done When

  • A run has a stable lifecycle independent of the final message row.
  • Tool and browser events no longer rely on stream-clearing hacks.
  • Browser and other agentic runs can expose TODO checkpoints and progress cleanly.
  • Users can cancel runs cleanly and recover from transient disconnects.

8. Backend: Reasoning Mode / Chain of Thought

Goal

Improve deep reasoning quality while avoiding raw chain-of-thought storage or exposure.

Important Product Decision

This feature should not be implemented as "show the model's full hidden chain of thought to the user." The safer and more maintainable version is:

  • allow higher reasoning effort where providers support it
  • optionally store a concise reasoning summary or answer plan
  • never persist hidden raw reasoning traces unless there is a very strong internal need and a separate security review

Current Touchpoints

  • backend/internal/chat/service.go
  • backend/internal/inference/provider.go
  • frontend/src/routes/_authenticated/chats/$chatId.tsx
  • Add a per-run reasoning setting: low, medium, high.
  • Map that setting to provider-specific controls where available.
  • Add an optional visible reasoning_summary field that contains a short explanation of the approach, not the full internal trace.

Backend Implementation

  • Extend inference provider abstractions so a run can request different reasoning effort without provider-specific logic leaking into chat orchestration.
  • Add reasoning settings into the new prompt builder rather than hard-coding more prompt branches into chat/service.go.
  • Persist reasoning metadata on the run and optionally on the final assistant message.
  • Add guardrails so reasoning summaries are generated after the answer, not as a replacement for the answer.

Frontend Implementation

  • Add a composer toggle or per-chat setting for reasoning mode.
  • Display a small "Reasoned" badge or effort indicator on completed runs.
  • If a reasoning summary exists, show it behind an expandable panel labeled clearly as a summary of approach.

Risks and Guardrails

  • Higher reasoning effort increases cost and latency. Make it explicit in the UI.
  • The product should not imply that hidden reasoning is complete, always correct, or user-auditable.
  • Treat reasoning summary as user-visible content subject to the same quality bar as any answer.

Done When

  • Users can request deeper reasoning on selected runs.
  • Providers that support reasoning settings can use them.
  • The product exposes only concise reasoning summaries, not raw hidden chain-of-thought logs.

9. Memory Presets

Goal

Let users and organizations store reusable guidance such as "When I ask for investor updates, use this format" or "Always answer compliance questions with these rules."

Current Touchpoints

  • backend/internal/chat/service.go
  • frontend/src/routes/_authenticated/chats/$chatId.tsx
  • frontend/src/hooks/useAuth.ts
  • Support three scopes:
  • user preset
  • organization preset
  • chat-attached preset
  • Support three trigger modes in v1:
  • always on
  • keyword contains
  • manually selected in the composer
  • Defer regex and semantic matching until later.

Data Model

Add tables such as:

  • memory_presets
  • id
  • organization_id
  • user_id nullable
  • scope
  • name
  • instruction
  • trigger_mode
  • trigger_value
  • priority
  • enabled
  • created_at
  • updated_at

Optionally add a join table if chats can have multiple manually attached presets.

Backend Implementation

  • Add a preset service that loads relevant presets for a user and chat, ranks them, and returns the instructions to the prompt builder.
  • Apply presets as structured system instructions, not string concatenation scattered through handlers.
  • Add collision rules:
  • platform safety instructions win
  • organization presets come before user presets
  • user presets come before chat-local context
  • Record which presets were applied on a run for debugging and auditability.

Frontend Implementation

  • Add a settings UI for creating and editing presets.
  • Add lightweight preset selection in the chat composer for manual presets.
  • Add a preview area that shows which presets will apply before sending.
  • Show applied presets somewhere in the run details so behavior is explainable.

Risks and Guardrails

  • Too many presets can create prompt bloat. Add priority caps and content length limits.
  • Presets can conflict. The UI should surface precedence rather than pretending all rules are equal.
  • Avoid hidden behavior. Users should be able to see what preset logic was active.

Done When

  • Users can save and reuse prompt guidance without retyping it.
  • The backend applies presets deterministically.
  • Applied presets are visible and debuggable.

10. Login with Solana

Goal

Add wallet-based authentication for users who prefer Solana identity over email/password.

Product Direction

Do not tie login to on-chain state. This is a signed-message authentication flow, not a blockchain transaction flow.

Current Touchpoints

  • backend/internal/auth/service.go
  • backend/internal/auth/handler.go
  • frontend/src/routes/login.tsx
  • frontend/src/services/api/auth.ts
  • Phase 1: link a Solana wallet to an existing BoxedAI account.
  • Phase 2: allow wallet-first sign-in for linked accounts.
  • Phase 3: optionally support invited wallet-first account creation if the org model needs it.

Backend Implementation

  • Add a table such as wallet_identities with:
  • id
  • user_id
  • chain
  • public_key
  • label
  • is_primary
  • Add a short-lived challenge table such as auth_challenges with:
  • id
  • purpose
  • public_key
  • nonce
  • expires_at
  • used_at
  • Add endpoints such as:
  • POST /api/auth/solana/challenge
  • POST /api/auth/solana/verify
  • POST /api/auth/solana/link
  • Reuse the existing session creation flow after signature verification succeeds.
  • Bind the signed statement to domain, nonce, issued time, expiration, and intended action.

Frontend Implementation

  • Add a Solana wallet button to frontend/src/routes/login.tsx.
  • Use Solana wallet adapter libraries for connection and signature prompts.
  • Keep the UX explicit: connect wallet, sign challenge, receive normal BoxedAI session token.
  • Add wallet management under account settings so users can link, unlink, and rotate wallets.

Security Requirements

  • Nonces must be single-use and short-lived.
  • The statement must include the BoxedAI domain and intended action to reduce phishing and replay risk.
  • Signature verification must happen server-side.
  • Wallet login should be feature-flagged so organizations can disable it if they do not want wallet auth.

Done When

  • Existing users can link a Solana wallet and sign in with it.
  • The same session and authorization model continues to work after wallet verification.
  • Wallet auth can be disabled without affecting email/password login.

Suggested Delivery Order in Practice

If this roadmap is executed by a small team, the most pragmatic order is:

  1. Extract prompt builder, run model, and universal TODO/checkpoint system.
  2. Refactor encryption internals.
  3. Upgrade chart system.
  4. Ship web search.
  5. Ship fetch-based browser tools.
  6. Do the artifact system and artifact UI cleanup pass.
  7. Build scene-based design artifacts.
  8. Add memory presets.
  9. Add reasoning mode.
  10. Add Solana wallet linking and wallet login.

That sequence minimizes rework because the later features all benefit from better prompt composition, cleaner streaming, explicit TODO checkpoints for agentic work, and stronger artifact primitives.