Product Roadmap

This page turns the current backlog into an implementation roadmap grounded in the codebase that exists today. It assumes the current chat stack in backend/internal/chat, tool registry in backend/internal/capability, artifact system in backend/internal/artifacts, auth stack in backend/internal/auth, realtime stack split across backend/internal/ws and chat SSE, and the Vite frontend in frontend/src remain the foundation.

Planning Principles

Prefer additive extensions to the current architecture over large rewrites.
Keep zero-knowledge guarantees intact. "Less code" for encryption should mean less duplication, not weaker security.
Favor structured outputs over raw free-form HTML or JSON whenever the model must edit something repeatedly.
Separate provider adapters from product features so search, browser, wallet, and reasoning vendors can change without touching chat orchestration.
Land observability and rollout controls with each feature instead of treating them as follow-up work.

Recommended Sequence

Phase	Focus	Items	Why this order
1	Platform foundation	Reduce LoC for Encryption, Improve Streaming System, Better Charts	These are shared foundations for almost every other item.
2	External research tools	Web Search Prototype, AI Use-Browser	Search is the simplest way to add fresh external knowledge; browser automation should build on that.
3	Artifact platform	Better Artifact System, Reusable Artifact UI	Browser use will produce richer outputs; the artifact layer should be cleaned up before pushing further into design-generation features.
4	Visual creation	Semi-Figma MCP / Design Studio	This should build on a stronger artifact model and a more coherent artifact workspace.
5	Personalization and reasoning	Memory Presets, Reasoning Mode / Chain of Thought	Both features need a cleaner prompt assembly pipeline and clearer run metadata.
6	Access and identity	Login with Solana	Wallet auth is isolated enough to ship later once core product workflows are stronger.

Shared Refactors Before Feature Work

Two cross-cutting refactors should happen before the larger roadmap items:

Extract prompt construction out of backend/internal/chat/service.go into a new package such as backend/internal/prompt. That package should compose base instructions, enabled capability guidance, artifact context, file context, memory presets, and reasoning settings in a deterministic order.
Introduce a run-oriented abstraction for chat responses. The current system stores an empty assistant row, streams plaintext over SSE, and patches the final message at the end. A chat_runs abstraction makes streaming, reasoning, browser steps, citations, and retries much easier to manage.
Upgrade the existing planning/TODO behavior into a universal run-level checkpoint system. Do not build a browser-only TODO feature. It should work across browser use, search, file tools, artifacts, spreadsheet generation, design tools, and any future tool call chain.

Universal TODO / Checkpoint System

Treat TODOs as a first-class run primitive, not a special tool for one feature area.
The TODO system should be able to:
create a checklist at the beginning of a run
associate one or more TODO items with any tool call
record evidence, notes, and completion status
expose unresolved items before the run finalizes
allow follow-up runs to continue an incomplete checklist when appropriate
If an existing TODO mechanism already exists in the product, this roadmap should upgrade and generalize it rather than introduce a second competing system.
The current user-facing gap is not just backend logic. The chat UI still does not expose a visible TODO checklist; the current streaming flow primarily surfaces status text and tool call cards. The roadmap should treat checklist rendering as part of the core implementation, not as an optional enhancement.

1. Web Search Prototype

Goal

Give the assistant a low-risk, citation-friendly way to answer questions that require fresh or external information.

Current Touchpoints

backend/internal/capability/registry.go
backend/internal/inference/executor.go
backend/internal/chat/service.go
frontend/src/routes/_authenticated/chats/$chatId.tsx

Recommended Scope

Start with a single tool: web_search.
Return normalized search hits only. Do not fetch full page content in the first pass.
Restrict the tool to 3-5 results, plus title, URL, snippet, source domain, and optional published timestamp.
Add org-level or platform-level search provider configuration in admin.

Backend Implementation

Add a new package such as backend/internal/websearch.
Define a provider interface with a single Search(ctx, query, options) method and a normalized result type.
Implement one provider first. Tavily, Brave Search API, SerpAPI, or Exa are all reasonable. The code should not assume a specific vendor outside the provider adapter.
Add a capability file such as backend/internal/capability/web_search.go.
Register web_search in backend/internal/capability/registry.go.
Update prompt construction so the model is told to use web_search for fresh information, public facts, and citation-worthy answers.
Add caching for identical queries with a short TTL to reduce cost and rate-limit pressure.
Add allowlist and denylist support for domains at the provider config layer.

Data and API

Add a new config table for search provider settings, or extend platform settings if search will always be global.
Expose admin endpoints to read and update search config, for example:
GET /api/admin/search-config
PUT /api/admin/search-config
POST /api/admin/search-config/test
Do not store raw search result pages in the database for the prototype. Persist only config and optional short-lived cache data.

Frontend Implementation

Add an admin page section for search provider selection, API key, default result count, and domain guardrails.
Extend the chat UI so web_search tool calls render source cards cleanly instead of dumping raw JSON.
Add explicit citation chips or a "Sources" block in the final assistant message UI when a run included search results.

Risks and Guardrails

Search snippets can be stale or misleading. The assistant should be instructed to treat snippets as evidence, not as guaranteed truth.
The tool must not become a generic web crawler. Keep the first version intentionally narrow.
Budget controls matter. Add rate limiting and optional daily caps per organization.

Done When

Admins can configure one search provider.
The assistant can answer freshness-sensitive prompts with links and source names.
Search tool usage appears clearly in the chat UI.
Search failures degrade gracefully without breaking the whole run.

2. AI Use-Browser

Goal

Let the assistant inspect page contents after search, not just cite search snippets.

Current Touchpoints

backend/internal/ws
backend/internal/chat/service.go
frontend/src/hooks/useStreamMessage.ts
frontend/src/routes/_authenticated/chats/$chatId.tsx

Recommended Scope

Phase 1 should be a fetch-and-extract browser, not a full remote-controlled browser.
Only support http and https URLs.
Start with three tools:
browser_open(url)
browser_extract(session_id, selector_or_mode)
browser_find(session_id, pattern)
Defer click automation, forms, screenshots, and authenticated browsing until the fetch-based version is stable.
Require browser runs to use the universal TODO/checkpoint system for multi-step tasks so the model can plan, track, and verify progress before answering.

Execution Model

Treat browser use as a guided investigation, not as a loose sequence of fetch calls.
A browser-enabled run should follow this shape:
create a short TODO list from the user goal
perform search or open-page actions against the active TODO
extract evidence from pages
mark TODOs done, blocked, or still open
check whether the answer criteria are satisfied before producing the final response
If the model cannot satisfy all required TODOs, it should explicitly report which checkpoints remain unresolved.

TODO Checkpoints

Browser use should consume the universal run-scoped TODO support, not define its own isolated task model.
The TODO system can be exposed as small tools, internal orchestration primitives, or both. A practical first version would include:
todo_write(items)
todo_list()
todo_complete(id, note)
todo_check(id, evidence)
These TODO primitives should be available to any agentic run, not just browser-enabled ones.
Tool steps should be able to reference active TODO IDs so one checklist can chain across search, browser, file-read, artifact creation, and later design-editing steps.
The assistant prompt should encourage TODO usage whenever:
the user asks for comparison across multiple pages
the answer needs verification from more than one source
the browsing task has multiple subquestions
the answer requires more than one tool family
The frontend should render these TODOs as a compact checklist within the run UI so users can see what the browser agent is trying to prove.

Session and State Model

Add a browser session model tied to a chat run.
Store session state either in a short-lived database table or a TTL cache with fields such as:
session_id
chat_run_id
current_url
page_title
status
content_hash
created_at
expires_at
Store browser step logs separately from final message content so retries and audits remain possible.
Add run-scoped TODO records such as chat_run_todos with status, notes, evidence references, display order, and optional parent-child relationships.
Keep TODOs attached to the run, not to the browser session, so the same checklist can continue across other tool calls in the same task.

Backend Implementation

Add a package such as backend/internal/browser.
Model the browser layer as an ephemeral session service that stores sanitized page state in memory or a short-lived cache.
browser_open should fetch the page, enforce URL validation, strip scripts, run readability extraction, and keep both raw HTML and extracted text in a capped session object.
browser_extract should return a compact text slice, not the entire page.
browser_find should search within the extracted text and return matching sections with offsets or headings.
Add a browser coordinator that associates the active browser step with the active TODO so every page fetch has a reason.
Save extracted evidence references, such as URL plus heading plus text range, so TODO completion can point to concrete proof.
Normalize all page text as untrusted input. The prompt layer should explicitly tell the model not to obey instructions embedded in fetched content.
Add SSRF protections:
reject private IP ranges
reject loopback and link-local targets
follow limited redirects only
cap body size and request time
Add provider-agnostic content extraction stages:
raw fetch
readability extraction
section chunking
search indexing within the session
evidence reference generation
Defer JavaScript execution to a second implementation stage with a separate headless worker if the fetch-based model proves useful.

Transport and Streaming

Browser actions will create longer, more structured runs than simple file search.
Emit dedicated events for browser steps: browser_start, browser_result, browser_error.
Emit TODO events too, such as todo_added, todo_completed, todo_blocked, and todo_checked.
Keep transport on SSE for now, but move the event schema toward a shared run event model so browser steps and tool steps use the same reducer on the frontend.
Tool events should be able to include TODO linkage, for example active TODO IDs or checkpoint references, so the UI can show why a step happened.

Frontend Implementation

Render browser actions as timeline cards in chat, similar to tool cards but with URL, page title, and extracted section labels.
Render the active TODO checklist near those browser cards so users can see the plan, current step, and finished checkpoints.
Show a clear distinction between search results and opened-page content.
Add a compact preview for visited URLs and a way to expand extracted content without flooding the main message bubble.
Add a "verified from" affordance so completed TODOs can link back to evidence snippets from the browser session.
Make the checklist component reusable outside browser use so later agentic flows can render the same TODO UI.

Risks and Guardrails

Prompt injection from arbitrary websites is the main risk. Treat fetched page text as hostile.
Some pages will require JavaScript. Do not block launch on that. Mark them unsupported in v1 and plan a headless worker later if adoption justifies it.
Network access must be observable. Log URL, domain, status, size, and duration.
TODO misuse is another risk. The system should avoid giant planning lists by capping TODO count and encouraging focused checkpoints.

Done When

The assistant can open a search result and quote or summarize page content with source attribution.
Unsafe URLs are blocked before any outbound request is made.
Browser failures surface as tool errors, not broken chat streams.
Multi-step browser runs create and complete TODO checkpoints before finalizing the answer.

3. Better Artifact System / Reusable Artifact UI

Goal

Make the artifact system feel coherent, reusable, and useful across many output types instead of feeling like a collection of disconnected special cases.

Why This Pass Sits After Browser-Use

Browser use will immediately create pressure for better saved outputs:

research packs
cited summaries
browser evidence captures
generated dashboards
spreadsheets
design scenes

If the artifact layer remains inconsistent, each of those use cases will keep adding one-off UI and storage behavior.

Current Problems

The current artifact layer already supports text, visual HTML, and spreadsheet artifacts, but the UI is split between chat-specific sidebar behavior and a separate workspace view.
Artifact interactions are shaped around storage type more than user intention.
There is no unified artifact card model for "preview, inspect, version history, related chat, export, and continue editing."
Browser-use output will need a way to save structured research evidence without forcing everything into raw chat text.

Current Touchpoints

backend/internal/artifacts
backend/internal/capability/artifacts.go
frontend/src/components/artifacts/ArtifactWorkspace.tsx
frontend/src/components/artifacts/ChatArtifactSidebar.tsx
frontend/src/components/artifacts/ArtifactVisualFrame.tsx

Recommended Scope

Keep the existing artifact storage and versioning primitives, but introduce a clearer artifact product model:
artifact kind
artifact summary
artifact preview data
artifact actions
artifact relations
Standardize the artifact UI around three reusable surfaces:
inline artifact card in chat
sidebar inspector
full workspace
Make those surfaces kind-aware but structurally consistent.
Add a browser-capture artifact kind for saved research outputs after browser-use lands.

Backend Implementation

Add a richer artifact view model so the API can return normalized metadata across text, visual, spreadsheet, browser-capture, and future scene artifacts.
Add artifact relation support where useful, for example:
artifact created from chat run
artifact derived from browser session
artifact based on another artifact version
Add summary-generation hooks per artifact kind so the UI does not need to infer everything from filename and MIME type.
Ensure artifact version records can carry structured change summaries and optional evidence references.

Frontend Implementation

Redesign artifact cards so every artifact consistently shows title, kind, latest version, source context, primary action, and available secondary actions.
Use the same mental model everywhere:
open
preview
inspect versions
export
continue working
Add kind-specific tabs only after the shared structure is stable.
Unify the current chat sidebar and full workspace interactions so learning one artifact surface teaches the user the others.
Add a clearer empty state and navigation model for "no artifact selected," "artifact loading," and "artifact type not previewable."

Reusable Use Cases

This pass should support many future outputs without another UI rewrite:

browser research pack
markdown report
generated app or HTML experience
spreadsheet workbook
chart collection
design scene
internal notes or checklists

Integration with Browser-Use

Browser runs should be able to save a "research artifact" that contains source list, evidence snippets, TODO completion state, and optional summary text.
Artifact cards for browser-derived outputs should link back to the run or evidence set that created them.
This creates a clean bridge between "the AI investigated something" and "the user can keep working with the result."

Integration with Universal TODOs

Artifact-producing runs should be able to embed or reference the checklist that led to the output.
A report artifact, browser research artifact, or design artifact should be able to show which TODOs were completed, which evidence was used, and which items remain open.
This keeps TODOs from being transient debug state only; they become part of the reusable workflow when that helps the user.

Done When

Artifacts feel like one system instead of several separate UIs.
The same artifact interaction model works across text, browser, spreadsheet, visual, and future design outputs.
Browser-use can save reusable outputs without inventing a new side-channel UI.

4. Better Charts / Better Chart Design

Goal

Move from basic bar and pie output to a chart system that looks intentional and supports more analytical use cases.

Current Touchpoints

backend/internal/capability/chart_visualization.go
frontend/src/components/charts/ToolChart.tsx
frontend/src/routes/_authenticated/chats/$chatId.tsx

Recommended Scope

Keep show_bar_chart and show_pie_chart working for compatibility.
Add a new generic chart tool, for example render_chart, with a typed schema.
Support at least:
bar
stacked bar
line
area
pie / donut
scatter
table fallback

Schema Direction

The long-term chart schema should look more like this:

json

{
  "type": "line",
  "title": "Revenue by Month",
  "subtitle": "Q1 to Q4",
  "x_field": "month",
  "series": [
    { "name": "Actual", "color": "#0f766e", "data": [{ "x": "Jan", "y": 120 }] },
    { "name": "Target", "color": "#2563eb", "data": [{ "x": "Jan", "y": 100 }] }
  ],
  "y_axis": { "label": "USD", "format": "currency" },
  "annotations": [{ "label": "Launch", "x": "Apr" }]
}

Backend Implementation

Add a new chart capability file instead of continuing to overload the existing bar and pie schema.
Centralize chart argument validation so bad tool payloads fail early and cleanly.
Encourage the model to choose chart type based on analytical intent rather than hard-coded chart names.
Optionally add a helper that converts small chart payloads into downloadable artifact snapshots later.

Frontend Implementation

Replace the current one-off rendering logic in ToolChart.tsx with a renderer registry keyed by chart type.
Introduce shared chart theme tokens for spacing, typography, gridline color, tooltip layout, legend behavior, and mobile sizing.
Add better empty states, long-label handling, numeric formatting, and responsiveness.
Add download actions for PNG and CSV where possible.
Add table fallback when a chart cannot render or the data shape is invalid.

Design Direction

The visual language should match the rest of BoxedAI rather than generic chart-library defaults.
Use deliberate palettes, restrained gridlines, readable legends, and chart-specific typography.
Optimize for a clean analytical look, not just "more colors."

Done When

The assistant can produce charts for comparisons, trends, and distributions.
Tool cards render charts consistently on desktop and mobile.
Invalid chart payloads degrade to a readable table instead of a blank box.

5. Semi-Figma MCP / Design Studio

Goal

Turn the current visualization artifact flow into a more reliable design system for dashboards, mockups, presentations, and richer generated interfaces.

Product Direction

The right mental model is not "let the model write better raw HTML." The right model is "let the model manipulate a structured scene format that the app can render, edit, and version."

This should be inspired by tools like Bricks in the sense of reusable templates, block-based composition, and "refresh data without rebuilding the layout," but it should be implemented in a way that fits BoxedAI's artifact architecture.

Current Touchpoints

backend/internal/capability/artifacts.go
backend/internal/artifacts
frontend/src/components/artifacts/ArtifactWorkspace.tsx
frontend/src/components/artifacts/ArtifactVisualFrame.tsx

Recommended Scope

Keep create_visualization for backwards compatibility.
Add a new artifact format such as boxedai.scene.v1.
Introduce scene-aware tools instead of relying only on raw HTML:
create_design_scene
edit_design_scene
read_design_scene
Support a constrained block set first:
frame
text
card
chart
table
image
metric
filter control

Data Model Direction

Use a structured envelope similar to:

json

{
  "format": "boxedai.scene.v1",
  "frame": { "width": 1440, "height": 900, "background": "theme.surface" },
  "theme": { "name": "boxed-analytics" },
  "blocks": [
    {
      "id": "hero_1",
      "type": "metric",
      "x": 48,
      "y": 48,
      "w": 280,
      "h": 160,
      "props": { "label": "Revenue", "value": "$1.2M" }
    }
  ],
  "bindings": []
}

Backend Implementation

Add a new artifact kind and content adapter for scene artifacts in backend/internal/artifacts.
Build a scene compiler that turns the structured scene into safe HTML for preview.
Keep the source of truth as the scene JSON, not the compiled HTML.
Extend artifact versioning so scene edits create clean change summaries.
Reuse existing artifact_updated events so the chat UI and artifact workspace stay synchronized.
Once the scene model is stable, expose it through an MCP-compatible surface if external agents need to edit the same primitives. Do not start with MCP as the storage format.

Frontend Implementation

Add a scene renderer and editor mode inside ArtifactWorkspace.
Add block selection, drag, resize, duplicate, reorder, snap-to-grid, and alignment helpers.
Add a right-side inspector for theme, spacing, typography, and data bindings.
Add a starter template gallery for common use cases:
KPI dashboard
report cover
product mockup
investor slide
data summary page

Integration with Other Roadmap Items

Chart improvements should feed into scene blocks so charts inside scenes use the same renderer and theme tokens.
Web search and browser tools can optionally capture cited content into a scene or report artifact.
Memory presets can influence brand rules, tone, and layout constraints for generated designs.

Risks and Guardrails

Raw HTML generation is too brittle for iterative editing. The scene format exists to avoid that trap.
Do not let arbitrary JavaScript become the primary extensibility path for generated designs.
Version diffs matter. Store enough metadata to explain what changed between scene versions.

Done When

The assistant can create and revise structured visual layouts instead of only free-form HTML artifacts.
Users can manually adjust generated layouts in the UI without losing the AI-generated structure.
One template can be refreshed with new data while preserving layout.

6. Backend: Reduce LoC for Encryption

Goal

Reduce duplicated encryption logic without changing the zero-knowledge model.

Current Problems

AES-GCM helpers exist in multiple places.
Artifact encryption, file encryption, message sealing, and search-index decryption each carry their own branching logic.
WebSocket crypto request flow is used indirectly from multiple packages with repeated glue code.
Encryption policy checks re-query organization state in multiple places.

Current Touchpoints

backend/internal/files/crypto.go
backend/internal/files/service.go
backend/internal/artifacts/tool_command_service.go
backend/internal/capability/file_read.go
backend/internal/capability/file_search.go
backend/internal/ws/hub.go

Recommended Refactor

Add a shared package such as backend/internal/e2ee.
Move all AES-GCM helpers into that package.
Add shared helpers for:
org encryption policy lookup
browser crypto bridge access
encrypted key resolution
encrypt-if-enabled and decrypt-if-needed flows
Replace one-off decrypt helpers in file read and file search with shared functions.
Add a small context-bound service interface so artifacts, files, and capabilities do not need to know the details of ws.Hub directly.

Code Shape

The package should expose a small surface area, for example:

PolicyProvider
CryptoBridge
EncryptTextIfEnabled
DecryptTextIfNeeded
ResolveObjectKey
ResolveIndexKey

Follow-On Cleanup

Move encryption decision branching out of handlers and into service-layer helpers.
Standardize error messages so frontend code can react consistently to "key not available" vs "ciphertext invalid" vs "ws unavailable".
Add package-level tests that cover ciphertext format once, rather than repeating similar tests in multiple feature packages.

Done When

Encryption-related logic is centralized and easier to audit.
New encrypted features no longer need custom helper code.
Total encryption-related LoC drops because duplicate utilities are removed, not because behavior is deleted.

7. Backend: Improve Streaming System

Goal

Replace the current fragile stream lifecycle with a run-based system that supports tool steps, browser steps, retries, cancellation, and better observability.

Current Problems

The current flow inserts an empty assistant message before generation and patches it later.
Tool-call deltas are handled with a "clear streamed garbage" workaround.
SSE payloads are message-centric, not run-centric.
Cancellation exists client-side but not as a first-class server run state.
Browser tooling and reasoning summaries will make the current event model harder to maintain.

Current Touchpoints

backend/internal/chat/service.go
backend/internal/chat/events.go
backend/internal/chat/handler.go
frontend/src/hooks/useStreamMessage.ts
frontend/src/services/api/chats.ts

Recommended Architecture

Add a chat_runs table with fields like:
id
chat_id
user_id
status
started_at
completed_at
error_code
model
reasoning_effort
Optionally add a chat_run_todos table for run-scoped checkpoints and progress notes.
Optionally add a chat_run_steps table for tool invocations, browser actions, and citations.
Introduce a RunManager package to own lifecycle, persistence, and event emission.
Keep the final assistant message as a result of a completed run, not as the thing that defines the run.
Ensure chat_run_steps can reference related TODO IDs so any tool call can be chained to the checklist.

Event Model

Move toward typed run events such as:

run_started
run_status
todo_added
todo_completed
todo_blocked
todo_checked
text_delta
tool_started
tool_completed
browser_started
browser_completed
citation_added
run_failed
run_completed

Transport Recommendation

Keep SSE as the first transport to minimize disruption.
Add run IDs and event sequence IDs so the frontend can resume or deduplicate events.
Revisit transport unification with WebSocket only after the event schema is stable.

Frontend Implementation

Replace the current ad hoc state updates in useStreamMessage.ts with a reducer keyed by run_id.
Render status, TODO checkpoints, tool calls, browser steps, and citations from the same event stream.
Add a persistent checklist panel or inline checklist block in the chat run UI. This is required work, because the current UI does not visibly expose a TODO checklist.
Add explicit resume and retry flows for interrupted runs.
Add latency metrics in the UI such as time to first token and total run duration for debugging.

Operational Requirements

Add heartbeat events to keep long responses alive through proxies.
Add cancel support that marks the run canceled server-side and stops downstream provider work.
Add structured logs and metrics for every run state transition.

Done When

A run has a stable lifecycle independent of the final message row.
Tool and browser events no longer rely on stream-clearing hacks.
Browser and other agentic runs can expose TODO checkpoints and progress cleanly.
Users can cancel runs cleanly and recover from transient disconnects.

8. Backend: Reasoning Mode / Chain of Thought

Goal

Improve deep reasoning quality while avoiding raw chain-of-thought storage or exposure.

Important Product Decision

This feature should not be implemented as "show the model's full hidden chain of thought to the user." The safer and more maintainable version is:

allow higher reasoning effort where providers support it
optionally store a concise reasoning summary or answer plan
never persist hidden raw reasoning traces unless there is a very strong internal need and a separate security review

Current Touchpoints

backend/internal/chat/service.go
backend/internal/inference/provider.go
frontend/src/routes/_authenticated/chats/$chatId.tsx

Recommended Scope

Add a per-run reasoning setting: low, medium, high.
Map that setting to provider-specific controls where available.
Add an optional visible reasoning_summary field that contains a short explanation of the approach, not the full internal trace.

Backend Implementation

Extend inference provider abstractions so a run can request different reasoning effort without provider-specific logic leaking into chat orchestration.
Add reasoning settings into the new prompt builder rather than hard-coding more prompt branches into chat/service.go.
Persist reasoning metadata on the run and optionally on the final assistant message.
Add guardrails so reasoning summaries are generated after the answer, not as a replacement for the answer.

Frontend Implementation

Add a composer toggle or per-chat setting for reasoning mode.
Display a small "Reasoned" badge or effort indicator on completed runs.
If a reasoning summary exists, show it behind an expandable panel labeled clearly as a summary of approach.

Risks and Guardrails

Higher reasoning effort increases cost and latency. Make it explicit in the UI.
The product should not imply that hidden reasoning is complete, always correct, or user-auditable.
Treat reasoning summary as user-visible content subject to the same quality bar as any answer.

Done When

Users can request deeper reasoning on selected runs.
Providers that support reasoning settings can use them.
The product exposes only concise reasoning summaries, not raw hidden chain-of-thought logs.

9. Memory Presets

Goal

Let users and organizations store reusable guidance such as "When I ask for investor updates, use this format" or "Always answer compliance questions with these rules."

Current Touchpoints

backend/internal/chat/service.go
frontend/src/routes/_authenticated/chats/$chatId.tsx
frontend/src/hooks/useAuth.ts

Recommended Scope

Support three scopes:
user preset
organization preset
chat-attached preset
Support three trigger modes in v1:
always on
keyword contains
manually selected in the composer
Defer regex and semantic matching until later.

Data Model

Add tables such as:

memory_presets
id
organization_id
user_id nullable
scope
name
instruction
trigger_mode
trigger_value
priority
enabled
created_at
updated_at

Optionally add a join table if chats can have multiple manually attached presets.

Backend Implementation

Add a preset service that loads relevant presets for a user and chat, ranks them, and returns the instructions to the prompt builder.
Apply presets as structured system instructions, not string concatenation scattered through handlers.
Add collision rules:
platform safety instructions win
organization presets come before user presets
user presets come before chat-local context
Record which presets were applied on a run for debugging and auditability.

Frontend Implementation

Add a settings UI for creating and editing presets.
Add lightweight preset selection in the chat composer for manual presets.
Add a preview area that shows which presets will apply before sending.
Show applied presets somewhere in the run details so behavior is explainable.

Risks and Guardrails

Too many presets can create prompt bloat. Add priority caps and content length limits.
Presets can conflict. The UI should surface precedence rather than pretending all rules are equal.
Avoid hidden behavior. Users should be able to see what preset logic was active.

Done When

Users can save and reuse prompt guidance without retyping it.
The backend applies presets deterministically.
Applied presets are visible and debuggable.

Goal

Add wallet-based authentication for users who prefer Solana identity over email/password.

Product Direction

Do not tie login to on-chain state. This is a signed-message authentication flow, not a blockchain transaction flow.

Current Touchpoints

backend/internal/auth/service.go
backend/internal/auth/handler.go
frontend/src/routes/login.tsx
frontend/src/services/api/auth.ts

Recommended Rollout

Phase 1: link a Solana wallet to an existing BoxedAI account.
Phase 2: allow wallet-first sign-in for linked accounts.
Phase 3: optionally support invited wallet-first account creation if the org model needs it.

Backend Implementation

Add a table such as wallet_identities with:
id
user_id
chain
public_key
label
is_primary
Add a short-lived challenge table such as auth_challenges with:
id
purpose
public_key
nonce
expires_at
used_at
Add endpoints such as:
POST /api/auth/solana/challenge
POST /api/auth/solana/verify
POST /api/auth/solana/link
Reuse the existing session creation flow after signature verification succeeds.
Bind the signed statement to domain, nonce, issued time, expiration, and intended action.

Frontend Implementation

Add a Solana wallet button to frontend/src/routes/login.tsx.
Use Solana wallet adapter libraries for connection and signature prompts.
Keep the UX explicit: connect wallet, sign challenge, receive normal BoxedAI session token.
Add wallet management under account settings so users can link, unlink, and rotate wallets.

Security Requirements

Nonces must be single-use and short-lived.
The statement must include the BoxedAI domain and intended action to reduce phishing and replay risk.
Signature verification must happen server-side.
Wallet login should be feature-flagged so organizations can disable it if they do not want wallet auth.

Done When

Existing users can link a Solana wallet and sign in with it.
The same session and authorization model continues to work after wallet verification.
Wallet auth can be disabled without affecting email/password login.

Suggested Delivery Order in Practice

If this roadmap is executed by a small team, the most pragmatic order is:

Extract prompt builder, run model, and universal TODO/checkpoint system.
Refactor encryption internals.
Upgrade chart system.
Ship web search.
Ship fetch-based browser tools.
Do the artifact system and artifact UI cleanup pass.
Build scene-based design artifacts.
Add memory presets.
Add reasoning mode.
Add Solana wallet linking and wallet login.

That sequence minimizes rework because the later features all benefit from better prompt composition, cleaner streaming, explicit TODO checkpoints for agentic work, and stronger artifact primitives.

Product Roadmap ​

Planning Principles ​

Recommended Sequence ​

Shared Refactors Before Feature Work ​

Universal TODO / Checkpoint System ​

1. Web Search Prototype ​

Goal ​

Current Touchpoints ​

Recommended Scope ​

Backend Implementation ​

Data and API ​

Frontend Implementation ​

Risks and Guardrails ​

Done When ​

2. AI Use-Browser ​

Goal ​

Current Touchpoints ​

Recommended Scope ​

Execution Model ​

TODO Checkpoints ​

Session and State Model ​

Backend Implementation ​

Transport and Streaming ​

Frontend Implementation ​

Risks and Guardrails ​

Done When ​

3. Better Artifact System / Reusable Artifact UI ​

Goal ​

Why This Pass Sits After Browser-Use ​

Current Problems ​

Current Touchpoints ​

Recommended Scope ​

Backend Implementation ​

Frontend Implementation ​

Reusable Use Cases ​

Integration with Browser-Use ​

Integration with Universal TODOs ​

Done When ​

4. Better Charts / Better Chart Design ​

Goal ​

Current Touchpoints ​

Recommended Scope ​

Schema Direction ​

Backend Implementation ​

Frontend Implementation ​

Design Direction ​

Done When ​

5. Semi-Figma MCP / Design Studio ​

Goal ​

Product Direction ​

Current Touchpoints ​

Recommended Scope ​

Data Model Direction ​

Backend Implementation ​

Frontend Implementation ​

Integration with Other Roadmap Items ​

Risks and Guardrails ​

Done When ​

6. Backend: Reduce LoC for Encryption ​

Goal ​

Current Problems ​

Current Touchpoints ​

Recommended Refactor ​

Code Shape ​

Follow-On Cleanup ​

Done When ​

7. Backend: Improve Streaming System ​

Goal ​

Current Problems ​

Current Touchpoints ​

Recommended Architecture ​

Event Model ​

Transport Recommendation ​

Frontend Implementation ​

Operational Requirements ​

Done When ​

8. Backend: Reasoning Mode / Chain of Thought ​

Goal ​

Important Product Decision ​

Current Touchpoints ​

Product Roadmap

Planning Principles

Recommended Sequence

Shared Refactors Before Feature Work

Universal TODO / Checkpoint System

1. Web Search Prototype

Goal

Current Touchpoints

Recommended Scope

Backend Implementation

Data and API

Frontend Implementation

Risks and Guardrails

Done When

2. AI Use-Browser

Goal

Current Touchpoints

Recommended Scope

Execution Model

TODO Checkpoints

Session and State Model

Backend Implementation

Transport and Streaming

Frontend Implementation

Risks and Guardrails

Done When

3. Better Artifact System / Reusable Artifact UI

Goal

Why This Pass Sits After Browser-Use

Current Problems

Current Touchpoints

Recommended Scope

Backend Implementation

Frontend Implementation

Reusable Use Cases

Integration with Browser-Use

Integration with Universal TODOs

Done When

4. Better Charts / Better Chart Design

Goal

Current Touchpoints

Recommended Scope

Schema Direction

Backend Implementation

Frontend Implementation

Design Direction

Done When

5. Semi-Figma MCP / Design Studio

Goal

Product Direction

Current Touchpoints

Recommended Scope

Data Model Direction

Backend Implementation

Frontend Implementation

Integration with Other Roadmap Items

Risks and Guardrails

Done When

6. Backend: Reduce LoC for Encryption

Goal

Current Problems

Current Touchpoints

Recommended Refactor

Code Shape

Follow-On Cleanup

Done When

7. Backend: Improve Streaming System

Goal

Current Problems

Current Touchpoints

Recommended Architecture

Event Model

Transport Recommendation

Frontend Implementation

Operational Requirements

Done When

8. Backend: Reasoning Mode / Chain of Thought

Goal

Important Product Decision

Current Touchpoints