Introduction
Generating a fully-formed website on-demand with an AI model is an appealing idea, especially for those who self-host or want to leverage local LLMs for creative or rapid prototyping work. The journey to build "Aether Architect" , a Python webserver that prompts a local or remote LLM to generate and serve single-file HTML websites live , was more challenging than it first appeared. This article documents that journey: from prompt engineering to backend flexibility, and through a series of bugs, model quirks, and real-world edge cases.
Why Build a Live AI Website Generator?
The initial spark came from the question: could a local AI model produce a complete, interactive website in real time, directly in the browser? Existing live-generation demos (think Flash Lite 2.5) hinted at the possibilities, but none showed a webserver that could reliably turn user prompts into a coherent, styled, multi-section site using only local compute. This project aimed to fill that gap, focusing on developer control, reproducibility, and the realities of serving content to real browsers.
Phase 1: Prompt Engineering as Foundation
Defining the System Prompt: Brand, Design, and Structural Constraints
Early experiments with prompting generic LLMs to "make a website" produced the expected chaos: inconsistent structure, missing styles, and hallucinated assets. To counteract this, the first system prompt was engineered to be extremely specific. The AI was told it was "Aether Architect," an expert web designer for a fictional company, Nexus Dynamics. Key details were locked in:
- Brand Identity: Hardcoded company name, tagline, and business focus.
- Design System: Fixed dark-theme palette (e.g.,
#0A192F,#64FFDA) and specified font pairings (Poppins for headings, Lato for body). - Structure: Navbar with exactly five named links, strict HTML structure, and all CSS/JS embedded inline.
- Technical Constraints: No external assets, no separate files, everything rendered as a single HTML file.
This setup formed the baseline for all subsequent development.
Lessons Learned: The Necessity of Strict Prompts
The most important realization: a detailed, restrictive prompt is non-negotiable if you want consistent, usable output from an LLM. Ambiguity in the prompt led directly to broken layouts or unpredictable results. The prompt wasn’t just an instruction , it was the program.
Phase 2: Exploring AI Creativity and Its Limits
The 'Muse Engine': Unleashing and Containing Creativity
After establishing a baseline, the next logical experiment was to see how creative the AI could get. The "Muse Engine" stripped out brand, color, and font requirements, keeping only the navbar structure. As expected, the model's outputs became wildly diverse: surprising layouts, odd color choices, but also frequent incoherence (e.g., unreadable contrast, mismatched sections, or conflicting visual styles). Unshackled creativity often meant a loss of site cohesion and usability.
The 'World-Builder Engine': Balancing Lore and Consistency
Instead of total freedom, the next experiment asked the AI to invent a brand and backstory for each site ("World-Builder Engine"). The model was prompted to generate a founder, mission, and vision, and then build the site around that lore. This did produce more internally consistent results, but still lacked the predictability required for a general-use tool. The lesson: for practical applications, a fixed "brand bible" outperforms even the most elaborate creative prompts. The project returned to a strict prompt, this time with a new fictional company, "Terranexa," to further clarify and harden the output style.
Phase 3: From Prototype to Production-Ready Server
Diagnosing and Fixing the BrokenPipeError
With prompt design stabilized, the next set of issues emerged at the server level. The first major bug: browsers would crash the Python server with a BrokenPipeError when the generated HTML referenced assets that didn’t exist (for example, the browser automatically requesting /favicon.ico or an AI-invented image). The server naively treated every request as a prompt, causing the browser to close the connection when it didn’t receive the expected file, which led to the crash.
The fix was straightforward but essential: check the request path for common asset extensions (.jpg, .ico, .css) and immediately return a 404 Not Found for those, bypassing the AI prompt logic entirely.
Improving Asset Request Filtering and URL Handling
The first implementation of this filter was too aggressive. It checked the entire URL, including query strings. If a prompt included something like .js or .css in its text, the server would return a 404 even for legitimate dynamic requests. The next iteration split the URL and applied the asset filter only to the path component using Python’s urlparse, ensuring the logic matched real browser asset requests and not user input. This change restored correct navigation while keeping the server robust against asset-related crashes.
Phase 4: Expanding Support for Local LLMs
Adding OpenAI-Compatible and Ollama Backends
Originally, the script used the Gemini API for generation. As local LLMs like LM Studio and Ollama gained traction, the need for broader backend support became clear. The codebase was refactored to use the standard openai Python library, instantly unlocking support for any local server implementing the OpenAI API schema (including LM Studio). Model, API key, and endpoint became command-line arguments.
Recognizing Ollama’s unique API and popularity, native support was added using the official ollama Python package. Backend selection was made a runtime option (--backend ollama), keeping the core logic backend-agnostic.
Configurability and Community Flexibility
Making these parameters configurable was non-negotiable for a tool intended for the r/LocalLLM crowd. Users could now point the script at their preferred server, model, or endpoint without code changes. Supporting both OpenAI-compatible and Ollama-native backends meant the same script could run everywhere from a laptop to a dedicated AI box, regardless of vendor lock-in.
Phase 5: Taming Model and Environment Quirks
Handling Inconsistent API Responses and Model Output Oddities
After adding multi-backend support, it became clear that not all LLM APIs return output in the same format. Some returned a raw string, others a dictionary (e.g., {'String': '...'}), leading to errors like AttributeError: 'dict' object has no attribute 'strip'. The code was updated to check for dictionaries, extract the appropriate value, and fall back to a string conversion for unknown cases.
A subtler issue: some models began including `` tags in their output, exposing their internal reasoning or thought process. These blocks polluted the final HTML. The solution was to add a regular expression to strip out these tags before serving the content.
Strengthening Prompt Constraints Against Hallucinations
Despite explicit instructions, some models still generated <link rel="stylesheet" href="styles.css">, falling back to common web dev patterns. The fix was to make the system prompt even more explicit, with capitalized negative constraints: “MUST NOT use the <link> tag,” “NO external files,” etc. This helped eliminate ambiguity and reduced the frequency of these hallucinations.
Navigating Python Dependency Issues
Not all bugs are code bugs. The introduction of the ollama Python library led to an environment-level issue: an incompatibility between Ollama’s requirement for Pydantic v2 and existing environments with Pydantic v1. This surfaced as a TypeError when importing ollama. The fix wasn’t in code, but in documentation: explicitly instructing users to reinstall with pip install --upgrade --force-reinstall ollama to force the correct dependency versions.
Conclusion
Key Takeaways: Prompt Design, Defensive Coding, and Iterative Development
Building Aether Architect highlighted several enduring truths about working with LLMs and integrating them into production systems:
- The Prompt is the Program: Output quality is determined more by the prompt than by the model itself. Precision and explicit constraints are essential.
- Guardrails Foster Creativity: Rigid structural and brand constraints free the model to focus creativity where it’s actually valuable , in design and layout, not in inventing arbitrary new structures.
- Code Must Be Defensive: LLMs are non-deterministic. Assume that every API call can return unexpected formats or hallucinated output, and handle them gracefully.
- Iteration is Unavoidable: Edge cases only emerge in real-world use. Expect a cycle of test, break, and fix, both in code and in prompt design.
- The Local Ecosystem is Diverse: Supporting local LLMs means accommodating different APIs, quirks, and even user environment issues. Flexibility is a feature, not an afterthought.
For anyone building real tools on top of LLMs, these lessons are foundational. The landscape is changing fast, but the need for robust prompts, defensive code, and relentless iteration isn’t going away.
