Posts tagged "Ai"

🔗 cursor-large-projects

Most of the twitter conversation about cursor has been about how it’s a great tool for small projects. I think that’s a shame, because I think it’s a great tool for large projects.

Cursor is a powerful tool for maintaining large coding projects, helping developers code 5-30 times faster. The guide emphasizes the importance of an effective edit and test loop to improve code quality. By setting up proper documentation and workflows, engineers can leverage AI for refactoring, documentation, and project planning.

I love the ai folder specific to the project. I also appreciate how you can have Cursor run tests which isn’t something I’ve done a lot of yet, but look forward to trying.

🔗 o1-not-a-chat-model

There has been a lot of chat about the new reasoning models, and how they are not chat models. I completely agree, and want to add my voice to the chorus.

The focus of this is that an o1 prompt should look a lot different than your typical chat:

O1 Prompt Structure

Understanding this prompt anatomy is crucial because it fundamentally changes how organizations need to approach AI implementation. While the structure might make sense to technical users, rolling this out across an enterprise presents unique challenges. The shift from quick chat-style interactions to detailed, structured briefs impacts everything from user training to workflow design.

The enterprise challenge with o1 isn’t just about adopting new tech - it’s about fundamentally changing how people work with AI. While chat models let users dive in with quick questions and iterate, o1 demands what I’d call “front-loaded effort” - you need to dump ALL the context upfront and carefully frame what you want. This creates an interesting tension for enterprise adoption: On the upside, o1’s report-style outputs actually align really well with enterprise needs. You get structured, thorough analysis that reads like a proper business document rather than a casual chat. Perfect for decision-making and documentation. But here’s the catch - teaching busy enterprise users to write these detailed briefs is tough. They’re used to the quick back-and-forth of chat models or traditional tools. Now we’re asking them to:

  • Front-load all context (which means gathering it first)
  • Clearly define outputs (no vague requests)
  • Wait longer for responses (potentially 5+ minutes)

For enterprise rollouts, I think this means:

  1. Training needs to shift from “how to chat with AI” to “how to brief AI”
  2. Expectations around response times need resetting (not usually a big deal)
  3. Best practices around context gathering need development

The real kicker? Just when enterprises were getting comfortable with chat-based AI, this paradigm shift forces another round of change management. It’s like teaching someone a new language right after they got comfortable with the first one.

To make this work, enterprises might need dedicated “AI prompt engineers” - people who can bridge the gap between users and these more demanding but powerful models. Think of them as technical writers for the AI age. If not dedicated people, then companies could consider dedicated projects and engagements focused on bringing reasoning prompts to business users.

Additionally, it’d be helpful to start sharing the art of the possible for business users with reasoning models like o1. Let me share three practical examples where business users could leverage reasoning models effectively:

Quarterly Report Analysis: Instead of asking quick questions about numbers, dump the entire quarterly spreadsheet, previous reports, and industry context into the model and ask for a comprehensive analysis. The model can identify trends, flag concerns, and create executive summaries - all in one thorough shot. Much better than piecemeal analysis through chat.

Meeting Summary & Action Plans: Take a raw meeting transcript, add the project background, team structure, and goals, then ask the model to create a structured output with: key decisions, action items, risks identified, and next steps. The model’s ability to process all this context at once means better synthesis than parsing piece by piece.

Policy Compliance Review: Perfect for legal or HR teams. Feed in your company policies, industry regulations, and a proposed new process or policy. The model can do a thorough gap analysis, identifying compliance risks and suggesting specific updates. Much more reliable than trying to check compliance point-by-point through chat. Plus, the model’s formal report style matches the serious nature of compliance work.

RFP Response Analysis: For procurement or sales teams, dump in the entire RFP document, your company’s past proposals, competitor intel, and pricing strategy. Ask for a detailed analysis of what sections need focus, suggested win themes, pricing recommendations, and potential red flags. The model’s ability to process all this context at once helps create a cohesive strategy instead of answering one requirement at a time.

The key theme here? These aren’t quick Q&A tasks - they’re meaty problems where the user invests time upfront to get comprehensive, actionable insights in return. Think “weekly deep-dive” rather than “quick daily check.”

Bottom line: Treat o1 like the powerful reasoning engine it is - with proper training and support - and it’ll transform your business. Treat it like ChatGPT, and you will likely struggle with user frustration and poor results.

🔗 Agents

I came across Chip Huyen’s blog via twitter today. Her blog is great and has a lot of content I’m going to read. I’ll start with this post on Agents.

Chip defines Agents as:

An agent is anything that can perceive its environment and act upon that environment. Artificial Intelligence: A Modern Approach (1995) defines an agent as anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.⁠

She then explains that the interaction between an agent and the environment is the key to the agent’s success. If you build a web scraping agent, the internet is the environment. If you build an agent that can play Minecraft, the Minecraft world is the environment. And so on.

Planning is the heart of an agent, and in the blog, Chip provides a lot of detail on it. Planning is decoupled from execution. The plan requires that the model understands the user’s intent and requires that the model breaks the task down into smaller subtasks. You can do a lot to help improve the likelihood of success - asking for human feedback, writing great system prompts, giving better descriptions of the tools available to the agent, use a stronger model, or even fine tune a model.

To me it seems that with the current state of LLMs, the most effective manner to improve the planning aspect of an agent is to ask for human feedback. There may be veritcals, like coding or math, where the plan is easier for an LLM to generate, but for the long tasks, I think asking for human feedback is the best way to improve the agent. I wonder if we’ll ever see the reasoning models (o1, o3, etc) allow users to adjust hte plan on the fly. I know that Google’s Gemini 1.5 Deep Research has a “refine” feature that allows you to adjust the plan already.

The post is great, and the blog looks promising. I’ll be back to Chip’s writing!

Running Ollama Vision Models Locally

When I was checking out the new Phi model with Ollama. I came across the LLM Ollama plugin. It reminded me that Ollama now supports vision models.One of the use cases I’ve been thinking about is script that can run against all images in my downloads folder and generate a description of each image. Many of them have obscure names, and I’d like to be able to search them by description.

Here is my attempt to do this. The script is as simple as:

for file in ~/Downloads/*.{jpg,jpeg,png,gif,webp}; do
    echo "Processing $file"
    llm -m llama3.2-vision:11b "Describe this image." -a "$file"
done

To run this, you’ll need to install LLM and Ollama. Then you’ll need to install the llama3.2-vision model:

ollama pull llama3.2-vision:11b

and install the plugin:

llm install --upgrade llm-ollama

This is remarkably simple and produces really helpful descriptions.

Ollama Vision Local

The image shows a man running across the finish line of a triathlon, with a red archway and a crowd of people watching. The purpose of the image is to capture the moment of triumph for the athlete as he completes his race.

* A man:
        + Wearing a black triathlon suit
        + Running towards the camera
        + Has a number on his bib
        + Appears to be exhausted but determined
* A finish line:
        + Red archway with white text that reads "FINISH"
        + Surrounded by spectators and officials
        + Located at the end of a road or track
* A crowd of people:
        + Standing behind barriers, watching the athlete cross the finish line
        + Cheering and taking photos
        + Dressed in casual clothing, with some wearing team jerseys

The image conveys a sense of excitement and accomplishment, as the athlete reaches the end of his grueling triathlon. The crowd's enthusiasm adds to the celebratory atmosphere, making it clear that this is a momentous occasion for all involved.

It can obviously be modified to run against any file or folder that you want. I could imagine doing some intelligent document processing on this to help group similar files together and cleaning up my downloads folder.

Lastly, if you prefer a web interface for more adhoc usage. You can use Open WebUI and uvx to run it locally:

uvx --python 3.11 open-webui serve

This will automatically find the Ollama models you have, including your image model.

IT Department to HR Department

The IT department of every company is going to be the HR department of AI agents in the future.

Jensen Huang, CEO of NVIDIA @ CES 2025

I love this quote from Jensen. It is aligned with my vision that the bar and output for software engineering is rapidly increasing, and the role of every software engineer will become more managerial over time.

AI Git Message

Ever find yourself staring at your terminal, struggling to write the perfect git commit message? I recently added a game-changing alias to my zshrc that leverages the power of LLMs to generate commit messages automatically. Using Simon Willison’s llm CLI tool – a Swiss Army knife for interacting with language models directly from your terminal – this alias pipes your git diff through an AI model to generate concise, descriptive commit messages. It’s like having a pair programmer who’s really good at documentation sitting right next to you. Implementation is as simple as adding a single line to your zshrc, and just like that, you’ve automated one of development’s most tedious tasks. While it might feel a bit like cheating, I’ve found the generated messages are often more consistent and informative than my manual ones, especially during those late-night coding sessions.

alias gitmessage='git diff | llm " Below is a diff of all staged changes, coming from the command:
\`\`\`
git diff
\`\`\`
Please generate a concise, one-line commit message for these changes."'
🔗 Practical Text to SQL from LinkedIn

A great article on how to use LLMs to generate SQL queries from natural language. Albert Chen’s deep dive into LinkedIn’s SQL Bot offers a fascinating glimpse into the marriage of generative AI with enterprise-scale data analytics. This multi-agent system, integrated into LinkedIn’s DARWIN platform, exemplifies how cutting-edge AI can democratize access to data insights while enhancing efficiency across teams.

Key Takeaways: Empowering Data Democratization: SQL Bot addresses a classic bottleneck: dependency on data teams for insights. By enabling non-technical users to autonomously query databases using natural language, LinkedIn has transformed a time-intensive process into a streamlined, scalable solution.

Data Cleaning and Annotation:

we initiated a dataset certification effort to collect comprehensive descriptions for hundreds of important tables. Domain experts identified key tables within their areas and provided mandatory table descriptions and optional field descriptions. These descriptions were augmented with AI-generated annotations based on existing documentation and Slack discussions, further enhancing our ability to retrieve the right tables and use them properly in queries

  • They infer the tables that a user cares about based on their org chart. Metadata and Knowledge Graphs as Pillars:
  • Comprehensive dataset certification, enriched with AI-generated annotations, ensures accurate table retrieval despite LinkedIn’s vast table inventory (in the millions!). By combining domain knowledge, query logs, and example queries into a knowledge graph, SQL Bot builds a robust contextual foundation for query generation.
  • LLM-Driven Iterative Query Refinement: Leveraging LangChain and LangGraph, the system iteratively plans and constructs SQL queries. Validators and self-correction agents ensure outputs are precise, efficient, and error-free, highlighting the sophistication of LinkedIn’s text-to-SQL pipeline.

Personalized and Guided User Experiences: With features like quick replies, rich display elements, and a guided query-writing process, SQL Bot prioritizes user understanding and engagement. Its integration with DARWIN, complete with saved chat history and custom instructions, amplifies its accessibility and adoption.

Benchmarking and Continuous Improvement: They have an emphasis on benchmarking. They use human evaluation alongside LLM-as-a-judge methods, so they’ve developed a scalable approach to query assessment and model enhancement.

Reflections: Many text-to-SQL solutions stumble in handling many tables, LinkedIn’s SQL Bot thrives by leveraging metadata, personalized retrieval, and user-friendly design. It’s also impressive how the system respects permissions, ensuring data security without sacrificing convenience.

Moreover, the survey results—95% user satisfaction with query accuracy—highlight the system’s impact. This balance of technical innovation and user-centric design offers a blueprint for organizations looking to replicate LinkedIn’s success.

Why It Matters: I think there are enough details in this article to either add new features to your existing text to sql bot, or to build your own. LinkedIn’s work on SQL Bot is a testament to the power of AI in reshaping how we interact with complex data systems. It’s an inspiring read for engineers, data scientists, and AI enthusiasts aiming to make SQL data more accessible.

🔗 Building effective agents

Agents are an overloaded term in AI. I found this article to be particularly helpful in understanding the current definition of agenticy systems, agents, and workflows. Plus it included a survey of some popular workflow architectures. To me, it all starts looking a lot like the microservices architectures as those were becoming more and more popular by Amazon over the past decade.

One principal to follow when building (software in general) agentic systems…

When building applications with LLMs, we recommend finding the simplest solution possible, and only increasing complexity when needed. This might mean not building agentic systems at all. Agentic systems often trade latency and cost for better task performance, and you should consider when this tradeoff makes sense.

Their definitions of agents…

  • Agentic systems - parent term for an entire app or system
  • Workflows - systems where multiple LLMs are orchestrated together with code
  • Agents - LLMs dynamically direct their own processes and tool usage

Their walkthrough of common workflow patterns was interesting as well. They walk through prompt chaining, routing parallelization, orchestrator workers, and evaluator optimizer. I think as reasoning models like o1 become more powerful, we’ll start to see these workflows become less important. Let’s look at Evaluator-optimizer:

Evaluator Optimizer

This is just one LLM providing content or a response and another LLM call provides feedback or an evaluation. It’s useful if we have evaluation criteria.

As reasoning models increase, more applications will move from worklfows to agents. Models like o1 that are doing multi step reasoning, one of the first types of paths I expect them to take, would be the iterative loop above. Let’s try it with Google Gemini’s new 2.0 Flash Thinking Experimental:

Prompt 💭

Generate a day-by-day training plan for a reactive dog exhibiting specific behaviors: lunging, barking at strangers. The plan should incorporate positive reinforcement techniques and gradually desensitize the dog to triggers over 4 weeks. Each day’s plan must include:
•	A clearly defined goal for that day.
•	Detailed instructions for the owner, including step-by-step exercises.
•	A checklist to measure the dog’s progress.

Evaluation Criteria:

1. Clarity: Are the daily goals and instructions written in a way that is easy to understand for a non-expert dog owner?
2. Feasibility: Are the exercises practical for an average dog owner to implement with minimal equipment or prior training experience?
3. Progressiveness: Does the plan show a clear progression from simple to complex tasks aligned with the dog’s desensitization process?
4. Effectiveness: Based on canine training principles, is the plan likely to improve the dog’s behavior when executed consistently?
5. Empathy: Does the tone and approach demonstrate understanding of the challenges faced by both the dog and the owner?

Thinking 🧠

This is a stripped down version of Gemini’s thought process.

1. Deconstruct the Request
2. Establish Foundational Principles: Positive Reinforcement, Classical Conditioning , Operant Conditioning, Gradual Exposure, Consistency is Key, Manage the Environment, Listen to Your Dog, Patience and Understanding
3. Structure the 4 Weeks: Week 1: Foundation Building, Week 2: Controlled Exposure (Distance), Week 3: Controlled Exposure (Closer), Week 4: Generalization and Maintenance
4. Design Daily Goals
5. Develop Detailed Instructions
6. Create Progress Checklists
7. Incorporate Evaluation Criteria Throughout
8. Refine and Review
9. Add Important Considerations
10. Self-Correction Example during the Process
    - Initial Thought, Correction, Revised Approach

Check it out! This thinking process is actually a perfect example of how our brains bounce between coming up with ideas and checking if they’re any good. Like, there’s this awesome moment where they think “hey, maybe let’s throw the dog right into meeting strangers” but then the evaluation part of their brain kicks in with “whoa, hold up, that’s way too intense for a scared pup.” The whole thing is basically a dance between brainstorming cool training ideas and then reality-checking them against what actually works for reactive dogs and their humans - it’s like having an excited idea machine and a practical bouncer working together in your head to build something that actually makes sense.

The parallel between AI workflow architectures and the evolution of microservices hints at a deeper pattern in software engineering - we repeatedly solve the problem of complexity by breaking it down into smaller, specialized components that communicate in structured ways. Additionally, as models like OpenAI’s o1 (or even o3) become more sophisticated at multi-step reasoning, we may see a shift back toward unified systems. This raises an interesting question: Is the current trend toward complex workflow architectures a permanent evolution in AI system design, or just a temporary solution until single models become capable enough to handle these tasks autonomously? The answer could fundamentally reshape how we build AI systems in the coming years.