Use Case: AI Agent Development

When an AI Agent needs to control a browser for tasks like "search competitor prices", "fill out and submit forms", or "extract structured data from web pages", the traditional approach requires the Agent to directly generate Puppeteer/Playwright code — brittle and hard to debug. Browser Forest offers two more direct integration paths: the AI Browser SDK (act/extract/observe high-level API) and an MCP Server (letting LLM tools like Claude / Cursor directly invoke browser operations).

Approach A: AI Browser SDK

The SDK encapsulates the full logic of "operating a browser using natural language": the LLM translates natural language instructions into CDP operations, executes them, and returns results. You only need a Browser Forest API Key and an LLM Provider (OpenAI, Anthropic, etc.).

Installation

npm install @browser-forest/ai-sdk

Initialization

import { AIBrowser, OpenAIProvider } from '@browser-forest/ai-sdk';
import OpenAI from 'openai';

const llm = new OpenAIProvider({
  client: new OpenAI({ apiKey: process.env.OPENAI_API_KEY }),
  model: 'gpt-4o',  // or 'gpt-4o-mini' for lower cost
});

const browser = new AIBrowser({
  apiKey: process.env.BROWSER_FOREST_API_KEY,  // bf_live_xxx
  llm,
});

act — Execute Natural Language Operations

act() lets the LLM analyze the current page DOM and generate + execute the corresponding CDP click / type / navigation actions. Ideal for interaction tasks like "click a button", "fill out a form", or "navigate to a page".

const session = await browser.session();

// Navigate
await session.act('navigate to https://news.ycombinator.com');

// Click
await session.act('click on the first story link');

// Fill a form and submit
await session.act('type "browser automation" in the search box and press Enter');

await session.close();

extract — Extract Structured Data

extract() lets the LLM extract data from the current page DOM according to your described structure, returning a type-safe object. Combining with Zod schemas ensures accurate data formats.

import { z } from 'zod';

const session = await browser.session();
await session.act('navigate to https://news.ycombinator.com');

// Extract structured data
const stories = await session.extract({
  instruction: 'extract the top 10 stories: title, URL, points, and comment count',
  schema: z.array(z.object({
    title: z.string(),
    url: z.string().url(),
    points: z.number(),
    comments: z.number(),
  })),
});

// stories is typed as Array<{ title: string; url: string; points: number; comments: number }>
console.log(stories);

await session.close();

observe — Observe Page State

observe() lets the LLM describe the current page content and list available interactive actions. Great for Agents that need to "look at what's on the page" before deciding the next step.

const session = await browser.session();
await session.act('navigate to https://example.com/checkout');

const state = await session.observe('what forms and buttons are on this page?');

console.log(state.description);
// "The page shows a checkout form with fields for shipping address,
//  payment method, and a 'Place Order' button."

console.log(state.actions);
// [
//   { action: 'type', element: '#shipping-name', description: 'Enter recipient name' },
//   { action: 'click', element: '#place-order-btn', description: 'Submit the order' },
// ]

// Decide next step based on observation
if (state.actions.some(a => a.element === '#place-order-btn')) {
  await session.act('fill in the shipping address and place the order');
}

await session.close();

Full Agent Example: Competitor Price Monitoring

import { AIBrowser, OpenAIProvider } from '@browser-forest/ai-sdk';
import { z } from 'zod';
import OpenAI from 'openai';

const browser = new AIBrowser({
  apiKey: process.env.BROWSER_FOREST_API_KEY!,
  llm: new OpenAIProvider({
    client: new OpenAI({ apiKey: process.env.OPENAI_API_KEY! }),
    model: 'gpt-4o-mini',
  }),
});

async function monitorPrice(productName: string) {
  const session = await browser.session({ os: 'windows' });

  try {
    // Search for product
    await session.act(`navigate to https://www.amazon.com`);
    await session.act(`search for "${productName}" and press Enter`);
    await session.act('wait for search results to load');

    // Extract pricing info
    const results = await session.extract({
      instruction: 'extract the first 5 products: name, price, rating, review count',
      schema: z.array(z.object({
        name: z.string(),
        price: z.string(),
        rating: z.number().optional(),
        reviewCount: z.number().optional(),
      })),
    });

    return results;
  } finally {
    await session.close();
  }
}

const prices = await monitorPrice('wireless earbuds');
console.table(prices);

Approach B: MCP Server

Browser Forest includes a built-in MCP (Model Context Protocol) Server. MCP-compatible LLM tools (Claude Desktop, Cursor, Continue, etc.) can directly invoke browser operations — no code required. The LLM itself decides when to create a Session, where to navigate, and what data to extract.

Configure in Claude Desktop

Open the Claude Desktop config file (macOS: ~/Library/Application Support/Claude/claude_desktop_config.json) and add:

{
  "mcpServers": {
    "browser-forest": {
      "url": "https://bf.mktindex.com/api/mcp"
    }
  }
}

After restarting Claude Desktop, you can simply say in conversation:

"Open Hacker News and extract the top 5 article titles and links"

"Go to Amazon and search for AirPods Pro, tell me the current price"

"Open https://example.com/login, log in with [email protected] / pass123,
 then go to Dashboard and take a screenshot"

Configure in Cursor

Open Cursor Settings → MCP → Add Server, and enter:

{
  "name": "browser-forest",
  "url": "https://bf.mktindex.com/api/mcp"
}

Available Tools

Tool Name	Description	Key Parameters
mirage_create_session	Create a browser Session	apiKey, contextId, timeout
mirage_close_session	Close a Session	sessionId
mirage_navigate	Navigate to a URL	sessionId, url, waitUntil
mirage_screenshot	Take a screenshot (returns base64)	sessionId
mirage_click	Click an element	sessionId, selector
mirage_type	Type text into an element	sessionId, selector, text
mirage_extract_text	Extract page text content	sessionId, selector?
mirage_extract_structured	Extract structured data via natural language	sessionId, instruction
mirage_scroll	Scroll the page	sessionId, x, y
mirage_wait	Wait for specified milliseconds	sessionId, ms
mirage_scrape	One-shot URL scrape (no Session needed)	url, format

Integrate in LangChain (REST API)

In Python / LangChain Agents, call Browser Forest via REST API, wrapping create_session and scrape as LangChain Tools:

import requests
from langchain.tools import tool

API_KEY = "bf_live_xxxxxxxx"
BASE = "https://bf.mktindex.com/api/v1"
HEADERS = {"X-API-Key": API_KEY, "Content-Type": "application/json"}


@tool
def scrape_url(url: str) -> str:
    """Scrape a URL and return the page content as markdown.
    Use this when you need to read content from a web page."""
    res = requests.post(
        f"{BASE}/scrape",
        headers=HEADERS,
        json={"url": url, "format": "markdown", "waitFor": "networkidle"},
        timeout=30,
    )
    data = res.json()
    return data.get("content", "")


@tool
def scrape_and_extract(url: str, instruction: str) -> str:
    """Navigate to a URL and extract specific information.
    url: the page to visit
    instruction: what data to extract in plain English"""
    # Step 1: Create session
    session = requests.post(
        f"{BASE}/sessions",
        headers=HEADERS,
        json={"os": "windows", "timeout": 120},
    ).json()
    session_id = session["id"]

    try:
        # Step 2: Get page content via scrape API
        content = requests.post(
            f"{BASE}/scrape",
            headers=HEADERS,
            json={"url": url, "format": "markdown"},
        ).json().get("content", "")

        return f"Page content from {url}:\n\n{content}"
    finally:
        # Step 3: Always clean up
        requests.delete(f"{BASE}/sessions/{session_id}", headers=HEADERS)


# Use in a LangChain Agent
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")
tools = [scrape_url, scrape_and_extract]
# ... create agent and run

Note: MCP Server operates in stateless mode — each tool invocation is an independent HTTP request. To maintain the same browser Session across multiple tool calls, first call mirage_create_session to get a sessionId, pass it to subsequent calls, and finally call mirage_close_session to clean up.