Use Case: Web Scraping

When target sites have anti-bot mechanisms (Cloudflare, PerimeterX, DataDome, etc.), regular headless Chrome gets detected and blocked. Browser Forest's anti-detection engine enables all patches by default, making browser instances indistinguishable from real users in terms of fingerprint, behavior, and CDP characteristics.

Approach A: Scrape API (Recommended, Simplest)

No Session lifecycle management needed. Ideal for single-shot scraping: provide a URL, get back rendered HTML / Markdown / screenshot. The platform creates a browser, waits for page load, extracts content, and auto-destroys — entirely transparent to the caller.

Typical Scenario: Scraping E-commerce Product Detail Pages

The target page has JavaScript-rendered price and inventory data, requiring JS execution to complete before correct data can be retrieved.

curl -X POST https://bf.mktindex.com/api/v1/scrape \
  -H "X-API-Key: bf_live_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.amazon.com/dp/B09G9FPHY6",
    "format": "markdown",
    "waitFor": "networkidle"
  }'

Example response:

{
  "url": "https://www.amazon.com/dp/B09G9FPHY6",
  "content": "# Apple AirPods Pro (2nd Generation)\n\n**Price**: $189.99\n\n**In Stock**: Yes\n...",
  "metadata": {
    "title": "Amazon.com: Apple AirPods Pro",
    "description": "Active Noise Cancellation..."
  },
  "durationMs": 4821
}

Parameter	Type	Default	Description
url	string	Required	Target URL to scrape
format	string	html	html / markdown / text / screenshot
waitFor	string	load	load / domcontentloaded / networkidle
selector	string	None	Extract only the region matching a CSS selector

Note: networkidle waits for network requests to be quiet for 500ms — good for SPA/React pages; load is faster and better for server-rendered static pages.

Approach B: Session + Puppeteer (Complex Interactions)

When scraping requires multi-step operations like login, pagination, clicks, and form filling, first create a Session, then connect and control the browser via CDP WebSocket using Puppeteer or Playwright.

Typical Scenario: Data Behind a Login Wall

Step 1: Create a Session

curl -X POST https://bf.mktindex.com/api/v1/sessions \
  -H "X-API-Key: bf_live_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "os": "windows",
    "timeout": 300,
    "idleTimeout": 60
  }'

Returns cdpUrl in the format wss://bf.mktindex.com/ws/session/ses_xxxxxxxx — the browser is ready at this point.

Step 2: Connect and Operate with Puppeteer

import puppeteer from 'puppeteer-core';

const browser = await puppeteer.connect({
  browserWSEndpoint: 'wss://bf.mktindex.com/ws/session/ses_xxxxxxxx',
});
const [page] = await browser.pages();

// Login
await page.goto('https://target-site.com/login', { waitUntil: 'networkidle2' });
await page.type('#email', '[email protected]');
await page.type('#password', 'secret123');
await page.click('[type="submit"]');
await page.waitForNavigation({ waitUntil: 'networkidle2' });

// Paginate and scrape
const allItems = [];
for (let p = 1; p <= 5; p++) {
  await page.goto(`https://target-site.com/products?page=${p}`, { waitUntil: 'networkidle2' });

  const items = await page.evaluate(() =>
    Array.from(document.querySelectorAll('.product-card')).map(el => ({
      name: el.querySelector('.title')?.textContent?.trim(),
      price: el.querySelector('.price')?.textContent?.trim(),
      sku: el.dataset.sku,
    }))
  );
  allItems.push(...items);
}

console.log(`Scraped ${allItems.length} items`);

// Disconnect (do NOT call browser.close() — it would close the remote browser process)
await browser.disconnect();

Step 3: Delete the Session

curl -X DELETE https://bf.mktindex.com/api/v1/sessions/ses_xxxxxxxx \
  -H "X-API-Key: bf_live_xxxxxxxx"

Persistent Login State (Context)

After logging in once, save cookies and localStorage to a Context. Subsequent Sessions specifying the same contextId will not need to re-login. This is especially valuable for scenarios requiring frequent IP rotation (new Session per proxy change) while maintaining the same account login.

First Time: Login and Save State

# 1. Create Context (once)
curl -X POST https://bf.mktindex.com/api/v1/contexts \
  -H "X-API-Key: bf_live_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"name": "amazon-account"}'
# Returns: { "id": "ctx_xxxxxxxx", ... }

# 2. Create Session bound to the Context
curl -X POST https://bf.mktindex.com/api/v1/sessions \
  -H "X-API-Key: bf_live_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"contextId": "ctx_xxxxxxxx"}'

# 3. Connect with Puppeteer and manually log in

# 4. Delete Session (auto-uploads cookies snapshot to S3)
curl -X DELETE https://bf.mktindex.com/api/v1/sessions/ses_xxxxxxxx \
  -H "X-API-Key: bf_live_xxxxxxxx"

Subsequent Times: Restore Login State Directly

# Create a new Session with the same contextId — login state auto-restored
curl -X POST https://bf.mktindex.com/api/v1/sessions \
  -H "X-API-Key: bf_live_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"contextId": "ctx_xxxxxxxx"}'

Note: Context snapshots include Cookies, LocalStorage, IndexedDB, and SessionStorage. Only one Session can use the same Context at a time, otherwise snapshots will overwrite each other.

Approach C: Cookie REST API (No CDP Client Needed)

If you already have login cookies (exported from DevTools / EditThisCookie), you can directly inject them into an active Session via API, or write them to a Context for automatic restoration on subsequent Sessions. Ideal for Python/curl scripts and agent toolchains.

Inject into Session, Then Visit Target Site

# 1. Create Session
curl -X POST https://bf.mktindex.com/api/v1/sessions \
  -H "X-API-Key: bf_live_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"timeout": 300}'

# 2. Inject cookies (CDP format JSON array)
curl -X PUT https://bf.mktindex.com/api/v1/sessions/ses_xxxxxxxx/cookies \
  -H "X-API-Key: bf_live_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"cookies": [{"name":"session","value":"...","domain":".target.com","path":"/","secure":true}]}'

# 3. Connect with Puppeteer / Playwright via cdpUrl and navigate

# 4. Export current cookies for backup
curl "https://bf.mktindex.com/api/v1/sessions/ses_xxxxxxxx/cookies?domain=.target.com" \
  -H "X-API-Key: bf_live_xxxxxxxx"

Write to Context for Persistence

curl -X PUT https://bf.mktindex.com/api/v1/contexts/ctx_xxxxxxxx/cookies \
  -H "X-API-Key: bf_live_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"cookies": [ ... ]}'

Repo examples: test/pm-agent-login.py (supports test / prod environment switching), test/cookie-api-test.py (Cookie API smoke test). See test/.env.example for config.

Proxy Configuration (Bypass IP Blocking)

Specify a proxy when creating a Session — all browser traffic will go through it. You can integrate with residential proxy pools to use a different IP per Session.

curl -X POST https://bf.mktindex.com/api/v1/sessions \
  -H "X-API-Key: bf_live_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "os": "windows",
    "proxy": {
      "type": "http",
      "host": "residential-proxy.provider.com",
      "port": 8080,
      "username": "user123",
      "password": "pass456"
    }
  }'

OS Fingerprint Simulation

The os parameter makes the browser present a complete fingerprint matching the target platform (User-Agent, Platform, WebGL renderer, font list, etc.), indistinguishable from real Chrome on that OS.

os Value	User-Agent Example	Recommended Scenario
windows	Windows NT 10.0; Win64; x64	Most e-commerce and financial sites
macos	Macintosh; Intel Mac OS X 10_15_7	Apple services, design platforms
linux	X11; Linux x86_64	Dev tools, GitHub, API documentation sites

Node.js SDK Version

If using in a Node.js project, we recommend managing the Session lifecycle through the SDK:

import { BrowserForestClient } from '@browser-forest/sdk';
import puppeteer from 'puppeteer-core';

const client = new BrowserForestClient({ apiKey: 'bf_live_xxxxxxxx' });

async function scrapeWithLogin(url: string) {
  // Pass contextId to restore login state from a saved Context
  const session = await client.sessions.create({
    os: 'windows',
    timeout: 180,
  });

  const browser = await puppeteer.connect({
    browserWSEndpoint: session.cdpUrl!,
  });

  try {
    const [page] = await browser.pages();
    await page.goto(url, { waitUntil: 'networkidle2' });

    const data = await page.evaluate(() => ({
      title: document.title,
      price: document.querySelector('.price')?.textContent,
    }));

    return data;
  } finally {
    await browser.disconnect();
    await client.sessions.delete(session.id);
  }
}