pipelineJobs.js:
- cancelJob(jobId): marks job as cancelled=true, status=cancelling
- isJobCancelled(jobId): checked by the pipeline between stages
runSourcePipeline.js:
- PipelineCancelledError class
- checkCancelled() called before each of the 6 pipeline stages
- Accepts options.isCancelled() callback from the job runner
runKytPipelineJob.js:
- Passes isCancelled: () => isJobCancelled(job.id) into pipeline
- Catches PipelineCancelledError separately, sets status=cancelled
routes/pipeline.js:
- POST /pipeline/cancel/:jobId — marks job for cancellation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both motousher and dirtstreet converters were reading product fields
(title, sku, price, images) directly from the aggregated record, but
those fields live inside record.scraped after fetchWebsiteData wraps
them. Results were: Untitled Product, missing images, SKU=variant-1.
Also fixed brand priority: per-product brand (e.g. Evans Coolant,
SC Project) now takes precedence over the global SHOPIFY_BRAND env
var (KYT), which was incorrectly overriding all products from the
new sources.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both sources are now registered in sources/index.js and fully wired
into the 6-stage pipeline (fetch → download → watermark → upload →
convert → upsert). The frontend will automatically show them as tabs
via GET /pipeline/sources without any frontend changes needed.
motousher/ (Shopify JSON API — 12 brands, ~2,446 products):
- scraper.js: fetches /collections/{slug}/products.json + /products/{handle}.json
- converter.js: maps scraped products to standard pipeline format
- index.js: fetchWebsiteData() loops all brands, normalises to
productSummary.img format for shared download/upload utilities
- Supports MOTOUSHER_BRANDS env var to filter brands on a run
dirtstreet/ (WooCommerce HTML + JSON-LD — 5 brands, ~1,087 products):
- scraper.js: pure fetch, paginates /brand/{slug}/page/N/,
extracts price from offers.priceSpecification[0].price,
stock from JSON-LD availability field
- converter.js: maps scraped products to standard pipeline format,
builds descriptionHtml from body + short desc + attributes table
- index.js: fetchWebsiteData() loops all brands, normalises to
productSummary.img format
- Supports DIRTSTREET_BRANDS env var to filter brands on a run
sources/index.js: registered all 4 sources (kyt, brocks-performance,
motousher, dirtstreet). GET /pipeline/sources now returns all 4.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds two new experimental product scrapers under test_source/, isolated
from the active pipeline until verified and ready to promote.
motousher/ (Shopify store — Shopify JSON API):
- Scrapes 12 brands: All Balls Racing, DID Chains, EBC Brakes, Esjot
Sprockets, Evans Coolant, Grip Puppies, HiFlo Filters, JT Sprockets,
Maxima Racing Oils, Putoline, Ram Mount, Wunderlich
- 2,446 products total scraped and verified
- Uses /collections/{slug}/products.json + /products/{handle}.json
- Parallel fetch (concurrency 3), paginated collection listing
dirtstreet/ (WooCommerce store — HTML + JSON-LD):
- Scrapes 5 brands: SC Project, Evotech Performance, DNA Air Filters,
WRS, Zero Gravity Racing
- 1,087 products total scraped and verified
- Pure fetch with JSON-LD schema.org extraction (no browser)
- Handles paginated /brand/{slug}/page/N/ archives
- Price extracted from offers.priceSpecification[0].price
- Stock status derived from JSON-LD availability field
Both scrapers are standalone (node index.js), support --brand and
--limit flags, save per-brand JSON files and a combined.json.
Scraped data lives in data/sources/test_source/ (gitignored).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>