Race-Nation-Shopify-Backend

Author	SHA1	Message	Date
MOHAN	08f21d9bc9	Add job cancellation — backend pipeline + cancel API route pipelineJobs.js: - cancelJob(jobId): marks job as cancelled=true, status=cancelling - isJobCancelled(jobId): checked by the pipeline between stages runSourcePipeline.js: - PipelineCancelledError class - checkCancelled() called before each of the 6 pipeline stages - Accepts options.isCancelled() callback from the job runner runKytPipelineJob.js: - Passes isCancelled: () => isJobCancelled(job.id) into pipeline - Catches PipelineCancelledError separately, sets status=cancelled routes/pipeline.js: - POST /pipeline/cancel/:jobId — marks job for cancellation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-04 16:57:35 +05:30
MOHAN	4e536f08b3	Add 24h cache to motousher and dirtstreet fetchWebsiteData Subsequent pipeline runs within 24 hours reuse the existing 01_products_aggregated.json instead of re-scraping all brands, eliminating redundant HTTP requests and 429 rate-limit retries. Cache lifetime controlled per source: MOTOUSHER_CACHE_HOURS=0 → always re-scrape DIRTSTREET_CACHE_HOURS=0 → always re-scrape (default: 24h) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-04 16:44:38 +05:30
MOHAN	6eac0b92ed	Fix converter reading nested scraped data and brand priority Both motousher and dirtstreet converters were reading product fields (title, sku, price, images) directly from the aggregated record, but those fields live inside record.scraped after fetchWebsiteData wraps them. Results were: Untitled Product, missing images, SKU=variant-1. Also fixed brand priority: per-product brand (e.g. Evans Coolant, SC Project) now takes precedence over the global SHOPIFY_BRAND env var (KYT), which was incorrectly overriding all products from the new sources. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-04 16:39:54 +05:30
MOHAN	c0132ab0aa	Add motousher and dirtstreet as active import pipeline sources Both sources are now registered in sources/index.js and fully wired into the 6-stage pipeline (fetch → download → watermark → upload → convert → upsert). The frontend will automatically show them as tabs via GET /pipeline/sources without any frontend changes needed. motousher/ (Shopify JSON API — 12 brands, ~2,446 products): - scraper.js: fetches /collections/{slug}/products.json + /products/{handle}.json - converter.js: maps scraped products to standard pipeline format - index.js: fetchWebsiteData() loops all brands, normalises to productSummary.img format for shared download/upload utilities - Supports MOTOUSHER_BRANDS env var to filter brands on a run dirtstreet/ (WooCommerce HTML + JSON-LD — 5 brands, ~1,087 products): - scraper.js: pure fetch, paginates /brand/{slug}/page/N/, extracts price from offers.priceSpecification[0].price, stock from JSON-LD availability field - converter.js: maps scraped products to standard pipeline format, builds descriptionHtml from body + short desc + attributes table - index.js: fetchWebsiteData() loops all brands, normalises to productSummary.img format - Supports DIRTSTREET_BRANDS env var to filter brands on a run sources/index.js: registered all 4 sources (kyt, brocks-performance, motousher, dirtstreet). GET /pipeline/sources now returns all 4. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-04 12:25:31 +05:30
MOHAN	b8d9478afa	Add test_source scrapers for motousher.com and dirtstreet.in Adds two new experimental product scrapers under test_source/, isolated from the active pipeline until verified and ready to promote. motousher/ (Shopify store — Shopify JSON API): - Scrapes 12 brands: All Balls Racing, DID Chains, EBC Brakes, Esjot Sprockets, Evans Coolant, Grip Puppies, HiFlo Filters, JT Sprockets, Maxima Racing Oils, Putoline, Ram Mount, Wunderlich - 2,446 products total scraped and verified - Uses /collections/{slug}/products.json + /products/{handle}.json - Parallel fetch (concurrency 3), paginated collection listing dirtstreet/ (WooCommerce store — HTML + JSON-LD): - Scrapes 5 brands: SC Project, Evotech Performance, DNA Air Filters, WRS, Zero Gravity Racing - 1,087 products total scraped and verified - Pure fetch with JSON-LD schema.org extraction (no browser) - Handles paginated /brand/{slug}/page/N/ archives - Price extracted from offers.priceSpecification[0].price - Stock status derived from JSON-LD availability field Both scrapers are standalone (node index.js), support --brand and --limit flags, save per-brand JSON files and a combined.json. Scraped data lives in data/sources/test_source/ (gitignored). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-04 12:17:23 +05:30
MOHAN	1d254a9009	Install Playwright browser for backend	2026-05-15 00:25:40 +05:30
MOHAN	68949f124e	Add multi-source import pipeline	2026-05-14 23:57:27 +05:30
MOHAN	bef07eff10	Refactor code structure for improved readability and maintainability	2026-05-14 23:56:13 +05:30
MOHAN	2320d1d5c3	Fix typo in health check service name	2026-04-14 14:23:00 +05:30
MOHAN	33ad269821	Fix health check service name in response	2026-04-14 14:22:25 +05:30
MOHAN	9480832478	Add concurrency handling and logging enhancements to KYT pipeline	2026-04-14 13:26:56 +05:30
MOHAN	e87bd907ea	first commit	2026-04-13 17:31:26 +05:30

12 Commits