9 Commits

Author SHA1 Message Date
c0132ab0aa Add motousher and dirtstreet as active import pipeline sources
Both sources are now registered in sources/index.js and fully wired
into the 6-stage pipeline (fetch → download → watermark → upload →
convert → upsert). The frontend will automatically show them as tabs
via GET /pipeline/sources without any frontend changes needed.

motousher/ (Shopify JSON API — 12 brands, ~2,446 products):
- scraper.js: fetches /collections/{slug}/products.json + /products/{handle}.json
- converter.js: maps scraped products to standard pipeline format
- index.js: fetchWebsiteData() loops all brands, normalises to
  productSummary.img format for shared download/upload utilities
- Supports MOTOUSHER_BRANDS env var to filter brands on a run

dirtstreet/ (WooCommerce HTML + JSON-LD — 5 brands, ~1,087 products):
- scraper.js: pure fetch, paginates /brand/{slug}/page/N/,
  extracts price from offers.priceSpecification[0].price,
  stock from JSON-LD availability field
- converter.js: maps scraped products to standard pipeline format,
  builds descriptionHtml from body + short desc + attributes table
- index.js: fetchWebsiteData() loops all brands, normalises to
  productSummary.img format
- Supports DIRTSTREET_BRANDS env var to filter brands on a run

sources/index.js: registered all 4 sources (kyt, brocks-performance,
motousher, dirtstreet). GET /pipeline/sources now returns all 4.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 12:25:31 +05:30
b8d9478afa Add test_source scrapers for motousher.com and dirtstreet.in
Adds two new experimental product scrapers under test_source/, isolated
from the active pipeline until verified and ready to promote.

motousher/ (Shopify store — Shopify JSON API):
- Scrapes 12 brands: All Balls Racing, DID Chains, EBC Brakes, Esjot
  Sprockets, Evans Coolant, Grip Puppies, HiFlo Filters, JT Sprockets,
  Maxima Racing Oils, Putoline, Ram Mount, Wunderlich
- 2,446 products total scraped and verified
- Uses /collections/{slug}/products.json + /products/{handle}.json
- Parallel fetch (concurrency 3), paginated collection listing

dirtstreet/ (WooCommerce store — HTML + JSON-LD):
- Scrapes 5 brands: SC Project, Evotech Performance, DNA Air Filters,
  WRS, Zero Gravity Racing
- 1,087 products total scraped and verified
- Pure fetch with JSON-LD schema.org extraction (no browser)
- Handles paginated /brand/{slug}/page/N/ archives
- Price extracted from offers.priceSpecification[0].price
- Stock status derived from JSON-LD availability field

Both scrapers are standalone (node index.js), support --brand and
--limit flags, save per-brand JSON files and a combined.json.
Scraped data lives in data/sources/test_source/ (gitignored).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 12:17:23 +05:30
1d254a9009 Install Playwright browser for backend 2026-05-15 00:25:40 +05:30
68949f124e Add multi-source import pipeline 2026-05-14 23:57:27 +05:30
bef07eff10 Refactor code structure for improved readability and maintainability 2026-05-14 23:56:13 +05:30
2320d1d5c3 Fix typo in health check service name 2026-04-14 14:23:00 +05:30
33ad269821 Fix health check service name in response 2026-04-14 14:22:25 +05:30
9480832478 Add concurrency handling and logging enhancements to KYT pipeline 2026-04-14 13:26:56 +05:30
e87bd907ea first commit 2026-04-13 17:31:26 +05:30