8 Commits

Author SHA1 Message Date
b8d9478afa Add test_source scrapers for motousher.com and dirtstreet.in
Adds two new experimental product scrapers under test_source/, isolated
from the active pipeline until verified and ready to promote.

motousher/ (Shopify store — Shopify JSON API):
- Scrapes 12 brands: All Balls Racing, DID Chains, EBC Brakes, Esjot
  Sprockets, Evans Coolant, Grip Puppies, HiFlo Filters, JT Sprockets,
  Maxima Racing Oils, Putoline, Ram Mount, Wunderlich
- 2,446 products total scraped and verified
- Uses /collections/{slug}/products.json + /products/{handle}.json
- Parallel fetch (concurrency 3), paginated collection listing

dirtstreet/ (WooCommerce store — HTML + JSON-LD):
- Scrapes 5 brands: SC Project, Evotech Performance, DNA Air Filters,
  WRS, Zero Gravity Racing
- 1,087 products total scraped and verified
- Pure fetch with JSON-LD schema.org extraction (no browser)
- Handles paginated /brand/{slug}/page/N/ archives
- Price extracted from offers.priceSpecification[0].price
- Stock status derived from JSON-LD availability field

Both scrapers are standalone (node index.js), support --brand and
--limit flags, save per-brand JSON files and a combined.json.
Scraped data lives in data/sources/test_source/ (gitignored).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 12:17:23 +05:30
1d254a9009 Install Playwright browser for backend 2026-05-15 00:25:40 +05:30
68949f124e Add multi-source import pipeline 2026-05-14 23:57:27 +05:30
bef07eff10 Refactor code structure for improved readability and maintainability 2026-05-14 23:56:13 +05:30
2320d1d5c3 Fix typo in health check service name 2026-04-14 14:23:00 +05:30
33ad269821 Fix health check service name in response 2026-04-14 14:22:25 +05:30
9480832478 Add concurrency handling and logging enhancements to KYT pipeline 2026-04-14 13:26:56 +05:30
e87bd907ea first commit 2026-04-13 17:31:26 +05:30