Methodology

← The board

How the board is built

Every listing here passes through one pipeline: fetch from nine sources, keep only tech, judge whether a resident of Europe could realistically apply, dedupe, enrich, and store. The figures below are the board’s real numbers right now, not illustrations.

  1. 01

    Fetch every source in parallel

    Nine sources are queried at once. Each runs independently: a source that errors or times out logs a warning and yields nothing, so one bad feed never sinks an ingest. Most are JSON APIs; WeWorkRemotely is RSS we parse; Hacker News is scraped from the latest “Who’s Hiring?” thread.

    Roles fetched in the most recent ingest, across nine sources.
    206last ingestroles fetched
  2. 02

    Keep only tech roles

    Every listing is classified by a rules-only keyword pass over its title (plus a source’s own category when it ships one). It’s positive-match only: if we can’t place a role in a tech bucket, it doesn’t make the board. Here’s the live mix of what passed.

    Live mix of tech categories on the board, top five plus the rest.
    • DevOps45
    • Design32
    • Software / Other29
    • Data / ML17
    • Full-stack11
    • Others27
  3. 03

    Judge Europe eligibility

    Each role is tagged from its location, using rules only. A concrete European place or a clear regional signal (“Remote — EU”, “EMEA”) is eligible; a broad worldwide signal is likely; anything that names no location stays unknown rather than being faked into confidence. Roles locked to a non-European region are excluded and never reach the page. This split is the whole point of the board.

    Live eligibility split: eligible, likely, and unknown.
    • 56 eligible
    • 61 likely
    • 44 unknown
  4. 04

    Resolve a country

    Where a location names a concrete country or major city, we map it to an ISO country code. That powers the country filter and the per-country counts. Region-only signals (“Europe”, “worldwide”) resolve to no country on purpose, so they don’t mis-shade the data.

  5. 05

    Deduplicate

    The same company often posts the same role to several boards. Each job gets a stable id from sha1(company + title); duplicates collapse to one, keeping the variant with the strongest eligibility verdict.

    Unique roles after removing duplicates in the most recent ingest.
    203last ingestunique, after merging 3 duplicates
  6. 06

    Enrich salary, experience & perks

    Sources that ship these as structured fields (Free-Work, Himalayas, Jobicy, RemoteOK) are used as-is. For the rest, and to fill gaps, we ask Claude Haiku to read the description and return a compact salary, a seniority string, and up to five perk chips, constrained to a JSON schema. The model reports only what the text states and never invents pay or perks, so coverage is honest, not total.

    Share of live roles carrying a salary, experience, and perks.
    Salary3421%
    Experience13081%
    Perks14993%
  7. 07

    Store, then read live

    Clean rows are upserted into Postgres by their id: re-running refreshes existing roles and inserts new ones, and prior enrichment is preserved if a later pass doesn’t reproduce it. The page reads the database live on every request, so it always reflects the most recent ingest. A scheduler re-runs the whole pipeline on a cron (every six hours by default).

The nine sources

Fetched in parallel; failures are isolated. The bars are each source’s real current contribution to the board.

What we don’t do

We don’t guess. Eligibility and tech classification are rules over the fields a source actually gives us, so unknown is a first-class verdict, not a gap we paper over. The AI step only extracts facts already present in a description; it never infers a salary, a seniority, or a perk that isn’t written down. And we never rewrite a role’s title or company; the apply link goes straight to the original posting.