Quantitative layer
JOB-ADVERTISING CORPUS.
A total of 6,206 job sources (public portals, commercial boards and employer ATS feeds) were scraped in fifteen languages. The window runs 1 January 2022 to 31 december 2025.
AI keyword lexicon.
A 220-term, multi-lingual list captured AI references in titles, duties and skills fields; fast-text similarity handled synonyms.
Vacancy universe.
After de-duplication and language harmonisation, 31.2 million unique European postings remained. 7.47 million contained at least one AI reference and formed the core dataset, that’s 24%.
Wage data.
Advertised salaries were matched to Eurostat’s Structure of Earnings Survey t(wage-index adjusted) and ≈ 195 000 Glassdoor/Indeed self-reports.
Task & automation data.
The OECD Skills-for-Jobs database (v 2024-2) provides task-exposure scores that inform role deep-dives.

scraping
cleaning
AI-tagging
analytics
pipelines
01
Methodology
Qualitative layer
The OECD Skills-for-Jobs database (v 2024-2) provides task-exposure scores that inform role deep-dives.
Limitations & mitigations
Language nuance. Multi-word competence phrases, e.g. « analyse des données », risk under-counting. Fast-text similarity and manual checks correct the largest misses.
Unadvertised hiring. Internal promotions and referral pipelines are invisible to scraping. Interview probes capture qualitative offsets.
Self-reported wages. Glassdoor data skew toward English-speaking countries; Eurostat medians anchor national differentials.
Rapid tool churn. New AI product names appear faster than any static lexicon; a monthly refresh loop keeps keyword capture current.
