{"id":177,"date":"2026-01-20T00:00:00","date_gmt":"2026-01-19T23:00:00","guid":{"rendered":"https:\/\/helloblog.io\/fr\/wp-bench-benchmark-ia-wordpress\/"},"modified":"2026-01-20T00:00:00","modified_gmt":"2026-01-19T23:00:00","slug":"wp-bench-benchmark-ia-wordpress","status":"publish","type":"post","link":"https:\/\/helloblog.io\/fr\/wp-bench-benchmark-ia-wordpress\/","title":{"rendered":"WP-Bench : un benchmark officiel pour mesurer les IA sur WordPress (et pas sur du code g\u00e9n\u00e9rique)"},"content":{"rendered":"\n<p>On a tous eu ce moment : un mod\u00e8le te sort une fonction PHP \u201cpresque\u201d correcte, mais qui ignore une convention WordPress, oublie un <code>nonce<\/code>, m\u00e9lange des APIs, ou invente un hook. Sur des benchmarks g\u00e9n\u00e9ralistes, ces \u00e9carts passent souvent sous le radar. C\u2019est pr\u00e9cis\u00e9ment l\u2019id\u00e9e derri\u00e8re <strong>WP-Bench<\/strong>, le benchmark IA officiel orient\u00e9 WordPress, disponible en open source : <a href=\"https:\/\/github.com\/WordPress\/wp-bench\">https:\/\/github.com\/WordPress\/wp-bench<\/a>.<\/p>\n\n\n\n<p>L\u2019objectif est simple : <strong>\u00e9valuer des mod\u00e8les de langage sur des t\u00e2ches WordPress r\u00e9elles<\/strong>, depuis la compr\u00e9hension des concepts (APIs, coding standards, architecture de plugins, patterns de s\u00e9curit\u00e9) jusqu\u2019\u00e0 la capacit\u00e9 \u00e0 g\u00e9n\u00e9rer du code qui <strong>tourne r\u00e9ellement<\/strong> dans un environnement WordPress.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Pourquoi un benchmark WordPress change la donne<\/h2>\n\n\n\n<p>WordPress alimente une part \u00e9norme du web, mais l\u2019\u00e9valuation des mod\u00e8les reste tr\u00e8s souvent centr\u00e9e sur des exercices de programmation g\u00e9n\u00e9riques. R\u00e9sultat : un mod\u00e8le peut sembler excellent sur du \u201ctoy code\u201d et se r\u00e9v\u00e9ler fragile d\u00e8s qu\u2019on touche aux r\u00e9alit\u00e9s WordPress (hooks, <code>WP_Query<\/code>, permissions\/capabilities, internationalisation, escaping, etc.).<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li><strong>Mieux choisir ses outils<\/strong> : si tu d\u00e9veloppes des plugins \u201cAI-powered\u201d ou si tu relies un coding assistant \u00e0 ton workflow, comparer les mod\u00e8les sur des cas WordPress permet de prendre des d\u00e9cisions plus rationnelles.<\/li>\n\n\n<li><strong>Mettre WordPress sur la carte des labos IA<\/strong> : l\u2019ambition affich\u00e9e est de faire de WP-Bench une r\u00e9f\u00e9rence utilis\u00e9e en pr\u00e9-release par les fournisseurs (OpenAI, Anthropic, Google, etc.), pour que la performance WordPress ne soit pas un d\u00e9tail ajout\u00e9 \u00e0 la fin.<\/li>\n\n\n<li><strong>Aller vers un leaderboard open source<\/strong> : l\u2019\u00e9quipe travaille vers un classement public des r\u00e9sultats pour aider la communaut\u00e9 \u00e0 comparer les mod\u00e8les sur des t\u00e2ches WordPress, de fa\u00e7on transparente.<\/li>\n\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Deux axes d\u2019\u00e9valuation : \u201cKnowledge\u201d et \u201cExecution\u201d<\/h2>\n\n\n\n<p>WP-Bench structure l\u2019\u00e9valuation autour de deux dimensions compl\u00e9mentaires.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Knowledge : v\u00e9rifier la compr\u00e9hension WordPress<\/h3>\n\n\n\n<p>Le volet <strong>Knowledge<\/strong> s\u2019appuie sur des questions \u00e0 choix multiple pour tester la compr\u00e9hension de concepts WordPress : APIs, hooks, patterns de s\u00e9curit\u00e9, coding standards\u2026 avec un accent sur des ajouts plus r\u00e9cents cit\u00e9s dans l\u2019annonce, comme <strong>Abilities API<\/strong> et <strong>Interactivity API<\/strong> (des APIs modernes sur lesquelles les mod\u00e8les peuvent \u00eatre moins \u00e0 l\u2019aise).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2) Execution : juger du code qui tourne vraiment<\/h3>\n\n\n\n<p>Le volet <strong>Execution<\/strong> est le plus int\u00e9ressant c\u00f4t\u00e9 dev : on demande au mod\u00e8le de g\u00e9n\u00e9rer du code, puis ce code est <strong>\u00e9valu\u00e9 par un runtime WordPress<\/strong>. Autrement dit, on ne note pas seulement \u201cla forme\u201d : on ex\u00e9cute, on v\u00e9rifie, on fait remonter des erreurs et des assertions.<\/p>\n\n\n\n<div class=\"wp-block-group callout callout-info is-style-info is-layout-flow wp-block-group-is-layout-flow\" style=\"border-width:1px;border-radius:8px;padding-top:1rem;padding-right:1.5rem;padding-bottom:1rem;padding-left:1.5rem\">\n\n<h4 class=\"wp-block-heading callout-title\">Pourquoi c\u2019est important<\/h4>\n\n\n<p>Beaucoup d\u2019\u00e9valuations s\u2019arr\u00eatent \u00e0 une comparaison textuelle. Ici, WP-Bench met un WordPress r\u00e9el dans la boucle de correction : analyse statique + ex\u00e9cution en sandbox + assertions. \u00c7a rapproche le benchmark d\u2019un vrai contexte de dev.<\/p>\n\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Comment WP-Bench note un mod\u00e8le (le flux de correction)<\/h2>\n\n\n\n<p>Le m\u00e9canisme d\u00e9crit est celui d\u2019un <strong>harness<\/strong> (un orchestrateur) qui pilote le mod\u00e8le, puis d\u00e9l\u00e8gue la correction \u00e0 WordPress lui-m\u00eame.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n\n<li>Le harness envoie un prompt au mod\u00e8le pour obtenir du code WordPress.<\/li>\n\n\n<li>Le code g\u00e9n\u00e9r\u00e9 est transmis au runtime WordPress via <strong>WP-CLI<\/strong> (outil en ligne de commande pour piloter WordPress).<\/li>\n\n\n<li>Le runtime lance une analyse statique : syntaxe, standards, signaux s\u00e9curit\u00e9.<\/li>\n\n\n<li>Le code est ex\u00e9cut\u00e9 dans un environnement isol\u00e9 (sandbox) avec des assertions c\u00f4t\u00e9 tests.<\/li>\n\n\n<li>Le syst\u00e8me renvoie un r\u00e9sultat en JSON : score + feedback d\u00e9taill\u00e9.<\/li>\n\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Installation et ex\u00e9cution : d\u00e9marrage rapide<\/h2>\n\n\n\n<p>WP-Bench se compose d\u2019un outil Python (le harness) et d\u2019un runtime WordPress (bas\u00e9 sur un projet <code>wp-env<\/code> c\u00f4t\u00e9 Node). Le workflow se fait en quatre \u00e9tapes : environnement Python, cl\u00e9s d\u2019API, runtime WordPress, puis ex\u00e9cution du benchmark.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Installer le harness (Python)<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>python3 -m venv .venv &amp;&amp; source .venv\/bin\/activate\npip install -e .\/python\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#B392F0\">python3<\/span><span style=\"color:#79B8FF\"> -m<\/span><span style=\"color:#9ECBFF\"> venv<\/span><span style=\"color:#9ECBFF\"> .venv<\/span><span style=\"color:#E1E4E8\"> &#x26;&#x26; <\/span><span style=\"color:#79B8FF\">source<\/span><span style=\"color:#9ECBFF\"> .venv\/bin\/activate<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">pip<\/span><span style=\"color:#9ECBFF\"> install<\/span><span style=\"color:#79B8FF\"> -e<\/span><span style=\"color:#9ECBFF\"> .\/python<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">2) D\u00e9finir les cl\u00e9s des providers dans un fichier .env<\/h3>\n\n\n\n<p>WP-Bench peut appeler diff\u00e9rents fournisseurs de mod\u00e8les. La configuration se fait via un fichier <code>.env<\/code>.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>OPENAI_API_KEY=sk-...\nANTHROPIC_API_KEY=sk-ant-...\nGOOGLE_API_KEY=...\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#E1E4E8\">OPENAI_API_KEY<\/span><span style=\"color:#F97583\">=<\/span><span style=\"color:#9ECBFF\">sk-...<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">ANTHROPIC_API_KEY<\/span><span style=\"color:#F97583\">=<\/span><span style=\"color:#9ECBFF\">sk-ant-...<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">GOOGLE_API_KEY<\/span><span style=\"color:#F97583\">=<\/span><span style=\"color:#9ECBFF\">...<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">3) D\u00e9marrer le runtime WordPress (le \u201cgrader\u201d)<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>cd runtime\nnpm install\nnpm start\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#79B8FF\">cd<\/span><span style=\"color:#9ECBFF\"> runtime<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">npm<\/span><span style=\"color:#9ECBFF\"> install<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">npm<\/span><span style=\"color:#9ECBFF\"> start<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">4) Lancer le benchmark<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>cd ..\nwp-bench run --config wp-bench.example.yaml\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#79B8FF\">cd<\/span><span style=\"color:#9ECBFF\"> ..<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">wp-bench<\/span><span style=\"color:#9ECBFF\"> run<\/span><span style=\"color:#79B8FF\"> --config<\/span><span style=\"color:#9ECBFF\"> wp-bench.example.yaml<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>Les r\u00e9sultats sont \u00e9crits dans <code>output\/results.json<\/code>, avec des logs par test dans <code>output\/results.jsonl<\/code>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Comparer plusieurs mod\u00e8les dans un seul run<\/h2>\n\n\n\n<p>Un point pratique : tu peux lancer une campagne multi-mod\u00e8les en listant plusieurs entr\u00e9es dans la config. Le harness les ex\u00e9cute s\u00e9quentiellement et produit un tableau comparatif.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>models:\n  - name: gpt-4o\n  - name: gpt-4o-mini\n  - name: claude-sonnet-4-20250514\n  - name: claude-opus-4-5-20251101\n  - name: gemini\/gemini-2.5-pro\n  - name: gemini\/gemini-2.5-flash\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#85E89D\">models<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gpt-4o<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gpt-4o-mini<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">claude-sonnet-4-20250514<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">claude-opus-4-5-20251101<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gemini\/gemini-2.5-pro<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gemini\/gemini-2.5-flash<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>Les noms de mod\u00e8les suivent les conventions de LiteLLM : <a href=\"https:\/\/docs.litellm.ai\/docs\/providers\">https:\/\/docs.litellm.ai\/docs\/providers<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Configurer WP-Bench : dataset, grader et param\u00e8tres de run<\/h2>\n\n\n\n<p>La configuration se fait via un fichier YAML (\u00e0 partir de <code>wp-bench.example.yaml<\/code>). Tu y d\u00e9finis notamment la suite de tests, le type de grader, et des options comme <code>limit<\/code> ou <code>concurrency<\/code>.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>dataset:\n  source: local              # 'local' or 'huggingface'\n  name: wp-core-v1           # suite name\n\nmodels:\n  - name: gpt-4o\n\ngrader:\n  kind: docker\n  wp_env_dir: .\/runtime      # path to wp-env project\n\nrun:\n  suite: wp-core-v1\n  limit: 10                  # limit tests (null = all)\n  concurrency: 4\n\noutput:\n  path: output\/results.json\n  jsonl_path: output\/results.jsonl\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#85E89D\">dataset<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  source<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">local<\/span><span style=\"color:#6A737D\">              # 'local' or 'huggingface'<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">wp-core-v1<\/span><span style=\"color:#6A737D\">           # suite name<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">models<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gpt-4o<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">grader<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  kind<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">docker<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  wp_env_dir<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">.\/runtime<\/span><span style=\"color:#6A737D\">      # path to wp-env project<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">run<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  suite<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">wp-core-v1<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  limit<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#79B8FF\">10<\/span><span style=\"color:#6A737D\">                  # limit tests (null = all)<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  concurrency<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#79B8FF\">4<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">output<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  path<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">output\/results.json<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  jsonl_path<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">output\/results.jsonl<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Options CLI utiles<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>wp-bench run --config wp-bench.yaml          # run with config file\nwp-bench run --model-name gpt-4o --limit 5   # quick single-model test\nwp-bench dry-run --config wp-bench.yaml      # validate config without calling models\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#B392F0\">wp-bench<\/span><span style=\"color:#9ECBFF\"> run<\/span><span style=\"color:#79B8FF\"> --config<\/span><span style=\"color:#9ECBFF\"> wp-bench.yaml<\/span><span style=\"color:#6A737D\">          # run with config file<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">wp-bench<\/span><span style=\"color:#9ECBFF\"> run<\/span><span style=\"color:#79B8FF\"> --model-name<\/span><span style=\"color:#9ECBFF\"> gpt-4o<\/span><span style=\"color:#79B8FF\"> --limit<\/span><span style=\"color:#79B8FF\"> 5<\/span><span style=\"color:#6A737D\">   # quick single-model test<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">wp-bench<\/span><span style=\"color:#9ECBFF\"> dry-run<\/span><span style=\"color:#79B8FF\"> --config<\/span><span style=\"color:#9ECBFF\"> wp-bench.yaml<\/span><span style=\"color:#6A737D\">      # validate config without calling models<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Structure du d\u00e9p\u00f4t : qui fait quoi ?<\/h2>\n\n\n\n<p>Le d\u00e9p\u00f4t est organis\u00e9 de fa\u00e7on assez lisible : une partie Python pour orchestrer, une partie runtime pour corriger, des datasets versionn\u00e9s, et des notebooks pour analyser les r\u00e9sultats.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>.\n\u251c\u2500\u2500 python\/          # Benchmark harness (pip installable)\n\u251c\u2500\u2500 runtime\/         # WordPress grader plugin + wp-env config\n\u251c\u2500\u2500 datasets\/        # Test suites (local JSON + Hugging Face builder)\n\u251c\u2500\u2500 notebooks\/       # Results visualization and reporting\n\u2514\u2500\u2500 output\/          # Benchmark results (gitignored)\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#79B8FF\">.<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u251c\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> python\/<\/span><span style=\"color:#6A737D\">          # Benchmark harness (pip installable)<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u251c\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> runtime\/<\/span><span style=\"color:#6A737D\">         # WordPress grader plugin + wp-env config<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u251c\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> datasets\/<\/span><span style=\"color:#6A737D\">        # Test suites (local JSON + Hugging Face builder)<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u251c\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> notebooks\/<\/span><span style=\"color:#6A737D\">       # Results visualization and reporting<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u2514\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> output\/<\/span><span style=\"color:#6A737D\">          # Benchmark results (gitignored)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Suites de tests : Knowledge vs Execution, et wp-core-v1<\/h2>\n\n\n\n<p>Les suites de tests vivent dans <code>datasets\/suites\/&lt;suite-name&gt;\/<\/code> et se d\u00e9coupent en deux r\u00e9pertoires : <code>execution\/<\/code> pour les t\u00e2ches de g\u00e9n\u00e9ration de code avec assertions, et <code>knowledge\/<\/code> pour les QCM.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li><code>execution\/<\/code> : t\u00e2ches de code, assertions et cat\u00e9gories (un JSON par cat\u00e9gorie).<\/li>\n\n\n<li><code>knowledge\/<\/code> : questions \u00e0 choix multiple (un JSON par cat\u00e9gorie).<\/li>\n\n<\/ul>\n\n\n\n<p>La suite par d\u00e9faut, <strong><code>wp-core-v1<\/code><\/strong>, couvre notamment des APIs du core, des hooks, des op\u00e9rations base de donn\u00e9es et des patterns de s\u00e9curit\u00e9.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Charger un dataset depuis Hugging Face<\/h3>\n\n\n\n<p>WP-Bench peut aussi charger une suite depuis Hugging Face via la config.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>dataset:\n  source: huggingface\n  name: WordPress\/wp-bench-v1\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#85E89D\">dataset<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  source<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">huggingface<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">WordPress\/wp-bench-v1<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">\u00c9tat actuel et limites connues (\u00e0 garder en t\u00eate)<\/h2>\n\n\n\n<p>WP-Bench est annonc\u00e9 comme une <strong>premi\u00e8re version<\/strong> et il y a des limites assum\u00e9es. C\u2019est important si tu veux interpr\u00e9ter correctement les scores.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li><strong>Taille du dataset<\/strong> : la suite actuelle est encore petite ; il faudra plus de cas couvrant davantage d\u2019APIs et de patterns WordPress pour une mesure \u201cexhaustive\u201d.<\/li>\n\n\n<li><strong>Couverture des versions<\/strong> : la suite penche vers des fonctionnalit\u00e9s associ\u00e9es \u00e0 WordPress 6.9 (Abilities API, Interactivity API). C\u2019est en partie volontaire (les mod\u00e8les y tr\u00e9buchent davantage), mais \u00e7a peut biaiser les r\u00e9sultats car ces \u00e9l\u00e9ments sont post\u00e9rieurs \u00e0 l\u2019entra\u00eenement de nombreux mod\u00e8les.<\/li>\n\n\n<li><strong>Saturation du benchmark<\/strong> : les premiers tests montrent des scores tr\u00e8s \u00e9lev\u00e9s sur des notions WordPress plus anciennes, ce qui r\u00e9duit le \u201csignal\u201d. Le d\u00e9fi est de construire des tests r\u00e9ellement discriminants, pas seulement des questions nouvelles.<\/li>\n\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">En pratique : \u00e0 quoi \u00e7a sert pour un dev WordPress ?<\/h2>\n\n\n\n<p>WP-Bench vise un usage tr\u00e8s concret : quand tu relies un mod\u00e8le \u00e0 ton IDE, \u00e0 un agent, ou \u00e0 un plugin qui g\u00e9n\u00e8re du code, tu veux savoir s\u2019il g\u00e8re correctement les fondamentaux WordPress (permissions, s\u00e9curit\u00e9, conventions), mais aussi s\u2019il sait produire du code ex\u00e9cutable sans \u201challuciner\u201d des APIs.<\/p>\n\n\n\n<p>Le fait d\u2019avoir un grader bas\u00e9 sur un runtime WordPress, avec analyse statique et assertions, rend les r\u00e9sultats plus actionnables qu\u2019un simple score textuel : tu peux mieux comprendre <em>o\u00f9<\/em> \u00e7a casse (standards, s\u00e9curit\u00e9, ex\u00e9cution) et pas uniquement <em>que<\/em> \u00e7a casse.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Ressources<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li>D\u00e9p\u00f4t GitHub WP-Bench : <a href=\"https:\/\/github.com\/WordPress\/wp-bench\">https:\/\/github.com\/WordPress\/wp-bench<\/a><\/li>\n\n\n<li>AI Building Blocks for WordPress : <a href=\"https:\/\/make.wordpress.org\/ai\/2025\/07\/17\/ai-building-blocks\/\">https:\/\/make.wordpress.org\/ai\/2025\/07\/17\/ai-building-blocks\/<\/a><\/li>\n\n\n<li>Canal Slack #core-ai : <a href=\"https:\/\/wordpress.slack.com\/archives\/C08TJ8BPULS\">https:\/\/wordpress.slack.com\/archives\/C08TJ8BPULS<\/a><\/li>\n\n<\/ul>\n\n\n<div class=\"references-section\">\n                <h2>R\u00e9f\u00e9rences \/ Sources<\/h2>\n                <ul class=\"references-list\"><li><a href=\"https:\/\/make.wordpress.org\/ai\/2026\/01\/14\/introducing-wp-bench-a-wordpress-ai-benchmark\/\" target=\"_blank\" rel=\"noopener noreferrer\">Introducing WP-Bench: A WordPress AI Benchmark<\/a><\/li><li><a href=\"https:\/\/github.com\/WordPress\/wp-bench\" target=\"_blank\" rel=\"noopener noreferrer\">WP-Bench GitHub README<\/a><\/li><li><a href=\"https:\/\/make.wordpress.org\/ai\/2025\/07\/17\/ai-building-blocks\/\" target=\"_blank\" rel=\"noopener noreferrer\">AI Building Blocks for WordPress<\/a><\/li><li><a href=\"https:\/\/wordpress.slack.com\/archives\/C08TJ8BPULS\" target=\"_blank\" rel=\"noopener noreferrer\">#core-ai Slack channel<\/a><\/li><li><a href=\"https:\/\/docs.litellm.ai\/docs\/providers\" target=\"_blank\" rel=\"noopener noreferrer\">Providers | LiteLLM<\/a><\/li><\/ul>\n            <\/div>","protected":false},"excerpt":{"rendered":"<p>Les assistants de code sont partout, mais savent-ils vraiment manier les APIs WordPress, ses hooks et ses standards ? WP-Bench propose enfin une mesure concr\u00e8te, reproductible et orient\u00e9e \u201cruntime\u201d des capacit\u00e9s des mod\u00e8les sur WordPress.<\/p>\n","protected":false},"author":12,"featured_media":176,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[63],"tags":[75,44,11,10,76],"class_list":["post-177","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ecosysteme-wordpress","tag-benchmark","tag-ia","tag-securite","tag-wordpress","tag-wp-cli"],"_links":{"self":[{"href":"https:\/\/helloblog.io\/fr\/wp-json\/wp\/v2\/posts\/177","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/helloblog.io\/fr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/helloblog.io\/fr\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/helloblog.io\/fr\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/helloblog.io\/fr\/wp-json\/wp\/v2\/comments?post=177"}],"version-history":[{"count":0,"href":"https:\/\/helloblog.io\/fr\/wp-json\/wp\/v2\/posts\/177\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/helloblog.io\/fr\/wp-json\/wp\/v2\/media\/176"}],"wp:attachment":[{"href":"https:\/\/helloblog.io\/fr\/wp-json\/wp\/v2\/media?parent=177"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/helloblog.io\/fr\/wp-json\/wp\/v2\/categories?post=177"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/helloblog.io\/fr\/wp-json\/wp\/v2\/tags?post=177"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}