{"id":163,"date":"2026-01-20T00:00:00","date_gmt":"2026-01-19T23:00:00","guid":{"rendered":"https:\/\/helloblog.io\/de\/wp-bench-wordpress-ki-benchmark\/"},"modified":"2026-01-20T00:00:00","modified_gmt":"2026-01-19T23:00:00","slug":"wp-bench-wordpress-ki-benchmark","status":"publish","type":"post","link":"https:\/\/helloblog.io\/de\/wp-bench-wordpress-ki-benchmark\/","title":{"rendered":"WP-Bench: So misst WordPress jetzt offiziell, wie gut KI-Modelle WordPress wirklich k\u00f6nnen"},"content":{"rendered":"\n<p>Wer regelm\u00e4\u00dfig mit LLMs (Large Language Models, also Sprachmodellen wie ChatGPT\/Claude\/Gemini) in WordPress-Projekten arbeitet, kennt das Problem: Allgemeine Programmier-Benchmarks sagen wenig dar\u00fcber aus, ob ein Modell <em>WordPress-spezifisch<\/em> sauber arbeitet. WP-Bench setzt genau dort an und bringt einen offiziellen WordPress AI Benchmark an den Start: <a href=\"https:\/\/github.com\/WordPress\/wp-bench\">WP-Bench auf GitHub<\/a>.<\/p>\n\n\n\n<p>Die Idee dahinter ist pragmatisch: Nicht nur \u201ekann das Modell PHP generieren?\u201c, sondern \u201eversteht es WordPress-APIs, Coding Standards, Plugin-Architektur und Security-Best-Practices \u2013 und produziert es Code, der im echten WordPress-Runtime-Kontext funktioniert?\u201c.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Warum ein WordPress-spezifischer KI-Benchmark \u00fcberhaupt n\u00f6tig ist<\/h2>\n\n\n\n<p>WordPress treibt einen gro\u00dfen Teil des Webs an, aber KI-Modelle werden in der Praxis h\u00e4ufig mit generischen Coding-Aufgaben bewertet. Das f\u00fchrt dazu, dass ein Modell in klassischen Aufgaben gl\u00e4nzt, bei WordPress aber an typischen Details scheitert: korrekte Nutzung von Hooks, Nonces, Capability-Checks, Datenbankzugriffe \u00fcber <code>$wpdb<\/code>, oder das Einhalten von WordPress Coding Standards.<\/p>\n\n\n\n<p>WP-Bench will diese L\u00fccke schlie\u00dfen \u2013 mit Messwerten, die f\u00fcr WordPress-Entwicklung relevant sind. Das hilft in zwei Richtungen: (1) Entwickelnde k\u00f6nnen Tools und Modelle besser vergleichen, (2) Modellanbieter bekommen WordPress-Leistung als eigenes Qualit\u00e4tskriterium auf den Radar \u2013 statt als Randnotiz.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Was WP-Bench konkret misst: Knowledge vs. Execution<\/h2>\n\n\n\n<p>WP-Bench teilt die Bewertung in zwei Dimensionen auf, die im Alltag oft auseinanderlaufen: Ein Modell kann WordPress-Begriffe erkl\u00e4ren (Knowledge), aber trotzdem Code ausgeben, der nicht l\u00e4uft oder Sicherheitsl\u00fccken baut (Execution).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Knowledge: WordPress-Verst\u00e4ndnis als Multiple-Choice<\/h3>\n\n\n\n<p>Der \u201eKnowledge\u201c-Teil besteht aus Multiple-Choice-Fragen zu WordPress-Konzepten und APIs: Hooks, Security-Patterns, Coding Standards und Core-Konzepte. Laut Ank\u00fcndigung liegt ein Fokus auch auf moderneren Zug\u00e4ngen wie der Abilities API und der Interactivity API \u2013 also Bereichen, in denen Modelle typischerweise eher schw\u00e4cheln, weil diese Themen neuer sind.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2) Execution: Code-Generierung, die im echten Runtime-Kontext gepr\u00fcft wird<\/h3>\n\n\n\n<p>Der \u201eExecution\u201c-Teil ist f\u00fcr WordPress-Entwicklung der spannendere: Hier muss das Modell Code erzeugen, der anschlie\u00dfend in einer WordPress-Runtime gepr\u00fcft wird. Es geht also nicht nur um \u201esieht plausibel aus\u201c, sondern um lauff\u00e4higen, \u00fcberpr\u00fcfbaren Code mit statischer Analyse und Assertions zur Laufzeit.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">So funktioniert das Grading (WordPress als Pr\u00fcfer)<\/h2>\n\n\n\n<p>Ein Kernpunkt von WP-Bench ist, dass WordPress selbst als \u201eGrader\u201c eingesetzt wird. Der Ablauf ist dabei klar strukturiert:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n\n<li>Der Harness (das Benchmark-Framework) sendet einen Prompt an das Modell und fordert WordPress-Code an.<\/li>\n\n\n<li>Der generierte Code wird \u00fcber WP-CLI an die WordPress-Runtime \u00fcbergeben.<\/li>\n\n\n<li>Die Runtime pr\u00fcft statisch (Syntax, Coding Standards, Security-Aspekte).<\/li>\n\n\n<li>Dann wird der Code in einer sandboxed Umgebung ausgef\u00fchrt; Assertions pr\u00fcfen erwartetes Verhalten.<\/li>\n\n\n<li>Am Ende kommen Ergebnisse als JSON zur\u00fcck \u2013 inklusive Score und detailliertem Feedback.<\/li>\n\n<\/ol>\n\n\n\n<p>Das ist ein wichtiger Unterschied zu vielen \u201enur Text\u201c-Evaluierungen: WP-Bench misst nicht nur Wissen, sondern auch die F\u00e4higkeit, korrekt in einer WordPress-Umgebung zu arbeiten.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Quickstart: WP-Bench lokal ausprobieren<\/h2>\n\n\n\n<p>Wenn du WP-Bench selbst laufen lassen willst, ist der Einstieg relativ geradlinig: Python-Harness installieren, API-Keys setzen, WordPress-Runtime starten und dann den Run ausf\u00fchren.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Installation (Python venv + editable install)<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>python3 -m venv .venv &amp;&amp; source .venv\/bin\/activate\npip install -e .\/python\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#B392F0\">python3<\/span><span style=\"color:#79B8FF\"> -m<\/span><span style=\"color:#9ECBFF\"> venv<\/span><span style=\"color:#9ECBFF\"> .venv<\/span><span style=\"color:#E1E4E8\"> &#x26;&#x26; <\/span><span style=\"color:#79B8FF\">source<\/span><span style=\"color:#9ECBFF\"> .venv\/bin\/activate<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">pip<\/span><span style=\"color:#9ECBFF\"> install<\/span><span style=\"color:#79B8FF\"> -e<\/span><span style=\"color:#9ECBFF\"> .\/python<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">2) API-Keys per .env konfigurieren<\/h3>\n\n\n\n<p>WP-Bench kann verschiedene Provider ansprechen; die Keys werden per <code>.env<\/code> hinterlegt:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>OPENAI_API_KEY=sk-...\nANTHROPIC_API_KEY=sk-ant-...\nGOOGLE_API_KEY=...\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#E1E4E8\">OPENAI_API_KEY<\/span><span style=\"color:#F97583\">=<\/span><span style=\"color:#9ECBFF\">sk-...<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">ANTHROPIC_API_KEY<\/span><span style=\"color:#F97583\">=<\/span><span style=\"color:#9ECBFF\">sk-ant-...<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">GOOGLE_API_KEY<\/span><span style=\"color:#F97583\">=<\/span><span style=\"color:#9ECBFF\">...<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">3) WordPress-Runtime (Grader) starten<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>cd runtime\nnpm install\nnpm start\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#79B8FF\">cd<\/span><span style=\"color:#9ECBFF\"> runtime<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">npm<\/span><span style=\"color:#9ECBFF\"> install<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">npm<\/span><span style=\"color:#9ECBFF\"> start<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">4) Benchmark ausf\u00fchren<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>cd ..\nwp-bench run --config wp-bench.example.yaml\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#79B8FF\">cd<\/span><span style=\"color:#9ECBFF\"> ..<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">wp-bench<\/span><span style=\"color:#9ECBFF\"> run<\/span><span style=\"color:#79B8FF\"> --config<\/span><span style=\"color:#9ECBFF\"> wp-bench.example.yaml<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>Die Ergebnisse landen in <code>output\/results.json<\/code>, plus per-Test Logs in <code>output\/results.jsonl<\/code>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Mehrere Modelle in einem Lauf vergleichen (Multi-Model Benchmarking)<\/h2>\n\n\n\n<p>Praktisch f\u00fcr Teams ist die M\u00f6glichkeit, mehrere Modelle in einem Run gegeneinander laufen zu lassen. In der Config listest du einfach mehrere Model-Namen auf; der Harness arbeitet sie nacheinander ab und gibt eine Vergleichstabelle aus. Die Modellnamen orientieren sich an den Konventionen von LiteLLM (einheitliche Provider-Namensgebung).<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>models:\n  - name: gpt-4o\n  - name: gpt-4o-mini\n  - name: claude-sonnet-4-20250514\n  - name: claude-opus-4-5-20251101\n  - name: gemini\/gemini-2.5-pro\n  - name: gemini\/gemini-2.5-flash\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#85E89D\">models<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gpt-4o<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gpt-4o-mini<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">claude-sonnet-4-20250514<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">claude-opus-4-5-20251101<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gemini\/gemini-2.5-pro<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gemini\/gemini-2.5-flash<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Konfiguration: Dataset, Grader, Run-Parameter, Output<\/h2>\n\n\n\n<p>Die Beispiel-Config (<code>wp-bench.example.yaml<\/code>) ist ein guter Ausgangspunkt. Typisch sind vier Bereiche: Dataset-Quelle, Modelle, Grader (Docker + wp-env Projekt) und Run-\/Output-Settings.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>dataset:\n  source: local              # 'local' or 'huggingface'\n  name: wp-core-v1           # suite name\n\nmodels:\n  - name: gpt-4o\n\ngrader:\n  kind: docker\n  wp_env_dir: .\/runtime      # path to wp-env project\n\nrun:\n  suite: wp-core-v1\n  limit: 10                  # limit tests (null = all)\n  concurrency: 4\n\noutput:\n  path: output\/results.json\n  jsonl_path: output\/results.jsonl\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#85E89D\">dataset<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  source<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">local<\/span><span style=\"color:#6A737D\">              # 'local' or 'huggingface'<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">wp-core-v1<\/span><span style=\"color:#6A737D\">           # suite name<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">models<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gpt-4o<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">grader<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  kind<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">docker<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  wp_env_dir<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">.\/runtime<\/span><span style=\"color:#6A737D\">      # path to wp-env project<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">run<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  suite<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">wp-core-v1<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  limit<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#79B8FF\">10<\/span><span style=\"color:#6A737D\">                  # limit tests (null = all)<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  concurrency<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#79B8FF\">4<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">output<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  path<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">output\/results.json<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  jsonl_path<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">output\/results.jsonl<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">N\u00fctzliche CLI-Optionen<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>wp-bench run --config wp-bench.yaml          # run with config file\nwp-bench run --model-name gpt-4o --limit 5   # quick single-model test\nwp-bench dry-run --config wp-bench.yaml      # validate config without calling models\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#B392F0\">wp-bench<\/span><span style=\"color:#9ECBFF\"> run<\/span><span style=\"color:#79B8FF\"> --config<\/span><span style=\"color:#9ECBFF\"> wp-bench.yaml<\/span><span style=\"color:#6A737D\">          # run with config file<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">wp-bench<\/span><span style=\"color:#9ECBFF\"> run<\/span><span style=\"color:#79B8FF\"> --model-name<\/span><span style=\"color:#9ECBFF\"> gpt-4o<\/span><span style=\"color:#79B8FF\"> --limit<\/span><span style=\"color:#79B8FF\"> 5<\/span><span style=\"color:#6A737D\">   # quick single-model test<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">wp-bench<\/span><span style=\"color:#9ECBFF\"> dry-run<\/span><span style=\"color:#79B8FF\"> --config<\/span><span style=\"color:#9ECBFF\"> wp-bench.yaml<\/span><span style=\"color:#6A737D\">      # validate config without calling models<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Aufbau des Repos: wo was liegt<\/h2>\n\n\n\n<p>Wenn du tiefer einsteigen oder selbst Tests erg\u00e4nzen willst, hilft die Repo-Struktur als Orientierung:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>.\n\u251c\u2500\u2500 python\/          # Benchmark harness (pip installable)\n\u251c\u2500\u2500 runtime\/         # WordPress grader plugin + wp-env config\n\u251c\u2500\u2500 datasets\/        # Test suites (local JSON + Hugging Face builder)\n\u251c\u2500\u2500 notebooks\/       # Results visualization and reporting\n\u2514\u2500\u2500 output\/          # Benchmark results (gitignored)\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#79B8FF\">.<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u251c\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> python\/<\/span><span style=\"color:#6A737D\">          # Benchmark harness (pip installable)<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u251c\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> runtime\/<\/span><span style=\"color:#6A737D\">         # WordPress grader plugin + wp-env config<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u251c\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> datasets\/<\/span><span style=\"color:#6A737D\">        # Test suites (local JSON + Hugging Face builder)<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u251c\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> notebooks\/<\/span><span style=\"color:#6A737D\">       # Results visualization and reporting<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u2514\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> output\/<\/span><span style=\"color:#6A737D\">          # Benchmark results (gitignored)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Test-Suites: Knowledge- und Execution-Datens\u00e4tze<\/h2>\n\n\n\n<p>Test-Suites liegen unter <code>datasets\/suites\/&lt;suite-name&gt;\/<\/code> und sind in zwei Verzeichnisse aufgeteilt:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li><code>execution\/<\/code>: Code-Generierungsaufgaben inklusive Assertions (je Kategorie eine JSON-Datei)<\/li>\n\n\n<li><code>knowledge\/<\/code>: Multiple-Choice-Fragen (je Kategorie eine JSON-Datei)<\/li>\n\n<\/ul>\n\n\n\n<p>Die Standard-Suite <code>wp-core-v1<\/code> deckt laut Beschreibung u. a. Core-APIs, Hooks, Datenbankoperationen und Security-Patterns ab.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Datasets von Hugging Face laden<\/h3>\n\n\n\n<p>Alternativ zu lokalen Datens\u00e4tzen kann WP-Bench Suites auch von Hugging Face beziehen, konfiguriert \u00fcber <code>dataset.source<\/code>:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>dataset:\n  source: huggingface\n  name: WordPress\/wp-bench-v1\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#85E89D\">dataset<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  source<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">huggingface<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">WordPress\/wp-bench-v1<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Aktueller Stand und bekannte Grenzen<\/h2>\n\n\n\n<p>WP-Bench ist als Early Release gestartet \u2013 entsprechend sind Einschr\u00e4nkungen offen benannt:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li><strong>Dataset-Gr\u00f6\u00dfe:<\/strong> Die aktuelle Suite ist noch relativ klein; f\u00fcr einen belastbaren Benchmark braucht es mehr F\u00e4lle \u00fcber mehr WordPress-APIs und typische Patterns hinweg.<\/li>\n\n\n<li><strong>Versions-Bias:<\/strong> Der Fokus liegt teilweise auf WordPress-6.9-nahen Themen (u. a. Abilities API, Interactivity API). Das ist n\u00fctzlich, weil genau dort viele Modelle Schwierigkeiten haben \u2013 kann aber gleichzeitig Modelle benachteiligen, deren Trainingsdaten diese Features noch kaum enthalten.<\/li>\n\n\n<li><strong>Benchmark-S\u00e4ttigung:<\/strong> Erste Tests zeigten wohl sehr hohe Scores bei \u00e4lteren WordPress-Konzepten; diese Fragen liefern dann wenig Signal. Schwierige, aber realistische Aufgaben zu finden ist die eigentliche Benchmark-Arbeit.<\/li>\n\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Warum das f\u00fcr Plugin- und Agentur-Alltag relevant ist<\/h2>\n\n\n\n<p>In der Praxis werden KI-Assistenten in WordPress-Projekten meist f\u00fcr drei Dinge genutzt: Boilerplate erstellen, unbekannte APIs nachschlagen\/erkl\u00e4ren und Code-Review\/Refactoring. WP-Bench zielt genau auf die Frage, welche Modelle dabei <em>zuverl\u00e4ssig<\/em> sind \u2013 nicht nur eloquent. Gerade bei Security (Nonces, Sanitization\/Escaping, Capability-Checks) oder bei korrekt angebundenen Hooks entscheidet \u201efunktioniert wirklich\u201c \u00fcber den Nutzen.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Open Source Ausrichtung: Leaderboard und Community-Beitr\u00e4ge<\/h2>\n\n\n\n<p>WP-Bench ist als Open-Source-Projekt angelegt. Neben dem Tool selbst arbeitet das Team laut Ank\u00fcndigung auf ein \u00f6ffentliches Leaderboard hin, das Modell-Performance auf WordPress-Aufgaben transparent macht. Zus\u00e4tzlich ist explizit Community-Mitarbeit vorgesehen \u2013 denn die Qualit\u00e4t eines Benchmarks steht und f\u00e4llt mit der Qualit\u00e4t der Testf\u00e4lle und der Strenge der Auswertung.<\/p>\n\n\n\n<p>Offizielle Einstiegspunkte sind das GitHub-Repository und die WordPress-AI-Ressourcen: <a href=\"https:\/\/github.com\/WordPress\/wp-bench\">WP-Bench GitHub Repository<\/a>, <a href=\"https:\/\/make.wordpress.org\/ai\/2025\/07\/17\/ai-building-blocks\/\">AI Building Blocks for WordPress<\/a> sowie der Slack-Channel <a href=\"https:\/\/wordpress.slack.com\/archives\/C08TJ8BPULS\">#core-ai<\/a>.<\/p>\n\n\n<div class=\"references-section\">\n                <h2>Referenzen \/ Quellen<\/h2>\n                <ul class=\"references-list\"><li><a href=\"https:\/\/make.wordpress.org\/ai\/2026\/01\/14\/introducing-wp-bench-a-wordpress-ai-benchmark\/\" target=\"_blank\" rel=\"noopener noreferrer\">Introducing WP-Bench: A WordPress AI Benchmark<\/a><\/li><li><a href=\"https:\/\/github.com\/WordPress\/wp-bench\" target=\"_blank\" rel=\"noopener noreferrer\">WP-Bench GitHub README<\/a><\/li><\/ul>\n            <\/div>","protected":false},"excerpt":{"rendered":"<p>Code-Assistenten f\u00fchlen sich bei WordPress oft erstaunlich selbstsicher an \u2013 bis es um echte Hooks, Security-Patterns oder moderne Core-APIs geht. WP-Bench soll genau diese L\u00fccke schlie\u00dfen: ein offizieller Benchmark, der WordPress-Know-how und lauff\u00e4higen Code messbar macht.<\/p>\n","protected":false},"author":10,"featured_media":162,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[45],"tags":[29,75,76,10,7],"class_list":["post-163","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-wordpress-okosystem","tag-ai","tag-benchmark","tag-open-source","tag-wordpress","tag-wp-cli"],"_links":{"self":[{"href":"https:\/\/helloblog.io\/de\/wp-json\/wp\/v2\/posts\/163","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/helloblog.io\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/helloblog.io\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/helloblog.io\/de\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/helloblog.io\/de\/wp-json\/wp\/v2\/comments?post=163"}],"version-history":[{"count":0,"href":"https:\/\/helloblog.io\/de\/wp-json\/wp\/v2\/posts\/163\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/helloblog.io\/de\/wp-json\/wp\/v2\/media\/162"}],"wp:attachment":[{"href":"https:\/\/helloblog.io\/de\/wp-json\/wp\/v2\/media?parent=163"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/helloblog.io\/de\/wp-json\/wp\/v2\/categories?post=163"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/helloblog.io\/de\/wp-json\/wp\/v2\/tags?post=163"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}