{"id":152,"date":"2026-01-20T00:00:00","date_gmt":"2026-01-19T23:00:00","guid":{"rendered":"https:\/\/helloblog.io\/hr\/wp-bench-wordpress-ai-benchmark\/"},"modified":"2026-01-20T00:00:00","modified_gmt":"2026-01-19T23:00:00","slug":"wp-bench-wordpress-ai-benchmark","status":"publish","type":"post","link":"https:\/\/helloblog.io\/hr\/wp-bench-wordpress-ai-benchmark\/","title":{"rendered":"WP-Bench: slu\u017ebeni WordPress AI benchmark koji kona\u010dno mjeri \u201czna\u0161 li WordPress?\u201d"},"content":{"rendered":"\n<p>AI modeli sve bolje pi\u0161u kod, ali WordPress je specifi\u010dna zvijer: hookovi, capabilites, core API-jevi, coding standards, kompatibilnost unatrag i tisu\u0107u na\u010dina da slu\u010dajno napravi\u0161 sigurnosnu rupu. Upravo zato je zanimljiva vijest da WordPress projekt gura <a href=\"https:\/\/github.com\/WordPress\/wp-bench\">WP-Bench<\/a> kao slu\u017ebeni AI benchmark fokusiran ba\u0161 na WordPress development.<\/p>\n\n\n\n<p>Ideja je jednostavna: umjesto generi\u010dkih zadataka tipa \u201cnapi\u0161i REST endpoint\u201d, WP-Bench testira koliko model razumije WordPress koncepte i mo\u017ee li isporu\u010diti kod koji stvarno radi u WordPress runtimeu, uz provjere standarda i sigurnosti.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\u0160to je WP-Bench i kome je namijenjen<\/h2>\n\n\n\n<p>WP-Bench je benchmark (standardizirani set testova) za evaluaciju LLM-ova (large language models) u kontekstu WordPressa. Pokriva dvije velike zone: teorijsko razumijevanje (znanje) i prakti\u010dnu isporuku koda (izvr\u0161avanje).<\/p>\n\n\n\n<p>Ako radi\u0161 na AI-powered pluginovima, internim alatima ili samo bira\u0161 coding assistant za WordPress projekte, ovakav benchmark je koristan jer ti daje usporedivu metriku za WordPress-specifi\u010dne zadatke, a ne op\u0107i dojam \u201cdobro pi\u0161e PHP\u201d.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Za\u0161to WordPressu treba poseban AI benchmark<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li><strong>WordPress je dominantna platforma<\/strong>, ali ve\u0107ina evaluacija AI modela je op\u0107enita i rijetko ulazi u specifi\u010dnosti WordPress ekosustava.<\/li>\n\n\n<li><strong>WordPress development ima jasna pravila igre<\/strong>: coding standards, sanitization\/escaping, nonces, permission checks, pravilna uporaba hookova i API-ja.<\/li>\n\n\n<li><strong>Novi API-ji se te\u0161ko \u201chvataju\u201d<\/strong>: WP-Bench namjerno nagla\u0161ava moderne stvari poput Abilities API-ja i Interactivity API-ja, gdje se modeli \u010desto spoti\u010du jer te zna\u010dajke dolaze nakon ve\u0107ine njihovih trening setova (prema samoj objavi).<\/li>\n\n\n<li><strong>Transparentnost i usporedivost<\/strong>: cilj je i\u0107i prema javnom leaderboardu, kako bi rezultati bili provjerljivi i korisni za zajednicu.<\/li>\n\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Kako WP-Bench mjeri sposobnosti: Knowledge vs Execution<\/h2>\n\n\n\n<p>WP-Bench razdvaja evaluaciju u dvije dimenzije:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li><strong>Knowledge<\/strong>: pitanja na vi\u0161e ponu\u0111enih odgovora koja ciljaju WordPress koncepte, core API-je, hookove, sigurnosne obrasce i coding standards (s naglaskom na modernije dodatke poput Abilities API-ja i Interactivity API-ja).<\/li>\n\n\n<li><strong>Execution<\/strong>: zadaci generiranja koda koji se ocjenjuju tako da se kod zaista pokrene u WordPress runtimeu, uz stati\u010dku analizu i runtime asercije.<\/li>\n\n<\/ul>\n\n\n\n<p>Prakti\u010dno, to zna\u010di da \u201cmodel zna objasniti \u0161to je <code>wp_nonce_field()<\/code>\u201d nije dovoljno. Benchmark \u017eeli vidjeti mo\u017ee li model proizvesti kod koji pro\u0111e provjere i odradi zadatak u sandboxu bez lomljenja standarda i bez o\u010ditih sigurnosnih problema.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Kako izgleda ocjenjivanje (grading) u praksi<\/h2>\n\n\n\n<p>Jedan detalj koji WP-Bench \u010dini zanimljivim je to \u0161to WordPress sam glumi ocjenjiva\u010da: generirani kod se \u0161alje u WordPress runtime i tamo prolazi kroz analize i testne asercije.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n\n<li>Harness \u0161alje prompt modelu i tra\u017ei WordPress kod za odre\u0111eni zadatak.<\/li>\n\n\n<li>Generirani kod se proslje\u0111uje WordPress runtimeu kroz WP-CLI (CLI alat za upravljanje WordPressom).<\/li>\n\n\n<li>Runtime radi stati\u010dke provjere: sintaksu, coding standards i sigurnosne aspekte.<\/li>\n\n\n<li>Kod se izvr\u0161ava u sandboxiranom okru\u017eenju, uz test asercije koje provjeravaju o\u010dekivano pona\u0161anje.<\/li>\n\n\n<li>Rezultati se vra\u0107aju kao JSON sa scoreovima i detaljnim feedbackom.<\/li>\n\n<\/ol>\n\n\n\n<div class=\"wp-block-group callout callout-info is-style-info is-layout-flow wp-block-group-is-layout-flow\" style=\"border-width:1px;border-radius:8px;padding-top:1rem;padding-right:1.5rem;padding-bottom:1rem;padding-left:1.5rem\">\n\n<h4 class=\"wp-block-heading callout-title\">Za\u0161to je ovo bitno<\/h4>\n\n\n<p>Kod \u201cAI evaluacija\u201d \u010desto je problem \u0161to se ocjenjuje samo tekstualni output. Ovdje je naglasak na tome da se WordPress kod stvarno pokrene i provjeri, \u0161to je bli\u017ee realnom developmentu.<\/p>\n\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Brzi start: pokretanje WP-Bench-a lokalno<\/h2>\n\n\n\n<p>WP-Bench repozitorij je organiziran tako da ima Python dio (harness) i zaseban runtime dio (WordPress grader + konfiguracija). Minimalni put do prvog rezultata izgleda ovako.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Instalacija Python harnessa<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>python3 -m venv .venv &amp;&amp; source .venv\/bin\/activate\npip install -e .\/python\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#B392F0\">python3<\/span><span style=\"color:#79B8FF\"> -m<\/span><span style=\"color:#9ECBFF\"> venv<\/span><span style=\"color:#9ECBFF\"> .venv<\/span><span style=\"color:#E1E4E8\"> &#x26;&#x26; <\/span><span style=\"color:#79B8FF\">source<\/span><span style=\"color:#9ECBFF\"> .venv\/bin\/activate<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">pip<\/span><span style=\"color:#9ECBFF\"> install<\/span><span style=\"color:#79B8FF\"> -e<\/span><span style=\"color:#9ECBFF\"> .\/python<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">2) API klju\u010devi (model provideri)<\/h3>\n\n\n\n<p>WP-Bench se oslanja na API-je providera. Kreira\u0161 <code>.env<\/code> datoteku i upi\u0161e\u0161 klju\u010deve za one koje koristi\u0161.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>OPENAI_API_KEY=sk-...\nANTHROPIC_API_KEY=sk-ant-...\nGOOGLE_API_KEY=...\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#E1E4E8\">OPENAI_API_KEY<\/span><span style=\"color:#F97583\">=<\/span><span style=\"color:#9ECBFF\">sk-...<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">ANTHROPIC_API_KEY<\/span><span style=\"color:#F97583\">=<\/span><span style=\"color:#9ECBFF\">sk-ant-...<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">GOOGLE_API_KEY<\/span><span style=\"color:#F97583\">=<\/span><span style=\"color:#9ECBFF\">...<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">3) Pokretanje WordPress runtimea za grading<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>cd runtime\nnpm install\nnpm start\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#79B8FF\">cd<\/span><span style=\"color:#9ECBFF\"> runtime<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">npm<\/span><span style=\"color:#9ECBFF\"> install<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">npm<\/span><span style=\"color:#9ECBFF\"> start<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">4) Pokretanje benchmarka<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>cd ..\nwp-bench run --config wp-bench.example.yaml\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#79B8FF\">cd<\/span><span style=\"color:#9ECBFF\"> ..<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">wp-bench<\/span><span style=\"color:#9ECBFF\"> run<\/span><span style=\"color:#79B8FF\"> --config<\/span><span style=\"color:#9ECBFF\"> wp-bench.example.yaml<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<p>Rezultati se upisuju u <code>output\/results.json<\/code>, a per-test logovi idu u <code>output\/results.jsonl<\/code>. JSONL format je prakti\u010dan ako kasnije \u017eeli\u0161 filtrirati i analizirati rezultate po testovima.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Usporedba vi\u0161e modela u jednom runu<\/h2>\n\n\n\n<p>Ako radi\u0161 evaluaciju za tim ili \u017eeli\u0161 standardizirano usporediti modele koje ve\u0107 koristi\u0161, WP-Bench podr\u017eava multi-model konfiguraciju u jednom pokretanju.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>models:\n  - name: gpt-4o\n  - name: gpt-4o-mini\n  - name: claude-sonnet-4-20250514\n  - name: claude-opus-4-5-20251101\n  - name: gemini\/gemini-2.5-pro\n  - name: gemini\/gemini-2.5-flash\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#85E89D\">models<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gpt-4o<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gpt-4o-mini<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">claude-sonnet-4-20250514<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">claude-opus-4-5-20251101<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gemini\/gemini-2.5-pro<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gemini\/gemini-2.5-flash<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<p>Nazivi modela prate LiteLLM konvencije (prema dokumentaciji na <a href=\"https:\/\/docs.litellm.ai\/docs\/providers\">LiteLLM<\/a>), \u0161to olak\u0161ava rad s razli\u010ditim providerima bez ru\u010dnog \u201cspajanja\u201d svakog SDK-a.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Konfiguracija koja se stvarno koristi (dataset, grader, concurrency)<\/h2>\n\n\n\n<p>Osnovna konfiguracija se radi preko YAML-a. Tipi\u010dno kopira\u0161 <code>wp-bench.example.yaml<\/code> i prilagodi\u0161 dataset, modele i na\u010din ocjenjivanja.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>dataset:\n  source: local              # 'local' or 'huggingface'\n  name: wp-core-v1           # suite name\n\nmodels:\n  - name: gpt-4o\n\ngrader:\n  kind: docker\n  wp_env_dir: .\/runtime      # path to wp-env project\n\nrun:\n  suite: wp-core-v1\n  limit: 10                  # limit tests (null = all)\n  concurrency: 4\n\noutput:\n  path: output\/results.json\n  jsonl_path: output\/results.jsonl\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#85E89D\">dataset<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  source<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">local<\/span><span style=\"color:#6A737D\">              # 'local' or 'huggingface'<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">wp-core-v1<\/span><span style=\"color:#6A737D\">           # suite name<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">models<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gpt-4o<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">grader<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  kind<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">docker<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  wp_env_dir<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">.\/runtime<\/span><span style=\"color:#6A737D\">      # path to wp-env project<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">run<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  suite<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">wp-core-v1<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  limit<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#79B8FF\">10<\/span><span style=\"color:#6A737D\">                  # limit tests (null = all)<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  concurrency<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#79B8FF\">4<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">output<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  path<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">output\/results.json<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  jsonl_path<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">output\/results.jsonl<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<p>Dva prakti\u010dna parametra su <code>limit<\/code> (koristan za brze probe) i <code>concurrency<\/code> (koliko paralelno ide). Imaj na umu da paralelizacija zna\u010di i vi\u0161e istovremenih API poziva prema providerima, pa to mo\u017ee utjecati na tro\u0161ak i rate limit.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Korisne CLI opcije<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>wp-bench run --config wp-bench.yaml          # run s config datotekom\nwp-bench run --model-name gpt-4o --limit 5   # brzi single-model test\nwp-bench dry-run --config wp-bench.yaml      # validacija configa bez pozivanja modela\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#B392F0\">wp-bench<\/span><span style=\"color:#9ECBFF\"> run<\/span><span style=\"color:#79B8FF\"> --config<\/span><span style=\"color:#9ECBFF\"> wp-bench.yaml<\/span><span style=\"color:#6A737D\">          # run s config datotekom<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">wp-bench<\/span><span style=\"color:#9ECBFF\"> run<\/span><span style=\"color:#79B8FF\"> --model-name<\/span><span style=\"color:#9ECBFF\"> gpt-4o<\/span><span style=\"color:#79B8FF\"> --limit<\/span><span style=\"color:#79B8FF\"> 5<\/span><span style=\"color:#6A737D\">   # brzi single-model test<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">wp-bench<\/span><span style=\"color:#9ECBFF\"> dry-run<\/span><span style=\"color:#79B8FF\"> --config<\/span><span style=\"color:#9ECBFF\"> wp-bench.yaml<\/span><span style=\"color:#6A737D\">      # validacija configa bez pozivanja modela<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Struktura repozitorija: gdje su datasetovi i \u0161to se izvr\u0161ava<\/h2>\n\n\n\n<p>Repozitorij je poslo\u017een tako da jasno razdvaja \u201corchestrator\u201d (Python) i WordPress grading runtime.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>.\n\u251c\u2500\u2500 python\/          # Benchmark harness (pip installable)\n\u251c\u2500\u2500 runtime\/         # WordPress grader plugin + wp-env config\n\u251c\u2500\u2500 datasets\/        # Test suites (local JSON + Hugging Face builder)\n\u251c\u2500\u2500 notebooks\/       # Results visualization and reporting\n\u2514\u2500\u2500 output\/          # Benchmark results (gitignored)\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#79B8FF\">.<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u251c\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> python\/<\/span><span style=\"color:#6A737D\">          # Benchmark harness (pip installable)<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u251c\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> runtime\/<\/span><span style=\"color:#6A737D\">         # WordPress grader plugin + wp-env config<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u251c\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> datasets\/<\/span><span style=\"color:#6A737D\">        # Test suites (local JSON + Hugging Face builder)<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u251c\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> notebooks\/<\/span><span style=\"color:#6A737D\">       # Results visualization and reporting<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u2514\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> output\/<\/span><span style=\"color:#6A737D\">          # Benchmark results (gitignored)<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Test suiteovi: \u0161to se zapravo testira<\/h2>\n\n\n\n<p>Test suiteovi su u <code>datasets\/suites\/&lt;suite-name&gt;\/<\/code> i tipi\u010dno imaju dvije mape:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li><code>execution\/<\/code> \u2014 zadaci generiranja koda s asercijama (obi\u010dno jedan JSON po kategoriji).<\/li>\n\n\n<li><code>knowledge\/<\/code> \u2014 multiple-choice pitanja (tako\u0111er jedan JSON po kategoriji).<\/li>\n\n<\/ul>\n\n\n\n<p>Default suite <code>wp-core-v1<\/code> pokriva core API-je, hookove, rad s bazom, te sigurnosne obrasce. Ako \u017eeli\u0161 dataset povla\u010diti izvana, podr\u017ean je i Hugging Face kao izvor.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>dataset:\n  source: huggingface\n  name: WordPress\/wp-bench-v1\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#85E89D\">dataset<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  source<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">huggingface<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">WordPress\/wp-bench-v1<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Trenutno stanje i ograni\u010denja (realno, ovo je \u201cearly release\u201d)<\/h2>\n\n\n\n<p>U najavi je jasno nagla\u0161eno da je WP-Bench rani release i da ima nekoliko o\u010dekivanih ograni\u010denja:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li><strong>Veli\u010dina dataseta<\/strong>: trenuta\u010dni set testova je relativno mali; treba vi\u0161e slu\u010dajeva preko \u0161ireg spektra WordPress API-ja i obrazaca.<\/li>\n\n\n<li><strong>Pokrivenost verzija<\/strong>: fokus naginje prema novijim zna\u010dajkama (spominje se WordPress 6.9 i API-ji poput Abilities i Interactivity), \u0161to je djelomi\u010dno namjerno, ali uvodi bias jer ti dijelovi \u010desto nisu bili u trening podacima modela.<\/li>\n\n\n<li><strong>Saturacija benchmarka<\/strong>: modeli su u ranim testovima postizali visoke rezultate na starijim konceptima, pa takva pitanja manje razlikuju modele. Izazov je slo\u017eiti zadatke koji su stvarno te\u0161ki, a ne samo \u201cnovi\u201d.<\/li>\n\n<\/ul>\n\n\n\n<div class=\"wp-block-group callout callout-warning is-style-warning is-layout-flow wp-block-group-is-layout-flow\" style=\"border-width:1px;border-radius:8px;padding-top:1rem;padding-right:1.5rem;padding-bottom:1rem;padding-left:1.5rem\">\n\n<h4 class=\"wp-block-heading callout-title\">Kako \u010ditati rezultate<\/h4>\n\n\n<p>Ako model pada na novijim API-jima, to nije nu\u017eno \u201clo\u0161 model\u201d nego signal o svje\u017eini znanja i o tome kako se snalazi kad nema memorirani primjer. Za realne projekte to je \u010desto klju\u010dna razlika.<\/p>\n\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\">\u0160to WP-Bench potencijalno mijenja za WordPress developere<\/h2>\n\n\n\n<p>Ako WP-Bench postane standard (\u0161to je jedan od deklariranih ciljeva), AI provideri bi mogli imati WordPress performanse kao dio internog pre-release testiranja, a ne kao sporednu metriku. Za nas koji radimo u WordPress ekosustavu to zna\u010di dvije stvari: bolji modeli za specifi\u010dne WP zadatke i manje vremena potro\u0161enog na provjeravanje generiranog koda koji \u201cizgleda ok\u201d, ali ne prolazi u realnom runtimeu.<\/p>\n\n\n\n<p>Uz to, planirani open-source leaderboard bi mogao postati prakti\u010dan alat kad treba odlu\u010diti koji model koristiti za odre\u0111enu vrstu posla: npr. refactor legacy plugina vs. pisanje modernog interaktivnog bloka.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Resursi<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li>WP-Bench GitHub: https:\/\/github.com\/WordPress\/wp-bench<\/li>\n\n\n<li>AI Building Blocks for WordPress: https:\/\/make.wordpress.org\/ai\/2025\/07\/17\/ai-building-blocks\/<\/li>\n\n\n<li>#core-ai Slack channel: https:\/\/wordpress.slack.com\/archives\/C08TJ8BPULS<\/li>\n\n<\/ul>\n\n\n<div class=\"references-section\">\n                <h2>Reference \/ Izvori<\/h2>\n                <ul class=\"references-list\"><li><a href=\"https:\/\/make.wordpress.org\/ai\/2026\/01\/14\/introducing-wp-bench-a-wordpress-ai-benchmark\/\" target=\"_blank\" rel=\"noopener noreferrer\">Introducing WP-Bench: A WordPress AI Benchmark<\/a><\/li><li><a href=\"https:\/\/github.com\/WordPress\/wp-bench\" target=\"_blank\" rel=\"noopener noreferrer\">WP-Bench GitHub README<\/a><\/li><li><a href=\"https:\/\/make.wordpress.org\/ai\/2025\/07\/17\/ai-building-blocks\/\" target=\"_blank\" rel=\"noopener noreferrer\">AI Building Blocks for WordPress<\/a><\/li><li><a href=\"https:\/\/wordpress.slack.com\/archives\/C08TJ8BPULS\" target=\"_blank\" rel=\"noopener noreferrer\">#core-ai Slack channel<\/a><\/li><li><a href=\"https:\/\/docs.litellm.ai\/docs\/providers\" target=\"_blank\" rel=\"noopener noreferrer\">LiteLLM providers documentation<\/a><\/li><\/ul>\n            <\/div>","protected":false},"excerpt":{"rendered":"<p>Ako koristi\u0161 AI asistente za izradu plugina ili odr\u017eavanje custom rje\u0161enja, treba ti konkretan signal: koji model stvarno razumije WordPress API-je, hookove i sigurnosne obrasce. WP-Bench je poku\u0161aj WordPress zajednice da to izmjeri na standardiziran, ponovljiv na\u010din.<\/p>\n","protected":false},"author":43,"featured_media":151,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[31],"tags":[26,75,33,10,8],"class_list":["post-152","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-wordpress-ekosustav","tag-ai","tag-benchmark","tag-sigurnost","tag-wordpress","tag-wp-cli"],"_links":{"self":[{"href":"https:\/\/helloblog.io\/hr\/wp-json\/wp\/v2\/posts\/152","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/helloblog.io\/hr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/helloblog.io\/hr\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/helloblog.io\/hr\/wp-json\/wp\/v2\/users\/43"}],"replies":[{"embeddable":true,"href":"https:\/\/helloblog.io\/hr\/wp-json\/wp\/v2\/comments?post=152"}],"version-history":[{"count":0,"href":"https:\/\/helloblog.io\/hr\/wp-json\/wp\/v2\/posts\/152\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/helloblog.io\/hr\/wp-json\/wp\/v2\/media\/151"}],"wp:attachment":[{"href":"https:\/\/helloblog.io\/hr\/wp-json\/wp\/v2\/media?parent=152"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/helloblog.io\/hr\/wp-json\/wp\/v2\/categories?post=152"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/helloblog.io\/hr\/wp-json\/wp\/v2\/tags?post=152"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}