{"id":152,"date":"2026-01-20T00:00:00","date_gmt":"2026-01-19T23:00:00","guid":{"rendered":"https:\/\/helloblog.io\/el\/wp-bench-episimo-benchmark-wordpress-gia-llms\/"},"modified":"2026-01-20T00:00:00","modified_gmt":"2026-01-19T23:00:00","slug":"wp-bench-episimo-benchmark-wordpress-gia-llms","status":"publish","type":"post","link":"https:\/\/helloblog.io\/el\/wp-bench-episimo-benchmark-wordpress-gia-llms\/","title":{"rendered":"WP-Bench: \u03c4\u03bf \u03b5\u03c0\u03af\u03c3\u03b7\u03bc\u03bf benchmark \u03c4\u03bf\u03c5 WordPress \u03b3\u03b9\u03b1 \u03bd\u03b1 \u03bc\u03b5\u03c4\u03c1\u03ac\u03bc\u03b5 \u03c0\u03c1\u03b1\u03b3\u03bc\u03b1\u03c4\u03b9\u03ba\u03ac \u03c4\u03b1 LLMs"},"content":{"rendered":"\n<p>\u03a4\u03b1 language models (LLMs) \u03ad\u03c7\u03bf\u03c5\u03bd \u03b3\u03af\u03bd\u03b5\u03b9 \u03ba\u03bf\u03bc\u03bc\u03ac\u03c4\u03b9 \u03c4\u03b7\u03c2 \u03ba\u03b1\u03b8\u03b7\u03bc\u03b5\u03c1\u03b9\u03bd\u03cc\u03c4\u03b7\u03c4\u03ac\u03c2 \u03bc\u03b1\u03c2: \u03b1\u03c0\u03cc code completion \u03bc\u03ad\u03c7\u03c1\u03b9 \u03bf\u03bb\u03cc\u03ba\u03bb\u03b7\u03c1\u03b1 snippets \u03b3\u03b9\u03b1 plugins. \u03a4\u03bf \u03c0\u03c1\u03cc\u03b2\u03bb\u03b7\u03bc\u03b1 \u03b5\u03af\u03bd\u03b1\u03b9 \u03cc\u03c4\u03b9 \u03c4\u03b1 \u03c0\u03b5\u03c1\u03b9\u03c3\u03c3\u03cc\u03c4\u03b5\u03c1\u03b1 benchmarks \u03c4\u03b1 \u03b1\u03be\u03b9\u03bf\u03bb\u03bf\u03b3\u03bf\u03cd\u03bd \u03c3\u03b5 \u03b3\u03b5\u03bd\u03b9\u03ba\u03ac \u03c0\u03c1\u03bf\u03b3\u03c1\u03b1\u03bc\u03bc\u03b1\u03c4\u03b9\u03c3\u03c4\u03b9\u03ba\u03ac tasks, \u03cc\u03c7\u03b9 \u03c3\u03c4\u03b9\u03c2 \u03b9\u03b4\u03b9\u03b1\u03b9\u03c4\u03b5\u03c1\u03cc\u03c4\u03b7\u03c4\u03b5\u03c2 \u03c4\u03bf\u03c5 WordPress: hooks, coding standards, nonces, capabilities, WP_Query, database access patterns, plugin architecture.<\/p>\n\n\n\n<p>\u0393\u03b9\u2019 \u03b1\u03c5\u03c4\u03cc \u03c4\u03bf WordPress project \u03c0\u03b1\u03c1\u03bf\u03c5\u03c3\u03af\u03b1\u03c3\u03b5 \u03c4\u03bf <strong><a href=\"https:\/\/github.com\/WordPress\/wp-bench\">WP-Bench<\/a><\/strong>, \u03ad\u03bd\u03b1 \u03b5\u03c0\u03af\u03c3\u03b7\u03bc\u03bf WordPress AI benchmark \u03c0\u03bf\u03c5 \u03c3\u03c4\u03bf\u03c7\u03b5\u03cd\u03b5\u03b9 \u03bd\u03b1 \u03bc\u03b5\u03c4\u03c1\u03ac \u03bc\u03b5 \u03c0\u03b9\u03bf \u03bf\u03c5\u03c3\u03b9\u03b1\u03c3\u03c4\u03b9\u03ba\u03cc \u03c4\u03c1\u03cc\u03c0\u03bf \u03c4\u03bf \u03ba\u03b1\u03c4\u03ac \u03c0\u03cc\u03c3\u03bf \u03ad\u03bd\u03b1 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03bf <em>\u03ba\u03b1\u03c4\u03b1\u03bb\u03b1\u03b2\u03b1\u03af\u03bd\u03b5\u03b9<\/em> WordPress development \u03ba\u03b1\u03b9 \u03c4\u03bf \u03ba\u03b1\u03c4\u03ac \u03c0\u03cc\u03c3\u03bf \u03bc\u03c0\u03bf\u03c1\u03b5\u03af \u03bd\u03b1 <em>\u03c0\u03b1\u03c1\u03ac\u03b3\u03b5\u03b9 \u03ba\u03ce\u03b4\u03b9\u03ba\u03b1 \u03c0\u03bf\u03c5 \u03cc\u03bd\u03c4\u03c9\u03c2 \u03c4\u03c1\u03ad\u03c7\u03b5\u03b9<\/em> \u03c3\u03c9\u03c3\u03c4\u03ac \u03bc\u03ad\u03c3\u03b1 \u03c3\u03b5 WordPress runtime.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\u03a4\u03b9 \u03b1\u03ba\u03c1\u03b9\u03b2\u03ce\u03c2 \u03b5\u03af\u03bd\u03b1\u03b9 \u03c4\u03bf WP-Bench (\u03ba\u03b1\u03b9 \u03c4\u03b9 \u03bc\u03b5\u03c4\u03c1\u03ac\u03b5\u03b9)<\/h2>\n\n\n\n<p>\u03a4\u03bf WP-Bench \u03b1\u03be\u03b9\u03bf\u03bb\u03bf\u03b3\u03b5\u03af \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03c3\u03b5 \u03b4\u03cd\u03bf \u03ac\u03be\u03bf\u03bd\u03b5\u03c2:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li><strong>Knowledge<\/strong>: multiple-choice \u03b5\u03c1\u03c9\u03c4\u03ae\u03c3\u03b5\u03b9\u03c2 \u03c0\u03ac\u03bd\u03c9 \u03c3\u03b5 WordPress \u03ad\u03bd\u03bd\u03bf\u03b9\u03b5\u03c2, APIs, hooks, security patterns \u03ba\u03b1\u03b9 coding standards. \u03a5\u03c0\u03ac\u03c1\u03c7\u03b5\u03b9 \u03ad\u03bc\u03c6\u03b1\u03c3\u03b7 \u03ba\u03b1\u03b9 \u03c3\u03b5 \u03c0\u03b9\u03bf \u03c3\u03cd\u03b3\u03c7\u03c1\u03bf\u03bd\u03b5\u03c2 \u03c0\u03c1\u03bf\u03c3\u03b8\u03ae\u03ba\u03b5\u03c2 \u03cc\u03c0\u03c9\u03c2 \u03c4\u03bf <strong>Abilities API<\/strong> \u03ba\u03b1\u03b9 \u03c4\u03bf <strong>Interactivity API<\/strong> (\u03b4\u03b7\u03bb\u03b1\u03b4\u03ae APIs \u03c0\u03bf\u03c5 \u03ad\u03c7\u03bf\u03c5\u03bd \u03b5\u03bc\u03c6\u03b1\u03bd\u03b9\u03c3\u03c4\u03b5\u03af \u03c0\u03c1\u03cc\u03c3\u03c6\u03b1\u03c4\u03b1 \u03ba\u03b1\u03b9 \u03c3\u03c5\u03c7\u03bd\u03ac \u00ab\u03be\u03b5\u03c6\u03b5\u03cd\u03b3\u03bf\u03c5\u03bd\u00bb \u03b1\u03c0\u03cc \u03c4\u03b1 training \u03b4\u03b5\u03b4\u03bf\u03bc\u03ad\u03bd\u03b1 \u03c0\u03bf\u03bb\u03bb\u03ce\u03bd \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03c9\u03bd).<\/li>\n\n\n<li><strong>Execution<\/strong>: tasks \u03c0\u03b1\u03c1\u03b1\u03b3\u03c9\u03b3\u03ae\u03c2 \u03ba\u03ce\u03b4\u03b9\u03ba\u03b1 \u03c0\u03bf\u03c5 \u03b2\u03b1\u03b8\u03bc\u03bf\u03bb\u03bf\u03b3\u03bf\u03cd\u03bd\u03c4\u03b1\u03b9 \u03b1\u03c0\u03cc \u03c0\u03c1\u03b1\u03b3\u03bc\u03b1\u03c4\u03b9\u03ba\u03cc WordPress runtime. \u0394\u03b5\u03bd \u03b1\u03c1\u03ba\u03b5\u03af \u03bd\u03b1 \u03b3\u03c1\u03ac\u03c8\u03b5\u03b9 \u201c\u03ba\u03b1\u03bb\u03cc-looking\u201d PHP\u00b7 \u03bf \u03ba\u03ce\u03b4\u03b9\u03ba\u03b1\u03c2 \u03c0\u03b5\u03c1\u03bd\u03ac static analysis \u03ba\u03b1\u03b9 \u03bc\u03b5\u03c4\u03ac \u03b5\u03ba\u03c4\u03b5\u03bb\u03b5\u03af\u03c4\u03b1\u03b9 \u03c3\u03b5 sandbox \u03bc\u03b5 assertions.<\/li>\n\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">\u0393\u03b9\u03b1\u03c4\u03af \u03ad\u03c7\u03b5\u03b9 \u03c3\u03b7\u03bc\u03b1\u03c3\u03af\u03b1 \u03ad\u03bd\u03b1 WordPress-specific benchmark<\/h2>\n\n\n\n<p>\u03a3\u03c4\u03bf WordPress \u03bf\u03b9\u03ba\u03bf\u03c3\u03cd\u03c3\u03c4\u03b7\u03bc\u03b1, \u03c4\u03bf \u00ab\u03c3\u03c9\u03c3\u03c4\u03cc\u00bb \u03b4\u03b5\u03bd \u03b5\u03af\u03bd\u03b1\u03b9 \u03bc\u03cc\u03bd\u03bf \u03bd\u03b1 \u03b4\u03bf\u03c5\u03bb\u03b5\u03cd\u03b5\u03b9. \u0395\u03af\u03bd\u03b1\u03b9 \u03bd\u03b1 \u03b4\u03bf\u03c5\u03bb\u03b5\u03cd\u03b5\u03b9 <em>\u03bc\u03b5 \u03c4\u03c1\u03cc\u03c0\u03bf \u03c3\u03c5\u03bc\u03b2\u03b1\u03c4\u03cc \u03bc\u03b5 standards<\/em> \u03ba\u03b1\u03b9 <em>\u03b1\u03c3\u03c6\u03b1\u03bb\u03ae<\/em>.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li><strong>\u03a0\u03b9\u03bf \u03c3\u03c9\u03c3\u03c4\u03ad\u03c2 \u03b5\u03c0\u03b9\u03bb\u03bf\u03b3\u03ad\u03c2 \u03b5\u03c1\u03b3\u03b1\u03bb\u03b5\u03af\u03c9\u03bd<\/strong>: \u03b1\u03bd \u03c7\u03c4\u03af\u03b6\u03b5\u03b9\u03c2 AI-powered \u03bb\u03b5\u03b9\u03c4\u03bf\u03c5\u03c1\u03b3\u03af\u03b5\u03c2 (\u03c0.\u03c7. \u03bc\u03ad\u03c3\u03b1 \u03c3\u03b5 plugin) \u03ae \u03b1\u03c0\u03bb\u03ac \u03c7\u03c1\u03b7\u03c3\u03b9\u03bc\u03bf\u03c0\u03bf\u03b9\u03b5\u03af\u03c2 coding assistants, \u03ad\u03c7\u03b5\u03b9 \u03b1\u03be\u03af\u03b1 \u03bd\u03b1 \u03be\u03ad\u03c1\u03b5\u03b9\u03c2 \u03c0\u03bf\u03b9\u03b1 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03b1\u03c0\u03bf\u03b4\u03af\u03b4\u03bf\u03c5\u03bd \u03ba\u03b1\u03bb\u03cd\u03c4\u03b5\u03c1\u03b1 \u03c3\u03b5 WordPress-specific \u03b4\u03bf\u03c5\u03bb\u03b5\u03b9\u03ad\u03c2.<\/li>\n\n\n<li><strong>\u03a0\u03af\u03b5\u03c3\u03b7 \u03c0\u03c1\u03bf\u03c2 \u03c4\u03bf\u03c5\u03c2 providers<\/strong>: \u03bf \u03c3\u03c4\u03cc\u03c7\u03bf\u03c2 \u03b5\u03af\u03bd\u03b1\u03b9 \u03c4\u03bf WP-Bench \u03bd\u03b1 \u03b3\u03af\u03bd\u03b5\u03b9 standard \u03b1\u03be\u03b9\u03bf\u03bb\u03cc\u03b3\u03b7\u03c3\u03b7\u03c2 \u03ce\u03c3\u03c4\u03b5 labs\/provides \u03bd\u03b1 \u03c4\u03bf \u03c4\u03c1\u03ad\u03c7\u03bf\u03c5\u03bd \u03c0\u03c1\u03b9\u03bd \u03b1\u03c0\u03cc releases. \u0388\u03c4\u03c3\u03b9 \u03b7 \u03b1\u03c0\u03cc\u03b4\u03bf\u03c3\u03b7 \u03c3\u03c4\u03bf WordPress \u03b4\u03b5\u03bd \u03b8\u03b1 \u03b5\u03af\u03bd\u03b1\u03b9 \u00ab\u03b4\u03b5\u03c5\u03c4\u03b5\u03c1\u03b5\u03cd\u03bf\u03bd\u00bb, \u03b1\u03bb\u03bb\u03ac \u03bc\u03b5\u03c4\u03c1\u03ae\u03c3\u03b9\u03bc\u03bf KPI.<\/li>\n\n\n<li><strong>\u0394\u03b9\u03b1\u03c6\u03ac\u03bd\u03b5\u03b9\u03b1 \u03bc\u03b5 leaderboard<\/strong>: \u03b7 \u03bf\u03bc\u03ac\u03b4\u03b1 \u03b4\u03bf\u03c5\u03bb\u03b5\u03cd\u03b5\u03b9 \u03c0\u03c1\u03bf\u03c2 \u03ad\u03bd\u03b1 open source, \u03b4\u03b7\u03bc\u03cc\u03c3\u03b9\u03bf leaderboard \u03ce\u03c3\u03c4\u03b5 \u03bd\u03b1 \u03c5\u03c0\u03ac\u03c1\u03c7\u03b5\u03b9 \u03bf\u03c1\u03b1\u03c4\u03cc\u03c4\u03b7\u03c4\u03b1 \u03c3\u03c4\u03b1 \u03b1\u03c0\u03bf\u03c4\u03b5\u03bb\u03ad\u03c3\u03bc\u03b1\u03c4\u03b1 \u03ba\u03b1\u03b9 \u03c0\u03c1\u03b1\u03ba\u03c4\u03b9\u03ba\u03ae \u03c3\u03cd\u03b3\u03ba\u03c1\u03b9\u03c3\u03b7 \u03b1\u03bd\u03ac \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03bf.<\/li>\n\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">\u03a0\u03ce\u03c2 \u03b3\u03af\u03bd\u03b5\u03c4\u03b1\u03b9 \u03b7 \u03b2\u03b1\u03b8\u03bc\u03bf\u03bb\u03cc\u03b3\u03b7\u03c3\u03b7: WordPress \u03c9\u03c2 \u201cgrader\u201d<\/h2>\n\n\n\n<p>\u03a4\u03bf \u03c0\u03b9\u03bf \u03b5\u03bd\u03b4\u03b9\u03b1\u03c6\u03ad\u03c1\u03bf\u03bd \u03c3\u03c4\u03bf\u03b9\u03c7\u03b5\u03af\u03bf \u03c4\u03bf\u03c5 WP-Bench \u03b5\u03af\u03bd\u03b1\u03b9 \u03cc\u03c4\u03b9 \u03b4\u03b5\u03bd \u03ba\u03ac\u03bd\u03b5\u03b9 grading \u03b8\u03b5\u03c9\u03c1\u03b7\u03c4\u03b9\u03ba\u03ac \u03ae \u03bc\u03b5 heuristics. \u03a7\u03c1\u03b7\u03c3\u03b9\u03bc\u03bf\u03c0\u03bf\u03b9\u03b5\u03af \u03c4\u03bf \u03af\u03b4\u03b9\u03bf \u03c4\u03bf WordPress \u03b3\u03b9\u03b1 \u03bd\u03b1 \u03ba\u03c1\u03af\u03bd\u03b5\u03b9 \u03c4\u03b7\u03bd \u03b5\u03ba\u03c4\u03ad\u03bb\u03b5\u03c3\u03b7.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\u0397 \u03c1\u03bf\u03ae \u03b1\u03be\u03b9\u03bf\u03bb\u03cc\u03b3\u03b7\u03c3\u03b7\u03c2 (high level)<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n\n<li>\u03a4\u03bf harness \u03c3\u03c4\u03ad\u03bb\u03bd\u03b5\u03b9 prompt \u03c3\u03c4\u03bf \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03bf \u03b6\u03b7\u03c4\u03ce\u03bd\u03c4\u03b1\u03c2 WordPress \u03ba\u03ce\u03b4\u03b9\u03ba\u03b1 \u03b3\u03b9\u03b1 \u03c3\u03c5\u03b3\u03ba\u03b5\u03ba\u03c1\u03b9\u03bc\u03ad\u03bd\u03bf task.<\/li>\n\n\n<li>\u039f \u03c0\u03b1\u03c1\u03b1\u03b3\u03cc\u03bc\u03b5\u03bd\u03bf\u03c2 \u03ba\u03ce\u03b4\u03b9\u03ba\u03b1\u03c2 \u03c0\u03c1\u03bf\u03c9\u03b8\u03b5\u03af\u03c4\u03b1\u03b9 \u03c3\u03c4\u03bf WordPress runtime \u03bc\u03ad\u03c3\u03c9 <strong>WP-CLI<\/strong> (command-line interface \u03b3\u03b9\u03b1 WordPress \u03b5\u03c1\u03b3\u03b1\u03c3\u03af\u03b5\u03c2).<\/li>\n\n\n<li>\u03a4\u03bf runtime \u03ba\u03ac\u03bd\u03b5\u03b9 static checks (syntax, coding standards, security).<\/li>\n\n\n<li>\u039f \u03ba\u03ce\u03b4\u03b9\u03ba\u03b1\u03c2 \u03b5\u03ba\u03c4\u03b5\u03bb\u03b5\u03af\u03c4\u03b1\u03b9 \u03c3\u03b5 sandbox \u03c0\u03b5\u03c1\u03b9\u03b2\u03ac\u03bb\u03bb\u03bf\u03bd \u03bc\u03b5 test assertions.<\/li>\n\n\n<li>\u03a4\u03b1 \u03b1\u03c0\u03bf\u03c4\u03b5\u03bb\u03ad\u03c3\u03bc\u03b1\u03c4\u03b1 \u03b5\u03c0\u03b9\u03c3\u03c4\u03c1\u03ad\u03c6\u03bf\u03c5\u03bd \u03c9\u03c2 JSON \u03bc\u03b5 \u03c3\u03ba\u03bf\u03c1 \u03ba\u03b1\u03b9 \u03b1\u03bd\u03b1\u03bb\u03c5\u03c4\u03b9\u03ba\u03cc feedback.<\/li>\n\n<\/ol>\n\n\n\n<div class=\"wp-block-group callout callout-info is-style-info is-layout-flow wp-block-group-is-layout-flow\" style=\"border-width:1px;border-radius:8px;padding-top:1rem;padding-right:1.5rem;padding-bottom:1rem;padding-left:1.5rem\">\n\n<h4 class=\"wp-block-heading callout-title\">\u0393\u03b9\u03b1\u03c4\u03af \u03ad\u03c7\u03b5\u03b9 \u03b1\u03be\u03af\u03b1 \u03b1\u03c5\u03c4\u03cc \u03c4\u03bf \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03bf grading<\/h4>\n\n\n<p>\u03a3\u03c4\u03b7\u03bd \u03c0\u03c1\u03ac\u03be\u03b7, \u03c0\u03bf\u03bb\u03bb\u03ac \u00abAI \u03c3\u03c9\u03c3\u03c4\u03ac\u00bb snippets \u03b1\u03c0\u03bf\u03c4\u03c5\u03b3\u03c7\u03ac\u03bd\u03bf\u03c5\u03bd \u03c3\u03b5 \u03bb\u03b5\u03c0\u03c4\u03bf\u03bc\u03ad\u03c1\u03b5\u03b9\u03b5\u03c2: \u03bb\u03ac\u03b8\u03bf\u03c2 hook, \u03bb\u03ac\u03b8\u03bf\u03c2 escaping, \u03bc\u03b7 \u03c3\u03c9\u03c3\u03c4\u03cc nonce check, \u03c7\u03c1\u03ae\u03c3\u03b7 API \u03c0\u03bf\u03c5 \u03b4\u03b5\u03bd \u03c5\u03c0\u03ac\u03c1\u03c7\u03b5\u03b9, \u03ae \u03ba\u03ce\u03b4\u03b9\u03ba\u03b1\u03c2 \u03c0\u03bf\u03c5 \u03b4\u03b5\u03bd \u03c0\u03b5\u03c1\u03bd\u03ac standards. \u03a4\u03bf WP-Bench \u03b2\u03ac\u03b6\u03b5\u03b9 \u03cc\u03bb\u03b1 \u03b1\u03c5\u03c4\u03ac \u03bc\u03ad\u03c3\u03b1 \u03c3\u03c4\u03b7 \u03bc\u03ad\u03c4\u03c1\u03b7\u03c3\u03b7, \u03b3\u03b9\u03b1\u03c4\u03af \u03c4\u03c1\u03ad\u03c7\u03b5\u03b9 \u03c3\u03b5 \u03c0\u03c1\u03b1\u03b3\u03bc\u03b1\u03c4\u03b9\u03ba\u03cc WordPress \u03c0\u03b5\u03c1\u03b9\u03b2\u03ac\u03bb\u03bb\u03bf\u03bd.<\/p>\n\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Quick start: \u03c4\u03c1\u03ad\u03be\u03b5 \u03c4\u03bf WP-Bench \u03c4\u03bf\u03c0\u03b9\u03ba\u03ac<\/h2>\n\n\n\n<p>\u0391\u03bd \u03b8\u03ad\u03bb\u03b5\u03b9\u03c2 \u03bd\u03b1 \u03b4\u03bf\u03ba\u03b9\u03bc\u03ac\u03c3\u03b5\u03b9\u03c2 \u03b3\u03c1\u03ae\u03b3\u03bf\u03c1\u03b1 \u03c0\u03ce\u03c2 \u03c3\u03c4\u03ad\u03ba\u03bf\u03bd\u03c4\u03b1\u03b9 \u03c4\u03b1 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03c0\u03bf\u03c5 \u03c7\u03c1\u03b7\u03c3\u03b9\u03bc\u03bf\u03c0\u03bf\u03b9\u03b5\u03af\u03c2, \u03c4\u03bf repo \u03b4\u03af\u03bd\u03b5\u03b9 \u03bc\u03b9\u03b1 \u03c3\u03c7\u03b5\u03c4\u03b9\u03ba\u03ac straight \u03b4\u03b9\u03b1\u03b4\u03b9\u03ba\u03b1\u03c3\u03af\u03b1: Python harness + WordPress runtime (grader).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) \u0395\u03b3\u03ba\u03b1\u03c4\u03ac\u03c3\u03c4\u03b1\u03c3\u03b7 \u03c4\u03bf\u03c5 harness (Python)<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>python3 -m venv .venv &amp;&amp; source .venv\/bin\/activate\npip install -e .\/python\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#B392F0\">python3<\/span><span style=\"color:#79B8FF\"> -m<\/span><span style=\"color:#9ECBFF\"> venv<\/span><span style=\"color:#9ECBFF\"> .venv<\/span><span style=\"color:#E1E4E8\"> &#x26;&#x26; <\/span><span style=\"color:#79B8FF\">source<\/span><span style=\"color:#9ECBFF\"> .venv\/bin\/activate<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">pip<\/span><span style=\"color:#9ECBFF\"> install<\/span><span style=\"color:#79B8FF\"> -e<\/span><span style=\"color:#9ECBFF\"> .\/python<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">2) \u03a1\u03cd\u03b8\u03bc\u03b9\u03c3\u03b7 API keys (providers)<\/h3>\n\n\n\n<p>\u0394\u03b7\u03bc\u03b9\u03bf\u03cd\u03c1\u03b3\u03b7\u03c3\u03b5 \u03ad\u03bd\u03b1 <code>.env<\/code> \u03bc\u03b5 \u03c4\u03b1 keys \u03c4\u03c9\u03bd providers \u03c0\u03bf\u03c5 \u03b8\u03b1 \u03c7\u03c1\u03b7\u03c3\u03b9\u03bc\u03bf\u03c0\u03bf\u03b9\u03ae\u03c3\u03b5\u03b9\u03c2:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>OPENAI_API_KEY=sk-...\nANTHROPIC_API_KEY=sk-ant-...\nGOOGLE_API_KEY=...\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#E1E4E8\">OPENAI_API_KEY<\/span><span style=\"color:#F97583\">=<\/span><span style=\"color:#9ECBFF\">sk-...<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">ANTHROPIC_API_KEY<\/span><span style=\"color:#F97583\">=<\/span><span style=\"color:#9ECBFF\">sk-ant-...<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">GOOGLE_API_KEY<\/span><span style=\"color:#F97583\">=<\/span><span style=\"color:#9ECBFF\">...<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">3) \u0395\u03ba\u03ba\u03af\u03bd\u03b7\u03c3\u03b7 \u03c4\u03bf\u03c5 WordPress runtime (grader)<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>cd runtime\nnpm install\nnpm start\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#79B8FF\">cd<\/span><span style=\"color:#9ECBFF\"> runtime<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">npm<\/span><span style=\"color:#9ECBFF\"> install<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">npm<\/span><span style=\"color:#9ECBFF\"> start<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">4) \u0395\u03ba\u03c4\u03ad\u03bb\u03b5\u03c3\u03b7 benchmark run<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>cd ..\nwp-bench run --config wp-bench.example.yaml\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#79B8FF\">cd<\/span><span style=\"color:#9ECBFF\"> ..<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">wp-bench<\/span><span style=\"color:#9ECBFF\"> run<\/span><span style=\"color:#79B8FF\"> --config<\/span><span style=\"color:#9ECBFF\"> wp-bench.example.yaml<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<p>\u03a4\u03b1 \u03b1\u03c0\u03bf\u03c4\u03b5\u03bb\u03ad\u03c3\u03bc\u03b1\u03c4\u03b1 \u03b3\u03c1\u03ac\u03c6\u03bf\u03bd\u03c4\u03b1\u03b9 \u03c3\u03c4\u03bf <code>output\/results.json<\/code>, \u03b5\u03bd\u03ce \u03c4\u03b1 per-test logs \u03b2\u03c1\u03af\u03c3\u03ba\u03bf\u03bd\u03c4\u03b1\u03b9 \u03c3\u03c4\u03bf <code>output\/results.jsonl<\/code>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\u03a3\u03cd\u03b3\u03ba\u03c1\u03b9\u03c3\u03b7 \u03c0\u03bf\u03bb\u03bb\u03ce\u03bd \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03c9\u03bd \u03c3\u03b5 \u03ad\u03bd\u03b1 run<\/h2>\n\n\n\n<p>\u039c\u03c0\u03bf\u03c1\u03b5\u03af\u03c2 \u03bd\u03b1 \u03c4\u03c1\u03ad\u03be\u03b5\u03b9\u03c2 \u03c0\u03bf\u03bb\u03bb\u03ac \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03c3\u03b5\u03b9\u03c1\u03b9\u03b1\u03ba\u03ac \u03c3\u03c4\u03bf \u03af\u03b4\u03b9\u03bf run (\u03ce\u03c3\u03c4\u03b5 \u03bd\u03b1 \u03c0\u03ac\u03c1\u03b5\u03b9\u03c2 \u03c3\u03c5\u03b3\u03ba\u03c1\u03b9\u03c4\u03b9\u03ba\u03cc \u03c0\u03af\u03bd\u03b1\u03ba\u03b1) \u03b4\u03b7\u03bb\u03ce\u03bd\u03bf\u03bd\u03c4\u03ac\u03c2 \u03c4\u03b1 \u03c3\u03c4\u03bf config. \u03a4\u03b1 \u03bf\u03bd\u03cc\u03bc\u03b1\u03c4\u03b1 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03c9\u03bd \u03b1\u03ba\u03bf\u03bb\u03bf\u03c5\u03b8\u03bf\u03cd\u03bd conventions \u03c4\u03bf\u03c5 LiteLLM.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>models:\n  - name: gpt-4o\n  - name: gpt-4o-mini\n  - name: claude-sonnet-4-20250514\n  - name: claude-opus-4-5-20251101\n  - name: gemini\/gemini-2.5-pro\n  - name: gemini\/gemini-2.5-flash\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#85E89D\">models<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gpt-4o<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gpt-4o-mini<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">claude-sonnet-4-20250514<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">claude-opus-4-5-20251101<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gemini\/gemini-2.5-pro<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gemini\/gemini-2.5-flash<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<p>\u03a4\u03b5\u03ba\u03bc\u03b7\u03c1\u03af\u03c9\u03c3\u03b7 \u03b3\u03b9\u03b1 \u03c4\u03b1 provider\/model naming conventions: <a href=\"https:\/\/docs.litellm.ai\/docs\/providers\">LiteLLM conventions<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\u03a0\u03b1\u03c1\u03ac\u03b4\u03b5\u03b9\u03b3\u03bc\u03b1 configuration \u03c0\u03bf\u03c5 \u03b1\u03be\u03af\u03b6\u03b5\u03b9 \u03bd\u03b1 \u03ba\u03c1\u03b1\u03c4\u03ae\u03c3\u03b5\u03b9\u03c2 \u03c9\u03c2 \u03b2\u03ac\u03c3\u03b7<\/h2>\n\n\n\n<p>\u03a4\u03bf <code>wp-bench.example.yaml<\/code> \u03b5\u03af\u03bd\u03b1\u03b9 \u03ba\u03b1\u03bb\u03cc \u03c3\u03b7\u03bc\u03b5\u03af\u03bf \u03b5\u03ba\u03ba\u03af\u03bd\u03b7\u03c3\u03b7\u03c2. \u03a3\u03b5 \u03b3\u03b5\u03bd\u03b9\u03ba\u03ad\u03c2 \u03b3\u03c1\u03b1\u03bc\u03bc\u03ad\u03c2, \u03c1\u03c5\u03b8\u03bc\u03af\u03b6\u03b5\u03b9\u03c2 dataset source, \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1, grader \u03ba\u03b1\u03b9 run parameters (limit\/concurrency), \u03ba\u03b1\u03b8\u03ce\u03c2 \u03ba\u03b1\u03b9 \u03c4\u03b1 output paths.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>dataset:\n  source: local              # 'local' or 'huggingface'\n  name: wp-core-v1           # suite name\n\nmodels:\n  - name: gpt-4o\n\ngrader:\n  kind: docker\n  wp_env_dir: .\/runtime      # path to wp-env project\n\nrun:\n  suite: wp-core-v1\n  limit: 10                  # limit tests (null = all)\n  concurrency: 4\n\noutput:\n  path: output\/results.json\n  jsonl_path: output\/results.jsonl\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#85E89D\">dataset<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  source<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">local<\/span><span style=\"color:#6A737D\">              # 'local' or 'huggingface'<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">wp-core-v1<\/span><span style=\"color:#6A737D\">           # suite name<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">models<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#E1E4E8\">  - <\/span><span style=\"color:#85E89D\">name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">gpt-4o<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">grader<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  kind<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">docker<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  wp_env_dir<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">.\/runtime<\/span><span style=\"color:#6A737D\">      # path to wp-env project<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">run<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  suite<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">wp-core-v1<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  limit<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#79B8FF\">10<\/span><span style=\"color:#6A737D\">                  # limit tests (null = all)<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  concurrency<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#79B8FF\">4<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">output<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  path<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">output\/results.json<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  jsonl_path<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">output\/results.jsonl<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">\u03a7\u03c1\u03ae\u03c3\u03b9\u03bc\u03b5\u03c2 CLI \u03b5\u03c0\u03b9\u03bb\u03bf\u03b3\u03ad\u03c2<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>wp-bench run --config wp-bench.yaml          # run with config file\nwp-bench run --model-name gpt-4o --limit 5   # quick single-model test\nwp-bench dry-run --config wp-bench.yaml      # validate config without calling models\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#B392F0\">wp-bench<\/span><span style=\"color:#9ECBFF\"> run<\/span><span style=\"color:#79B8FF\"> --config<\/span><span style=\"color:#9ECBFF\"> wp-bench.yaml<\/span><span style=\"color:#6A737D\">          # run with config file<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">wp-bench<\/span><span style=\"color:#9ECBFF\"> run<\/span><span style=\"color:#79B8FF\"> --model-name<\/span><span style=\"color:#9ECBFF\"> gpt-4o<\/span><span style=\"color:#79B8FF\"> --limit<\/span><span style=\"color:#79B8FF\"> 5<\/span><span style=\"color:#6A737D\">   # quick single-model test<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">wp-bench<\/span><span style=\"color:#9ECBFF\"> dry-run<\/span><span style=\"color:#79B8FF\"> --config<\/span><span style=\"color:#9ECBFF\"> wp-bench.yaml<\/span><span style=\"color:#6A737D\">      # validate config without calling models<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">\u0394\u03bf\u03bc\u03ae repository \u03ba\u03b1\u03b9 \u03c0\u03bf\u03cd \u03b2\u03c1\u03af\u03c3\u03ba\u03bf\u03bd\u03c4\u03b1\u03b9 \u03c4\u03b1 test suites<\/h2>\n\n\n\n<p>\u03a4\u03bf repo \u03b5\u03af\u03bd\u03b1\u03b9 \u03bf\u03c1\u03b3\u03b1\u03bd\u03c9\u03bc\u03ad\u03bd\u03bf \u03bc\u03b5 \u03ba\u03b1\u03b8\u03b1\u03c1\u03cc \u03b4\u03b9\u03b1\u03c7\u03c9\u03c1\u03b9\u03c3\u03bc\u03cc \u03bc\u03b5\u03c4\u03b1\u03be\u03cd \u03c4\u03bf\u03c5 harness, \u03c4\u03bf\u03c5 runtime (grader) \u03ba\u03b1\u03b9 \u03c4\u03c9\u03bd datasets:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>.\n\u251c\u2500\u2500 python\/          # Benchmark harness (pip installable)\n\u251c\u2500\u2500 runtime\/         # WordPress grader plugin + wp-env config\n\u251c\u2500\u2500 datasets\/        # Test suites (local JSON + Hugging Face builder)\n\u251c\u2500\u2500 notebooks\/       # Results visualization and reporting\n\u2514\u2500\u2500 output\/          # Benchmark results (gitignored)\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#79B8FF\">.<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u251c\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> python\/<\/span><span style=\"color:#6A737D\">          # Benchmark harness (pip installable)<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u251c\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> runtime\/<\/span><span style=\"color:#6A737D\">         # WordPress grader plugin + wp-env config<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u251c\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> datasets\/<\/span><span style=\"color:#6A737D\">        # Test suites (local JSON + Hugging Face builder)<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u251c\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> notebooks\/<\/span><span style=\"color:#6A737D\">       # Results visualization and reporting<\/span><\/span>\n<span class=\"line\"><span style=\"color:#B392F0\">\u2514\u2500\u2500<\/span><span style=\"color:#9ECBFF\"> output\/<\/span><span style=\"color:#6A737D\">          # Benchmark results (gitignored)<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<p>\u03a4\u03b1 test suites \u03b2\u03c1\u03af\u03c3\u03ba\u03bf\u03bd\u03c4\u03b1\u03b9 \u03c3\u03c4\u03bf <code>datasets\/suites\/&lt;suite-name&gt;\/<\/code> \u03ba\u03b1\u03b9 \u03c7\u03c9\u03c1\u03af\u03b6\u03bf\u03bd\u03c4\u03b1\u03b9 \u03c3\u03b5:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li><code>execution\/<\/code>: code generation tasks \u03bc\u03b5 assertions (\u03ad\u03bd\u03b1 JSON \u03b1\u03bd\u03ac \u03ba\u03b1\u03c4\u03b7\u03b3\u03bf\u03c1\u03af\u03b1).<\/li>\n\n\n<li><code>knowledge\/<\/code>: multiple-choice \u03b5\u03c1\u03c9\u03c4\u03ae\u03c3\u03b5\u03b9\u03c2 (\u03ad\u03bd\u03b1 JSON \u03b1\u03bd\u03ac \u03ba\u03b1\u03c4\u03b7\u03b3\u03bf\u03c1\u03af\u03b1).<\/li>\n\n<\/ul>\n\n\n\n<p>\u03a4\u03bf default suite, <strong><code>wp-core-v1<\/code><\/strong>, \u03ba\u03b1\u03bb\u03cd\u03c0\u03c4\u03b5\u03b9 WordPress core APIs, hooks, database operations \u03ba\u03b1\u03b9 security patterns.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Datasets \u03b1\u03c0\u03cc Hugging Face<\/h3>\n\n\n\n<p>\u0391\u03bd\u03c4\u03af \u03b3\u03b9\u03b1 local suite, \u03bc\u03c0\u03bf\u03c1\u03b5\u03af\u03c2 \u03bd\u03b1 \u03c6\u03bf\u03c1\u03c4\u03ce\u03c3\u03b5\u03b9\u03c2 dataset \u03b1\u03c0\u03cc Hugging Face \u03c9\u03c2 \u03b5\u03be\u03ae\u03c2:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#24292e\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#e1e4e8;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>dataset:\n  source: huggingface\n  name: WordPress\/wp-bench-v1\n\n<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-dark\" style=\"background-color:#24292e;color:#e1e4e8\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color:#85E89D\">dataset<\/span><span style=\"color:#E1E4E8\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  source<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">huggingface<\/span><\/span>\n<span class=\"line\"><span style=\"color:#85E89D\">  name<\/span><span style=\"color:#E1E4E8\">: <\/span><span style=\"color:#9ECBFF\">WordPress\/wp-bench-v1<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">\u03a4\u03c1\u03ad\u03c7\u03bf\u03c5\u03c3\u03b1 \u03ba\u03b1\u03c4\u03ac\u03c3\u03c4\u03b1\u03c3\u03b7 \u03ba\u03b1\u03b9 \u03b3\u03bd\u03c9\u03c3\u03c4\u03bf\u03af \u03c0\u03b5\u03c1\u03b9\u03bf\u03c1\u03b9\u03c3\u03bc\u03bf\u03af<\/h2>\n\n\n\n<p>\u03a4\u03bf WP-Bench \u03b5\u03af\u03bd\u03b1\u03b9 \u03c3\u03b5 early release \u03ba\u03b1\u03b9 \u03c4\u03bf \u03b1\u03bd\u03b1\u03c6\u03ad\u03c1\u03bf\u03c5\u03bd \u03be\u03b5\u03ba\u03ac\u03b8\u03b1\u03c1\u03b1. \u03a5\u03c0\u03ac\u03c1\u03c7\u03bf\u03c5\u03bd \u03bc\u03b5\u03c1\u03b9\u03ba\u03ac \u03c3\u03b7\u03bc\u03b5\u03af\u03b1 \u03c0\u03bf\u03c5 \u03b5\u03c0\u03b7\u03c1\u03b5\u03ac\u03b6\u03bf\u03c5\u03bd \u03c4\u03bf \u03c0\u03ce\u03c2 \u03b4\u03b9\u03b1\u03b2\u03ac\u03b6\u03b5\u03b9\u03c2 \u03c4\u03b1 scores:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li><strong>\u039c\u03b9\u03ba\u03c1\u03cc dataset (\u03c0\u03c1\u03bf\u03c2 \u03c4\u03bf \u03c0\u03b1\u03c1\u03cc\u03bd)<\/strong>: \u03c4\u03bf \u03c5\u03c0\u03ac\u03c1\u03c7\u03bf\u03bd suite \u03b4\u03b5\u03bd \u03b5\u03af\u03bd\u03b1\u03b9 \u03b1\u03ba\u03cc\u03bc\u03b1 \u00ab\u03c4\u03b5\u03c1\u03ac\u03c3\u03c4\u03b9\u03bf\u00bb, \u03bf\u03c0\u03cc\u03c4\u03b5 \u03c7\u03c1\u03b5\u03b9\u03ac\u03b6\u03b5\u03c4\u03b1\u03b9 \u03b5\u03c0\u03ad\u03ba\u03c4\u03b1\u03c3\u03b7 \u03b3\u03b9\u03b1 \u03c0\u03b9\u03bf \u03bf\u03bb\u03bf\u03ba\u03bb\u03b7\u03c1\u03c9\u03bc\u03ad\u03bd\u03bf \u03c3\u03ae\u03bc\u03b1 \u03c3\u03b5 \u03c0\u03b5\u03c1\u03b9\u03c3\u03c3\u03cc\u03c4\u03b5\u03c1\u03b1 APIs \u03ba\u03b1\u03b9 patterns.<\/li>\n\n\n<li><strong>Bias \u03c0\u03c1\u03bf\u03c2 \u03bd\u03b5\u03cc\u03c4\u03b5\u03c1\u03b1 features<\/strong>: \u03c5\u03c0\u03ac\u03c1\u03c7\u03b5\u03b9 \u03ba\u03bb\u03af\u03c3\u03b7 \u03c0\u03c1\u03bf\u03c2 \u03bb\u03b5\u03b9\u03c4\u03bf\u03c5\u03c1\u03b3\u03af\u03b5\u03c2 \u03c0\u03bf\u03c5 \u03b1\u03bd\u03c4\u03b9\u03c3\u03c4\u03bf\u03b9\u03c7\u03bf\u03cd\u03bd \u03c3\u03b5 WordPress 6.9 (\u03c0.\u03c7. Abilities API, Interactivity API). \u0391\u03c5\u03c4\u03cc \u03b5\u03af\u03bd\u03b1\u03b9 \u03b5\u03bd \u03bc\u03ad\u03c1\u03b5\u03b9 \u03c3\u03ba\u03cc\u03c0\u03b9\u03bc\u03bf (\u03b5\u03ba\u03b5\u03af \u03b4\u03c5\u03c3\u03ba\u03bf\u03bb\u03b5\u03cd\u03bf\u03bd\u03c4\u03b1\u03b9 \u03c4\u03b1 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1), \u03b1\u03bb\u03bb\u03ac \u03bc\u03c0\u03bf\u03c1\u03b5\u03af \u03bd\u03b1 \u03b1\u03b4\u03b9\u03ba\u03b5\u03af \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03c0\u03bf\u03c5 \u03b4\u03b5\u03bd \u03ad\u03c7\u03bf\u03c5\u03bd \u03b4\u03b5\u03b9 \u03b1\u03c5\u03c4\u03ac \u03c4\u03b1 APIs \u03c3\u03b5 training data.<\/li>\n\n\n<li><strong>Benchmark saturation \u03c3\u03b5 \u03c0\u03b1\u03bb\u03b9\u03cc\u03c4\u03b5\u03c1\u03b5\u03c2 \u03ad\u03bd\u03bd\u03bf\u03b9\u03b5\u03c2<\/strong>: \u03c3\u03b5 \u03c0\u03c1\u03ce\u03b9\u03bc\u03b5\u03c2 \u03b4\u03bf\u03ba\u03b9\u03bc\u03ad\u03c2 \u03ba\u03ac\u03c0\u03bf\u03b9\u03b1 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03ad\u03c0\u03b9\u03b1\u03bd\u03b1\u03bd \u03c0\u03bf\u03bb\u03cd \u03c5\u03c8\u03b7\u03bb\u03ac \u03c3\u03b5 \u03c0\u03b1\u03bb\u03b1\u03b9\u03cc\u03c4\u03b5\u03c1\u03b1 WordPress concepts, \u03ac\u03c1\u03b1 \u03b1\u03c5\u03c4\u03ac \u03c4\u03b1 tests \u03b4\u03b5\u03bd \u03be\u03b5\u03c7\u03c9\u03c1\u03af\u03b6\u03bf\u03c5\u03bd \u03c0\u03bb\u03ad\u03bf\u03bd \u03b5\u03c0\u03b1\u03c1\u03ba\u03ce\u03c2. \u03a4\u03bf \u03b4\u03cd\u03c3\u03ba\u03bf\u03bb\u03bf \u03b5\u03af\u03bd\u03b1\u03b9 \u03bd\u03b1 \u03b2\u03c1\u03b5\u03b8\u03bf\u03cd\u03bd \u03c0\u03c1\u03bf\u03b2\u03bb\u03ae\u03bc\u03b1\u03c4\u03b1 \u03c0\u03bf\u03c5 \u03b5\u03af\u03bd\u03b1\u03b9 \u03c0\u03c1\u03b1\u03b3\u03bc\u03b1\u03c4\u03b9\u03ba\u03ac \u03b1\u03c0\u03b1\u03b9\u03c4\u03b7\u03c4\u03b9\u03ba\u03ac, \u03cc\u03c7\u03b9 \u03b1\u03c0\u03bb\u03ce\u03c2 \u00ab\u03ba\u03b1\u03b9\u03bd\u03bf\u03cd\u03c1\u03b9\u03b1\u00bb.<\/li>\n\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">\u03a0\u03c1\u03b1\u03ba\u03c4\u03b9\u03ba\u03cc take: \u03c0\u03bf\u03cd \u03b2\u03bf\u03b7\u03b8\u03ac\u03b5\u03b9 \u03ad\u03bd\u03b1\u03bd WordPress dev \u03c3\u03ae\u03bc\u03b5\u03c1\u03b1<\/h2>\n\n\n\n<p>\u0391\u03bd \u03b4\u03bf\u03c5\u03bb\u03b5\u03cd\u03b5\u03b9\u03c2 \u03b5\u03c0\u03b1\u03b3\u03b3\u03b5\u03bb\u03bc\u03b1\u03c4\u03b9\u03ba\u03ac \u03bc\u03b5 WordPress (plugins, custom blocks, integrations), \u03c4\u03bf WP-Bench \u03bc\u03c0\u03bf\u03c1\u03b5\u03af \u03bd\u03b1 \u03bb\u03b5\u03b9\u03c4\u03bf\u03c5\u03c1\u03b3\u03ae\u03c3\u03b5\u03b9 \u03c3\u03b1\u03bd \u03ba\u03bf\u03b9\u03bd\u03ae \u03b3\u03bb\u03ce\u03c3\u03c3\u03b1 \u03b3\u03b9\u03b1 \u03bd\u03b1 \u03c3\u03c5\u03b3\u03ba\u03c1\u03af\u03bd\u03b5\u03b9\u03c2 assistants \u03ba\u03b1\u03b9 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03bc\u03b5 \u03bc\u03b5\u03c4\u03c1\u03ae\u03c3\u03b9\u03bc\u03bf \u03c4\u03c1\u03cc\u03c0\u03bf. \u0395\u03b9\u03b4\u03b9\u03ba\u03ac \u03bf \u03ac\u03be\u03bf\u03bd\u03b1\u03c2 <strong>Execution<\/strong> \u03b5\u03af\u03bd\u03b1\u03b9 \u03b1\u03c5\u03c4\u03cc \u03c0\u03bf\u03c5 \u03bb\u03b5\u03af\u03c0\u03b5\u03b9 \u03b1\u03c0\u03cc \u03c0\u03bf\u03bb\u03bb\u03ac \u201cAI evals\u201d: \u03bd\u03b1 \u03b1\u03c0\u03bf\u03b4\u03b5\u03b9\u03ba\u03bd\u03cd\u03b5\u03c4\u03b1\u03b9 \u03cc\u03c4\u03b9 \u03c4\u03bf output \u03b5\u03af\u03bd\u03b1\u03b9 <em>\u03b5\u03ba\u03c4\u03b5\u03bb\u03ad\u03c3\u03b9\u03bc\u03bf, \u03b1\u03c3\u03c6\u03b1\u03bb\u03ad\u03c2 \u03ba\u03b1\u03b9 \u03c3\u03c5\u03bc\u03b2\u03b1\u03c4\u03cc<\/em> \u03bc\u03ad\u03c3\u03b1 \u03c3\u03b5 \u03c0\u03c1\u03b1\u03b3\u03bc\u03b1\u03c4\u03b9\u03ba\u03cc WordPress \u03c0\u03b5\u03c1\u03b9\u03b2\u03ac\u03bb\u03bb\u03bf\u03bd.<\/p>\n\n\n\n<p>\u03a0\u03b1\u03c1\u03ac\u03bb\u03bb\u03b7\u03bb\u03b1, \u03b1\u03bd \u03bf \u03c3\u03c4\u03cc\u03c7\u03bf\u03c2 \u03b3\u03b9\u03b1 \u03b4\u03b7\u03bc\u03cc\u03c3\u03b9\u03bf leaderboard \u03c0\u03c1\u03bf\u03c7\u03c9\u03c1\u03ae\u03c3\u03b5\u03b9, \u03b8\u03b1 \u03c5\u03c0\u03ac\u03c1\u03c7\u03b5\u03b9 \u03bc\u03b9\u03b1 \u03c0\u03b9\u03bf \u03be\u03b5\u03ba\u03ac\u03b8\u03b1\u03c1\u03b7 \u03b5\u03b9\u03ba\u03cc\u03bd\u03b1 \u03b3\u03b9\u03b1 \u03c4\u03bf \u03c0\u03ce\u03c2 \u03c3\u03c4\u03ad\u03ba\u03bf\u03bd\u03c4\u03b1\u03b9 \u03c4\u03b1 \u03bc\u03bf\u03bd\u03c4\u03ad\u03bb\u03b1 \u03c3\u03b5 tasks \u03c0\u03bf\u03c5 \u03bc\u03b1\u03c2 \u03ba\u03b1\u03af\u03bd\u03b5 \u03ba\u03b1\u03b8\u03b7\u03bc\u03b5\u03c1\u03b9\u03bd\u03ac (hooks, nonces, escaping\/sanitization, \u03c3\u03c9\u03c3\u03c4\u03ae \u03c7\u03c1\u03ae\u03c3\u03b7 WP APIs) \u03c7\u03c9\u03c1\u03af\u03c2 \u03bd\u03b1 \u03b2\u03b1\u03c3\u03b9\u03b6\u03cc\u03bc\u03b1\u03c3\u03c4\u03b5 \u03c3\u03b5 \u03b5\u03bd\u03c4\u03c5\u03c0\u03ce\u03c3\u03b5\u03b9\u03c2.<\/p>\n\n\n<div class=\"references-section\">\n                <h2>\u0391\u03bd\u03b1\u03c6\u03bf\u03c1\u03ad\u03c2 \/ \u03a0\u03b7\u03b3\u03ad\u03c2<\/h2>\n                <ul class=\"references-list\"><li><a href=\"https:\/\/make.wordpress.org\/ai\/2026\/01\/14\/introducing-wp-bench-a-wordpress-ai-benchmark\/\" target=\"_blank\" rel=\"noopener noreferrer\">Introducing WP-Bench: A WordPress AI Benchmark<\/a><\/li><li><a href=\"https:\/\/github.com\/WordPress\/wp-bench\" target=\"_blank\" rel=\"noopener noreferrer\">WP-Bench GitHub README<\/a><\/li><li><a href=\"https:\/\/make.wordpress.org\/ai\/2025\/07\/17\/ai-building-blocks\/\" target=\"_blank\" rel=\"noopener noreferrer\">AI Building Blocks for WordPress<\/a><\/li><li><a href=\"https:\/\/wordpress.slack.com\/archives\/C08TJ8BPULS\" target=\"_blank\" rel=\"noopener noreferrer\">#core-ai Slack channel<\/a><\/li><\/ul>\n            <\/div>","protected":false},"excerpt":{"rendered":"<p>\u038c\u03bb\u03bf\u03b9 \u03c7\u03c1\u03b7\u03c3\u03b9\u03bc\u03bf\u03c0\u03bf\u03b9\u03bf\u03cd\u03bc\u03b5 AI assistants \u03c3\u03c4\u03bf WordPress, \u03b1\u03bb\u03bb\u03ac \u03c0\u03cc\u03c3\u03bf \u03ba\u03b1\u03bb\u03ac \u00ab\u03be\u03ad\u03c1\u03bf\u03c5\u03bd\u00bb WordPress \u03c3\u03c4\u03b7\u03bd \u03c0\u03c1\u03ac\u03be\u03b7; \u03a4\u03bf WP-Bench \u03ad\u03c1\u03c7\u03b5\u03c4\u03b1\u03b9 \u03c9\u03c2 \u03b5\u03c0\u03af\u03c3\u03b7\u03bc\u03bf benchmark \u03b3\u03b9\u03b1 \u03bd\u03b1 \u03c4\u03bf \u03bc\u03b5\u03c4\u03c1\u03ae\u03c3\u03b5\u03b9 \u03bc\u03b5 \u03c4\u03c1\u03cc\u03c0\u03bf \u03b5\u03c0\u03b1\u03bd\u03b1\u03bb\u03ae\u03c8\u03b9\u03bc\u03bf \u03ba\u03b1\u03b9 \u03b1\u03c5\u03c3\u03c4\u03b7\u03c1\u03ac \u03b5\u03ba\u03c4\u03b5\u03bb\u03ad\u03c3\u03b9\u03bc\u03bf.<\/p>\n","protected":false},"author":67,"featured_media":151,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[43],"tags":[27,82,10,7,35],"class_list":["post-152","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oikosystima-wordpress","tag-ai","tag-benchmark","tag-wordpress","tag-wp-cli","tag-35"],"_links":{"self":[{"href":"https:\/\/helloblog.io\/el\/wp-json\/wp\/v2\/posts\/152","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/helloblog.io\/el\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/helloblog.io\/el\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/helloblog.io\/el\/wp-json\/wp\/v2\/users\/67"}],"replies":[{"embeddable":true,"href":"https:\/\/helloblog.io\/el\/wp-json\/wp\/v2\/comments?post=152"}],"version-history":[{"count":0,"href":"https:\/\/helloblog.io\/el\/wp-json\/wp\/v2\/posts\/152\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/helloblog.io\/el\/wp-json\/wp\/v2\/media\/151"}],"wp:attachment":[{"href":"https:\/\/helloblog.io\/el\/wp-json\/wp\/v2\/media?parent=152"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/helloblog.io\/el\/wp-json\/wp\/v2\/categories?post=152"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/helloblog.io\/el\/wp-json\/wp\/v2\/tags?post=152"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}