AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

framework pypi-package pytest python
40 Open Issues Need Help Last updated: Jan 23, 2026

Open Issues Need Help

View All on GitHub

AI Summary: Mypy is reporting missing type annotations for the `cases` list in both `LlamaDatasetGenerator` and `OpenAIDatasetGenerator` within `src/agentunit/generators/llm_generator.py`. The task is to add the correct type hints to these lists to resolve the errors.

Complexity: 1/5
good first issue

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement good first issue

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
help wanted good first issue tests

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python

AI Summary: The `SuiteResult` class, which currently supports JSON, Markdown, JUnit, and HTML exports, needs a new `to_csv(self, path)` method. This method should export test results to a CSV file, with the added requirement of flattening nested metrics into separate, distinct columns for improved spreadsheet analysis.

Complexity: 3/5
enhancement good first issue

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
help wanted refactor architecture

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement good first issue data-handling

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted cli

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted frontend

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement good first issue

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python

AI Summary: This issue proposes refactoring all usages of the deprecated `datetime.utcnow()` to its timezone-aware equivalent, `datetime.now(timezone.utc)`. The goal is to eliminate deprecation warnings and ensure accurate timestamps, while maintaining existing string formatting and adding a small unit test to verify the change.

Complexity: 1/5
help wanted good first issue chore

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
Fix typos in docstrings about 1 month ago
documentation help wanted good first issue

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
help wanted good first issue tests

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted performance

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted good first issue

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
help wanted good first issue tests

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
help wanted tests

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
documentation help wanted good first issue

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
documentation help wanted good first issue

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted good first issue

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
enhancement help wanted good first issue

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
help wanted good first issue tests

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
documentation help wanted good first issue

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
help wanted good first issue chore

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python
documentation help wanted good first issue

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python
#framework#pypi-package#pytest#python