Abstracts

07.11.2024
12:05 - 12:50

Track
Test & AI

Taras Holoyad
Federal Network Agency

Testing CV capabilities in light of the EU AI Act

As for usual software, test criteria for AI systems can depend heavily on the capabilities to be tested. However, the AI Act expects closer scrutiny regarding market access and regulation concerning capabilities. Therefore, due to the current lack of clear description of "capabilities" in the context of the EU AI Act, this conference contribution aims to provide clarification. According to the current state of science, AI-specific capabilities exist in a wide range within the supergroups of perception, processing, action and communication. In this context, the formulation of individual test criteria for specific capabilities is essential for comprehensive testing against quality measures such as correctness.
This conference contribution suggests a test description for machine learning (ML) capabilities in alignment with the requirements of the European Artificial Intelligence (AI) Act. The approach involves a structured test description for testing ML system capabilities according to Correctness, Robustness, Avoidance of unwanted bias and Security against adversarial attacks. To demonstrate the practical suitability, the testing description was applied to the computer vision model DEtection TRansformer (DETR) for vehicle classification in road traffic.

Lessons Learned:
With regard to testing AI systems, the central lesson learned is that even a standardised test description for AI systems requires an adaptation of test criteria to the capability to be tested.
With regard to fulfilling AI Act-related expectations, testing capabilities fosters the fulfillment of requirements, e.g. with regard to:

Transparency and Traceability: Insights into how AI systems perceive and process information, aiding in transparent development and deployment;
Risk Management: Identifying weaknesses and strengths of AI systems under various conditions helps in managing risks associated with potential failures or biases;
Human Oversight and Interpretability: Support of human decision-making regarding system deployment and use by highlighting performance limitations and strengths;
Prevention of Misuse: Understanding AI system performance in specific scenarios aids in preventing potential misuse or unintended consequences;
Compliant Cooperation: Effective communication of AI capabilities and limitations to downstream users, fostering compliant cooperation.

Applicability to Other Project Situations:
The suggested test description is applicable to a broad range of AI projects, especially those subject to regulatory scrutiny or generally requiring robust quality assurance. The approach can be adapted and applied under specific conditions:

Further ML Systems: The scheme is suitable for testing various types of ML systems beyond CV, as long as they involve inference tasks;
Regulatory Compliance: Particularly useful for operators as well as market surveillance authorities to understand deficiencies;

However, the applicability may be limited under certain conditions:

Resource Constraints: Comprehensive testing may require significant computational resources and expertise.
Scope of Testing: The scheme's effectiveness depends on the AI system's transparency as well as the specificity of defined quality criteria and test scenarios.
System Complexity: Testing may face challenges with highly complex or novel AI functionalities beyound the current state of the art, making the applicability of a standardised test description challenging.

Benefit for my employer (Federal Network Agency):
In the context of regulating artificial intelligence, a standardised test description of AI systems improves the comprehensibility for evaluating AI systems during market surveillance, especially when justifying product market withdrawal.

Taras Holoyad, Federal Network Agency

After graduating from Technical University of Braunschweig (Electrical Engineering M.Sc.), I gained experience in the automotive industry (calculation of electrical machines) and afterwards switched to standardisation of AI.
At present, my focus lies on standardisation of Artificial Intelligence at the Federal Network Agency (Bundesnetzagentur) at the national level (DIN/DKE) as well as worldwide (ETSI, ISO/IEC and ITU).
My most impactful projects are:

Project leader of ISO/IEC 42102 „Taxonomy of AI system methods and capabilities“;
Evaluation of ML-based traffic classification at the city planning authority of Wiesbaden;
Co-author of the scheme „AI=MC2“ and of the correspondent book
Schmid, T., et al. (2023). Künstliche Intelligenz managen und verstehen: Der Praxis-Wegweiser für Entscheidungsträger, Entwickler und Regulierer. Deutschland: DIN Media GmbH;
Development of the online AI glossary "AI-Glossary.org"