Yahya M. Mirza

Passionate QA Engineer specializing in manual, automation, and AI testing. Experienced in Playwright, Cypress, Selenium, Jest, CI/CD pipelines, and LLM evaluation frameworks to ensure software quality, reliability, and response accuracy.

What I do

Expertise

Manual Testing

Designing and executing detailed test cases and suites to ensure maximum functional coverage.

Automation Testing

Building end-to-end automation scripts with Playwright, Cypress, and Selenium to improve efficiency and accuracy.

API Testing

Testing REST and GraphQL APIs using Postman, Newman, and JMeter to validate functionality, performance, and reliability.

DB Testing

Writing SQL queries to verify data integrity, schema validation, and back-end consistency across environments.

Performance Testing

Conducting load and stress testing with JMeter to detect bottlenecks and optimize system scalability.

LLM Evaluation

Building AI testing frameworks with Promptfoo, DeepEval, FAISS, and LLM-as-Judge techniques to benchmark chatbot quality and uncover failure patterns before deployment.

QA & Test Automation Projects

End-to-end testing, automation frameworks, performance analysis, and CI/CD pipelines.

CI/CD Kubernetes Test Pipeline

CI/CD Kubernetes Test Pipeline

End-to-end CI/CD pipeline where GitHub Actions builds Dockerized Playwright tests, Jenkins orchestrates execution, and Kubernetes Jobs run automated tests inside containerized pods.

LLM Automation Testing

LLM Automation Testing

Developed a lightweight framework to audit LLM behavior using Groq LLaMA3: validating keywords, measuring latency, grading responses, detecting hallucinations, checking safety, and testing paraphrase robustness.

Self-Healing Automation

AI-Driven Self-Healing Automation

Implemented AI-based locator healing using Playwright and Healenium, enabling tests to auto-recover from DOM changes and reduce maintenance time.

Docker Playwright Automation

Dockerized Playwright Testing

Containerized Playwright tests using Docker Compose to automate E2E testing in isolated environments with consistent CI execution.

LLM Testing Suite

AI LLM Testing Suite

Built a hybrid LLM evaluation framework using Promptfoo, DeepEval, RAG (FAISS), and Groq to measure answer relevancy, faithfulness, and toxicity while identifying prompt weaknesses through real-world test datasets.

Feel free to reach out.

I'll try to answer as soon as possible.