Data + AI systems for real workflows

Open to Data & AI Engineering roles

I build data and AI systems that survive real workflows.

I am Weijian (Tim) Zhang, a data engineer and AI-driven pipeline architect. I work across the full chain: ingestion, modeling, retrieval, evaluation, service design, and cloud deployment, with delivery experience spanning Hong Kong GovTech, insurance fraud detection, clinical gait AI, multinational retail enterprise data platforms, and commercial bank data middle-platform programs.

Featured work includes an award-winning SRR Agentic Case Processing System, GaitGPT clinical gait analysis, MCC FWA insurance fraud graph intelligence, and enterprise warehouse modernization. The resume points here; the portfolio shows the systems.

Best Fit

Data Engineer, AI Engineer, applied ML/LLM engineer, or hybrid data + AI product roles.

Strongest Areas

Agentic workflows, clinical/financial AI, retrieval pipelines, data quality automation, and FastAPI services.

What Teams Get

Someone who can connect warehouse discipline, model behavior, and production pragmatism.

About

I build systems that stay useful after the demo.

My background is anchored in data engineering, but the through-line has always been operational reliability. I started with enterprise data warehouse design and large-scale analytics, then moved into real-time monitoring, AI-assisted workflows, and cloud-native service delivery.

That combination matters because many AI projects break at the seams: weak ingestion, brittle retrieval, unclear evaluation, or deployment paths that are hard to maintain. I enjoy building across those seams so the system behaves as one coherent product rather than a chain of disconnected tools.

I hold an MSc in Artificial Intelligence and Business Analytics from Lingnan University and bring hands-on delivery experience from Hong Kong government innovation work, clinical gait AI, insurance fraud detection, multinational retail enterprise data platforms, and commercial bank data middle-platform programs.

Education

MSc in Artificial Intelligence and Business Analytics

Lingnan University, Hong Kong · 2025–2026

AIBA coursework: Foundation of AI, Business Data Management, Data Analytics & Programming, Healthcare Analytics, Data Visualization, Programming with Generative AI, AI-Based Optimization.

Bachelor of Management · Shenzhen University · 2013–2017

Domain Context

Government services, banking, and global retail

Environments where data quality, business rules, and stakeholder trust are not optional.

Working Style

Systems thinking with delivery discipline

I like architectures that are measurable, explainable, and production-ready from day one.

Publications, Patents & Awards

External recognition, IP disclosures, and research work in progress.

This is the compact signal layer for outcomes beyond delivery: competition recognition, patent disclosure materials, and manuscript work that is still positioned as in preparation.

Award Fourth Sun Yat-sen University Lingnan Cup, Second Prize

Hong Kong Lingnan University MScAIBA team case on SRR civic-service automation; 163 teams entered and 8 advanced to the final.

Manuscript From Symptoms to Signals

A Bidirectional LLM Framework for Literature-Grounded Clinical Gait Analysis; manuscript in preparation.

Patent 01 File-Driven Intelligent Case Processing System and Method for Public-Service Complaint Workflows

Patent disclosure in preparation for file-driven intake, retrieval, quality control, rollback, and response drafting in public-service complaint workflows.

Patent 02 Evidence-bound clinical gait translation

Invention disclosure material for symptom-to-metric translation with literature evidence, measurability status, and biomechanics validation.

Patent 03 Vision-Link Discovery

Patent disclosure in preparation for multi-model fusion that discovers cross-modal visual relationships inside file knowledge graphs.

Capabilities

The work I do best sits where data foundations and AI behavior need to cooperate.

I am most useful when the problem needs both infrastructure rigor and application-layer intelligence: reliable ingestion, clean modeling, retrieval quality, evaluation, and deployment that can survive real-world use.

01

Data foundations

  • SQL, dimensional modeling, and enterprise warehouse design
  • Python pipelines for ingestion, transformation, and validation
  • GraphDB modeling for claims fraud paths, entity links, and adjudication evidence
  • PostgreSQL, MySQL, SQL Server, StarRocks, Hive, Kafka
  • Cross-source reconciliation and downstream reporting reliability

02

LLM and retrieval systems

  • Agentic workflow design with hybrid retrieval and reranking
  • LangChain, LangGraph, OpenAI API, Gemini, and pgvector-based workflows
  • Prompt and evaluation design, including LLM-as-Judge and ablation patterns
  • LLM-assisted medical-necessity review, policy checks, and rejection-code reasoning
  • Document parsing, literature retrieval, semantic search, and response generation pipelines

03

Production delivery

  • FastAPI services, Docker packaging, Linux deployment, and CI/CD
  • Async batch workflows, result tracking, and review-facing API design
  • Google Cloud Run and Cloud SQL for managed deployment paths
  • Monitoring, alerting, and failure-aware workflow design
  • Cross-functional execution with technical and business stakeholders
Engineering

Python, SQL, FastAPI, Docker, Linux, GitHub, Airflow, DolphinScheduler

AI & Retrieval

LangChain, LangGraph, OpenAI API, Gemini, pgvector, Neo4j, PubMed, Semantic Scholar, LLM-as-Judge

Analytics & Cloud

Power BI, Tableau, GCP, StarRocks, SQL Server, PostgreSQL, Google Spanner Graph-ready design

Selected Projects

A few systems that show how I think, build, and measure impact.

These projects are arranged around the kind of work I am targeting: hybrid data and AI engineering roles where product usefulness depends on both infrastructure discipline and intelligent workflow design.

Award-Winning GovTech Case

SRR — Agentic Case Processing System

Technical Lead · 99.4% contribution · Lingnan Cup 2nd Prize · Guangzhou TV segment · Patent filing in preparation · Public reference implementation

Three-stage repo evolution

  1. 01 Original baseline SRR-Case-Processing-System
  2. 02 Agentic system SRR-Agentic-Case-Processing-System
  3. 03 Public showcase srr-agentic-case-processing

80,430 lines delivered across two internal iterations before the public version.

View SRR-Project-TeamGlobal link / 海外链接
3 repository stages from original to public showcase
17 pluggable atomic capabilities
12/12 requirements delivered
TV-Featured Project Evidence SRR was shown in a Guangzhou TV segment covering the fourth Lingnan Cup. The project is presented here as both a working AI system and a public-facing competition case, connecting technical delivery with external recognition.
SRR product video thumbnail showing input parsing and task routing architecture SO_PRD25 product video Background -> pain points -> solution -> future roadmap Open video CIC AI Award preparation video thumbnail showing field maintenance work CIC AI Award preparation Construction-sector pitch material built from the SRR workflow story Open video

Hong Kong public-service SRR handling relied on multi-channel materials, manual routing, historical lookup, and repeated field interpretation.

Turn the process into a controllable case-processing system that could parse ICC 1823, TMO, and RCC inputs, extract A-Q fields, and draft traceable replies.

Designed a seven-layer agentic architecture with 17 atomic capabilities, pgvector + RRF retrieval, three-tier evaluation, Best-of-N repair, and rollback logic.

Delivered 12/12 requirements and 80,430 lines across internal iterations, then converted the work into a public reference repo, TV-covered case, award proof, and patent-prep narrative.

Code Public SRR reference implementation Open GitHub repository Global link / 海外链接 Media WeChat feature coverage Official article and team photos Competition Lingnan Cup case page Case topic and finalist listing Next Stage CIC AI Award preparation Construction industry competition track
SRR team presenting during the final defense session
Live Lingnan Cup final moment explaining workflow automation, case routing, and public-service value.
SRR seven-layer agentic system architecture diagram
Seven-layer agentic architecture with parsing, routing, retrieval, quality gates, and fallback logic.
SRR assistant extracting case fields and drafting replies
Interaction view showing extracted fields, similar-case handling, and draft-reply workflow.
Lingnan Cup second prize award photo
Lingnan Cup final: Second Prize for the Hong Kong Lingnan University team.
SRR team celebrating the Lingnan Cup second prize outside Lingnan Hall
Team celebration outside Lingnan Hall after the Second Prize result, with project title and Hong Kong Lingnan University team sign visible.
Close-up of the Lingnan Cup second prize trophy and certificate folder
Second Prize trophy close-up: physical award proof from the fourth Lingnan Cup case analysis final.
Lingnan Cup second prize certificate for the SRR case project
Second Prize certificate naming the SRR case: from paperless handling to automated civic-service workflow.
Award reflection from Lingnan Cup coverage
Award reflections highlight hands-on AI implementation, real business data, and civic-service problem solving.
Official Lingnan College webpage showing competition photo gallery without browser interface
Official website feature: cropped to the webpage itself, showing the SRR team in the Lingnan Cup photo gallery and recap.
Official competition topic listing naming the SRR project
Official topic listing: Hong Kong SRR civic-service processing case among finalist reports.

InsurTech Graph Intelligence

MCC FWA — Insurance Claims Fraud Graph Intelligence

Core Developer · GraphDB foundation · Claims adjudication workflow · Neo4j validation · Spanner Graph-ready design

Pipeline shape

  1. 01 Claim ingestion 38,659 medical claim JSON files
  2. 02 Property graph Claim, Event, Doctor, Hospital, Diagnosis, Receipt, BillingItem, BreakdownItem, AmountFeature
  3. 03 Review workflow Risk paths, medical necessity, policy checks, and case-level summaries

Graph first because FWA risk often appears in relationships, not a single field.

38,659 medical claim JSON files converted
1.61M graph nodes generated for entity-level investigation
2.57M edges connecting claim paths and evidence trails

Claim JSON -> Property Graph -> Risk Paths -> Claims Review

The system turns raw claim records into portable nodes and edges, validates paths in Neo4j, and prepares the model for future Google Spanner Graph deployment.

Insurance FWA review faces high claim volume, sparse confirmed fraud labels, and risk signals that often hide across patients, providers, diagnoses, receipts, and amount patterns.

Create a data foundation that lets reviewers inspect risk paths instead of reading isolated claim fields or opaque model scores.

Converted 38,659 claim JSON files into a 1.61M-node / 2.57M-edge property graph, validated paths in Neo4j, and supported FastAPI async review with medical-necessity, policy, and rejection-code reasoning.

The output shifts the review surface from approve/reject labels to explainable evidence trails: risk score, reason code, related entities, and claim-level summary that an assessor can audit.

Graph Schema 9 node families

Claim, event, provider, diagnosis, receipt, billing, breakdown, and amount-feature entities.

Validation Neo4j local POC

Single-claim path inspection helps explain why entities are linked in a review trail.

Deployment Path Spanner Graph-ready

Portable CSV extraction keeps the graph foundation independent from one graph database.

Public Landing Page MediConCen FWA project site Open live landing page
Redacted thumbnail of the MCC FWA claim graph preview
Redacted Screenshot Neo4j graph path preview
Redacted thumbnail of the MCC FWA claims processing dashboard
Redacted Screenshot Claims processing workflow UI

Clinical AI Research System

GaitGPT — Clinical Gait Analysis Agent

Architecture Lead · Manuscript in preparation · iCAN 2026 application · LU invention disclosure preparation

0.9223 overall benchmark accuracy
10.07s median response time
20 question benchmark suite
Paper Manuscript in preparation

Target venue options under review: Sensors, AMIA / MICCAI workshop tracks, or BMC Medical Informatics.

Competition iCAN 2026 application package

Prepared for Medicine & Health Care and IT & Data Technology categories, with a finals video asset ready.

Patent LU disclosure preparation

Invention-disclosure materials focus on evidence-bound clinical gait translation while keeping claim-level details private.

Gait systems produce dense sensor indicators, while clinicians often reason in symptom language such as limping, shuffling gait, or asymmetry.

Build a privacy-first assistant that can answer structured gait-data questions and map clinical descriptions back to measurable gait indicators with evidence boundaries.

Combined NONSD-Gait data, rule-template-first query handling, PubMed / Semantic Scholar retrieval, reverse translation, evidence scoring, and physics-constrained validation.

Produced a 20-question benchmark with 0.9223 overall accuracy and 10.07s median response time, forming the basis for manuscript, iCAN, and LU invention-disclosure preparation.

67-second product flash showing the clinical gait analysis concept, interface direction, and research-to-clinic positioning.

Enterprise Data Backbone

Multinational Retail B2B Data Platform

Data Engineer · B2B finance and operations reporting · Cloud warehouse modernization · SOX-aware delivery

Business data scope

  1. 01 B2B transaction foundation Orders, contracts, items, stores, customers, fulfillment, inventory, payment, margin, and approval fields
  2. 02 Reporting and governance Dashboard-ready ADS tables, data dictionary, SOX-aware release evidence, and production job monitoring
  3. 03 Cloud warehouse performance Five-layer ODS / DIM / DWD / DWS / ADS architecture with StarRocks acceleration

Enterprise value came from making finance and operations data explainable, reusable, and fast enough for daily decisions.

10TB+ warehouse workload migrated and restructured
3-10 min → 0.8s core analytical queries accelerated with StarRocks
-87.5% ETL runtime reduction after warehouse optimization

B2B finance and operations reporting depended on fragmented order, contract, billing, inventory, gift-card, fulfillment, and enterprise-account sources.

Make daily decision data reusable and explainable across finance, operations, and dashboard teams while respecting SOX-style release evidence and access boundaries.

Migrated 10TB+ workloads, rebuilt ODS / DIM / DWD / DWS / ADS layers, connected CRM / OMS / B2B / gift-card / warehouse / finance sources, and tuned Hive, SparkSQL, StarRocks, Seatunnel, and scheduling paths.

Cut ETL runtime by 87.5%, reduced disk usage by 38%, improved query speed by 55%, and moved core StarRocks analytical queries from 3-10 minutes to an average 0.8s.

Redacted appreciation note for enterprise retail B2B delivery
Cross-border retail B2B delivery evidence: appreciation material kept as business-context proof.
Retail B2B dataflow diagram across source systems and reporting layers
Dataflow evidence across source systems, data processing, and reporting outputs.
Retail B2B technical architecture diagram
Technical architecture for B2B reporting and financial data operations.
Anonymized retail big data overview diagram with sensitive brand text masked
Enterprise big-data capability overview with sensitive brand text masked by mosaic.

Commercial Banking Data

Commercial Banking Data Warehouse

Data Engineer · Model-layer maintenance · SQL delivery · Banking stakeholder communication

Banking data discipline

  1. 01 Business object understanding Corporate credit, limits, loan contracts, loan disbursement, repayment, and settlement accounts
  2. 02 Platform architecture Channel, front-office, product-service, integration, warehouse, and management-decision layers
  3. 03 Warehouse flow Source systems, data bus, SDATA / SHDATA, domain layers, marts, and downstream applications

Banking value means definitions are stable, lineage is traceable, and downstream reporting can be trusted.

100+ transaction systems integrated in enterprise data work
5 layers warehouse modeling discipline carried into later delivery
50% cross-team model reuse improvement in warehouse architecture work

Commercial-bank reporting depends on strict definitions for credit, limits, contracts, disbursement, repayment, settlement accounts, and downstream product indicators.

Support financial, regulatory, and management reporting by keeping mart logic aligned with bank definitions, source lineage, and stakeholder confirmation.

Maintained model-layer SQL, mapped source systems through data-bus, SDATA / SHDATA, domain, mart, and application layers, and handled requirement clarification directly with banking stakeholders.

Built bank-grade delivery habits: clear semantics first, controlled change second, traceable issue handling, and stable reporting outputs before presentation polish.

Commercial banking loan lifecycle business object map
Loan lifecycle business-object map: from application and approval to disbursement, repayment, post-loan management, and banking data foundation.
Commercial banking channel and system architecture diagram
Banking channel and system architecture: channel portal, front-office integration, product service layer, data integration, and decision layers.
Commercial banking warehouse data flow from source systems to marts and applications
Banking warehouse flow: source systems, data bus, SDATA / SHDATA, business domains, marts, and downstream applications.

Additional Work

Enterprise delivery that reinforces the same pattern.

Banking & Retail Data Warehouse Delivery

Delivered data modeling, cloud migration, and warehouse optimization across banking and retail contexts, including commercial bank data middle-platform and multinational retail enterprise data platform environments.

Resume

The full experience, condensed into one document.

If you want the complete timeline, project detail, and technical background, the resume includes the full version. It is designed to point back to this portfolio, where SRR, GaitGPT, and MCC FWA show the system proof, architecture decisions, and project narrative.

Snapshot

2019–2025 Enterprise data engineering across banking, retail, and cloud data platforms.

2025–2026 SRR, GaitGPT, and MCC FWA: applied AI systems with evidence, evaluation, and deployment stories.

Core promise Strong foundations, measurable outcomes, and systems that can be maintained.

Contact

If you need someone who can move between pipelines, platforms, and AI behavior, let's talk.

I am especially interested in roles where data reliability and intelligent application design need to work together, not compete with each other.

Best fit

Data Engineer, AI Engineer, Platform Engineer, or hybrid data + AI product roles.