Data + AI systems for real workflows

Open to Data & AI Engineering roles

I build data and AI systems that survive real workflows.

I am Weijian (Tim) Zhang, a data engineer and AI-driven pipeline architect. I work across the full chain: ingestion, modeling, retrieval, evaluation, service design, and cloud deployment, with delivery experience spanning Hong Kong GovTech, insurance fraud detection, clinical gait AI, multinational retail enterprise data platforms, and commercial bank data middle-platform programs.

Featured work includes an award-winning SRR Agentic Case Processing System, GaitGPT clinical gait analysis, MCC FWA insurance fraud graph intelligence, and enterprise warehouse modernization. The resume points here; the portfolio shows the systems.

View Selected Work Start a Conversation

Best Fit

Data Engineer, AI Engineer, applied ML/LLM engineer, or hybrid data + AI product roles.

Strongest Areas

Agentic workflows, clinical/financial AI, retrieval pipelines, data quality automation, and FastAPI services.

What Teams Get

Someone who can connect warehouse discipline, model behavior, and production pragmatism.

Weijian (Tim) Zhang Data + AI Systems Builder

Featured proof

From dimensional models to award-winning agentic systems.

My recent work covers SRR case-processing automation, GaitGPT clinical gait analysis, MCC FWA graph intelligence, and cross-source financial monitoring for retail operations.

80,430

lines delivered for the SRR Agentic Case Processing System

1.61M

nodes modeled for MCC FWA insurance-claims graph intelligence

8+ years

enterprise delivery across banking, retail, GovTech, and AI systems

About

I build systems that stay useful after the demo.

My background is anchored in data engineering, but the through-line has always been operational reliability. I started with enterprise data warehouse design and large-scale analytics, then moved into real-time monitoring, AI-assisted workflows, and cloud-native service delivery.

That combination matters because many AI projects break at the seams: weak ingestion, brittle retrieval, unclear evaluation, or deployment paths that are hard to maintain. I enjoy building across those seams so the system behaves as one coherent product rather than a chain of disconnected tools.

I hold an MSc in Artificial Intelligence and Business Analytics from Lingnan University and bring hands-on delivery experience from Hong Kong government innovation work, clinical gait AI, insurance fraud detection, multinational retail enterprise data platforms, and commercial bank data middle-platform programs.

Education

MSc in Artificial Intelligence and Business Analytics

Lingnan University, Hong Kong · 2025–2026

AIBA coursework: Foundation of AI, Business Data Management, Data Analytics & Programming, Healthcare Analytics, Data Visualization, Programming with Generative AI, AI-Based Optimization.

Bachelor of Management · Shenzhen University · 2013–2017

Domain Context

Government services, banking, and global retail

Environments where data quality, business rules, and stakeholder trust are not optional.

Working Style

Systems thinking with delivery discipline

I like architectures that are measurable, explainable, and production-ready from day one.

Publications, Patents & Awards

External recognition, IP disclosures, and research work in progress.

This is the compact signal layer for outcomes beyond delivery: competition recognition, patent disclosure materials, and manuscript work that is still positioned as in preparation.

Award Fourth Sun Yat-sen University Lingnan Cup, Second Prize

Hong Kong Lingnan University MScAIBA team case on SRR civic-service automation; 163 teams entered and 8 advanced to the final.

Manuscript From Symptoms to Signals

A Bidirectional LLM Framework for Literature-Grounded Clinical Gait Analysis; manuscript in preparation.

Patent 01 File-Driven Intelligent Case Processing System and Method for Public-Service Complaint Workflows

Patent disclosure in preparation for file-driven intake, retrieval, quality control, rollback, and response drafting in public-service complaint workflows.

Patent 02 Evidence-bound clinical gait translation

Invention disclosure material for symptom-to-metric translation with literature evidence, measurability status, and biomechanics validation.

Patent 03 Vision-Link Discovery

Patent disclosure in preparation for multi-model fusion that discovers cross-modal visual relationships inside file knowledge graphs.

Capabilities

The work I do best sits where data foundations and AI behavior need to cooperate.

I am most useful when the problem needs both infrastructure rigor and application-layer intelligence: reliable ingestion, clean modeling, retrieval quality, evaluation, and deployment that can survive real-world use.

01

Data foundations

SQL, dimensional modeling, and enterprise warehouse design
Python pipelines for ingestion, transformation, and validation
GraphDB modeling for claims fraud paths, entity links, and adjudication evidence
PostgreSQL, MySQL, SQL Server, StarRocks, Hive, Kafka
Cross-source reconciliation and downstream reporting reliability

02

LLM and retrieval systems

Agentic workflow design with hybrid retrieval and reranking
LangChain, LangGraph, OpenAI API, Gemini, and pgvector-based workflows
Prompt and evaluation design, including LLM-as-Judge and ablation patterns
LLM-assisted medical-necessity review, policy checks, and rejection-code reasoning
Document parsing, literature retrieval, semantic search, and response generation pipelines

03

Production delivery

FastAPI services, Docker packaging, Linux deployment, and CI/CD
Async batch workflows, result tracking, and review-facing API design
Google Cloud Run and Cloud SQL for managed deployment paths
Monitoring, alerting, and failure-aware workflow design
Cross-functional execution with technical and business stakeholders

Engineering

Python, SQL, FastAPI, Docker, Linux, GitHub, Airflow, DolphinScheduler

AI & Retrieval

LangChain, LangGraph, OpenAI API, Gemini, pgvector, Neo4j, PubMed, Semantic Scholar, LLM-as-Judge

Analytics & Cloud

Power BI, Tableau, GCP, StarRocks, SQL Server, PostgreSQL, Google Spanner Graph-ready design

Selected Projects

A few systems that show how I think, build, and measure impact.

These projects are arranged around the kind of work I am targeting: hybrid data and AI engineering roles where product usefulness depends on both infrastructure discipline and intelligent workflow design.

Award-Winning GovTech Case

SRR — Agentic Case Processing System

Hong Kong · 2025–2026

Technical Lead · 99.4% contribution · Lingnan Cup 2nd Prize · Guangzhou TV segment · Patent filing in preparation · Public reference implementation

Three-stage repo evolution

01 Original baseline SRR-Case-Processing-System
02 Agentic system SRR-Agentic-Case-Processing-System
03 Public showcase srr-agentic-case-processing

80,430 lines delivered across two internal iterations before the public version.

View SRR-Project-TeamGlobal link / 海外链接

3 repository stages from original to public showcase

17 pluggable atomic capabilities

12/12 requirements delivered

TV-Featured Project Evidence SRR was shown in a Guangzhou TV segment covering the fourth Lingnan Cup. The project is presented here as both a working AI system and a public-facing competition case, connecting technical delivery with external recognition.

SRR product video thumbnail showing input parsing and task routing architecture

SO_PRD25 product video Background -> pain points -> solution -> future roadmap Open video

CIC AI Award preparation video thumbnail showing field maintenance work

CIC AI Award preparation Construction-sector pitch material built from the SRR workflow story Open video

Situation

Hong Kong public-service SRR handling relied on multi-channel materials, manual routing, historical lookup, and repeated field interpretation.

Task

Turn the process into a controllable case-processing system that could parse ICC 1823, TMO, and RCC inputs, extract A-Q fields, and draft traceable replies.

Action

Designed a seven-layer agentic architecture with 17 atomic capabilities, pgvector + RRF retrieval, three-tier evaluation, Best-of-N repair, and rollback logic.

Result

Delivered 12/12 requirements and 80,430 lines across internal iterations, then converted the work into a public reference repo, TV-covered case, award proof, and patent-prep narrative.

Code Public SRR reference implementation Open GitHub repository Global link / 海外链接 Media WeChat feature coverage Official article and team photos Competition Lingnan Cup case page Case topic and finalist listing Next Stage CIC AI Award preparation Construction industry competition track

SRR team presenting during the final defense session — Live Lingnan Cup final moment explaining workflow automation, case routing, and public-service value.

SRR seven-layer agentic system architecture diagram — Seven-layer agentic architecture with parsing, routing, retrieval, quality gates, and fallback logic.

SRR assistant extracting case fields and drafting replies — Interaction view showing extracted fields, similar-case handling, and draft-reply workflow.

Lingnan Cup second prize award photo — Lingnan Cup final: Second Prize for the Hong Kong Lingnan University team.

SRR team celebrating the Lingnan Cup second prize outside Lingnan Hall — Team celebration outside Lingnan Hall after the Second Prize result, with project title and Hong Kong Lingnan University team sign visible.

Close-up of the Lingnan Cup second prize trophy and certificate folder — Second Prize trophy close-up: physical award proof from the fourth Lingnan Cup case analysis final.

Lingnan Cup second prize certificate for the SRR case project — Second Prize certificate naming the SRR case: from paperless handling to automated civic-service workflow.

Award reflection from Lingnan Cup coverage — Award reflections highlight hands-on AI implementation, real business data, and civic-service problem solving.

Official Lingnan College webpage showing competition photo gallery without browser interface — Official website feature: cropped to the webpage itself, showing the SRR team in the Lingnan Cup photo gallery and recap.

Official competition topic listing naming the SRR project — Official topic listing: Hong Kong SRR civic-service processing case among finalist reports.

Python 3.11+, FastAPI, PostgreSQL 15, pgvector, OpenAI GPT-4o, Cloud Run, Cloud SQL, Docker

Agentic AI RAG Evaluation GovTech

Open SRR GitHubGlobal link / 海外链接 Official competition coverage

InsurTech Graph Intelligence

MCC FWA — Insurance Claims Fraud Graph Intelligence

Enterprise collaboration · 2026

Core Developer · GraphDB foundation · Claims adjudication workflow · Neo4j validation · Spanner Graph-ready design

Pipeline shape

01 Claim ingestion 38,659 medical claim JSON files
02 Property graph Claim, Event, Doctor, Hospital, Diagnosis, Receipt, BillingItem, BreakdownItem, AmountFeature
03 Review workflow Risk paths, medical necessity, policy checks, and case-level summaries

Graph first because FWA risk often appears in relationships, not a single field.

38,659 medical claim JSON files converted

1.61M graph nodes generated for entity-level investigation

2.57M edges connecting claim paths and evidence trails

Graph Pipeline

Claim JSON -> Property Graph -> Risk Paths -> Claims Review

The system turns raw claim records into portable nodes and edges, validates paths in Neo4j, and prepares the model for future Google Spanner Graph deployment.

Situation

Insurance FWA review faces high claim volume, sparse confirmed fraud labels, and risk signals that often hide across patients, providers, diagnoses, receipts, and amount patterns.

Task

Create a data foundation that lets reviewers inspect risk paths instead of reading isolated claim fields or opaque model scores.

Action

Converted 38,659 claim JSON files into a 1.61M-node / 2.57M-edge property graph, validated paths in Neo4j, and supported FastAPI async review with medical-necessity, policy, and rejection-code reasoning.

Result

The output shifts the review surface from approve/reject labels to explainable evidence trails: risk score, reason code, related entities, and claim-level summary that an assessor can audit.

Graph Schema 9 node families

Claim, event, provider, diagnosis, receipt, billing, breakdown, and amount-feature entities.

Validation Neo4j local POC

Single-claim path inspection helps explain why entities are linked in a review trail.

Deployment Path Spanner Graph-ready

Portable CSV extraction keeps the graph foundation independent from one graph database.

Public Landing Page MediConCen FWA project site Open live landing page

Redacted thumbnail of the MCC FWA claim graph preview — Redacted Screenshot **Neo4j graph path preview**

Redacted thumbnail of the MCC FWA claims processing dashboard — Redacted Screenshot **Claims processing workflow UI**

Python, FastAPI, JSON/CSV pipelines, Neo4j, Google Spanner Graph-ready design, LangChain/LangGraph, OpenRouter LLMs

Insurance FWA GraphDB Claims Adjudication Explainable Review Healthcare AI

Enterprise collaboration; private claim data and client artifacts are intentionally omitted.

Clinical AI Research System

GaitGPT — Clinical Gait Analysis Agent

Lingnan University · 2026

Architecture Lead · Manuscript in preparation · iCAN 2026 application · LU invention disclosure preparation

0.9223 overall benchmark accuracy

10.07s median response time

20 question benchmark suite

Paper Manuscript in preparation

Target venue options under review: Sensors, AMIA / MICCAI workshop tracks, or BMC Medical Informatics.

Competition iCAN 2026 application package

Prepared for Medicine & Health Care and IT & Data Technology categories, with a finals video asset ready.

Patent LU disclosure preparation

Invention-disclosure materials focus on evidence-bound clinical gait translation while keeping claim-level details private.

Situation

Gait systems produce dense sensor indicators, while clinicians often reason in symptom language such as limping, shuffling gait, or asymmetry.

Task

Build a privacy-first assistant that can answer structured gait-data questions and map clinical descriptions back to measurable gait indicators with evidence boundaries.

Action

Combined NONSD-Gait data, rule-template-first query handling, PubMed / Semantic Scholar retrieval, reverse translation, evidence scoring, and physics-constrained validation.

Result

Produced a 20-question benchmark with 0.9223 overall accuracy and 10.07s median response time, forming the basis for manuscript, iCAN, and LU invention-disclosure preparation.

Demo Video

67-second product flash showing the clinical gait analysis concept, interface direction, and research-to-clinic positioning.

Python, FastAPI, Next.js/React, SurrealDB, LangGraph, PubMed, Semantic Scholar, local/open AI providers

Digital Health Clinical AI Literature Retrieval Benchmarking Paper in Preparation iCAN 2026 Patent Prep

Open GaitGPT GitHubGlobal link / 海外链接

Enterprise Data Backbone

Multinational Retail B2B Data Platform

Multinational retail enterprise · 2023–2025

Data Engineer · B2B finance and operations reporting · Cloud warehouse modernization · SOX-aware delivery

Business data scope

01 B2B transaction foundation Orders, contracts, items, stores, customers, fulfillment, inventory, payment, margin, and approval fields
02 Reporting and governance Dashboard-ready ADS tables, data dictionary, SOX-aware release evidence, and production job monitoring
03 Cloud warehouse performance Five-layer ODS / DIM / DWD / DWS / ADS architecture with StarRocks acceleration

Enterprise value came from making finance and operations data explainable, reusable, and fast enough for daily decisions.

10TB+ warehouse workload migrated and restructured

3-10 min → 0.8s core analytical queries accelerated with StarRocks

-87.5% ETL runtime reduction after warehouse optimization

Situation

B2B finance and operations reporting depended on fragmented order, contract, billing, inventory, gift-card, fulfillment, and enterprise-account sources.

Task

Make daily decision data reusable and explainable across finance, operations, and dashboard teams while respecting SOX-style release evidence and access boundaries.

Action

Migrated 10TB+ workloads, rebuilt ODS / DIM / DWD / DWS / ADS layers, connected CRM / OMS / B2B / gift-card / warehouse / finance sources, and tuned Hive, SparkSQL, StarRocks, Seatunnel, and scheduling paths.

Result

Cut ETL runtime by 87.5%, reduced disk usage by 38%, improved query speed by 55%, and moved core StarRocks analytical queries from 3-10 minutes to an average 0.8s.

Redacted appreciation note for enterprise retail B2B delivery — Cross-border retail B2B delivery evidence: appreciation material kept as business-context proof.

Retail B2B dataflow diagram across source systems and reporting layers — Dataflow evidence across source systems, data processing, and reporting outputs.

Retail B2B technical architecture diagram — Technical architecture for B2B reporting and financial data operations.

Enterprise big-data capability overview with sensitive brand text masked by mosaic.

SQL, SparkSQL, Hive, StarRocks, Seatunnel, DolphinScheduler, Tencent Cloud, data lake, ADS reporting tables

Retail B2B Financial Data SOX-aware Workflows Cloud Warehouse StarRocks

Commercial Banking Data

Commercial Banking Data Warehouse

Commercial bank data middle-platform · 2019–2021

Data Engineer · Model-layer maintenance · SQL delivery · Banking stakeholder communication

Banking data discipline

01 Business object understanding Corporate credit, limits, loan contracts, loan disbursement, repayment, and settlement accounts
02 Platform architecture Channel, front-office, product-service, integration, warehouse, and management-decision layers
03 Warehouse flow Source systems, data bus, SDATA / SHDATA, domain layers, marts, and downstream applications

Banking value means definitions are stable, lineage is traceable, and downstream reporting can be trusted.

100+ transaction systems integrated in enterprise data work

5 layers warehouse modeling discipline carried into later delivery

50% cross-team model reuse improvement in warehouse architecture work

Situation

Commercial-bank reporting depends on strict definitions for credit, limits, contracts, disbursement, repayment, settlement accounts, and downstream product indicators.

Task

Support financial, regulatory, and management reporting by keeping mart logic aligned with bank definitions, source lineage, and stakeholder confirmation.

Action

Maintained model-layer SQL, mapped source systems through data-bus, SDATA / SHDATA, domain, mart, and application layers, and handled requirement clarification directly with banking stakeholders.

Result

Built bank-grade delivery habits: clear semantics first, controlled change second, traceable issue handling, and stable reporting outputs before presentation polish.

Commercial banking loan lifecycle business object map — Loan lifecycle business-object map: from application and approval to disbursement, repayment, post-loan management, and banking data foundation.

Commercial banking channel and system architecture diagram — Banking channel and system architecture: channel portal, front-office integration, product service layer, data integration, and decision layers.

Commercial banking warehouse data flow from source systems to marts and applications — Banking warehouse flow: source systems, data bus, SDATA / SHDATA, business domains, marts, and downstream applications.

SQL, Impala SQL, Hive warehouse layers, Informatica PowerCenter, banking model maintenance, downstream reporting support

Commercial Banking Data Warehouse Model Layers Lineage Reporting

Additional Work

Enterprise delivery that reinforces the same pattern.

Banking & Retail Data Warehouse Delivery

Delivered data modeling, cloud migration, and warehouse optimization across banking and retail contexts, including commercial bank data middle-platform and multinational retail enterprise data platform environments.

Resume

The full experience, condensed into one document.

If you want the complete timeline, project detail, and technical background, the resume includes the full version. It is designed to point back to this portfolio, where SRR, GaitGPT, and MCC FWA show the system proof, architecture decisions, and project narrative.

Download Resume (EN PDF) Download Resume (CN PDF)

Snapshot

2019–2025 Enterprise data engineering across banking, retail, and cloud data platforms.

2025–2026 SRR, GaitGPT, and MCC FWA: applied AI systems with evidence, evaluation, and deployment stories.

Core promise Strong foundations, measurable outcomes, and systems that can be maintained.

Contact

If you need someone who can move between pipelines, platforms, and AI behavior, let's talk.

I am especially interested in roles where data reliability and intelligent application design need to work together, not compete with each other.

Email

weijianlucky@gmail.com

Phone

HK: +852 84965467
Mainland: +86 131 6809 0613

GitHub

github.com/February13Global link / 海外链接

Best fit

Data Engineer, AI Engineer, Platform Engineer, or hybrid data + AI product roles.