Back to course
Lesson 76Quant and ML research labAdvanced205 min

Machine-learning validation: leakage, non-stationarity, and walk-forward controls

Build a validation framework that attacks ML trading models before they reach a demo EA: leakage checks, walk-forward windows, regime testing, and cost stress.

Lesson outcomes

  • Identify common data-leakage mistakes in trading models.
  • Design walk-forward validation for non-stationary markets.
  • Reject models that cannot survive costs, regime changes, and feature ablation.

Workshop lab

Complete the demo, notebook, platform, or code task before treating the lesson as finished.

Evidence pack

Keep screenshots, exports, logs, calculations, or code versions in a dated learning folder.

Pass standard

You should be able to explain the failure modes, show your work, and name the stop rule.

Free education, not signals. This lesson is part of EarnSouthAfrica's free forex course. It does not tell you what to buy or sell, it does not promise income, and it should be practised on a demo account before any real-money decision.

Markets are non-stationary: relationships change, volatility regimes shift, spreads widen, and a pattern that existed in one period may disappear. Machine learning does not remove that. It can hide it behind impressive metrics if validation is weak.

This lesson teaches learners to attack their own model before anyone else can sell them one. The paid-course quality standard is not a high accuracy number; it is a validation dossier that explains why the result might be wrong.

What you should be able to do after this lesson

  • Identify common data-leakage mistakes in trading models.
  • Design walk-forward validation for non-stationary markets.
  • Reject models that cannot survive costs, regime changes, and feature ablation.

Leakage audit

  • Check that every feature was available at the decision time.
  • Check that labels do not leak future highs, lows, closes, or trade outcomes into inputs.
  • Separate scaling/normalization fit on training data from validation and test windows.
  • Freeze feature engineering before evaluating the reserved test period.

Walk-forward design

WindowPurpose
TrainingFit model and preprocessing without touching future data.
ValidationTune model choices and reject unstable ideas.
Walk-forward testEvaluate the locked process on later data.
Forward demoObserve predictions in platform conditions without live risk.

Model rejection tests

Stress the model with wider spread, missing bars, different symbols, session splits, removed best trades, shifted labels, and feature ablation. If one tiny feature or one short period explains most of the performance, the model belongs in the rejected folder.

Academy-grade study plan

Machine learning in trading is not a shortcut around market uncertainty. The paid-course standard is to prove data integrity, prevent leakage, respect non-stationarity, validate outside the fitting window, and keep ML outputs behind strict demo-only risk boundaries.

Course elementWhat you must produce
Primary artifactModel research dossier
Lesson focusMachine-learning validation: leakage, non-stationarity, and walk-forward controls
Working environmentDemo account, notebook, exported platform data, or local code sandbox. Never live funds for first practice.
Completion standardYou can explain the concept, reproduce the exercise, identify failure modes, and show evidence without relying on a seller's claims.

Instructor workflow

Use this workflow as if an instructor were marking the lesson. The important question is not whether the topic sounds familiar. The question is whether your notes, screenshots, calculations, logs, or code prove that you can apply machine-learning validation: leakage, non-stationarity, and walk-forward controls under controlled conditions.

  • Define the prediction target, horizon, features, labels, costs, and no-trade conditions before any model run.
  • Build a data lineage record from MT5 rates, ticks, custom symbols, calendar data, feature engineering, training window, validation window, and deployment boundary.
  • Use walk-forward validation, cost stress, feature-ablation tests, and regime checks before trusting any metric.
  • Treat ONNX integration as an engineering interface for research models, not evidence that the model has an edge.

Worked case study: A model looks brilliant because it leaked the future

A learner trains a model that predicts the next candle with impressive accuracy. The audit discovers that features used values only known after the prediction point, spreads were ignored, and the validation data overlapped with the tuning process. The professional response is to discard the result, rebuild the dataset, and require walk-forward evidence before any demo automation.

After reading the scenario, write the decision you would make before checking the suggested workflow above. Then compare your decision with the operating model. The gap between those two answers is the part of the lesson that deserves another demo repetition.

Professional template

Complete this template in your own notebook. A paid course would normally hide this kind of operating document behind worksheets; here it is part of the free lesson.

FieldStandard
DatasetSymbol, timeframe, broker, date range, tick/rate source, timezone, missing data, and cleaning checks.
Feature setInputs available at decision time, transformations, lookback windows, and leakage controls.
ValidationTrain/validation/test split, walk-forward windows, costs, slippage, feature ablation, and rejection threshold.
Deployment boundaryObserve-only, demo-only, max risk, kill switch, logging, and retraining/version rule.

Failure-mode lab

Paid courses often sell confidence. A serious course teaches you how the idea breaks. Before continuing, test the failure modes below on demo, paper, or code review. If you cannot describe the failure, you are not ready to trust the concept.

  • Using future candle values, final high/low, revised labels, or post-trade information as model inputs.
  • Optimizing features until one historical period looks good while out-of-sample behaviour collapses.
  • Ignoring spread, commission, slippage, swap, rollover, and execution delay.
  • Moving a model from notebook to EA before the data, validation, and risk boundaries are documented.

Evidence pack and pass standard

Do not mark this lesson complete because you read it. Mark it complete only when you can show the evidence below. Keep the files in a dated folder so your learning history survives platform updates, memory gaps, and sales pressure.

  • A one-page note explaining machine-learning validation: leakage, non-stationarity, and walk-forward controls without sales language or copied definitions.
  • A screenshot, export, calculation, log, or code file that proves the practical work was completed on demo.
  • A written stop rule that says when this topic must not be used with real money.
  • A research notebook or report with data lineage, leakage checks, walk-forward results, and rejected models.
  • An observe-only model log showing predictions, feature values, decision boundaries, and no-trade reasons.

Assessment rubric

LevelWhat it looks like
Not readyYou can repeat the vocabulary but cannot complete the demo task, calculate the risk, explain the failure mode, or show evidence.
Course passYou can complete the practical task on demo, explain the decision rules, show evidence, and name the conditions where the idea must not be used.
Strong passYou can teach the concept to someone else, find edge cases, document a rejected example, and improve the template without weakening risk controls.

Advanced homework

  • Break a promising model by shifting labels one bar forward and proving why the original result was invalid.
  • Run feature ablation to see whether the model depends on one unstable input.
  • Build an ONNX smoke test that logs model output on demo without sending any orders.

Practical drill

Do this lesson as a controlled exercise, not as a reason to trade live. Open a demo account or notebook, write the lesson title, and record what you changed, clicked, calculated, or checked. If the lesson includes code, compile it only in a demo environment and keep the original version unchanged so you can compare edits safely.

  • Write a one-paragraph explanation of machine-learning validation: leakage, non-stationarity, and walk-forward controls in your own words.
  • Take one screenshot or note that proves you completed the platform, maths, research, or code task.
  • Record one risk rule that would stop you from using this idea with real money.
  • If anything feels unclear, repeat the lesson before moving to the next module.

How scammers misuse this topic

Scammers often take real concepts and wrap them in urgency. They may use platform jargon, bot screenshots, copied profit charts, or official-sounding language to make a paid offer feel safe. A real concept is not the same as a safe offer. Before paying anyone, ask whether you can verify the provider, reproduce the calculation, test the claim on demo, understand the risk, and walk away without pressure.

Checkpoint before continuing

  • You can name at least five leakage paths in trading data.
  • Your validation windows are defined before reading the results.
  • Your model report includes rejection tests, not only positive metrics.

Official references

These lessons are written as free education. When platform features or rules matter, verify against the official source before using real money.

Risk note: leveraged forex and contracts for difference can lose money quickly. EarnSouthAfrica is an educational publisher, not a broker, adviser, signal provider, or money manager.

Keep exploring

Read the latest guides, take the side-hustle quiz, or contact the editorial desk if you spot a correction.