Great question for why we do not do scaling independently and separately for the test data. We know the purpose of training a model is to deploy the model to score the future, unknown data in production. We also know if the training data are scaled, the future data should also be scaled. In production we do not know what the mean and standard deviation of the incoming data are. They suppose to behave the same as the training data. So we can only use the mean and the standard deviation of the training data to apply to the test data. Let me use an extreme case to make this point. Suppose the future data are very different from the training data, the mean and the standard deviation of the future data will be different from those of the training data. This data drifting is serious and should trigger a warning to the model users. I hope this helps.