How to customize distributed training when using the TensorFlow Estimator API
https://towardsdatascience.com/how-to-configure-the-train-and-evaluate-loop-of-the-tensorflow-estimator-api-45c470f6f8d
TensorFlow’s Estimator API provides an easy, high-level API to train machine learning models. You can use the train(), evaluate() or predict() methods on a Estimator. However, most often, training is carried out in a loop, in a distributed way, with evaluation done periodically during the training process. To do this, you will use the train_and_evaluate loop:
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)In this article, I will talk about what kinds of things you might want to specify in the train_spec and the eval_spec. Taken together, these options allow train_and_evaluate to provide a powerful, customizable way to do distributed training.