https://blog.infostrux.com/llm-101-reinforcement-learning-from-human-feedback-rlhf-with-large-language-models-part-3-f7d158eda28f