Machine learning (ML) models thrive on data. Use good data, and they’ll produce great results; use bad data, and well, you know what comes next. This primacy of data means that you need to pay extra close attention to the start of your ML lifecycle. Long before you start building your algorithms, you need to make sure that the data you’ll use to train and test them will be enough to give you the best results. However, the question is whether the data you already have — your internally collected dataset — is actually the data you need and if it’s enough to give you the insights you’re looking for.