Overview

Last week, we focused on common machine learning problems and troubleshooting techniques, such as regression and classification, overfitting, underfitting, and managing high-dimensional data. We also discussed the significance of good data, the training process, and model searching. 

This week, we delve into machine learning implementations, focusing on when to use and when not to use machine learning in various scenarios. We discuss the importance of having sufficient and meaningful data, as well as the need for a well-defined task or hypothesis. We also revisit the essential ingredients for machine learning, such as the task, experience, model, loss function, and optimizer. Furthermore, we explore the role of various libraries in the implementation process, and how they contribute to different stages of a machine learning project.

When & When not to use Machine Learning

When to use Machine Learning

When to use machine learning largely depends on the nature of the task or hypothesis at hand. Machine learning is best suited for scenarios that require exploration or modeling relationships between inputs and outputs, or among the inputs themselves. It is also ideal when predictions are needed based on historical or current data. However, it is crucial to have sufficient and high-quality data to achieve accurate results. By meeting these criteria, you can successfully apply machine learning and find a suitable model to solve your problem.

When not to use machine learning

When not to use machine learning is equally important to consider. Situations where data is scarce or limited, such as in the case of young companies with minimal data, might not be suitable for machine learning. Additionally, if the data is not meaningful or lacks predictive value, like an address book containing only names and email addresses, machine learning might not be the best approach. In cases where a deterministic system can easily hard-code a solution, automating with machine learning may be unnecessary and inefficient.

Carefully evaluating these factors will help ensure that machine learning is only applied when it is the most effective and appropriate solution.

Key Takeaways:

When to use:

  • Task or hypothesis requires exploration or modeling relationships
  • Predictions needed from historical or current data
  • Sufficient and high-quality data is available
  • Successful application leads to a suitable model

When not to use:

  • Data is scarce or limited
  • Data is not meaningful or lacks predictive value
  • Deterministic system can easily hard-code a solution
  • Machine learning is unnecessary or inefficient in certain situations

It all comes down to data:

From the previous lesson, recall that you need some ingredients:

  • The task/hypothesis (i.e. emails are spam or not spam)
  • The experience (i.e. supervised or unsupervised)
  • The model (i.e. a neural network or an SVM)
    o And related to the model, its hyperparameters (you select how many layers your neural net has, for ex.)
  • The loss function (i.e. mean squared error if regression, cross entropy loss if classification)
  • The optimizer (i.e. gradient descent, AdaGrad, etc.)

Writing ML Code

When it comes to writing machine learning code, Python is the go-to language for many professionals. Its simplicity, clean syntax, and ease of readability make it an ideal choice for rapid prototyping. Furthermore, Python offers a rich ecosystem of libraries and packages, which are often tailored to support hardware acceleration like GPUs. Although the language itself may not be the fastest, these libraries are typically written in more efficient languages like C, ensuring optimal performance. In the realm of machine learning, prototyping is key, and Python’s speed and flexibility make it the perfect choice for developers in this ever-evolving field.

Key Takeaways:

  • The go-to language for ML is Python
  • Simple, clean, very easy to read & write
  • Very rich in libraries/packages (these are statistical and mathematical in nature, so hardware support like GPUs is most important)
  • The language itself is not the fastest, but the libraries are written in faster languages like C
  • ML is all about prototyping – for this, you need a fast and easy language

Machine Learning Libraries

Machine learning libraries play a vital role in streamlining the various stages of the ML pipeline. Python, in particular, offers a number of libraries designed to assist with each step of the process. The ML pipeline can be loosely described as a sequence of stages, including data acquisition and preprocessing, model selection, training/validation/testing, and ultimately deployment to production. For each of these stages, Python provides at least one dedicated library, ensuring that developers have the necessary tools and resources to efficiently and effectively execute their machine learning projects.

Libraries and their Purposes

  • Data cleaning and preparation – Pandas and Numpy
  • Preparing visualizations (charts, plots, etc.) – Matplotlib
  • Splitting data into train/validation/test – Scikit-Learn
  • Model definition – PyTorch, Keras, Tensorflow are the most popular
  • Optimizer – Scikit-Learn

Note:

However, some libraries like PyTorch now combine the steps and offer almost the whole pipeline (data handling is still best done in Pandas* and Numpy).

*In addition, Pandas includes rich functionalities that allow you to write and read SQL queries, interact with databases, and more.

Conclusion

This week’s lesson provided valuable insights into the implementation of machine learning, emphasizing the importance of knowing when to use and when not to use machine learning. We explored the significance of having a well-defined task, sufficient and meaningful data, and the essential ingredients for a successful machine learning project. Additionally, we discussed the key role of Python and its rich ecosystem of libraries in the various stages of the ML pipeline, streamlining the process and making it more accessible for developers.

As we move forward to next week’s topic, we will delve deeper into the practical aspects of machine learning in real-world scenarios. We will explore ML pipelines, which are crucial for managing the end-to-end process of developing, deploying, and maintaining machine learning models. By understanding how machine learning is applied in real-world situations, you will be better equipped to tackle complex problems and create solutions that deliver meaningful impact. So, join us next week as we continue our journey through the fascinating world of machine learning and its practical applications!