How to prepare for the GCP Professional Machine Learning Engineer exam

Courses review, study tips, and how I did it

Gabriel Cassimiro
Towards Data Science
8 min readJan 10, 2022

--

So I decided to take the GCP Professional Machine Learning Engineering (PMLE) test but I had only 2 months to do it in order to attain enough certifications to my company be a GCP partner. I knew this was going to be a hard challenge but I jumped at it anyway.

In this post, I will share what helped me study and prepare for the test and also stuff you should not waste your time with. Also reading feedbacks like this one helps get different perspectives about the exam, so I will start with this great repository that has posts about a lot of GCP and other certifications. I recommend reading it before beginning your studies.

As is usual in this kind of post, this is my certification:

Test Feedback

The certification exam for Professional Machine Learning Engineer is considered one of the hardest GCP certifications because of two main reasons: The content is very extensive and most questions have more than one correct answer but only one best possible answer.

The test covers how to solve real-world business problems using Machine Learning techniques and how to use the best available solutions (offered by GCP obviously) in the correct context.

Knowing what the test covers is the most important part of the study because with this information you can focus on what matters when watching courses. So the first thing you should do is read carefully the official GCP certification site. There you’ll find information on what is covered on the exam, rules, where to take your exam and other important stuff.

Another great starting point is to do these sample questions provided by Google to see without any study how you would perform in the test. From there you can focus on studying and paying more attention to what you don't know.

Previous experience recommendation

The official exam guide doesn't demand any prerequisites however, it recommends:

3+ years of industry experience including 1 or more years designing and managing solutions using Google Cloud.

That is far from my case. By the time I took the exam, I had almost one year of cloud experience (AWS) and less than one month of experience with GCP. So I will give my opinion here about that recommendation:

Years don't dictate how much you know about something, but having a meaningful experience does. In my opinion, if you have some experience with any cloud and understand the basics of the concept and products you're good to go.

Being a machine learning engineer requires you to solve problems using ML models, serving data to that model, and creating the means to generate value with that solution consistently.

In terms of machine learning, you will have to study a lot less if you have experience building models. If you know how to differentiate problems that need classification, recommendation, or regression models and know which cases you need a DNN or a basic Linear Regression, you will be able to focus your studies on the serving data to your model and predictions to users using GCP solutions part.

Wrapping up the previous experience part:

  • You don't need 3+ years of experience, but having some experience with any cloud provider will save you time studying.
  • Having experience with machine learning is needed but just enough that you're able to create solutions using ML to business problems.
  • Hands-on experience using GCP is possible to obtain with some courses provided by Google, and is enough for you to take the test.

How to Study

The main source of knowledge for this exam is a group of courses designed by Google and available on Coursera. However, not all courses have the same relevance regarding the exam content. That is why I will rank them and comment on each one below.

First, there are some techniques that I used for my preparation that are worth mentioning before starting on the courses. If you only care about the courses feel free to skip ahead, but this helped me a lot to absorb more of the relevant stuff.

The main thing you have to have on your head while doing the courses are:

How to use GCP solutions and ML models to solve real business problems

You need to know all GCP's ML and Data solutions, what they do, what are their strengths and weaknesses, and the use cases for each one.

Remember: A lot of problems can be solved in different ways with a good result however, the test will ask you always for the best solution.

So I have two methods that helped me learn these characteristics while watching the courses:

Flashcards

I used flashcards to remember what each solution does, its characteristics, and use cases. Then I tried to study them a couple of times until I could explain all without looking at the answer.

This is a very rich technique because you write in the flashcard a brief explanation, exercising your ability to summarise. Then you try to do them with intervals of days, exercising your long-term memory, and lastly, try to explain it to someone to really see if you learned that concept.

I used and recommend using Anki, a free flashcard app.

Mindmaps

Another great method to organize the main concepts is creating mindmaps. This way you can easily link products and solutions with business problems and advantages.

Particularly I used mind meister, but there are a lot of great solutions for free.

Courses

Finally, we’ll take a look at the courses offered by Google and their content.

Preparing for Google Cloud Machine Learning Engineer Professional Certificate

This is the main course for preparing and is of the utmost importance to watch them with your full attention.

It starts with some basics of cloud in Google Cloud Big Data and Machine Learning Fundamentals that you can skip if you have already worked with data solutions in GCP otherwise, you should do it because it gives a first view of the GCP data solutions. This is also one of the only courses of the bunch that shows data engineering solutions, so if you do not know them, just do it.

The second and third courses show some ML solutions and APIs offered by GCP. It is very important to remember what they do and their use cases.

The fifth, sixth, and seventh courses will dive deeper into ML solutions, Feature Engineering, and modeling products.

The last three courses will cover how to deploy and create effective ML pipelines with all the best practices. In my opinion, these are the most important courses (Production Machine Learning Systems, MLOps Fundamentals, and ML Pipelines on Google Cloud).

All of these courses offer Labs to implement the solutions in a real GCP environment. They are a great way of learning how things work and how to set them up.

Some Labs will have big Jupyter Notebooks with tons of code. In these situations, my tip is to focus on what is the code doing and don't worry about understanding and learning how to code it yourself. If in the future you need to implement the code yourself, just go to the open GitHub repository provided by Google and remember the syntax.

Wrapping up courses:

  • You should use Flashcards, Mindmaps, or other techniques to remember a lot of details about solutions.
  • The main focus of the test is MLOps and ML pipelines, however, do not discard data engineering knowledge and Machine Learning model-specific questions.
  • Do not focus on the code syntax, focus on what it does and its benefits.

Mock Tests

Finally, you HAVE to do mock tests. This is crucial to check your knowledge and to learn how to read the questions.

Answering questions

This last part is what defines if you'll pass or not. The test is huge, with 60 questions and 120 minutes to do them, meaning you have 2 minutes per question. You have to read the questions looking for characteristics of the problem that will help you find the right solution. I will do an example here:

You work for a public transportation company and need to build a model to estimate delay times for multiple transportation routes. Predictions are served directly to users in an app in real-time. Because different seasons and population increases impact the data relevance, you will retrain the model every month. You want to follow Google-recommended best practices. How should you configure the end-to-end architecture of the predictive model?

  • A. Configure Kubeflow Pipelines to schedule your multi-step workflow from training to deploying your model.
  • B. Use a model trained and deployed on BigQuery ML, and trigger retraining with the scheduled query feature in BigQuery.
  • C. Write a Cloud Functions script that launches a training and deploying job on AI Platform that is triggered by Cloud Scheduler.
  • D. Use Cloud Composer to programmatically schedule a Dataflow job that executes the workflow from training to deploying your model.

In bold are the most relevant parts of the question. You have to pay attention to details like batch ou real-time, retraining, deploying, and the architecture. The last one is one of the most important because often they'll ask for No-code solutions, serverless, or even complete control over the infrastructure. This will define what is the best offering to resolve that specific request.

In this case, Kubeflow is the only answer with the ability to do end-to-end with deploying and retraining capabilities. So the answer is A.

Another good tip is after finding the most relevant information in the question, eliminate answers that are clearly wrong, so you can have fewer options to compare.

Mock tests links

I did a couple of mock tests, but they are not perfect. There are a lot of wrong answers in all of them, but here is the link and comment on each one:

Exam Topics: This was the best mock test I did. It does not have the correct answers given by the website however, all questions have a discussion where people present arguments for each possibility. This was a great source of new knowledge and helped me deeply.

Google Sample questions: Now that you finished studying, you should revisit the first sample questions that you did at the beginning of your studies.

There are other paid preparation exams, but I cannot review them because I only did the free ones. These sites usually offer a couple of free sample questions, but the ones I did had weird answers which I did not agree with. So do them at your own risk.

If I had to do it again I would pay for the rest of the Exam Topics questions and focus only on the discussion when checking the correct answer.

Thanks for reading and good luck on your journey to become a GCP Certified Professional Machine Learning Engineer!

If you want to support my work you can Buy Me a Coffe:

--

--