Prepare for your PyTorch interview with this comprehensive guide featuring 30 essential questions and answers. Whether you’re a fresher, have 1-3 years of experience, or are a seasoned professional with 3-6 years, these questions progress from basic concepts to advanced scenarios, helping you build confidence in PyTorch fundamentals, practical implementation, and real-world applications at companies like Atlassian, Adobe, and Zoho.
Basic PyTorch Interview Questions (Freshers & 1-3 Years Experience)
1. What are Tensors in PyTorch?
Tensors in PyTorch are multi-dimensional arrays similar to NumPy arrays but with GPU acceleration support and autograd functionality for gradient computation.[1][2]
2. How do you create a tensor in PyTorch?
You can create a tensor using torch.tensor() from data, or factory functions like torch.zeros(), torch.ones(), or torch.rand().
import torch
tensor = torch.tensor([[1, 2], [3, 4]])
zeros = torch.zeros(2, 3)
[1][2]
3. How do you check if a GPU is available and move a tensor to GPU in PyTorch?
Use torch.cuda.is_available() to check GPU availability, then move tensors with tensor.to('cuda').
if torch.cuda.is_available():
device = torch.device('cuda')
tensor = torch.rand(2, 2).to(device)
[1][2]
4. What is the purpose of torch.no_grad() in PyTorch?
torch.no_grad() disables gradient computation, useful during inference to save memory and speed up computation.[2][5]
5. Explain the role of the forward() method in a PyTorch nn.Module.
The forward() method defines how input data flows through the network layers during prediction.[2][4]
6. How do you access model parameters in PyTorch?
Use model.parameters() for all parameters or model.named_parameters() for named access.
for name, param in model.named_parameters():
print(name, param.shape)
[5]
7. What does optimizer.zero_grad() do in PyTorch?
It clears old gradients from previous iterations to prevent accumulation before backpropagation.[2][5]
8. How do you define a simple neural network class in PyTorch?
Inherit from nn.Module, define layers in __init__, and computation in forward().
class Net(nn.Module):
def __init__(self):
super().__init__()
self.fc = nn.Linear(10, 1)
def forward(self, x):
return self.fc(x)
[2]
9. What is a DataLoader in PyTorch?
DataLoader handles batching, shuffling, and parallel loading of data for efficient training.[3]
10. How do you save and load a PyTorch model checkpoint?
Use torch.save() for checkpoints and torch.load() to restore state dicts.
torch.save({'model': model.state_dict()}, 'checkpoint.pth')
model.load_state_dict(torch.load('checkpoint.pth'))
[6]
Intermediate PyTorch Interview Questions (1-3 Years Experience)
11. What is the typical structure of a PyTorch training loop?
It includes zero_grad, forward pass, loss computation, backward pass, and optimizer step.
for inputs, labels in loader:
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
[2][5]
12. How do you implement early stopping in PyTorch training?
Track validation loss with a patience counter; stop if no improvement after specified epochs.[1]
13. Explain requires_grad in PyTorch tensors.
When True, the tensor tracks operations for gradient computation during backpropagation.[5]
14. How do you freeze layers in a pre-trained PyTorch model?
Set requires_grad = False for parameters of specific layers.
for param in model.layer.parameters():
param.requires_grad = False
[6]
15. What are transforms in PyTorch’s torchvision?
Transforms are preprocessing functions like resizing or normalization applied to datasets.[4]
16. How do you move an entire model to GPU in PyTorch?
Call model.to(device) after defining the device as ‘cuda’ or ‘cpu’.[2]
17. What is model.eval() used for in PyTorch?
It sets the model to evaluation mode, disabling dropout and batch normalization updates.[2]
18. How do you implement a custom loss function in PyTorch?
Define a class inheriting from nn.Module with a forward() method returning the loss.[3]
19. What is the difference between torch.save(model.state_dict()) and torch.save(model)?
state_dict() saves only parameters; saving the full model includes architecture but is less portable.[6]
20. How do you use learning rate schedulers in PyTorch?
Initialize like lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1) and call scheduler.step() after epochs.[2]
Advanced PyTorch Interview Questions (3-6 Years Experience)
21. Explain TorchScript and its tracing vs scripting methods.
TorchScript optimizes models for production: tracing records execution on dummy input; scripting compiles Python code statically.[3]
22. How do you handle padded sequences in RNNs using PyTorch?
Use pack_padded_sequence() before RNN and pad_packed_sequence() after.
packed = pack_padded_sequence(seq, lengths, batch_first=True)
output, _ = rnn(packed)
[5]
23. What is Data Parallelism in PyTorch?
nn.DataParallel splits batches across multiple GPUs, gathering gradients for updates.[7]
24. How would you optimize GPU memory usage during training at Atlassian?
Use mixed precision with torch.cuda.amp, gradient accumulation, and smaller batches to handle large models efficiently.[3]
25. Describe deploying a PyTorch model for inference at Adobe.
Convert to TorchScript, save with torch.jit.trace(), and serve via optimized runtime for low-latency predictions.[3]
26. How do you implement gradient clipping in PyTorch?
Use torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) after backward pass.[5]
27. Explain custom layers in PyTorch with an example for Zoho’s recommendation system.
Define nn.Parameter for trainable weights in a subclass of nn.Module.
class CustomLayer(nn.Module):
def __init__(self, in_f, out_f):
super().__init__()
self.weight = nn.Parameter(torch.randn(out_f, in_f))
[2]
28. What strategies address overfitting in large PyTorch models?
Apply dropout, weight decay, data augmentation, and early stopping based on validation metrics.[3]
29. How do you profile PyTorch code for performance bottlenecks?
Use torch.profiler to record CPU/GPU usage, memory, and identify slow operations.[1]
30. In a Swiggy scenario, how would you fine-tune a vision model for food classification?
Load pre-trained model, replace classifier head, freeze early layers, and train with task-specific data using differential learning rates.[6]