Top 30 PyTorch Interview Questions and Answers for All Levels

Prepare for Your PyTorch Interview: Basic to Advanced Questions

This comprehensive guide covers 30 essential PyTorch interview questions with detailed answers, organized from basic to advanced levels. Ideal for freshers, candidates with 1-3 years of experience, and professionals with 3-6 years in deep learning. Master tensors, models, training loops, and deployment techniques used at companies like Amazon, Salesforce, and Atlassian.

Basic PyTorch Interview Questions (1-10)

1. What are Tensors in PyTorch?

Tensors in PyTorch are multi-dimensional arrays similar to NumPy arrays but with GPU acceleration support and autograd functionality for gradient computation. They are the fundamental data structure for all PyTorch operations[1][2].

2. How do you create a tensor in PyTorch?

You can create tensors using torch.tensor(), torch.zeros(), torch.ones(), or torch.rand(). Here’s an example:

import torch
tensor = torch.tensor([[1, 2], [3, 4]])
zeros = torch.zeros(2, 3)
rand_tensor = torch.rand(2, 2)

This creates a 2×2 tensor from a list, a 2×3 zero tensor, and a 2×2 random tensor[2].

3. How do you check if a GPU is available and move a tensor to GPU in PyTorch?

Use torch.cuda.is_available() to check GPU availability, then move tensors using the to() method:

import torch
if torch.cuda.is_available():
    device = torch.device("cuda")
    tensor = torch.rand(2, 2).to(device)
    print(tensor)

This code moves the tensor to GPU if available, otherwise stays on CPU[1][2].

4. What is the difference between `torch.tensor()` and `torch.Tensor()`?

torch.tensor() creates a tensor from input data and preserves the input dtype, while torch.Tensor() (functional constructor) always creates a default float32 tensor regardless of input type[2].

5. How do you perform basic tensor operations in PyTorch?

PyTorch supports element-wise addition, multiplication, matrix multiplication, and more:

import torch
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])
sum_ab = a + b  # [5, 7, 9]
matmul = torch.matmul(a, b)

These operations are GPU-accelerated when tensors are on the same device[2].

6. What is Autograd in PyTorch?

Autograd is PyTorch’s automatic differentiation engine that tracks operations on tensors with requires_grad=True and computes gradients during backpropagation[1][2].

7. How do you enable gradient tracking for a tensor?

Set requires_grad=True when creating the tensor or use tensor.requires_grad_():

x = torch.tensor([1.0, 2.0], requires_grad=True)
y = x**2
y.backward()
print(x.grad)  # tensor([2., 4.])

[2]

8. What is `nn.Module` in PyTorch?

nn.Module is the base class for all neural network modules. Custom models inherit from it and implement the forward() method to define computation[1][2].

9. How do you create a simple neural network in PyTorch?

import torch.nn as nn
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 5)
        self.fc2 = nn.Linear(5, 2)
    
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        return self.fc2(x)

This creates a two-layer network with ReLU activation[2].

10. What are common activation functions in PyTorch?

PyTorch provides ReLU (F.relu()), Sigmoid (torch.sigmoid()), Tanh (torch.tanh()), and LeakyReLU (nn.LeakyReLU()) for introducing non-linearity[5].

Intermediate PyTorch Interview Questions (11-20)

11. How do you implement a training loop in PyTorch?

A standard training loop includes zero_grad, forward pass, loss computation, backward pass, and parameter update:

for inputs, targets in data_loader:
    optimizer.zero_grad()
    outputs = model(inputs)
    loss = criterion(outputs, targets)
    loss.backward()
    optimizer.step()

[1][2]

12. What is the purpose of `optimizer.zero_grad()`?

optimizer.zero_grad() clears old gradients from previous iterations to prevent gradient accumulation across batches[2].

13. How do you move an entire model to GPU?

Use the to(device) method on the model after defining the device:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = SimpleNet().to(device)

[1][2]

14. What is a DataLoader in PyTorch?

DataLoader provides efficient data loading with batching, shuffling, and parallel processing using multiple workers[3].

15. How do you implement early stopping in PyTorch?

Track validation loss with a patience counter and stop training if no improvement occurs for specified epochs:

patience = 5
best_loss = float('inf')
counter = 0
if val_loss < best_loss:
    best_loss = val_loss
    counter = 0
else:
    counter += 1
if counter >= patience:
    break

[1]

16. How do you save and load a PyTorch model checkpoint?

# Save checkpoint
torch.save({
    'epoch': epoch,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'loss': loss,
}, 'checkpoint.pth')

# Load checkpoint
checkpoint = torch.load('checkpoint.pth')
model.load_state_dict(checkpoint['model_state_dict'])

[5]

17. What is a custom layer in PyTorch?

A custom layer inherits from nn.Module, defines parameters in __init__, and computation logic in forward()[2].

18. How do you freeze layers in a pre-trained model?

model = torchvision.models.resnet50(pretrained=True)
for param in model.parameters():
    param.requires_grad = False
model.fc = nn.Linear(model.fc.in_features, num_classes)

This freezes all layers except the final classification layer[5].

19. What are transforms in PyTorch?

Transforms are preprocessing operations like resizing, normalization, and augmentation applied to image datasets using torchvision.transforms[4].

20. How do you implement a custom loss function in PyTorch?

class CustomLoss(nn.Module):
    def __init__(self):
        super().__init__()
    
    def forward(self, predictions, targets):
        return torch.mean((predictions - targets)**2 + 0.1 * torch.abs(predictions - targets))

This creates a robust loss combining MSE and MAE[7].

Advanced PyTorch Interview Questions (21-30)

21. What is TorchScript and how is it used for deployment?

TorchScript converts PyTorch models into serializable, optimizable representations using tracing or scripting for production deployment without Python runtime[3][7].

22. Explain Data Parallelism in PyTorch.

Data Parallelism (nn.DataParallel) splits batches across multiple GPUs, computing gradients in parallel and synchronizing them on the primary GPU[6].

23. How do you implement GAN training in PyTorch?

Create generator and discriminator networks with separate optimizers. Alternate training: train discriminator on real and fake data, then train generator to fool discriminator[5].

24. What are the challenges of large-scale PyTorch training at companies like Zoho?

Challenges include handling large datasets, GPU memory optimization, overfitting prevention, and efficient data loading. Solutions involve DataLoader, gradient accumulation, and regularization[3][7].

25. How do you optimize GPU memory usage during training?

Use mixed precision training (torch.cuda.amp), gradient checkpointing, smaller batch sizes, and torch.no_grad() during validation[3].

26. Explain model evaluation mode in PyTorch.

model.eval() disables dropout and batch normalization layers for consistent inference behavior. Use with torch.no_grad() to disable gradient computation[2].

27. How do you implement learning rate scheduling?

from torch.optim.lr_scheduler import StepLR
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)
scheduler.step()  # Call after optimizer.step()

This reduces learning rate by factor of 0.1 every 10 epochs[2].

28. What is gradient clipping and how do you implement it?

Gradient clipping prevents exploding gradients by clamping them to a maximum norm:

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

[2]

29. How do you deploy a PyTorch model for production at companies like Paytm?

Convert to TorchScript using torch.jit.trace() or torch.jit.script(), save with torch.jit.save(), and serve using Flask/FastAPI or cloud inference services[3].

30. How do you handle class imbalance in PyTorch training?

Use weighted loss functions (weight parameter in nn.CrossEntropyLoss), oversampling minority classes in DataLoader, or focal loss implementations[3].

Master PyTorch for Your Next Interview

Practice these 30 PyTorch interview questions covering tensors, models, training, optimization, and deployment. Understanding these concepts demonstrates readiness for roles at product companies like Flipkart, SAP, and Adobe.