Detecting cactus with kekas
Author: https://www.kaggle.com/artgor
From: https://www.kaggle.com/artgor/detecting-cactus-with-kekas
License: Apache 2.0
General information
Researchers in Mexico have created the VIGIA project, aiming to build a system for autonomous surveillance of protected areas. One of the first steps is being able to recognize the vegetation in the area. In this competition we are trying to identify whether there is a cactus in the image.
In this kernel I use kekas (https://github.com/belskikh/kekas) as a wrapper for Pytorch.
Most of the code is taken from my other kernel: https://www.kaggle.com/artgor/cancer-detection-with-kekas
CodeIn [1]:
# libraries
import numpy as np
import pandas as pd
import os
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
import torch
from torch.utils.data import TensorDataset, DataLoader,Dataset
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
from torch.optim import lr_scheduler
import time
from PIL import Image
train_on_gpu = True
from torch.utils.data.sampler import SubsetRandomSampler
from torch.optim.lr_scheduler import StepLR, ReduceLROnPlateau, CosineAnnealingLR
from sklearn.metrics import accuracy_score
import cv2
Some of good libraries for DL aren't available in Docker with GPU by default, so it is necessary to install them. (don't forget to turn on internet connection in kernels).
In [2]:
!pip install albumentations > /dev/null 2>&1
!pip install pretrainedmodels > /dev/null 2>&1
!pip install kekas > /dev/null 2>&1
!pip install adabound > /dev/null 2>&1
CodeIn [3]:
# more imports
import albumentations
from albumentations import torch as AT
import pretrainedmodels
import adabound
from kekas import Keker, DataOwner, DataKek
from kekas.transformations import Transformer, to_torch, normalize
from kekas.metrics import accuracy
from kekas.modules import Flatten, AdaptiveConcatPool2d
from kekas.callbacks import Callback, Callbacks, DebuggerCallback
from kekas.utils import DotDict
/opt/conda/lib/python3.6/site-packages/kekas/keker.py:9: UserWarning: Error 'No module named 'apex''' during importing apex library. To use mixed precison you should install it from https://github.com/NVIDIA/apex
warnings.warn(f"Error '{e}'' during importing apex library. To use mixed precison"
Data overview
In [4]:
labels = pd.read_csv('../input/train.csv')
fig = plt.figure(figsize=(25, 8))
train_imgs = os.listdir("../input/train/train")
for idx, img in enumerate(np.random.choice(train_imgs, 20)):
ax = fig.add_subplot(4, 20//4, idx+1, xticks=[], yticks=[])
im = Image.open("../input/train/train/" + img)
plt.imshow(im)
lab = labels.loc[labels['id'] == img, 'has_cactus'].values[0]
ax.set_title(f'Label: {lab}')
Images were resized, so I can see almost nothing in them...
Kekas accepts pandas DataFrame as an input and iterates over it to get image names and labels
In [5]:
test_img = os.listdir('../input/test/test')
test_df = pd.DataFrame(test_img, columns=['id'])
test_df['has_cactus'] = -1
test_df['data_type'] = 'test'
labels['has_cactus'] = labels['has_cactus'].astype(int)
labels['data_type'] = 'train'
labels.head()
Out[5]:
id | has_cactus | data_type | |
---|---|---|---|
0 | 0004be2cfeaba1c0361d39e2b000257b.jpg | 1 | train |
--- | --- | --- | --- |
1 | 000c8a36845c0208e833c79c1bffedd1.jpg | 1 | train |
--- | --- | --- | --- |
2 | 000d1e9a533f62e55c289303b072733d.jpg | 1 | train |
--- | --- | --- | --- |
3 | 0011485b40695e9138e92d0b3fb55128.jpg | 1 | train |
--- | --- | --- | --- |
4 | 0014d7a11e90b62848904c1418fc8cf2.jpg | 1 | train |
--- | --- | --- | --- |
In [6]:
labels.loc[labels['data_type'] == 'train', 'has_cactus'].value_counts()
Out[6]:
1 13136
0 4364
Name: has_cactus, dtype: int64
We have some disbalance in the data, but it isn't too big.
In [7]:
# splitting data into train and validation
train, valid = train_test_split(labels, stratify=labels.has_cactus, test_size=0.2)
Reader function
At first it is necessary to create a reader function, which will open images. It accepts i and row as input (like from pandas iterrows). The function should return a dictionary with image and label. [:,:,::-1] - is a neat trick which converts BGR images to RGB, it works faster that converting to RGB by usual means.
In [8]:
def reader_fn(i, row):
image = cv2.imread(f"../input/{row['data_type']}/{row['data_type']}/{row['id']}")[:,:,::-1] # BGR -> RGB
label = torch.Tensor([row["has_cactus"]])
return {"image": image, "label": label}
Data transformation
Next step is defining data transformations and augmentations. This differs from standard PyTorch way. We define resizing, augmentations and normalizing separately, this allows to easily create separate transformers for train and valid/test data.
At first we define augmentations. We create a function with a list of augmentations (I prefer albumentation library: https://github.com/albu/albumentations)
In [9]:
def augs(p=0.5):
return albumentations.Compose([
albumentations.HorizontalFlip(),
albumentations.VerticalFlip(),
albumentations.RandomBrightness(),
], p=p)
Now we create a transforming function. It heavily uses Transformer from kekas.
- The first step is defining resizing. You can change arguments of function if you want images to have different height and width, otherwis you can leave it as it is.
- Next step is defining augmentations. Here we provide the key of image which is defined in reader_fn;
- The third step is defining final transformation to tensor and normalizing;
- After this we can compose separate transformations for train and valid/test data;
In [10]:
def get_transforms(dataset_key, size, p):
PRE_TFMS = Transformer(dataset_key, lambda x: cv2.resize(x, (size, size)))
AUGS = Transformer(dataset_key, lambda x: augs()(image=x)["image"])
NRM_TFMS = transforms.Compose([
Transformer(dataset_key, to_torch()),
Transformer(dataset_key, normalize())
])
train_tfms = transforms.Compose([PRE_TFMS, AUGS, NRM_TFMS])
val_tfms = transforms.Compose([PRE_TFMS, NRM_TFMS])
return train_tfms, val_tfms
In [11]:
train_tfms, val_tfms = get_transforms("image", 32, 0.5)
Now we can create a DataKek, which is similar to creating dataset in Pytorch. We define the data, reader function and transformation.Then we can define standard PyTorch DataLoader.
In [12]:
train_dk = DataKek(df=train, reader_fn=reader_fn, transforms=train_tfms)
val_dk = DataKek(df=valid, reader_fn=reader_fn, transforms=val_tfms)
batch_size = 64
workers = 0
train_dl = DataLoader(train_dk, batch_size=batch_size, num_workers=workers, shuffle=True, drop_last=True)
val_dl = DataLoader(val_dk, batch_size=batch_size, num_workers=workers, shuffle=False)
In [13]:
test_dk = DataKek(df=test_df, reader_fn=reader_fn, transforms=val_tfms)
test_dl = DataLoader(test_dk, batch_size=batch_size, num_workers=workers, shuffle=False)
Building a neural net
Here we define the architecture of the neural net.
- Pre-trained backbone is taken from pretrainedmodels: https://github.com/Cadene/pretrained-models.pytorch Here I take densenet169
- We also define changes to the architecture. For example, we take off the last layer and add a custom head with nn.Sequential. AdaptiveConcatPool2d is a layer in kekas, which concats AdaptiveMaxPooling and AdaptiveAveragePooling
In [14]:
class Net(nn.Module):
def __init__(
self,
num_classes: int,
p: float = 0.2,
pooling_size: int = 2,
last_conv_size: int = 1664,
arch: str = "densenet169",
pretrained: str = "imagenet") -> None:
"""A simple model to finetune.
Args:
num_classes: the number of target classes, the size of the last layer's output
p: dropout probability
pooling_size: the size of the result feature map after adaptive pooling layer
last_conv_size: size of the flatten last backbone conv layer
arch: the name of the architecture form pretrainedmodels
pretrained: the mode for pretrained model from pretrainedmodels
"""
super().__init__()
net = pretrainedmodels.__dict__[arch](pretrained=pretrained)
modules = list(net.children())[:-1] # delete last layer
# add custom head
modules += [nn.Sequential(
# AdaptiveConcatPool2d is a concat of AdaptiveMaxPooling and AdaptiveAveragePooling
# AdaptiveConcatPool2d(size=pooling_size),
Flatten(),
nn.BatchNorm1d(1664),
nn.Dropout(p),
nn.Linear(1664, num_classes)
)]
self.net = nn.Sequential(*modules)
def forward(self, x):
logits = self.net(x)
return logits
The data for training needs to be transformed one more time - we define DataOwner, which contains all the data. For now let's define it for train and valid. Next we define model and loss. As I choose BCEWithLogitsLoss, we can set the number of classes for output to 1.
In [15]:
dataowner = DataOwner(train_dl, val_dl, None)
model = Net(num_classes=1)
criterion = nn.BCEWithLogitsLoss()
Downloading: "http://data.lip6.fr/cadene/pretrainedmodels/densenet169-f470b90a4.pth" to /tmp/.torch/models/densenet169-f470b90a4.pth
57372314it [00:03, 14671003.61it/s]
And now we define what will the model do with the data. For example we could slice the output and take only a part of it. For now we will simply return the output of the model.
In [16]:
def step_fn(model: torch.nn.Module,
batch: torch.Tensor) -> torch.Tensor:
"""Determine what your model will do with your data.
Args:
model: the pytorch module to pass input in
batch: the batch of data from the DataLoader
Returns:
The models forward pass results
"""
inp = batch["image"]
return model(inp)
Defining custom metrics
In [17]:
def bce_accuracy(target: torch.Tensor,
preds: torch.Tensor,
thresh: bool = 0.5) -> float:
target = target.cpu().detach().numpy()
preds = (torch.sigmoid(preds).cpu().detach().numpy() > thresh).astype(int)
return accuracy_score(target, preds)
def roc_auc(target: torch.Tensor,
preds: torch.Tensor) -> float:
target = target.cpu().detach().numpy()
preds = torch.sigmoid(preds).cpu().detach().numpy()
return roc_auc_score(target, preds)
Keker
Now we can define the Keker - the core Kekas class for training the model.
Here we define everything which is necessary for training:
- the model which was defined earlier;
- dataowner containing the data for training and validation;
- criterion;
- step function;
- the key of labels, which was defined in the reader function;
- the dictionary with metrics (there can be several of them);
- The optimizer and its parameters;
In [18]:
keker = Keker(model=model,
dataowner=dataowner,
criterion=criterion,
step_fn=step_fn,
target_key="label",
metrics={"acc": bce_accuracy, 'auc': roc_auc},
opt=torch.optim.SGD,
opt_params={"momentum": 0.99})
In [19]:
keker.unfreeze(model_attr="net")
layer_num = -1
keker.freeze_to(layer_num, model_attr="net")
In [20]:
keker.kek_one_cycle(max_lr=1e-2, # the maximum learning rate
cycle_len=5, # number of epochs, actually, but not exactly
momentum_range=(0.95, 0.85), # range of momentum changes
div_factor=25, # max_lr / min_lr
increase_fraction=0.3, # the part of cycle when learning rate increases
logdir='train_logs')
keker.plot_kek('train_logs')
Epoch 1/5: 100% 218/218 [00:52<00:00, 4.94it/s, loss=0.0502, val_loss=0.0337, acc=0.9896, auc=0.9995]
Epoch 2/5: 100% 218/218 [00:42<00:00, 6.22it/s, loss=0.0131, val_loss=0.0152, acc=0.9955, auc=0.9999]
Epoch 3/5: 100% 218/218 [00:41<00:00, 6.37it/s, loss=0.0146, val_loss=0.0112, acc=0.9955, auc=1.0000]
Epoch 4/5: 100% 218/218 [00:43<00:00, 5.73it/s, loss=0.0097, val_loss=0.0109, acc=0.9955, auc=1.0000]
Epoch 5/5: 100% 218/218 [00:41<00:00, 6.24it/s, loss=0.0107, val_loss=0.0092, acc=0.9957, auc=1.0000]
keker.kek_one_cycle(max_lr=1e-3, # the maximum learning rate
cycle_len=5, # number of epochs, actually, but not exactly
momentum_range=(0.95, 0.85), # range of momentum changes
div_factor=25, # max_lr / min_lr
increase_fraction=0.2, # the part of cycle when learning rate increases
logdir='train_logs1')
keker.plot_kek('train_logs1')
Epoch 1/5: 100% 218/218 [00:41<00:00, 6.29it/s, loss=0.0149, val_loss=0.0089, acc=0.9963, auc=1.0000]
Epoch 2/5: 100% 218/218 [00:41<00:00, 6.34it/s, loss=0.0221, val_loss=0.0098, acc=0.9966, auc=1.0000]
Epoch 3/5: 100% 218/218 [00:42<00:00, 6.39it/s, loss=0.0122, val_loss=0.0094, acc=0.9960, auc=1.0000]
Epoch 4/5: 100% 218/218 [00:41<00:00, 5.93it/s, loss=0.0119, val_loss=0.0087, acc=0.9963, auc=1.0000]
Epoch 5/5: 100% 218/218 [00:41<00:00, 6.48it/s, loss=0.0106, val_loss=0.0086, acc=0.9974, auc=1.0000]
Predicting and TTA
Simply predicting on test data is okay, but it is better to use TTA - test time augmentation. Let's see how it can be done with Kekas.
- define augmentations;
- define augmentation function;
- create objects with these augmentations;
- put these objects into a single dictionary;
In [22]:
preds = keker.predict_loader(loader=test_dl)
Predict: 100% 63/63 [00:11<00:00, 6.02it/s]
In [23]:
# flip_ = albumentations.HorizontalFlip(always_apply=True)
# transpose_ = albumentations.Transpose(always_apply=True)
# def insert_aug(aug, dataset_key="image", size=224):
# PRE_TFMS = Transformer(dataset_key, lambda x: cv2.resize(x, (size, size)))
# AUGS = Transformer(dataset_key, lambda x: aug(image=x)["image"])
# NRM_TFMS = transforms.Compose([
# Transformer(dataset_key, to_torch()),
# Transformer(dataset_key, normalize())
# ])
# tfm = transforms.Compose([PRE_TFMS, AUGS, NRM_TFMS])
# return tfm
# flip = insert_aug(flip_)
# transpose = insert_aug(transpose_)
# tta_tfms = {"flip": flip, "transpose": transpose}
# # third, run TTA
# keker.TTA(loader=test_dl, # loader to predict on
# tfms=tta_tfms, # list or dict of always applying transforms
# savedir="tta_preds1", # savedir
# prefix="preds") # (optional) name prefix. default is 'preds'
In [24]:
# prediction = np.zeros((test_df.shape[0], 1))
# for i in os.listdir('tta_preds1'):
# pr = np.load('tta_preds1/' + i)
# prediction += pr
# prediction = prediction / len(os.listdir('tta_preds1'))
In [25]:
test_preds = pd.DataFrame({'imgs': test_df.id.values, 'preds': preds.reshape(-1,)})
test_preds.columns = ['id', 'has_cactus']
test_preds.to_csv('sub.csv', index=False)
test_preds.head()
Out[25]:
id | has_cactus | |
---|---|---|
0 | 79ac4cc3b082e0a1defe1be601806efd.jpg | 4.901506 |
--- | --- | --- |
1 | e880364d6521c6f3a27748ec62b0e335.jpg | 9.305355 |
--- | --- | --- |
2 | 74912492b6cdf28c4bfb9c8e1d35af3e.jpg | 8.171706 |
--- | --- | --- |
3 | 078cfa961183b30693ea2f13f5ff6d17.jpg | 8.186723 |
--- | --- | --- |
4 | 7fd729184ef182899ce3e7a174fb9bc0.jpg | 9.269026 |
--- | --- | --- |