Background

Counting objects has been the focus of many research works in the recent years across a wide range of domains including counting people, vehicles, animals, fruits, microscopy cells, etc. Crowd counting aims to identify the number of people in a crowded scene. This post will focus on crowd counting using density estimation. Density estimation aims to map an image to its corresponding density map which indicates the number of people per pixel in the image. Fortunately, the techniques for crowd estimation can be extended to other domains.

The problem of crowd counting is of great importance for obtaining a high-level understanding of crowd scenarios such as parades, concerts and stadiums. In these cases, crowd counting plays a crucial role in scene understanding, crowd monitoring and management, and social safety.

We will use the Mall dataset to demonstrate this work.

The mall dataset was collected from a publicly accessible webcam for crowd counting and profiling research. Over 60,000 pedestrians were labeled in 2000 video frames. We annotated the data exhaustively by labeling the head position of every pedestrian in all frames.

# imports 
from scipy.io import loadmat
from scipy.ndimage import gaussian_filter
from fastai.vision.all import *

The dataset

The dataset consists of video frames and their corresponding labels. All frames in the dataset are of size 480 by 640 pixels. Labels are stored in .mat format and can be read using scipy.io. In this section, we will see how to load and extract those labels.

labels = loadmat('/mall_dataset/mall_gt.mat')['frame'][0]
# extract ground truth for the first frame
label = labels[0] 
ground_truth = (label[0][0][0]) 
ground_truth.shape

(29, 2)

The ground_truth array holds the positions of people in the image (the center of the head). We have 29 rows corresponding to 29 people in the first frame.

ground_truth[:3]

array([[126.77986348,  60.70477816],
       [116.95051195,  47.59897611],
       [175.10750853,  44.3225256 ]])

We need to construct the label as an array that matches the size of the frame (1-channel image). A value of 1 indicates the position of the center of the head for each person in the image.

# helper fuction to generate the label from groud truth array
def generate_label(labels, image_shape=(480,640)):
    "Generate an array based on objects positions"
    label = np.zeros(image_shape, dtype=np.float32)
    for x, y in labels:
        label[int(y)][int(x)] = 1
    return label

l = generate_label(ground_truth)
l.shape

(480, 640)

In order to get the count of people, we sum all the elements of the array. In the above example, there are 29 people in the image.

l.sum()

29.0

Density Maps

Density mapping is simply a way to show where points may be concentrated in a given area.

# simple example of a label
gt = np.zeros((11,11))
gt[5,5]=1
gt

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

Below is an image representation of the ground truth that corresponds to the above numpy array. The title of the image shows the sum of values within the array; this is the same as the number of people in the frame. In this case, it depicts 1 person at the center of the image.

plt.imshow(gt, cmap='jet')
plt.title(str(gt.sum()))
plt.axis('off');

In order to obtain a density map that corresponds the above ground truth, we use a 2D-Gaussian filter. The filter dilates the ground truth points according to standard deviation values sigma, while preserving the sum of those values. Below is an example of the resulting density map.

We can notice that the sum did not change as indicated by the title of the density map image.

dmap = gaussian_filter(gt, sigma=(1, 1), order=0)
dmap.round(2)

array([[0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , 0.  , 0.01, 0.02, 0.01, 0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , 0.01, 0.06, 0.1 , 0.06, 0.01, 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , 0.02, 0.1 , 0.16, 0.1 , 0.02, 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , 0.01, 0.06, 0.1 , 0.06, 0.01, 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , 0.  , 0.01, 0.02, 0.01, 0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ]])

# simple example of a density map
plt.imshow(dmap, cmap='jet')
plt.title(str(dmap.sum()))
plt.axis('off');

Fastai data types

Fastai library has many useful predefined data types that can handle most of the common tasks. Unfortunately, our problem does not fit under those tasks. However, the library allows us to create custom types for a specific tasks. If defined properly, those custom types can make use of many handy features that are available in the library. In this section we learn how to build and utilize such custom types.

We will start with one of the predefined data types PILImage that allows to read an image using its path. Then we use fastai's function show_image to display the image.

fn = '/mall_dataset/frames/seq_000001.jpg'

img = PILImage.create(fn)
show_image(img,figsize=(18,8));

PILImage is a class that allows to construct images from various sources like a path, an array, or a tensor. We will follow the same structure for building our custom type. Our custom type which we will call DMap, should allow us to construct a density map from a path to an image or directly from a ground truth label array.

We Define a helper function get_lbl to generate a label starting from a path to an image. This function simply extracts the index of the frame from the path using regular expressions and uses generate_label to build the label.

def get_lbl(fn):
    # extract indx from fn
    indx = int(re.findall('.+/seq_(\d+).jpg', str(fn))[0])-1 # extract indx from fn
    lbl = labels[indx]
    return generate_label(lbl[0][0][0], (480,640))

DMap: density map type

Next, we define a density map type DMap. DMap can be created from a path to an image or directly from an array. It also knows how to show itself using the show method.

The create method utilizes the gaussian_filter function to produce the final density map. The show method uses the _show_args to set the color map and transparency. It also shows the total number of people as the title when displaying the density map. Transparency alpha is set so that we can overlay the density map on top of the original image.

# Density Map type
class DMap(PILBase): 
    _open_args,_show_args = {'mode':'L'},{'alpha':0.4, 'cmap':'jet'}
    
    @classmethod
    def create(cls, fn):
        if isinstance(fn,ndarray): 
            l = fn
        else:
            l = get_lbl(fn)
        l = gaussian_filter(l, sigma=(4, 4), order=0)
        return cls(Image.fromarray(l))
    
    def show(self, ctx=None, **kwargs):
        ax = show_image(self, title=str(array(self).sum()), ctx=ctx, **self._show_args);
        return ax

We can now create and display our density map from the label l above.

dm = DMap.create(l)
dm.show();

We can also create the density map starting from the path of the image fn directly.

dm = DMap.create(fn)
dm.show();

type(dm)

__main__.DMap

Additionally, we can overlay the density map on top of the image by using the show method.

ax = show_image(img,figsize=(18,8))
dm.show(ctx=ax);

Data augmentations

Common data augmentations can be used for this application. We demonstrate using fastai's random crop of an image and the corresponding density map. Fastai transforms work on tuples, so we create a tuple of (image, density-map) and test the RandomCrop transformation.

tup = (img, dm)
rtup = RandomCrop(size=224)(tup)
x, y = rtup
x.shape, y.shape

((224, 224), (224, 224))

We can confirm that the transform worked by looking at the output shapes and visualizing the result.

ax = x.show()
y.show(ctx=ax);

Notice that after applying the Gaussian filter and cropping the image, it is possible to obtain factional numbers that represents a partial head cropped out of the image.

TensorDMap: density map tensor

We need to convert our density map DMap to a tensor that can be used for training the model. For this, we use ToTensor transform from fastai. However, ToTensor only works with the types that are familiar with and doesn't support our custom DMap out of the box. Below, we define a density map tensor TensorDMap for this purpose. TensorDMap also knows how to show itself using the show method just like DMap.

This line DMap._tensor_cls = TensorDMap is used to associates TensorDMap with our custom type DMap. TensorDMap is just like a normal tensor but it signifies a particular use-case.

# Density Map Tensor
class TensorDMap(TensorImageBase): 
    _show_args = {'alpha':0.4, 'cmap':'jet'}
    def show(self, ctx=None, **kwargs):
        return show_image(self, ctx=ctx, title=str(array(self).sum()),**{**self._show_args, **kwargs})
    
DMap._tensor_cls = TensorDMap

To get ToTensor transform to work with our custom type, we need to define the encodes method properly annotated to handle our DMap type. Fastai uses type annotations to properly dispatch the right transformation for the type.

@ToTensor
def encodes(self, o:DMap): return o._tensor_cls(image2tensor(o))

Preparing data for training

We build the DataBlock utilizing the types and functions that were defined earlier.

db = DataBlock(blocks=(ImageBlock, ImageBlock(cls=DMap)),
               get_items=get_image_files,
               get_y=get_lbl,
               item_tfms=[RandomCrop(224), FlipItem(p=0.5), ToTensor],
               batch_tfms=[Normalize.from_stats(*imagenet_stats)]
              )

Once we have our datablock ready, it is trivially easy to create the dataloaders with fastai. In this step, we defined the transforms that need to be applied to our data. Each item is randomly cropped, flipped, and then converted to a tensor.

dls = db.dataloaders('/mall_dataset/frames/', bs=6)

Let's grab a batch and confirm that the shapes are correct. Here, we defined a batch size of 6.

b = dls.one_batch()
b[0].shape, b[1].shape

((6, 3, 224, 224), (6, 1, 224, 224))

As our custom defined type has a show method, we can easily visualize a batch of data with fastai convenient functionality.

dls.show_batch(figsize=(18,8))

From this point, we continue as usual by defining the model of choice along with the optimizer and the loss function to create a learner object.

m = resnet34(pretrained=True)
m = nn.Sequential(*list(m.children())[:-2])
unet = DynamicUnet(m, 1, (224, 224))

learn = Learner(dls, unet, loss_func=MSELossFlat(), metrics=mae)

Hope that this post inspires you to use and customize fastai to your particular use case.

Feel free to reach out to me at @Feras_Oughali if you have any questions or suggestions for improving this post.