Crowd Counting and Density Estimation with Fastai
Creating new types for density maps
Background
Counting objects has been the focus of many research works in the recent years across a wide range of domains including counting people, vehicles, animals, fruits, microscopy cells, etc. Crowd counting aims to identify the number of people in a crowded scene. This post will focus on crowd counting using density estimation. Density estimation aims to map an image to its corresponding density map which indicates the number of people per pixel in the image. Fortunately, the techniques for crowd estimation can be extended to other domains.
The problem of crowd counting is of great importance for obtaining a high-level understanding of crowd scenarios such as parades, concerts and stadiums. In these cases, crowd counting plays a crucial role in scene understanding, crowd monitoring and management, and social safety.
We will use the Mall dataset to demonstrate this work.
The mall dataset was collected from a publicly accessible webcam for crowd counting and profiling research. Over 60,000 pedestrians were labeled in 2000 video frames. We annotated the data exhaustively by labeling the head position of every pedestrian in all frames.
# imports
from scipy.io import loadmat
from scipy.ndimage import gaussian_filter
from fastai.vision.all import *
labels = loadmat('/mall_dataset/mall_gt.mat')['frame'][0]
# extract ground truth for the first frame
label = labels[0]
ground_truth = (label[0][0][0])
ground_truth.shape
The ground_truth
array holds the positions of people in the image (the center of the head). We have 29 rows corresponding to 29 people in the first frame.
ground_truth[:3]
We need to construct the label as an array that matches the size of the frame (1-channel image). A value of 1 indicates the position of the center of the head for each person in the image.
# helper fuction to generate the label from groud truth array
def generate_label(labels, image_shape=(480,640)):
"Generate an array based on objects positions"
label = np.zeros(image_shape, dtype=np.float32)
for x, y in labels:
label[int(y)][int(x)] = 1
return label
l = generate_label(ground_truth)
l.shape
In order to get the count of people, we sum all the elements of the array. In the above example, there are 29 people in the image.
l.sum()
# simple example of a label
gt = np.zeros((11,11))
gt[5,5]=1
gt
Below is an image representation of the ground truth that corresponds to the above numpy
array. The title of the image shows the sum of values within the array; this is the same as the number of people in the frame. In this case, it depicts 1 person at the center of the image.
plt.imshow(gt, cmap='jet')
plt.title(str(gt.sum()))
plt.axis('off');
In order to obtain a density map that corresponds the above ground truth, we use a 2D-Gaussian filter. The filter dilates the ground truth points according to standard deviation values sigma
, while preserving the sum of those values. Below is an example of the resulting density map.
We can notice that the sum did not change as indicated by the title of the density map image.
dmap = gaussian_filter(gt, sigma=(1, 1), order=0)
dmap.round(2)
# simple example of a density map
plt.imshow(dmap, cmap='jet')
plt.title(str(dmap.sum()))
plt.axis('off');
Fastai library has many useful predefined data types that can handle most of the common tasks. Unfortunately, our problem does not fit under those tasks. However, the library allows us to create custom types for a specific tasks. If defined properly, those custom types can make use of many handy features that are available in the library. In this section we learn how to build and utilize such custom types.
We will start with one of the predefined data types PILImage
that allows to read an image using its path. Then we use fastai's function show_image
to display the image.
fn = '/mall_dataset/frames/seq_000001.jpg'
img = PILImage.create(fn)
show_image(img,figsize=(18,8));
PILImage
is a class that allows to construct images from various sources like a path, an array, or a tensor. We will follow the same structure for building our custom type. Our custom type which we will call DMap
, should allow us to construct a density map from a path to an image or directly from a ground truth label array.
We Define a helper function get_lbl
to generate a label starting from a path to an image. This function simply extracts the index of the frame from the path using regular expressions and uses generate_label
to build the label.
def get_lbl(fn):
# extract indx from fn
indx = int(re.findall('.+/seq_(\d+).jpg', str(fn))[0])-1 # extract indx from fn
lbl = labels[indx]
return generate_label(lbl[0][0][0], (480,640))
DMap: density map type
Next, we define a density map type DMap
. DMap can be created from a path to an image or directly from an array. It also knows how to show itself using the show
method.
The create
method utilizes the gaussian_filter
function to produce the final density map. The show
method uses the _show_args
to set the color map and transparency. It also shows the total number of people as the title when displaying the density map. Transparency alpha
is set so that we can overlay the density map on top of the original image.
# Density Map type
class DMap(PILBase):
_open_args,_show_args = {'mode':'L'},{'alpha':0.4, 'cmap':'jet'}
@classmethod
def create(cls, fn):
if isinstance(fn,ndarray):
l = fn
else:
l = get_lbl(fn)
l = gaussian_filter(l, sigma=(4, 4), order=0)
return cls(Image.fromarray(l))
def show(self, ctx=None, **kwargs):
ax = show_image(self, title=str(array(self).sum()), ctx=ctx, **self._show_args);
return ax
We can now create and display our density map from the label l
above.
dm = DMap.create(l)
dm.show();
We can also create the density map starting from the path of the image fn
directly.
dm = DMap.create(fn)
dm.show();
type(dm)
Additionally, we can overlay the density map on top of the image by using the show
method.
ax = show_image(img,figsize=(18,8))
dm.show(ctx=ax);
tup = (img, dm)
rtup = RandomCrop(size=224)(tup)
x, y = rtup
x.shape, y.shape
We can confirm that the transform worked by looking at the output shapes and visualizing the result.
ax = x.show()
y.show(ctx=ax);
Notice that after applying the Gaussian filter and cropping the image, it is possible to obtain factional numbers that represents a partial head cropped out of the image.
TensorDMap: density map tensor
We need to convert our density map DMap
to a tensor that can be used for training the model. For this, we use ToTensor
transform from fastai. However, ToTensor
only works with the types that are familiar with and doesn't support our custom DMap
out of the box. Below, we define a density map tensor TensorDMap
for this purpose. TensorDMap
also knows how to show itself using the show
method just like DMap
.
This line DMap._tensor_cls = TensorDMap
is used to associates TensorDMap
with our custom type DMap
. TensorDMap
is just like a normal tensor but it signifies a particular use-case.
# Density Map Tensor
class TensorDMap(TensorImageBase):
_show_args = {'alpha':0.4, 'cmap':'jet'}
def show(self, ctx=None, **kwargs):
return show_image(self, ctx=ctx, title=str(array(self).sum()),**{**self._show_args, **kwargs})
DMap._tensor_cls = TensorDMap
To get ToTensor
transform to work with our custom type, we need to define the encodes
method properly annotated to handle our DMap
type. Fastai uses type annotations to properly dispatch the right transformation for the type.
@ToTensor
def encodes(self, o:DMap): return o._tensor_cls(image2tensor(o))
db = DataBlock(blocks=(ImageBlock, ImageBlock(cls=DMap)),
get_items=get_image_files,
get_y=get_lbl,
item_tfms=[RandomCrop(224), FlipItem(p=0.5), ToTensor],
batch_tfms=[Normalize.from_stats(*imagenet_stats)]
)
Once we have our datablock ready, it is trivially easy to create the dataloaders with fastai. In this step, we defined the transforms that need to be applied to our data. Each item is randomly cropped, flipped, and then converted to a tensor.
dls = db.dataloaders('/mall_dataset/frames/', bs=6)
Let's grab a batch and confirm that the shapes are correct. Here, we defined a batch size of 6.
b = dls.one_batch()
b[0].shape, b[1].shape
As our custom defined type has a show
method, we can easily visualize a batch of data with fastai convenient functionality.
dls.show_batch(figsize=(18,8))
From this point, we continue as usual by defining the model of choice along with the optimizer and the loss function to create a learner object.
m = resnet34(pretrained=True)
m = nn.Sequential(*list(m.children())[:-2])
unet = DynamicUnet(m, 1, (224, 224))
learn = Learner(dls, unet, loss_func=MSELossFlat(), metrics=mae)
Hope that this post inspires you to use and customize fastai to your particular use case.
Feel free to reach out to me at @Feras_Oughali if you have any questions or suggestions for improving this post.