Crowd counting is known to be act of counting the total crowd present in a certain area. The people in a certain area are called a crowd. The most direct method is to actually count each person in the crowd. For example, turnstiles are often used to precisely count the number of people entering an event.[1]

The Million Man March, Washington, D.C., October 1995 was the focus of a large crowd counting dispute.

Modern understanding

edit

Since the early 2000s, there has been a shift in the understanding of the phrase “crowd counting”. Having moved from a simpler crowd counting method to that of clusters and density maps, there are several improvements for crowd counting methods. Crowd counting can also be defined as estimating the number of people present in a single picture.[2]

Methods of counting crowds

edit

Due to the rapid progress in technology and growth of CNN (Convolutional Neural Network) over the last decade, the usage of CNN in crowd counting has skyrocketed. The CNN based methods can largely be grouped under the following different models:[3]

Jacobs' method

edit

The most common technique for counting crowds at protests and rallies is Jacobs' method, named for its inventor, Herbert Jacobs. Jacobs' method involves dividing the area occupied by a crowd into sections, determining an average number of people in each section, and multiplying by the number of sections occupied. According to a report by Life's Little Mysteries, technologies sometimes used to assist such estimations include "lasers, satellites, aerial photography, 3-D grid systems, recorded video footage and surveillance balloons, usually tethered several blocks around an event's location and flying 400 to 800 feet (120 to 240 meters) overhead."[2]

Direct regression-based counting

edit

This crowd counting method involves using regression on global image features to the whole image. Global image features refer to the different properties of certain areas of the photo. For example, global image features include “ contour representations, shape descriptions, texture features.”[4]

As distribution information of objects are not accounted for, object localisation cannot be processed via regressions.[5] Additionally, as this model estimates the crowd density on descriptions of crowd patterns, it ignores individual trackers.[2] This allows regression based models to be very efficient in crowded pictures; if the density per pixel is very high regression models are best suited.

Earlier crowd counting methods employed classical regression models.[6]

Density-based counting

edit

Object density maps rely on finding the total number of objects located in a particular area. This is determined by the integral summation of the number of objects in that area.[5] Due to the density values being estimated through low values, density-based counting allows the user to experience advantages of regression-based models alongside localisation of information.[5] Localisation of information refers to the act of maintaining location information.

Strengthening crowd counting

edit

In order to use the above-mentioned models efficiently, it is important to have a large amount of data. However, as users, we are stuck with limited data i.e. the original image. In order to compensate for these issues, we employ tricks such as random cropping. Random cropping refers to the act of randomly choosing certain sub images from the existing original image.

After performing several iterations of random cropping, the sub images are then fed into the machine learning algorithm to help the algorithm generalize better.

To tackle the problems associated with crowd counting in heavy density areas  density based counting methods can be employed. These image pyramids are generally employed for crowd counting in places where people gather to perform rituals or practice their religious beliefs. This is because there are different scales of people in different locations within the image.

However, as employing the required algorithms for image pyramids is very expensive, it is financially unstable to depend on these methods. As a result, deep fusion models can be involved.[7]

These deep fusion models will employ “neural network(s) to promote the density map regression accuracy.”[8] These models will first mark the location of each civilian within the picture. Then, the models shall decide the density maps of the area by using the “pedestrian’s location, shape, and perspective distortion.”[8]  As there are many iterations of the algorithm and scanning processes taking place, the number of people is counted via the head of the person. This is also because there will be many instances when the bodies of the civilians will be overlapping with one another.

Importance

edit

Crowd counting plays an important role in “public safety, assembly language, and video surveillance”[9]  amongst many things. Without crowd control, through poor planning, several terrible accidents can occur. Some of the most notable ones are the Hillborough disaster which took place on April 15 in England. Another memorable incident occurred when Louis Farrakhan threatened to sue the Washington, D.C. Park Police for announcing that only 400,000 people attended the 1995 Million Man March he organized.

At events in streets or a park rather than an enclosed venue, crowd counting is more difficult and less precise. For many events, especially political rallies or protests, the number of people in a crowd carries political significance and count results are controversial. For example, the global protests against the Iraq war had many protests with widely differing counts offered by organizers on one side and the police on the other side.

References

edit
  1. ^ "What are Turnstiles? (with pictures)". EasyTechJunkie. Retrieved 2022-10-11.
  2. ^ a b c Loy, Chen Change; Chen, Ke; Gong, Shaogang; Xiang, Tao (2021). "Fine-Grained Crowd Counting". IEEE Transactions on Image Processing. 30: 2114–2126. arXiv:2007.06146. Bibcode:2021ITIP...30.2114W. doi:10.1109/TIP.2021.3049938. PMID 33439838. S2CID 220496399.
  3. ^ Chu, Huanpeng; Tang, Jilin; Hu, Haoji (2021-10-01). "Attention guided feature pyramid network for crowd counting". Journal of Visual Communication and Image Representation. 80: 103319. doi:10.1016/j.jvcir.2021.103319. ISSN 1047-3203. S2CID 241591128.
  4. ^ Lisin, Dimitri A.; Mattar, Marwan A.; Blaschko, Matthew B.; Benfield, Mark C.; Learned-Mille, Erik G. "Combining Local and Global Image Features for Object Class Recognition" (PDF).
  5. ^ a b c Kang, D.; Ma, Z.; Chan, A. B. (May 2019). "Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks—Counting, Detection, and Tracking". IEEE Transactions on Circuits and Systems for Video Technology. 29 (5): 1408–1422. arXiv:1705.10118. doi:10.1109/TCSVT.2018.2837153. S2CID 19706288.
  6. ^ Delussu, Rita; Putzu, Lorenzo; Fumera, Giorgio (2022). "Scene-specific crowd counting using synthetic training images". Pattern Recognition. 124: 108484. Bibcode:2022PatRe.12408484D. doi:10.1016/j.patcog.2021.108484. hdl:11584/341493. S2CID 245109866.
  7. ^ Khan, Sultan Daud; Salih, Yasir; Zafar, Basim; Noorwali, Abdulfattah (2021-09-28). "A Deep-Fusion Network for Crowd Counting in High-Density Crowded Scenes". International Journal of Computational Intelligence Systems. 14 (1): 168. doi:10.1007/s44196-021-00016-x. ISSN 1875-6883.
  8. ^ a b Tang, Siqi; Pan, Zhisong; Zhou, Xingyu (2017-01-01). "Low-Rank and Sparse Based Deep-Fusion Convolutional Neural Network for Crowd Counting". Mathematical Problems in Engineering. 2017: 1–11. doi:10.1155/2017/5046727.
  9. ^ Chu, Huanpeng; Tang, Jilin; Hu, Haoji (2021-10-01). "Attention guided feature pyramid network for crowd counting". Journal of Visual Communication and Image Representation. 80: 103319. doi:10.1016/j.jvcir.2021.103319. ISSN 1047-3203. S2CID 241591128.
edit