How I Built a Real-Time Crowd Counting & Analytics System with YOLOv8 at Infosys Springboard
By Sathwik K | Computer Vision & Analytics Intern, Infosys Springboard Virtual Internship 6.0

Why Crowd Monitoring Matters
Picture a packed railway station during peak hours. Security personnel are trying to manually estimate crowd density, spot overcrowding, and prevent safety incidents, all at the same time. It's nearly impossible to do accurately at scale.
Crowd monitoring is a critical challenge across smart cities, public events, malls, hospitals, and transportation hubs. The ability to detect and count people in real time can prevent stampedes, improve resource allocation, and enhance public safety, all without relying on manual observation.
During my internship at Infosys Springboard (Oct 2025 – Dec 2025), I got the opportunity to design and build exactly this project, a real-time video analytics application powered by YOLOv8, with zone-based counting, secure authentication and an interactive monitoring dashboard.
Here's a complete walkthrough of how I built it, the challenges I faced, and what I learned.
The Problem Statement
The challenge was straightforward on paper but complex in execution:
"Build a system that can analyze uploaded video footage, let users define custom zones of interest, count people inside each zone in real time, and surface crowd density metrics through an interactive dashboard."
The key difficulties were:
Real-time performance: Video streams generate massive data per second. Processing each frame fast enough requires careful model selection
Accuracy under density: Detecting individuals when people overlap or cluster together is a known challenge for object detection models
Custom zone flexibility: Different deployments (malls, stations, offices) need different monitoring zones, so hardcoding zones wasn't an option
Multi-user access: The system needed role-based access so admins could oversee the system while regular users ran their own detections
Data persistence: Zone configurations and login activity needed to be saved across sessions
Tech Stack & Why I Chose Each Tool
COMPONENETS | TECHNOLOGY | REASON |
Object Detection | YOLOv8s | Best balance of speed and accuracy for real-time person detection |
Web Framework | Streamlit | Rapid prototyping of interactive dashboards without frontend overhead |
Zone Drawing | streamlit-drawable-canvas | Lets users draw rectangles directly on video frames interactively |
Database | MongoDB | Flexible document storage for users, zones and login history |
Security | Bcrypt | Industry-standard password hashing and salting |
Video Processing | OpenCV | Frame extraction, resizing and bounding box rendering |
Data Visualization | Pandas + Streamlit charts | Real-time bar charts of zone-wise crowd counts |
System Architecture
The application is structured around three layers:
1. Authentication Layer Users register and log in securely. Passwords are hashed using Bcrypt before being stored in MongoDB plain text passwords are never saved. The system supports two roles: user and superadmin, each with different access levels.
2. Detection Layer Once logged in, users upload a video file. The system reads the first frame and presents it on an interactive canvas. Users draw rectangular zones of interest directly on the frame, for example- an entrance corridor, a waiting area, or a ticket counter. These zones are saved to MongoDB with custom names.
When detection starts, YOLOv8s processes each frame, identifies all persons, and checks whether each person's center point falls inside any of the defined zones. The count per zone updates in real time.
3. Dashboard Layer Three live UI elements update as the video plays:
A JSON view showing current count per zone.
A video feed with bounding boxes and zone overlays rendered on each frame.
A bar chart visualizing crowd density across zones in real time.
Building It: Step by Step
Step 1: Setting Up YOLOv8
I used Ultralytics' YOLOv8s the small variant which hits the sweet spot between accuracy and inference speed. The model is loaded once using Streamlit's @st.cache_resource decorator, preventing it from reloading on every user interaction.
python
@st.cache_resource
def load_model():
return YOLO("yolov8s.pt")
For each frame, I filtered detections to only the person class, ignoring cars, bags, and other objects YOLO detects by default.
Step 2: Interactive Zone Drawing
This was one of the most user-friendly features. Instead of hardcoding coordinates, I used streamlit-drawable-canvas to let users draw rectangles directly on the first video frame. Each rectangle becomes a named zone stored in MongoDB.
python
canvas_result = st_canvas(
background_image=pil_img,
drawing_mode="rect",
height=360,
width=640,
)
Users can name each zone (e.g., "Entrance", "Queue Area") and save them for future sessions.
Step 3: Zone-Based Counting Logic
For every detected person, I calculated their bounding box center point and checked which zone they fell into:
python
cx, cy = int((x1 + x2) / 2), int((y1 + y2) / 2)
for j, (zx1, zy1, zx2, zy2) in enumerate(zones):
if zx1 <= cx <= zx2 and zy1 <= cy <= zy2:
zone_count[j] += 1
This approach is simple and effective using center points avoids double-counting when a person straddles a zone boundary.
Step 4: Role-Based Access Control (RBAC)
I implemented two roles:
User - can upload videos, draw zones, save zones, and run detection
Super Admin - can manage all users, view all zones, delete accounts, and review login history
The super admin panel gives full visibility into the system:- who logged in, when, and what zones they configured. This is directly relevant to real-world deployments where accountability matters.
Step 5: Live Dashboard
As detection runs, three components update simultaneously- the video feed with overlays, a JSON count display, and a live bar chart. This gives operators an at a glance view of crowd distribution across all zones without reading raw numbers.
Challenges & How I Solved Them
Challenge 1: Model reloading on every interaction Streamlit reruns the entire script on every user action. Initially, YOLOv8 was reloading on every button click, causing 10+ second delays. I fixed this with @st.cache_resource, which persists the model across reruns.
Challenge 2: Overlapping detections inflating counts When people stood close together, bounding boxes overlapped and the center point logic occasionally miscounted. I mitigated this by tuning the confidence threshold and using the center point method instead of checking if any part of the bounding box overlapped with a zone.
Challenge 3: Session state management in Streamlit Streamlit doesn't maintain state between reruns natively. I used st.session_state extensively to preserve authentication status, user role, and zone data across interactions.
Challenge 4: Video processing speed Processing every frame at full resolution was too slow. I resized all frames to 640x360 before passing them to the model, which significantly improved throughput without meaningfully impacting detection accuracy.
Results
Successfully detected and counted people across multiple custom zones in real time.
Zone-based bar charts updated live, giving operators instant density visibility.
Role-based system correctly restricted user access while giving admins full oversight.
Secure authentication with Bcrypt ensured no plain text credentials were stored.
MongoDB persistence meant zones and login history survived across sessions.
What I Learned
Technical:
How to build a full-stack computer vision application end-to-end from model inference to database persistence to live UI.
How YOLO's detection pipeline works internally and how to filter detections by class.
How to manage Streamlit session state for multi-user applications.
How to design role-based access control from scratch.
Non-Technical:
Took responsibility of the team and coordinated task distribution and made architectural decisions under time pressure. Balancing technical execution with team coordination was a skill in itself.
Writing clear README documentation forced me to explain complex systems in simple language, a habit I've carried forward.
Try It Yourself
The project is open source under the MIT License.
GitHub: Crowd-Count-using-Video-Analysis
bash
git clone https://github.com/Sathwik464/Crowd-Count-using-Video-Analysis
pip install streamlit ultralytics streamlit-drawable-canvas opencv-python-headless pandas Pillow pymongo bcrypt
streamlit run dashboard.py
You'll need MongoDB running locally and a video file to upload.
Final Thoughts
Building CrowdCount taught me that production-ready systems aren't just about the core algorithm they're about the reliability, security and usability that surround it. YOLOv8 handles detection, but the value comes from making that detection actionable through zones, dashboards and access control.
If you're a student looking to build something meaningful with computer vision, I'd highly recommend starting with a real-world problem like crowd monitoring the combination of CV, backend, and UI work will stretch you in all the right directions.
Feel free to fork the repo, open issues, or reach out on LinkedIn if you have questions or ideas!


