University of Maryland at College Park
- B.S. Computer Engineering 2024
- Minor in Robotics and Autonomous
Systems
- Cybersecurity Honors Program
View My LinkedIn Profile
View My Github Profile
Contact Me:
E: sulkunte@gmail.com
C: (301) 605-0719
Project description: I worked on the creation of the VAST system: Video Analytics with Speech and Text. Over the course of the internship, I worked on several different tasks including a custom OCR (optical character recognition) program using open-source technologies, automated metadata extraction and location mapping, and object recognition tasks.
The biggest challenge with this program was that it involved natural scene detection which poses an additional set of challenges due to the lack of clarity. In order to combat this, I manipulated images using OpenCV and augmented the training dataset to better results with text on objects like waving flags and angled signs. I utilized the EAST text detector to isolate regions of interest in video frames. Through OpenCV functions, I manipulated these regions (using thresholding, binarization, and blurring) to better isolate the text from the noisy background. I also analyzed skew angles of text and deskewed them in order to get better results. These manipulated regions of interest were then passed into the open-source Tesseract OCR engine and the text output was fed to our Elasticsearch database.
Utilizing a Yolov3 model for object detection along with a Resnet50 model for the headcounter, the programs were able to be run on extracted frames from the videos of interest. The headcounter program utilized density maps to estimate the number of people in the frame. The resulting data was fed to our Elasticsearch database.
I also aided in the development of other video processing models such as our sentiment analysis model and our audio analysis model. I made the following visuals to illustrate the overall workflow and tech stack.