Friday, 21 August 2020

So I heard you can do Computer Vision at 30FPS; I can do 1000.

- Akash James



And there was a man, in a cave, held captive and hooked up to an electromagnet plunged deep in his chest. Hammering his way through, quite literally, Stark, built his initial Arc Reactor and Mark 1 Iron Man suit, using nothing but a bucket of scrap and modern, tactical, self-guiding, explosive payload-carrying arrows, ergo missiles. Over-did it, didn’t I? Mesmerizing to most, the primitive propulsion system for un-guided flight and rudimentary weapons were not striking to engineers like us.

Stark kept going on, adding new capabilities to his armour, reaching peak performance with the Model Prime and finally calling it a day with the Mark 85. (More like Captain Marvel blasted him in Civil War 2 or the Gauntlet irradiated him, based on the cinematic or comic universe you prefer).

Just like arguably the best science-fiction-based inventor, I never stop with my creations and continue over-hauling for higher performance, ’cause I know that there will always be a higher ascension level to reach.

Computer Vision is a field with rapid progress; new techniques and higher accuracy coming out from various developers across the planet. Machines now have human-like perception capabilities, thanks to Deep Learning; with the ability to not only understand and derive information from digital image media but also create images from scratch with nothing but 0’s and 1's.

How did it begin?

Time and again, the higher tech-deities bring me at a point in this space-time continuum where I am faced with a conundrum. My team and I, back in our final year of college, were building a smart wearable for people with impaired vision, an AI-enabled extension of sorts to help the user with recognizing objects, recognizing people, and performing Optical Character Recognition; we called it Oculus. In all honesty, we did not rip it off from Facebook’s, Oculus Rift VR Headset and it was purely coincidental. The AI Engine was comprised of a multitude of classifiers, object detectors and image captioning neural networks running with TensorFlow and Python. With my simpleton knowledge of writing optimized code, everything was stacked sequentially, not allowing us to derive results in real-time, which was an absolute necessity of our wearable. Merely by running the entire stack on the GPU and using concurrent processes, I was able to achieve 30fps and derive real-time results.

Thus, this began my journey of being fast — real fast.

Ratcheting my way through

Fast forward two years to the present, I currently work as an AI Architect at Integration Wizards. My work predominantly revolves around creating a digital manifestation of the architecture I come up with for our flagship product — IRIS

Wondering what exactly IRIS does? (being Deadpool and breaking the 4th wall) To give you a gist, IRIS is a Computer Vision platform which provides our customers with the ability to quickly deploy solutions that monitor and detect violations. People counting and tracking with demographics, adherence to safety gear usage, person utilization, detection of fire, automatic number plate recognition and document text extraction are some of the features that come out-of-the-box. 

Typically, IRIS plugs into existing CCTV networks, rendering previously non-smart recording networks into real-time analytical entities. IRIS uses Deep Learning for it’s AI Engine but the architecture of the pipeline and the neural networks has seen many changes. My first notable architecture involved web technologies, like Flask and Gunicorn, to create APIs, that my worker threads could utilize. This ensured that the GPU was utilized in a better manner. However, this turned out to be moot when a large number of streams were to be processed.

The two primary hindrances were the API based architecture being a bottleneck under higher loads and the Object detection neural networks being heavy. For this, I needed something better, a better queue and processing architecture along with faster neural nets. Googling and surfing Reddit for a couple of days, I came across Apache Kafka, a publisher-subscriber message queue that is used for high data traffic. We retro-fit the architecture to push several thousand images per second from the CCTVs to the neural networks to achieve our analytical information. We devised another object detection model that was anchor-less and ran faster while retaining performance. Of course, the benchmark was against the infamous COCO dataset.
This increased our processing capability close to 200 fps on a single GPU.

The Turning point

Yes, you guessed it, I didn’t stop there. I knew that there was much more fire-power I could get; accessible but hidden in the trenches of Tensor cores and C++ (such a spoiler). The deities were calling me and my urge to find something better kept me burning the midnight fuel. And then, the pandemic happened.


WHO declared COVID-19 a global emergency — it ravaged through multiple countries and fear was being pushed down people’s throats; most offices transitioned into an indefinite work-from-home status and India imposed the world’s largest lockdown. Wearing masks and Social distancing was the new norm and everybody feared another Spanish flu of the 1900s. 

As an organization, we work with AI to be an extension of man, helping the human race to be better. Usage of face masks and social distancing needed enforcement and what better way to do it than with AI? Our stars aligned, the goals matched and we knew what we needed to build. The solution had to be light-weight and fast enough to run on low-end hardware or run on large HPC machines to analyze hundreds of CCTV cameras at once. For this, we needed an efficient pipeline and highly optimized models.

Hitting 1000 with Mask Detection and Social Distancing Enforcement

By now, I had a few tricks up my sleeve. IRIS’ pipeline now harnesses elements of GStreamer, which is an open-source, highly optimized, image/video media processing tool. TensorRT is something we used to speed up our neural networks on NVIDIA’s GPUs to properly utilize every ounce of performance we could push out. The entire pipeline is written with C++ with CUDA enabled code to parallelize operations. Finally, light-weight models — the person detector uses a smaller ResNet-like backbone and our Face Detector is just 999 kilobytes in size with a 95% result on the WiderFace dataset. Our person detector and Face Detector are INT8 and FP16 quantized making them much faster. With quantization and entire processing pipeline running on the GPU, amalgamating these together, IRIS’ new and shiny COVID-19 Enforcer ran at 1000 fps at peak performance for Social Distancing and 800fps for both Social Distancing and Mask Detection.

This allows us to deploy IRIS on smaller embedded devices to provide a cost-effective solution for retail-chains and stand-alone stores while letting us utilize multi-GPU setups to run on warehouses, shopping malls and city-wide CCTV networks making it easier to comply with and deny the spread of infection.

So what’s next?

I am not done. Achieving one milestone allows me to mark a bigger and better goal. Artificial Intelligence is in its infancy and being at the forefront of making it commercially viable and available in all markets, especially India has been mine and my organization’s vision. The endgame is to have AI for all, where people, be it developers or business-owners, have the ability to quickly design and deploy their own pipelines. 

IRIS aims at being a platform to precisely empower individuals with that, with the intention to democratize Artificial Intelligence, making it not a luxury for the few, rather a commodity for all. 

Chiselling AI agents to be the best tool that man has ever known will be our goal, paving the future with a legion of Intelligent agents, not making the world cold, but making us a smarter race. Ain’t nobody creating Ultron!

Thursday, 6 August 2020

What Enterprises can do to adapt to the new normal?

- Apoorva Verma



March 2020 drastically changed businesses, global economy, and every aspect of our daily lives. As business pivoted to home offices a lot of things changed. Now, as economies begin to reopen, “new normal” has become the buzzword that everyone is talking about. Yet, industries such as warehousing, manufacturing, construction, et al have to work with a large number of workers, on a daily basis at a particular premise, as they cannot all be working from home, given the manual nature of their jobs.

The need for risk assessment

This introduces new risks and risk assessment needs to be carefully re-evaluated. Thus, as most such companies try to identify and manage risks within their premises, the advent of COVID-19 has put the health and safety of the workers under even more scrutiny.

Risk assessment can help enterprises put controls in place that can prevent the spread of contagious diseases such as COVID-19 as well as other accidents and injuries. Since all organisations and industries are different, they take different approaches to carry out a risk assessment.

However, you could carry out the process in these five broad steps:

  1. Identify potential hazards in the premises
  2. Identify who could be at risk from those hazards
  3. Implement control measures by managing the risks
  4. Record the findings of your assessment
  5. Review the risk assessment on a regular basis

Also, it is better to involve ground-level workers in this process to ensure that you implement controls that are effective and safe.

Can technology help?

As most premises are fitted with CCTV cameras, it is best to turn these passive tools into active analytical tools. With a surge in the adoption of AI, technology such as computer vision can be deployed to implement and ensure workplace safety.

Computer vision is not only a contactless solution, but it is also free from human errors and prejudices. In addition to marking contactless attendance, eliminating biometric, it can detect face masks compliance, social distancing index, PPE compliance, and more. In case of a breach, real-time alerts can be sent to the right authorities to take immediate action. Such a system can improve operations in warehouses as much as it can ensure compliance in a manufacturing setting.

Since the cameras have constant access to data from live feeds, it can also generate hidden insights on machine utilization, efficiency and productivity of the staff and machines. Win-Win, isn’t it?

Wednesday, 5 August 2020

Biometric is passé, Go Contactless

- Apoorva Verma


Consider a scenario where a large steel plant with thousands of workers has restarted after the pandemic. The bio-metric machine for marking their attendance now becomes a high risk. Thus, given the highly contagious nature of the novel coronavirus, the use of biometric machines has become, more or less, a threat to safety.  

In addition to workplaces, the Indian government has encouraged the use of face recognition technology and it is being explored in airports, railway stations, schools, universities, and other places frequented in groups. For example, Rajiv Gandhi International Airport at Hyderabad recently became the first airport in India to initiate facial recognition while Bengaluru, Manmad and Bhusawal railway stations are in testing stages for implementing the face recognition technology.

In a scenario where touch is avoided at the most, contactless has become the need of the hour for most enterprises. In fact, a spike is predicted in facial recognition technology. 

With the pandemic seeing ubiquitous adoption of masks and other protective gear that may partially cover faces, a solution that can recognise faces even with masks will meet the current demands of the world.

LogMyFace is an app owned by Integration Wizards Solutions, developed to help enterprises manage employee attendance while proving safe and convenient for employees. Unlike a biometric which requires touch, this app marks attendance from phones and tablets.

Using a phone for facial recognition, the employees can log their attendance from any of the predefined locations - be it client location, office, or home.

Also, if synced with the companies’ CCTV cameras at entry/exit, it can recognize faces from a distance of two metres or less. It will track, log, and recognise the face liveness, gender, age and emotions. It offers 98% accuracy and provides reports and dashboards for detailed audits.

Connect with our team today and unlock the possibilities to explore 'contactless attendance' in your organisation.