Real-time face detection and tracking systems suffer from low accuracy and slow processing speed that lead to poor robustness. This problem is vital in real-time setups including human robot interactions (HRI) and video analysis systems. This paper presents margin-based region of interest (MROI) approach to speed up the processing time. Further a hybrid approach is also presented that combines Multi-task Convolutional Neural Networks (MTCNN) with template matching to improve face detection accuracy. The MROI approach which is responsible to speed up the processing time is presented in two variants with fixed and dynamic margin concepts. Dataset used in this work comprises of twenty RGB video files. Each video file is fifteen seconds long and been created from real-life videos containing a person in lecture delivering environment. Each video file contains a person in which the person moves, turns head and the camera also moves. The highest face detection and tracking accuracy achieved in this paper is 99.31% with a processing time of 14.93 milliseconds per frame. The proposed hybrid algorithm significantly improves the ability to detect and track faces in sideway orientation apart from frontal face. The proposed algorithm has the ability to process above 65 frames per second (FPS). The system presented has increased FPS processing ability by more than 400% as well as given boost to the accuracy if compared to the conventional MTCNN full frame scanning techniques.