Bird species classification has received more and more attention in the field of computer vision, for its promising applications in biology and environmental studies. Recognizing bird species are difficult due to the challenges of discriminative region localization and fine-grained feature learning. In this paper, we have introduced a Transfer learning based method with multistage training. We have used both Pre-Trained Mask-RCNN and a ensemble model consists of Inception Nets (InceptionV3 net & InceptionResnetV2) to get both the localization and species of the bird from the images. we have tested our model in an Indian bird dataset consist of variable size, high-resolution images are taken from camera in various environments (like day, noon, evening etc.) with different perspectives and occlusions. Our final model achieves an F1 score of 0.5567 or 55.67% on that dataset.