Predicting COVID-19 Cases Using Machine Learning

Sahit Tallapragada
7 min readAug 3, 2020

Experience Working at Verzeo

I joined Verzeo as a intern. Due to the current situation of COVID-19 , the entire Country was under the lockdown and we had our internship in online through training sessions from the web portal and mentoring by Video call. Before getting into this field , i had a very mixed thoughts about as how it is going to be being a developer in this industry . This intern helped me to work in area’s with complete freedom to learn and explore new skills and enhance my knowledge, by giving this opportunity for me to gain experience.

So far at my internship at Verzeo, I have had a lot of hands-on experience and this was one of my favorite things that I have been able to do. This was a great learning experience for me, I came to the conclusion that this is something I could see myself implementing things like these in the future using Machine Learning algorithms. The Project was a good design and given to us for enhancing our knowledge as well about the current pandemic. We were trained to handle each and tasks by the Mentor and how it is performed in the company , and as well share the core values about the company.

I was a part of the Team along with Mr.Satya Kunda in this project, he along with the Mentor gave me immense support and motivation in getting the success rate for our project. I believe that equal contribution towards work made us achieve what we wanted to do, for that we are very thankful for Verzeo Company to provide this opportunity to us and Mr. Satya Kunda for being a huge support for the completion of the project .

About Company

Verzeo is an AI-Based online finding out stage that gives understudies a a ways that attaining mastering trip to help set them up trade. With get admission to to the trade specialists, online Courses and combined learning, it awards understudies to find out Here and Lead anyplace.

Verzeo has teamed up with precise huge cheeses to make an unmistakable stage. With AI-based programming at its inside, it affords a associated herbal shape on hand from any area and through anybody.

Verzeo goes about as a indistinct guide for understudies, making channels to release their getting to know potential. It presents get right of entry to to a extensive gathering of arranging programs, hackathons, and undertakings. These workout routines are natural, prepare masterminded and provide get entry to to friends and specialists. With Verzeo, you can make sure about Internships and Position Opportunities consummately.

Learning via Verzeo is fun, sensible and supportive attracting understudies to Lead Anywhere, Anyplace and Everywhere.

Project Overview

The Machine Learning Project which we have got is “ Predication of COVID-19 Cases” , here we have to predict the increase or decrease of the case in the location on a day to day bases .

According to the problem statement which was given by the company , we are told to take India as the location . The dataset was given to use by the company.We are doing by using utilising Machine Learning figuring via using Python on the stage known as google colab. We have performed Linear Regression and Random Forest Regressor .

For imagining this we masterminded the dataset utilising google colab, with the aid of taking the fine precision rating we had the choice to foresee the instances which are being change. Affiliation has given us Logistic fall away from the belief and elective backwoods Regressor Algorithms to set up our dataset and we have done a 78% accuracy rating in straight lose the faith.

Solution

Utilizing metric limit unit procedures, the pc figures out how to utilize examples or “preparing tests” in data (handled data) to anticipate or manufacture smart decisions while not open planning. Time-arrangement square measure data groupings gathered over it slow,which can be utilized as contributions to metric limit unit calculations. this sort of information mirrors the progressions that an advancement has experienced after some time.

We have imported Pandas Library in Python and pre-processed the data and cleaned it by mode, mean, median. In a much more precise way, for numerical columns, we used to mean to replace null values and for categorical columns, we used mode to replace the values. When the column has 50% of its null values, then the columns will be dropped from the training dataset. The date column is changed to the ordinal column or numerical column to train the dataset. All the categorical columns were dropped. The “total_cases” column selected as a target variable and other columns are selected as features including the date column .

Using sklearn Library, we were able to split and train the dataset with 70% of Training and 30% of testing. Liner Regression algorithm was performed to train the model. The accuracy score was 78% successfully achieved. The Random forest Regressor was used and the received accuracy was 22%, which was low. So, we based on that we have chosen Linear Regression to predict the total new cases for the new dates.

  1. Loading covid-19 dataset:-

2. In the Covid-19 Dataset there are 33 columns are as follows :-

3. Company instructed us to focus on India , so we took India as the location :-

4. Glimpse of subset is selected India as a Location :-

5. Finding Unique Lables :-

6. Dropping all categorical columns :-

7. Coming to Pre-processing of Data , we are going to Handle missing values as follows :-

8. We have dropped test units column because more than 50% null values are present in the Dataset :-

9. Covert Date column to ordinal :-

10. Select “total_cases” column as the target variable(y) :-

11. Splitting the Data for Training the Dataset :-

- Using sklearn library, the model is splitted into 70–30 ratio present in our dataset

12. Performing Linear Regression Model :-

13. Performaing Random Forest Regression Model :-

Linear Regression has Highest Accuracy of 78%. So, this model is best suited to predict new cases.

Conclusion

We can conclude that one of the classifiers is able to accurately classify the with an accuracy of 78% based on a Linear regression algorithm. Today’s world is not under our control due to virus called coronavirus also known as COVID-19. What makes it more dangerous is it is the first time that this virus is seen this humans and the more threatening part is there is no vaccine for this virus(CoronaVirus) till now to keep us protected knowing the situation near him/her is very necessary right at this moment so that him/her can avoid to visit the places where the cases are found.

With our models it makes more easy to know the situation near him. Everyone has their own fear inside them to go out and lead normal life like before either it is not safe. This essay intended to use machine learning models for the pandamic analysis through a dataset from verzeo. Which we have trained the data for predicting for this we used google colab, by taking the best accuracy score. We were able to predict the cases which are being vary day by day. So we have trained the model with this one can get to know whether there any cases near them this will be helpful to lead lives in little peace if they don’t find any cases near them and this is also helpful to take an intensive care if the user can find any cases near them. With this the user can also understand whether the area is safe or not by knowing the situation near him/her.

Company has also provided logistic regression and random forest Regression Algorithm to train our dataset and we have achieved the best score accuracy of 78% in linear regression. In India, with this prediction the increased case of COVID-19 can be degraded by lessening the number of sensitive individuals from infected people. This new normal is obtainable by becoming unsocial and supporting the lockdown regulation with control.

--

--