Skip to main content

Up coming

Pyscript

PyScript PyScript is a framework that allows users to create rich Python applications in the browser using HTML’s interface. PyScript aims to give users a first-class programming language that has consistent styling rules, is more expressive, and is easier to learn. What is PyScript?  Well, here are some of the core components: Python in the browser:  Enable drop-in content, external file hosting (made possible by the  Pyodide project , thank you!), and application hosting without the reliance on server-side configuration Python ecosystem:  Run many popular packages of Python and the scientific stack (such as numpy, pandas, scikit-learn, and more) Python with JavaScript:  Bi-directional communication between Python and Javascript objects and namespaces Environment management: Allow users to define what packages and files to include for the page code to run Visual application development:  Use readily available curated UI components, such as buttons, contain...

The Complete Collection Of Data Repositories - Part 1

 

The Complete Collection Of Data Repositories – Part 1

Check out the collection of the best data repositories on agriculture, audio, biology, climate, computer vision, economics, education, energy, finance, and government.



The Complete Collection Of Data Repositories - Part 1
Image by Author

 

Editor's note: For the full scope of repositories included in this 2 part series, please see The Complete Collection Of Data Repositories – Part 2.

 

Finding the data that works for your business can take up a lot of time. There are several data-sharing platforms that are offering a wide variety of data datasets, but they can’t provide you with a dataset for a specific field of study. That's why I have created a list of data repositories, which will help you find any dataset without searching on the internet. A single data repository consists of multiple datasets for a particular field of study.

The collection of data repositories is divided into 2 parts, which consist of 20 categories based on various fields of science. Most of the data sources listed below are free. However, some are not. It took me more than 2 days to collect the repositories, which are in high quality and easily downloadable. I used duckduckgo.com to search for most resources, but the majority of repositories are from Awesome Public Datasets and KDnuggets.

 
In the first part we will be covering:

  1. Agriculture
  2. Audio
  3. Biology
  4. Climate
  5. Computer Vision
  6. Economics
  7. Education
  8. Energy
  9. Finance
  10. Government

 

Agriculture

 
In this category, the datasets are mostly related to crop monitoring, remote sensing indices, grain size, geochemistry, soil, and sediment analysis. The dataset is mostly in tabular form, but you can also find visual data for monitoring crops and detecting weeds in the crop field.  

 

Audio

 
The audio repositories are rich and can be used for automatic speech recognition, text to speech, songs classification, emotion detection, translation, and detecting hate speech. This is a gold mine for any beginners or mid-size company to develop state-of-the-art solutions. 

 

Biology

 
The biology category mostly consists of images of cells, cancer cells, types of genomes, genes, and protein structure. You can use them to generate new strains of viruses or come up with life-saving drugs. Most of the datasets are for research purposes and can be easily downloadable directly. 

 

Climate

 
The climate repositories contain satellite imagery, time-series data of winds and temperature, global weather, and climate spatial data. You can use it to forecast weather, monitor the effects of global warms, and detect natural disasters. 

 

The Complete Collection Of Data Repositories - Part 1
Image by Freepik

 

Computer Vision

 
Computer Vision is highly in demand. Companies are developing all kinds of solutions to improve current processes or create new services such as warehouse management, self-driving cars, face detection, generative art, and robots.  

 

Economics

 
The world economics data consist of trade statistics, human development index, geospatial data of food supplies, and macroeconomics data. You can use them to analyze current trade deficits and forecast countries' development.

 

Education

 
In the educational category, you can find the data on student’s assessments, report cards, college performance, graduation rate, and surveys filled by individual students, school principals, and parents.  

 

Energy

 
The energy category is filled with global power consumption, smart meter data from various buildings, and the power station's energy production rate. We can use it to strategize the implementations of renewable energy, save cost on electricity, and cater to the high demand of global energy consumption. 

 

The Complete Collection Of Data Repositories - Part 1
Image by rawpixel.com

 

Finance

 
In this section, you can find data on debts, banking statistics, GDP, exchange rate, consumer price, and much more. Finance is the backbone of the modern economy, and to create a stable economy, we can use this data to predict the next financial crisis, detect crimes, and forecast stock prices.

 

Government

 
You can find government data on any country, state, or even county. Many government officials promote fairness and inclusiveness by sharing the data with the public. The most prominent data sets are from the US, India, Canada, New Zealand, and the UN. These data have all kinds of information from crime to food security. 

 

Conclusion

 
In this blog, we have covered 10 categories of data repositories. We have also discovered the type of datasets and their use case. These datasets are a goldmine, and you can't find them on Kaggle or any general sites. Most data scientists search on either Kaggle or on Google to get a dataset, and sometimes they are happy with what we get. They spend most of the time cleaning and augmenting data instead of looking for better data resources. This changes everything because I am going to use my collection of repositories to find what I am looking for.   
In the second part, we will be looking at healthcare, natural language, neuroscience, physics, social network, sports, time series, transportation, miscellaneous, and super data repositories.

Comments

Popular posts from this blog

Data Science Courses — 2022 Guide & Reviews

  Top 8 Online Data Science Courses — 2022 Guide & Reviews Learn data science online this year by taking one of these top-ranked courses LearnDataSci is reader-supported. When you purchase through links on our site, earned commissions help support our team of writers, researchers, and designers at no extra cost to you. Over the course of several years and 100+ hours watching course videos, engaging with quizzes and assignments, reading reviews on various aggregators and forums, I’ve narrowed down the best data science courses available to the list below. This is a fairly long article with reviews of each course, so here’s the  TL;DR: 8 Best Data Science Courses & Certifications for 2022: Data Science Specialization  — JHU @ Coursera Introduction to Data Science  — Metis Applied Data Science with Python Specialization  — UMich @ Coursera Data Science MicroMasters  — UC San Diego @ edX Dataquest Statistics and Data Science MicroMasters ...

Pyscript

PyScript PyScript is a framework that allows users to create rich Python applications in the browser using HTML’s interface. PyScript aims to give users a first-class programming language that has consistent styling rules, is more expressive, and is easier to learn. What is PyScript?  Well, here are some of the core components: Python in the browser:  Enable drop-in content, external file hosting (made possible by the  Pyodide project , thank you!), and application hosting without the reliance on server-side configuration Python ecosystem:  Run many popular packages of Python and the scientific stack (such as numpy, pandas, scikit-learn, and more) Python with JavaScript:  Bi-directional communication between Python and Javascript objects and namespaces Environment management: Allow users to define what packages and files to include for the page code to run Visual application development:  Use readily available curated UI components, such as buttons, contain...

Top data Science Interview Questions And Answers

DATA SCIENCE SCHOOL Top Data Science Interview Questions And Answers Data Science is among the leading and most popular technologies in the world today. Major organizations are hiring professionals in this field. With the high demand and low availability of these professionals, Data Scientists are among the highest-paid IT professionals. This Data Science Interview preparation blog includes the most frequently asked questions in Data Science job interviews. Here is a list of these popular Data Science interview questions: Q1. What is Data Science? Q2. Differentiate between Data Analytics and Data Science Q3. What do you understand about linear regression? Q4. What do you understand by logistic regression? Q5. What is a confusion matrix? Q6. What do you understand by true-positive rate and false-positive rate? Q7. How is Data Science different from traditional application programming? Q8. Explain the difference between Supervised and Unsupervised Learning. Q9. What is the di...