SAMPLE BIG DATA CSV FILES

Data Community

filter by
Crime in Denver: What are the most frequent crimes?
Rows:
496,402
Post Date:
Sep 9, 2022
Data Description and Source

The data was collected here, and became available here. With this information, we can have a better understanding of what crimes are occuring, and where. With this, we can prepare ourselves for these crimes, and also try to prevent them. We do have to keep in mind that this information is from Denver, and crimes likely vary significantly by state. However, we could use this data with other data to understand how crimes vary.

Source
Fantasy Draft '22 Cheat Sheet: Win your draft using these consolidated rankings!
Rows:
655
Post Date:
Sep 2, 2022
Data Description and Source

Fantasy Pros•Depth Charts: https://www.fantasypros.com/nfl/depth-charts.php•Fantasy Football Draft Rankings: https://www.fantasypros.com/nfl/rankings/half-point-ppr-cheatsheets.phpRoto Baller•Rookie Rankings: https://www.rotoballer.com/updated-nfl-rookies-fantasy-football-rankings-for-2022-drafts-september-1st/1061141Yahoo•Fantasy Rankings: https://sports.yahoo.com/fantasy-football-draft-rankings-for-2022-nfl-season-142617825.htmlCBS•https://www.cbssports.com/fantasy/football/rankings/ppr/top200/

Source
Wordle Answers: Cheater, cheater, pumpkin eater!
Rows:
2,309
Post Date:
Sep 1, 2022
Data Description and Source

The New York Times maintains an updated list of the potential five-letter solves. This list can be found here: https://static.nytimes.com/newsgraphics/2022/01/25/wordle-solver/assets/solutions.txt.

Source
League of Legends: Master Rank game data. No noobs allowed!
Rows:
107,125
Post Date:
Aug 19, 2022
Data Description and Source

This dataset contains information of over 100000 games played in the Masters rank. This information includes the match’s game id, duration, and both teams’ kills and objectives, such as first blood, first tower, first baron, total number of dragon kills, total wards placed and killed, total kills, assists, and deaths, total gold gained, and more. This dataset was collected from Kaggle and can be found here. The data can be used to visualize how games at high ranks play out and look at the many factors that could go into a team’s win or loss, like any correlation on which objective specifically helps a team win.

Source
IMDB Movie Reviews: Understand what to watch next.
Rows:
7,357,888
Post Date:
Aug 4, 2022
Data Description and Source

This dataset includes over seven million TV shows and movies from the IMDB dataset. The data includes the primary title, original title, title type (tv show, movie, etc), start year, end year, runtime, genre, and more. With this information, you can find the perfect TV show or movie for any audience. It is a great place to find new films, especially if you know broadly what you are looking for. We can also look at this dataset with an analytic view and try to understand how the titles of movies are changing through time. This data was provided by IMDB here.

Source
Motorcycle Data: What is the fastest motorcycle?
Rows:
38,624
Post Date:
Aug 2, 2022
Data Description and Source

This dataset includes bike models, old and new, as well as stats about each model. This dataset was scraped from Bikez.com, and became available here. With this dataset, individuals interested in motorcycles, or the specs surrounding them, can research whatever they want. Whether you are looking for a new bike, or just want to know more, this dataset provides the tools for an extremely thorough look at each motorcycle, and at all bikes as a whole.

Source
Cost of College: The cost of higher education
Rows:
3,203
Post Date:
Jul 26, 2022
Data Description and Source

Compiled from the National Center of Education Statistics Annual Digest and became available here. Specifically, Table 330.20: Average undergraduate tuition and fees and room and board rates charged for full-time students in degree-granting postsecondary institutions, by control and level of institution and state or jurisdictionThis dataset allows us to have a significantly better understanding of the costs involved in a college degree.Further analysis we could do with such a dataset includes:Finding the state with the lowest average room and board priceCompare our findings in this dataset with another that includes other information about schools in specific states.Determine the state with the best return on investment based on average first year salary and compare that to costs.

Source
Anime Analytics: Analyzing the global anime scene
Rows:
24,012
Post Date:
Jul 21, 2022
Data Description and Source

This data was scraped from MAL (MyAnimeList), and then became available here. MyAnimeList is an anime and manga social networking and social cataloging application website run by volunteers. The site provides its users with a list-like system to organize and score anime and manga. With this information, users can find out more about their favorite animes, look for new anime to watch in the future, or even use an analytical lens and try to understand what makes successful shows successful. This dataset includes the title, type, mean rating, number of scoring users, status, number of episodes, start date, end date, source, and SO much more.This dataset was built using the Philadelphia Federal Reserve's State Coincident Indices and the Bry-Boschan Method for business cycle dating. It then became available here. In the tradition of Owyang, Piger, et al. business cycles are calculated on the state level which provides interesting analysis opportunities for looking at recession timing for different regions or sectors present in different states. With this information, we could look to predict future recessions, or try to understand why they occurred in the past.

Source
Air Quality: What we breathe is important.
Rows:
5,617,325
Post Date:
Jul 19, 2022
Data Description and Source

AQI or Air Quality Index is the primary way to measure the current quality of the air. AQI values range from 0-500 with 0 being perfectly healthy and 500 being extremely hazardous. AQI values are derived from moving averages/current values of PM2.5 (particulate matter), PM10, Ozone, Carbon Monoxide, Sulfur Dioxide, and Nitrogen Dioxide levels. This dataset was created using Locational Data from: https://simplemaps.com/data/us-cities and AQI Data from: https://aqs.epa.gov/aqsweb/airdata/download_files.html. The dataset then became available here.With this dataset, we can look at what areas are the most dangerous, how the air quality changes over time, and how an area’s pollution levels change over time.

Source
Health Insurance Prices: Transparency in Coverage Data
Rows:
5,446,583
Post Date:
Jul 15, 2022
Data Description and Source

The file comes from Aetna's Machine Readable Transparency in Coverage website.We downloaded the first Life Insurance file:Plan Name: Aetna CVS Bronze: Low-Cost MinuteClinic Visits- Telehealth- Roanoke- Ped DentalFile Name: 2022-07-01_f42d21fd-3576-4569-b0c7-20253bccc7fe_Aetna-Life-insurance-Company.json.gzFile Type: In Network RatesPlan ID: 38234VA0180009

Source
NFL Player Data: Remember Ike Peterson from '35? Neither do we!
Rows:
17,172
Post Date:
Jul 15, 2022
Data Description and Source

The data was scraped using a Python code. The code can be located at Github: https://github.com/kendallgillies/NFL-Statistics-Scrape, the data then became available here. With this dataset, we can look at the characteristics of existing (and past) NFL players to have a better understanding of the sport. With this information, we can also predict the positioning or quality of new players. Lastly, this information could be used for fantasy football, by having a better understanding of how players rank in comparison to other players in the same position.

Source
NHL Stats: Strap on your skates and hockey hair!
Rows:
26,305
Post Date:
Jun 27, 2022
Data Description and Source

Thanks to Kevin Sidwar who began documenting the still un-documented NHL stats API which was used to gather this data, and became available here.This dataset provides an in depth look at the performance of teams in the NHL during the 2020-2021 season. With this information, we can try to predict the outcome of games, or performance of teams. Team’s names and other information are held in a separate sheet here

Source
Bigfoot Sightings: Where's the Squatch?
Rows:
5,467
Post Date:
Jun 15, 2022
Data Description and Source

This dataset includes the type of the report, the ID, class, submission date, headline, year, season, month, state, county, location details, nearest town, nearest road, and other information for each sighting. With all of this, you can get a great sense of where these sightings occur, and more importantly, what is being sighted. This data originated from The Bigfoot Field Researchers Organization, and became available on kaggle here.

Source
Collectible Sneakers: Sneakerheads and their thriving marketplace
Rows:
99,956
Post Date:
Jun 15, 2022
Data Description and Source

This data became available for the StockX sneaker data contest in 2019 and was sourced here. Unfortunately, this data only includes the sales of Yeezys and Off-White footwear. That being said, we can apply the knowledge we gained from this dataset to a broader one if we need to. This dataset includes data from 9/1/17 to 2/13/19. It includes order date, brand, sneaker name, retail price, sale price, release date, shoe size, and buyer region.

Source
Avocado Prices: Avocad-OHH my, it costs how much?
Rows:
18,249
Post Date:
Jun 15, 2022
Data Description and Source

This data was downloaded from the Hass Avocado Board website in May of 2018 & compiled into a single CSV then became available here. Here's how the Hass Avocado Board describes the data on their website:The table below represents weekly 2018 retail scan data for National retail volume (units) and price. Retail scan data comes directly from retailers’ cash registers based on actual retail sales of Hass avocados. Starting in 2013, the table below reflects an expanded, multi-outlet retail data set. Multi-outlet reporting includes an aggregation of the following channels: grocery, mass, club, drug, dollar and military. The Average Price (of avocados) in the table reflects a per unit (per avocado) cost, even when multiple units (avocados) are sold in bags. The Product Lookup codes (PLU’s) in the table are only for Hass avocados. Other varieties of avocados (e.g. greenskins) are not included in this table.The data includes the following columns:•Date - The date of the observation•AveragePrice - the average price of a single avocado•type - conventional or organic•year - the year•Region - the city or region of the observation•Total Volume - Total number of avocados sold•4046 - Total number of avocados with PLU 4046 sold•4225 - Total number of avocados with PLU 4225 sold•4770 - Total number of avocados with PLU 4770 soldWith this information, one can better understand the avocado market and the fluctuations within it. This data also details the region in which these avocados are being sold. This allows us to better understand where to get the cheapest avocados.

Source
Data Science Jobs: What would you say.. you do here?
Rows:
6,964
Post Date:
Jun 15, 2022
Data Description and Source

This dataset includes position title, company name, job description, number of reviews for the company, and location of job. It was sourced from here. With this data, you can answer questions like:1.Who gets hired? What kind of talent do employers want when they are hiring a data scientist?2.Which location has the most opportunities?3.What skills, tools, degrees or majors do employers want the most for data scientists?4.What's the difference between data scientist, data engineer and data analyst?5.Can you develop an efficient classification algorithm to differentiate the three job types above?

Source
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

The ease of a spreadsheet with the power of a database, at cloud scale.

No Code
No Database
No Credit Card
Sign Up, Free Forever

By using this website, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.