udacity data wrangling with sql course how to extract the bz2 file from metro extracts

by Justina Hettinger 8 min read

What is Udacity and why should I use it?

What is data wrangling in DBMS?

Why choose Udacity's intro to programming?

Introduction

Real-world data rarely comes clean. Using Python and its libraries, you will gather data from a variety of sources and in a variety of formats, assess its quality and tidiness, then clean it. This is called data wrangling.

What Software Do I Need?

You need to be able to work in a Jupyter Notebook on your computer. Please revisit our Jupyter Notebook and Anaconda tutorials earlier in the Nanodegree program for installation instructions.

How to perform analysis on data stored in relational and non-relational database systems?

Begin by leveraging the power of SQL commands, functions, and data cleaning methodologies to join, aggregate, and clean tables, as well as complete performance tune analysis to provide strategic business recommendations. Finally, apply relational database management techniques to normalize data schemas in order to build the supporting data structures for a social news aggregator.

Who is Ziad from DecodeMTL?

Ziad is a seasoned software developer who loves mentoring and teaching. Currently working as an independent contractor, he previously co-founded and taught full-stack web development at DecodeMTL, Montreal's first web development bootcamp.

Project Details

This project is connected to the Data Wrangling course from Udacity. You have the choice between two databases for this project: SQL and MongoDB. I opted to utilize SQL. The project is broken into steps below.

Running the Code

To see the final code and analysis use this link: https://github.com/AdkinsWx/OpenStreetMap_Udacity/blob/master/Final_Code.ipynb

Here's what you should do

Make sure all programming exercises are solved correctly in the "Case Study: OpenStreetMap Data" Lesson in the course you have chosen (MongoDB or SQL). This is the last lesson in that section.

Step One - Complete Programming Exercises

  • Make sure all programming exercises are solved correctly in the "Case Study: OpenStreetMap Data" Lesson in the course you have chosen (MongoDB or SQL). This is the last lesson in that section.
See more on github.com

Step Two - Review The Rubric and Sample Project

  • The Project Rubric will be used to evaluate your project. It will need to Meet Specifications for all the criteria listed. Here is an example of what your final report could look like if you choose the SQL option: SQL Sample Project
See more on github.com

Step Three - Choose Your Map Area

  • Choose any area of the world from https://www.openstreetmap.org, and download a XML OSM dataset. The dataset should be at least 50MB in size (uncompressed). We recommend using one of the following methods to download a dataset: Use the Overpass API to download a custom square area. Explanation of the syntax can found in the wiki. In general you wil...
See more on github.com

Step Six - Document Your Work

  • Create a document (pdf, html) that directly addresses the following sections from the Project Rubric. Problems encountered in your mapOverview of the DataOther ideas about the datasetsTry to include snippets of code and problematic tags (see SQL Sample Project) and visualizations in your report if they are applicable. Use the following code to take a systematic sample of element…
See more on github.com