Big Data in Construction. Extract Data from PDF.

  • 0.0
1.5 hours on-demand video
$ 74.99

Brief Introduction

Convert PDF documents to Text and Graphics. Data Visualisation. Python OCR. Practical Step-by-Step Course for Beginners.

Description

This course is intended to be an initiation to learn #BigData and #MachineLearning with #Python programming for absolute beginners that have no background in programming.

In this course, we will step by step, using the example of real data, we will go through the main processes related to the topic "Big data and machine learning".  Since the material turned out to be voluminous, I divided the course into five parts.


This part - the first part is devoted to the collection and extraction of data from documents. In this course, you will learn how to extract data from PDF documents, drawings and any other documents in PDF format.

⇉ We will work on real data. We will have two sets of data consisting of PDF files that we will transform to the text and to tabular form. We will visualize the received data on the Kaggle platform using python libraries, which will help us to depict our received data in a graphical format.

⇉ During the training process, we will install Python and such libraries as Pandas, seaborn, matplotlib and others. We will upload the received data to the Kaggle platform and here using the “Jupiter Notebook” we will visualize our data and at the end, we will upload our data to the GitHub platform.


Topics covered in this course:

Lecture 2. Python. Choosing python IDE. Anaconda. Install Python.

  • How to convert a PDF to text?

  • Python or Anaconda?

  • What is the best Python IDE for beginners?

  • How do I install VS Code?

  • How do I install Python?

  • How to run Python in VS Code?

  • How does Python interpreter choose VS code?

Lecture 3. 1st Dataset. PDF files. Tika OCR. Extracting content and metadata.

  • How do I convert a PDF to TXT in Python?

  • How can I iterate over files in a given directory?

  • Install Apache Tika on Windows.

  • How to split a string into a list?

  • Remove blank strings from a list?

Lecture 4. Regular Expression in Python. Pattern matching in Python.

  • What is regular expression with example?

  • How to match regular expression in Python?

  • Debug a regular expression in Python?

  • What is the regular expression for date format?

  • How do you check if an array contains a regular expression?

  • Create loop with regular expression.

Lecture 5. Array und Function in Python. Add data to Array. Create function.

  • How do you add a string to an array?

  • How do you find the index of an element in a list?

  • How can I extract the date from a string?

  • How to declare and add items to an array in Python?

  • How do you write a function in Python?

Lecture 6. Pandas DataFrame. Two-dimensional size-mutable, tabular data structure.

  • Install pandas on Python

  • How do I create a pandas DataFrame?

  • Reduce number of columns in a pandas DataFrame

  • Combine column values into a list in a new column

  • How to convert array into DataFrame in Python?

  • How to change column names in pandas Dataframe?

  • Save a Dataframe as CSV table

Lecture 7. Kaggle. Jupiter Notebook. Create an account. Plotting with matplotlib and seaborn.

  • Upload a file to kaggle kernel

  • How do you use kaggle dataset?

  • Run Jupyter notebook using Kaggle kernels

  • Convert a CSV to dataframe in Python Jupyter Notebook

  • How to use the functions of Pandas Dataframe?

  • Change the date format of a column in pandas

  • How do I convert a string to datetime Objects in Python?

  • Calculate Difference Between Two Dates in Pandas Dataframe

  • How do I delete a column in pandas DataFrame?

  • Add columns in pandas DataFrame?

  • How do you visualize a dataset?

  • How do you plot a DataFrame in pandas?

Lecture 8. 2nd Dataset. Task. Data from PDF. Getting data from PDF drawings.

  • Independent Work Tasks

  • Learn to Code - on real data (16 PDF files to chart)

  • A brief overview of the data in the task

Lecture 9. 2nd Dataset. My solution.

  • This is my solution.

  • It may seem very simple and perhaps not the most effective.

Lecture 10. GitHub. Desktop GitHub. Store and manage code

  • What is GitHub and how do you use it?

  • What can I use GitHub for?

  • How do I upload files to GitHub?

  • Install GitHub Desktop

  • How to sync with a remote Git repository?

  • Adding a repository from your local computer to GitHub


⇛ This is a practical course where we will analyse the process of data extraction step-by-step. In this course, you will go through all the steps from installing python to data visualization on the Kaggle platform.


When I got acquainted with the topic “Big Data and Machine Learning” myself, I often came across problems when installing software and errors while installing various libraries and tools for working with big data.

⇉ It took me a lot of time to find the right solutions and I would like to save this time for you.

⚐ To understand the topic, I had to look for a large number of questions for which I received non-targeted answers. In this course, you will find answers to basic questions that are related to the topic of Big Data and Machine Learning. Part 1: Extract Data from PDF.


Requirements

  • Requirements
  • You need only the installed Windows System.
  • You do not need any special programming knowledge or theoretical knowledge of Python.

Knowledge

  • How to convert a PDF to text?
  • How do I install Python?
  • How do you visualize a dataset?
  • What is GitHub and how do you use it?
  • What is regular expression?
  • How do I install VS Code?
  • How to run Python in VS Code?
  • How do you use kaggle dataset?
  • How to install pandas on Python?
  • How do I convert a PDF to TXT in Python?
  • What is the best Python IDE for beginners?
  • How can I iterate over files in a given directory?
  • How to Install Apache Tika on Windows?
  • How to split a string into a list?
  • How do I remove blank strings from a list?
  • How does Python interpreter choose VS code?
  • How to match regular expression in Python?
  • How can I debug a regular expression in Python?
  • What is the regular expression for date format?
  • How do you check if an array contains a regular expression?
  • How to create loop with regular expression?
  • How do you add a string to an array?
  • How do you find the index of an element in a list?
  • How can I extract the date from a string?
  • How do you write a function in Python?
  • How do I create a pandas DataFrame?
  • How to reduce number of columns in a pandas DataFrame?
  • How to convert array into DataFrame in Python?
  • How to change column names in pandas Dataframe?
  • How do I save a Dataframe as CSV table?
  • How do I upload a file to kaggle kernel?
  • How to run Jupyter notebook using Kaggle kernels?
  • How to convert a CSV to dataframe in Python Jupyter Notebook?
  • How do I change the date format of a column in pandas?
  • How do I convert a string to datetime Objects in Python?
  • How to Calculate Difference Between Two Dates in Pandas Dataframe?
  • How do I delete a column in pandas DataFrame?
  • How do I add columns in pandas DataFrame?
  • How do you plot a DataFrame in pandas?
  • What can I use GitHub for?
  • How do I upload files to GitHub?
  • How to install GitHub Desktop?
  • How to sync with a remote Git repository?
  • Ho adding a repository from your local computer to GitHub?
$ 74.99
English
Available now
1.5 hours on-demand video
Boiko Artem
Udemy

Instructor

Share
Saved Course list
Cancel
Get Course Update
Computer Courses