This is the 2nd project for the Python path of Openclassrooms. The goal is to web scrape from Books to Scrape and to extract the following things:
- All categories
- All products (books) in each category
- All required details for each product
Then it takes all the above and saves it all in separate lists within a CSV file (1 file for each category).
It also saves the image from each product and organizes them by the book's category, in a separate folder, with the name of the book it belongs to.
- Python version 3.9.5 or higher must be installed.
- Create the directory in which you want to keep the program.
- Open your terminal.
- Navigate to the folder with the main.py and requirements.txt
- Create your Virtual Environment by running the command:
python -m venv .venv
- Activate the Environment by running:
.venv\Scripts\activate.bat
(Windows) orsource .venv/bin/activate
(OS) - Install the Requirements by running the command:
pip install -r requirements.txt
- Open your terminal
- Navigate to the directory which contains the >main.py< file
- Activate the Environment by running:
.venv\Scripts\activate.bat
(Windows) orsource .venv/bin/activate
(OS) - Run the command:
python main.py
- While the program runs, it will continuously display the current category and the last title of each book it successfully saved.
- The program will create all files and folders automatically in the program directory.
-
Python version 3.9.5
-
Beautifulsoup4 version 4.9.3
-
Requests version 2.6.0