Spaces:

shwetashweta05
/

Zero_to_Hero_Machine_Learning

Sleeping

shwetashweta05 commited on Dec 11, 2024

Commit

ffe927c

verified ·

1 Parent(s): b5361bd

Update pages/6.Data Collection.py

Files changed (1) hide show

pages/6.Data Collection.py CHANGED Viewed

@@ -122,4 +122,39 @@ if data_type == "Structured":
                     mime="application/octet-stream",
                 )
-# Add similar sections for "Unstructured" and "Semi-Structured" data types as needed.

                     mime="application/octet-stream",
                 )
+# Add similar sections for "Unstructured" and "Semi-Structured" data types as needed.
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": "## Excel Data Format\n\n### What is Excel?\nExcel is a tabular data format commonly used in business and analytics, with extensions `.xls` and `.xlsx`."
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": "### How to Read Excel Files\nUse the `pandas` library to read Excel files:"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "import pandas as pd\n\ndf = pd.read_excel(\"example.xlsx\")\nprint(df.head())"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": "### Common Issues\n1. Missing Data\n2. Encoding Problems\n3. File Corruption\n4. Large Files"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": "### How to Overcome Issues\n1. Use data imputation methods for missing data.\n2. Specify encoding when reading files (`encoding='utf-8'`).\n3. Repair or convert corrupted files.\n4. Process large files in chunks with `pandas`."
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 2
+}