Update pages/6.Data Collection.py
Browse files- pages/6.Data Collection.py +36 -1
pages/6.Data Collection.py
CHANGED
@@ -122,4 +122,39 @@ if data_type == "Structured":
|
|
122 |
mime="application/octet-stream",
|
123 |
)
|
124 |
|
125 |
-
# Add similar sections for "Unstructured" and "Semi-Structured" data types as needed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
122 |
mime="application/octet-stream",
|
123 |
)
|
124 |
|
125 |
+
# Add similar sections for "Unstructured" and "Semi-Structured" data types as needed.
|
126 |
+
|
127 |
+
{
|
128 |
+
"cells": [
|
129 |
+
{
|
130 |
+
"cell_type": "markdown",
|
131 |
+
"metadata": {},
|
132 |
+
"source": "## Excel Data Format\n\n### What is Excel?\nExcel is a tabular data format commonly used in business and analytics, with extensions `.xls` and `.xlsx`."
|
133 |
+
},
|
134 |
+
{
|
135 |
+
"cell_type": "markdown",
|
136 |
+
"metadata": {},
|
137 |
+
"source": "### How to Read Excel Files\nUse the `pandas` library to read Excel files:"
|
138 |
+
},
|
139 |
+
{
|
140 |
+
"cell_type": "code",
|
141 |
+
"execution_count": null,
|
142 |
+
"metadata": {},
|
143 |
+
"outputs": [],
|
144 |
+
"source": "import pandas as pd\n\ndf = pd.read_excel(\"example.xlsx\")\nprint(df.head())"
|
145 |
+
},
|
146 |
+
{
|
147 |
+
"cell_type": "markdown",
|
148 |
+
"metadata": {},
|
149 |
+
"source": "### Common Issues\n1. Missing Data\n2. Encoding Problems\n3. File Corruption\n4. Large Files"
|
150 |
+
},
|
151 |
+
{
|
152 |
+
"cell_type": "markdown",
|
153 |
+
"metadata": {},
|
154 |
+
"source": "### How to Overcome Issues\n1. Use data imputation methods for missing data.\n2. Specify encoding when reading files (`encoding='utf-8'`).\n3. Repair or convert corrupted files.\n4. Process large files in chunks with `pandas`."
|
155 |
+
}
|
156 |
+
],
|
157 |
+
"metadata": {},
|
158 |
+
"nbformat": 4,
|
159 |
+
"nbformat_minor": 2
|
160 |
+
}
|