shwetashweta05 commited on
Commit
ffe927c
·
verified ·
1 Parent(s): b5361bd

Update pages/6.Data Collection.py

Browse files
Files changed (1) hide show
  1. pages/6.Data Collection.py +36 -1
pages/6.Data Collection.py CHANGED
@@ -122,4 +122,39 @@ if data_type == "Structured":
122
  mime="application/octet-stream",
123
  )
124
 
125
- # Add similar sections for "Unstructured" and "Semi-Structured" data types as needed.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
122
  mime="application/octet-stream",
123
  )
124
 
125
+ # Add similar sections for "Unstructured" and "Semi-Structured" data types as needed.
126
+
127
+ {
128
+ "cells": [
129
+ {
130
+ "cell_type": "markdown",
131
+ "metadata": {},
132
+ "source": "## Excel Data Format\n\n### What is Excel?\nExcel is a tabular data format commonly used in business and analytics, with extensions `.xls` and `.xlsx`."
133
+ },
134
+ {
135
+ "cell_type": "markdown",
136
+ "metadata": {},
137
+ "source": "### How to Read Excel Files\nUse the `pandas` library to read Excel files:"
138
+ },
139
+ {
140
+ "cell_type": "code",
141
+ "execution_count": null,
142
+ "metadata": {},
143
+ "outputs": [],
144
+ "source": "import pandas as pd\n\ndf = pd.read_excel(\"example.xlsx\")\nprint(df.head())"
145
+ },
146
+ {
147
+ "cell_type": "markdown",
148
+ "metadata": {},
149
+ "source": "### Common Issues\n1. Missing Data\n2. Encoding Problems\n3. File Corruption\n4. Large Files"
150
+ },
151
+ {
152
+ "cell_type": "markdown",
153
+ "metadata": {},
154
+ "source": "### How to Overcome Issues\n1. Use data imputation methods for missing data.\n2. Specify encoding when reading files (`encoding='utf-8'`).\n3. Repair or convert corrupted files.\n4. Process large files in chunks with `pandas`."
155
+ }
156
+ ],
157
+ "metadata": {},
158
+ "nbformat": 4,
159
+ "nbformat_minor": 2
160
+ }