yuxwu commited on
Commit
2ec677d
·
2 Parent(s): 51c40fe 4bf718b
README.md CHANGED
@@ -120,64 +120,64 @@ At its core, Solution Space Tree Search consists of three main components:
120
 
121
  By repeatedly applying these steps, AIDE navigates the vast space of possible solutions, progressively refining its approach until it converges on the optimal solution for the given data science problem.
122
 
123
- ![Tree Search Visualization](https://github.com/WecoAI/AIDE_internal/assets/8918572/daea2488-3e1f-4d64-a555-e4d45ed46e4d)
124
 
125
  ## Solution Gallery
126
 
127
  | Domain | Task | Top% | Solution Link | Competition Link |
128
  |:---------------------------------|:------------------------------------------------------------------------|:------|:------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|
129
- | Urban Planning | Forecast city bikeshare system usage | 5% | [link](examples/bike-sharing-demand.py) | [link](https://www.kaggle.com/competitions/bike-sharing-demand/overview) |
130
- | Physics | Predicting Critical Heat Flux | 56% | [link](examples/playground-series-s3e15.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e15/overview) |
131
- | Genomics | Classify bacteria species from genomic data | 0% | [link](examples/tabular-playground-series-feb-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-feb-2022/overview) |
132
- | Agriculture | Predict blueberry yield | 58% | [link](examples/playground-series-s3e14.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e14/overview) |
133
- | Healthcare | Predict disease prognosis | 0% | [link](examples/playground-series-s3e13.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e13/overview) |
134
- | Economics | Predict monthly microbusiness density in a given area | 35% | [link](examples/godaddy-microbusiness-density-forecasting.py) | [link](https://www.kaggle.com/competitions/godaddy-microbusiness-density-forecasting/overview) |
135
- | Cryptography | Decrypt shakespearean text | 91% | [link](examples/ciphertext-challenge-iii.py) | [link](https://www.kaggle.com/competitions/ciphertext-challenge-iii/overview) |
136
- | Data Science Education | Predict passenger survival on Titanic | 78% | [link](examples/tabular-playground-series-apr-2021.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-apr-2021/overview) |
137
- | Software Engineering | Predict defects in c programs given various attributes about the code | 0% | [link](examples/playground-series-s3e23.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e23/overview) |
138
- | Real Estate | Predict the final price of homes | 5% | [link](examples/home-data-for-ml-course.py) | [link](https://www.kaggle.com/competitions/home-data-for-ml-course/overview) |
139
- | Real Estate | Predict house sale price | 36% | [link](examples/house-prices-advanced-regression-techniques.py) | [link](https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview) |
140
- | Entertainment Analytics | Predict movie worldwide box office revenue | 62% | [link](examples/tmdb-box-office-prediction.py) | [link](https://www.kaggle.com/competitions/tmdb-box-office-prediction/overview) |
141
- | Entertainment Analytics | Predict scoring probability in next 10 seconds of a rocket league match | 21% | [link](examples/tabular-playground-series-oct-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-oct-2022/overview) |
142
- | Environmental Science | Predict air pollution levels | 12% | [link](examples/tabular-playground-series-jul-2021.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-jul-2021/overview) |
143
- | Environmental Science | Classify forest categories using cartographic variables | 55% | [link](examples/forest-cover-type-prediction.py) | [link](https://www.kaggle.com/competitions/forest-cover-type-prediction/overview) |
144
- | Computer Vision | Predict the probability of machine failure | 32% | [link](examples/playground-series-s3e17.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e17/overview) |
145
- | Computer Vision | Identify handwritten digits | 14% | [link](examples/digit-recognizer.py) | [link](https://www.kaggle.com/competitions/digit-recognizer/overview) |
146
- | Manufacturing | Predict missing values in dataset | 70% | [link](examples/tabular-playground-series-jun-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-jun-2022/overview) |
147
- | Manufacturing | Predict product failures | 48% | [link](examples/tabular-playground-series-aug-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-aug-2022/overview) |
148
- | Manufacturing | Cluster control data into different control states | 96% | [link](examples/tabular-playground-series-jul-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-jul-2022/overview) |
149
- | Natural Language Processing | Classify toxic online comments | 78% | [link](examples/jigsaw-toxic-comment-classification-challenge.py) | [link](https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/overview) |
150
- | Natural Language Processing | Predict passenger transport to an alternate dimension | 59% | [link](examples/spaceship-titanic.py) | [link](https://www.kaggle.com/competitions/spaceship-titanic/overview) |
151
- | Natural Language Processing | Classify sentence sentiment | 42% | [link](examples/sentiment-analysis-on-movie-reviews.py) | [link](https://www.kaggle.com/competitions/sentiment-analysis-on-movie-reviews/overview) |
152
- | Natural Language Processing | Predict whether a tweet is about a real disaster | 48% | [link](examples/nlp-getting-started.py) | [link](https://www.kaggle.com/competitions/nlp-getting-started/overview) |
153
- | Business Analytics | Predict total sales for each product and store in the next month | 87% | [link](examples/competitive-data-science-predict-future-sales.py) | [link](https://www.kaggle.com/competitions/competitive-data-science-predict-future-sales/overview) |
154
- | Business Analytics | Predict book sales for 2021 | 66% | [link](examples/tabular-playground-series-sep-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-sep-2022/overview) |
155
- | Business Analytics | Predict insurance claim amount | 80% | [link](examples/tabular-playground-series-feb-2021.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-feb-2021/overview) |
156
- | Business Analytics | Minimize penalty cost in scheduling families to santa's workshop | 100% | [link](examples/santa-2019-revenge-of-the-accountants.py) | [link](https://www.kaggle.com/competitions/santa-2019-revenge-of-the-accountants/overview) |
157
- | Business Analytics | Predict yearly sales for learning modules | 26% | [link](examples/playground-series-s3e19.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e19/overview) |
158
- | Business Analytics | Binary classification of manufacturing machine state | 60% | [link](examples/tabular-playground-series-may-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-may-2022/overview) |
159
- | Business Analytics | Forecast retail store sales | 36% | [link](examples/tabular-playground-series-jan-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-jan-2022/overview) |
160
- | Business Analytics | Predict reservation cancellation | 54% | [link](examples/playground-series-s3e7.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e7/overview) |
161
- | Finance | Predict the probability of an insurance claim | 13% | [link](examples/tabular-playground-series-mar-2021.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-mar-2021/overview) |
162
- | Finance | Predict loan loss | 0% | [link](examples/tabular-playground-series-aug-2021.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-aug-2021/overview) |
163
- | Finance | Predict a continuous target | 42% | [link](examples/tabular-playground-series-jan-2021.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-jan-2021/overview) |
164
- | Finance | Predict customer churn | 24% | [link](examples/playground-series-s4e1.py) | [link](https://www.kaggle.com/competitions/playground-series-s4e1/overview) |
165
- | Finance | Predict median house value | 58% | [link](examples/playground-series-s3e1.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e1/overview) |
166
- | Finance | Predict closing price movements for nasdaq listed stocks | 99% | [link](examples/optiver-trading-at-the-close.py) | [link](https://www.kaggle.com/competitions/optiver-trading-at-the-close/overview) |
167
- | Finance | Predict taxi fare | 100% | [link](examples/new-york-city-taxi-fare-prediction.py) | [link](https://www.kaggle.com/competitions/new-york-city-taxi-fare-prediction/overview) |
168
- | Finance | Predict insurance claim probability | 62% | [link](examples/tabular-playground-series-sep-2021.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-sep-2021/overview) |
169
- | Biotech | Predict cat in dat | 66% | [link](examples/cat-in-the-dat-ii.py) | [link](https://www.kaggle.com/competitions/cat-in-the-dat-ii/overview) |
170
- | Biotech | Predict the biological response of molecules | 62% | [link](examples/tabular-playground-series-oct-2021.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-oct-2021/overview) |
171
- | Biotech | Predict medical conditions | 92% | [link](examples/icr-identify-age-related-conditions.py) | [link](https://www.kaggle.com/competitions/icr-identify-age-related-conditions/overview) |
172
- | Biotech | Predict wine quality | 61% | [link](examples/playground-series-s3e5.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e5/overview) |
173
- | Biotech | Predict binary target without overfitting | 98% | [link](examples/dont-overfit-ii.py) | [link](https://www.kaggle.com/competitions/dont-overfit-ii/overview) |
174
- | Biotech | Predict concrete strength | 86% | [link](examples/playground-series-s3e9.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e9/overview) |
175
- | Biotech | Predict crab age | 46% | [link](examples/playground-series-s3e16.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e16/overview) |
176
- | Biotech | Predict enzyme characteristics | 10% | [link](examples/playground-series-s3e18.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e18/overview) |
177
- | Biotech | Classify activity state from sensor data | 51% | [link](examples/tabular-playground-series-apr-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-apr-2022/overview) |
178
- | Biotech | Predict horse health outcomes | 86% | [link](examples/playground-series-s3e22.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e22/overview) |
179
- | Biotech | Predict the mohs hardness of a mineral | 64% | [link](examples/playground-series-s3e25.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e25/overview) |
180
- | Biotech | Predict cirrhosis patient outcomes | 51% | [link](examples/playground-series-s3e26.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e26/overview) |
181
- | Biotech | Predict obesity risk | 62% | [link](examples/playground-series-s4e2.py) | [link](https://www.kaggle.com/competitions/playground-series-s4e2/overview) |
182
- | Biotech | Classify presence of feature in data | 66% | [link](examples/cat-in-the-dat.py) | [link](https://www.kaggle.com/competitions/cat-in-the-dat/overview) |
183
- | Biotech | Predict patient's smoking status | 40% | [link](examples/playground-series-s3e24.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e24/overview) |
 
120
 
121
  By repeatedly applying these steps, AIDE navigates the vast space of possible solutions, progressively refining its approach until it converges on the optimal solution for the given data science problem.
122
 
123
+ ![Tree Search Visualization](https://github.com/WecoAI/aideml/assets/8918572/2401529c-b97e-4029-aed2-c3f376f54c3c)
124
 
125
  ## Solution Gallery
126
 
127
  | Domain | Task | Top% | Solution Link | Competition Link |
128
  |:---------------------------------|:------------------------------------------------------------------------|:------|:------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|
129
+ | Urban Planning | Forecast city bikeshare system usage | 5% | [link](sample_results/bike-sharing-demand.py) | [link](https://www.kaggle.com/competitions/bike-sharing-demand/overview) |
130
+ | Physics | Predicting Critical Heat Flux | 56% | [link](sample_results/playground-series-s3e15.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e15/overview) |
131
+ | Genomics | Classify bacteria species from genomic data | 0% | [link](sample_results/tabular-playground-series-feb-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-feb-2022/overview) |
132
+ | Agriculture | Predict blueberry yield | 58% | [link](sample_results/playground-series-s3e14.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e14/overview) |
133
+ | Healthcare | Predict disease prognosis | 0% | [link](sample_results/playground-series-s3e13.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e13/overview) |
134
+ | Economics | Predict monthly microbusiness density in a given area | 35% | [link](sample_results/godaddy-microbusiness-density-forecasting.py) | [link](https://www.kaggle.com/competitions/godaddy-microbusiness-density-forecasting/overview) |
135
+ | Cryptography | Decrypt shakespearean text | 91% | [link](sample_results/ciphertext-challenge-iii.py) | [link](https://www.kaggle.com/competitions/ciphertext-challenge-iii/overview) |
136
+ | Data Science Education | Predict passenger survival on Titanic | 78% | [link](sample_results/tabular-playground-series-apr-2021.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-apr-2021/overview) |
137
+ | Software Engineering | Predict defects in c programs given various attributes about the code | 0% | [link](sample_results/playground-series-s3e23.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e23/overview) |
138
+ | Real Estate | Predict the final price of homes | 5% | [link](sample_results/home-data-for-ml-course.py) | [link](https://www.kaggle.com/competitions/home-data-for-ml-course/overview) |
139
+ | Real Estate | Predict house sale price | 36% | [link](sample_results/house-prices-advanced-regression-techniques.py) | [link](https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview) |
140
+ | Entertainment Analytics | Predict movie worldwide box office revenue | 62% | [link](sample_results/tmdb-box-office-prediction.py) | [link](https://www.kaggle.com/competitions/tmdb-box-office-prediction/overview) |
141
+ | Entertainment Analytics | Predict scoring probability in next 10 seconds of a rocket league match | 21% | [link](sample_results/tabular-playground-series-oct-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-oct-2022/overview) |
142
+ | Environmental Science | Predict air pollution levels | 12% | [link](sample_results/tabular-playground-series-jul-2021.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-jul-2021/overview) |
143
+ | Environmental Science | Classify forest categories using cartographic variables | 55% | [link](sample_results/forest-cover-type-prediction.py) | [link](https://www.kaggle.com/competitions/forest-cover-type-prediction/overview) |
144
+ | Computer Vision | Predict the probability of machine failure | 32% | [link](sample_results/playground-series-s3e17.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e17/overview) |
145
+ | Computer Vision | Identify handwritten digits | 14% | [link](sample_results/digit-recognizer.py) | [link](https://www.kaggle.com/competitions/digit-recognizer/overview) |
146
+ | Manufacturing | Predict missing values in dataset | 70% | [link](sample_results/tabular-playground-series-jun-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-jun-2022/overview) |
147
+ | Manufacturing | Predict product failures | 48% | [link](sample_results/tabular-playground-series-aug-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-aug-2022/overview) |
148
+ | Manufacturing | Cluster control data into different control states | 96% | [link](sample_results/tabular-playground-series-jul-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-jul-2022/overview) |
149
+ | Natural Language Processing | Classify toxic online comments | 78% | [link](sample_results/jigsaw-toxic-comment-classification-challenge.py) | [link](https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/overview) |
150
+ | Natural Language Processing | Predict passenger transport to an alternate dimension | 59% | [link](sample_results/spaceship-titanic.py) | [link](https://www.kaggle.com/competitions/spaceship-titanic/overview) |
151
+ | Natural Language Processing | Classify sentence sentiment | 42% | [link](sample_results/sentiment-analysis-on-movie-reviews.py) | [link](https://www.kaggle.com/competitions/sentiment-analysis-on-movie-reviews/overview) |
152
+ | Natural Language Processing | Predict whether a tweet is about a real disaster | 48% | [link](sample_results/nlp-getting-started.py) | [link](https://www.kaggle.com/competitions/nlp-getting-started/overview) |
153
+ | Business Analytics | Predict total sales for each product and store in the next month | 87% | [link](sample_results/competitive-data-science-predict-future-sales.py) | [link](https://www.kaggle.com/competitions/competitive-data-science-predict-future-sales/overview) |
154
+ | Business Analytics | Predict book sales for 2021 | 66% | [link](sample_results/tabular-playground-series-sep-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-sep-2022/overview) |
155
+ | Business Analytics | Predict insurance claim amount | 80% | [link](sample_results/tabular-playground-series-feb-2021.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-feb-2021/overview) |
156
+ | Business Analytics | Minimize penalty cost in scheduling families to santa's workshop | 100% | [link](sample_results/santa-2019-revenge-of-the-accountants.py) | [link](https://www.kaggle.com/competitions/santa-2019-revenge-of-the-accountants/overview) |
157
+ | Business Analytics | Predict yearly sales for learning modules | 26% | [link](sample_results/playground-series-s3e19.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e19/overview) |
158
+ | Business Analytics | Binary classification of manufacturing machine state | 60% | [link](sample_results/tabular-playground-series-may-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-may-2022/overview) |
159
+ | Business Analytics | Forecast retail store sales | 36% | [link](sample_results/tabular-playground-series-jan-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-jan-2022/overview) |
160
+ | Business Analytics | Predict reservation cancellation | 54% | [link](sample_results/playground-series-s3e7.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e7/overview) |
161
+ | Finance | Predict the probability of an insurance claim | 13% | [link](sample_results/tabular-playground-series-mar-2021.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-mar-2021/overview) |
162
+ | Finance | Predict loan loss | 0% | [link](sample_results/tabular-playground-series-aug-2021.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-aug-2021/overview) |
163
+ | Finance | Predict a continuous target | 42% | [link](sample_results/tabular-playground-series-jan-2021.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-jan-2021/overview) |
164
+ | Finance | Predict customer churn | 24% | [link](sample_results/playground-series-s4e1.py) | [link](https://www.kaggle.com/competitions/playground-series-s4e1/overview) |
165
+ | Finance | Predict median house value | 58% | [link](sample_results/playground-series-s3e1.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e1/overview) |
166
+ | Finance | Predict closing price movements for nasdaq listed stocks | 99% | [link](sample_results/optiver-trading-at-the-close.py) | [link](https://www.kaggle.com/competitions/optiver-trading-at-the-close/overview) |
167
+ | Finance | Predict taxi fare | 100% | [link](sample_results/new-york-city-taxi-fare-prediction.py) | [link](https://www.kaggle.com/competitions/new-york-city-taxi-fare-prediction/overview) |
168
+ | Finance | Predict insurance claim probability | 62% | [link](sample_results/tabular-playground-series-sep-2021.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-sep-2021/overview) |
169
+ | Biotech | Predict cat in dat | 66% | [link](sample_results/cat-in-the-dat-ii.py) | [link](https://www.kaggle.com/competitions/cat-in-the-dat-ii/overview) |
170
+ | Biotech | Predict the biological response of molecules | 62% | [link](sample_results/tabular-playground-series-oct-2021.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-oct-2021/overview) |
171
+ | Biotech | Predict medical conditions | 92% | [link](sample_results/icr-identify-age-related-conditions.py) | [link](https://www.kaggle.com/competitions/icr-identify-age-related-conditions/overview) |
172
+ | Biotech | Predict wine quality | 61% | [link](sample_results/playground-series-s3e5.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e5/overview) |
173
+ | Biotech | Predict binary target without overfitting | 98% | [link](sample_results/dont-overfit-ii.py) | [link](https://www.kaggle.com/competitions/dont-overfit-ii/overview) |
174
+ | Biotech | Predict concrete strength | 86% | [link](sample_results/playground-series-s3e9.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e9/overview) |
175
+ | Biotech | Predict crab age | 46% | [link](sample_results/playground-series-s3e16.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e16/overview) |
176
+ | Biotech | Predict enzyme characteristics | 10% | [link](sample_results/playground-series-s3e18.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e18/overview) |
177
+ | Biotech | Classify activity state from sensor data | 51% | [link](sample_results/tabular-playground-series-apr-2022.py) | [link](https://www.kaggle.com/competitions/tabular-playground-series-apr-2022/overview) |
178
+ | Biotech | Predict horse health outcomes | 86% | [link](sample_results/playground-series-s3e22.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e22/overview) |
179
+ | Biotech | Predict the mohs hardness of a mineral | 64% | [link](sample_results/playground-series-s3e25.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e25/overview) |
180
+ | Biotech | Predict cirrhosis patient outcomes | 51% | [link](sample_results/playground-series-s3e26.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e26/overview) |
181
+ | Biotech | Predict obesity risk | 62% | [link](sample_results/playground-series-s4e2.py) | [link](https://www.kaggle.com/competitions/playground-series-s4e2/overview) |
182
+ | Biotech | Classify presence of feature in data | 66% | [link](sample_results/cat-in-the-dat.py) | [link](https://www.kaggle.com/competitions/cat-in-the-dat/overview) |
183
+ | Biotech | Predict patient's smoking status | 40% | [link](sample_results/playground-series-s3e24.py) | [link](https://www.kaggle.com/competitions/playground-series-s3e24/overview) |
aide/example_tasks/bitcoin_price.md ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ ## Goal
2
+ Build a timeseries forcasting model for bitcoin close price.
3
+
4
+ ## Evaluation metric
5
+ Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed price.
aide/example_tasks/house_prices.md ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Goal
2
+ It is your job to predict the sales price for each house. For each Id in the test set, you must predict the value of the SalePrice variable.
3
+
4
+ ## Background
5
+ Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.
6
+ With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.
7
+
8
+ ## Evaluation
9
+ Submissions are evaluated on [Root-Mean-Squared-Error (RMSE)](https://en.wikipedia.org/wiki/Root-mean-square_deviation) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)
10
+
11
+ The file should contain a header and have the following format:
12
+
13
+ ```
14
+ Id,SalePrice
15
+ 1461,169000.1
16
+ 1462,187724.1233
17
+ 1463,175221
18
+ etc.
19
+ ```
20
+
21
+ ## Data description
22
+ - **train.csv** - the training set
23
+ - **test.csv** - the test set
24
+ - **data_description.txt** - full description of each column, originally prepared by Dean De Cock but lightly edited to match the column names used here
setup.py CHANGED
@@ -7,14 +7,14 @@ with open("requirements.txt", "r") as f:
7
  requirements = f.read().splitlines()
8
 
9
  setup(
10
- name="aideml",
11
- version="0.1.0",
12
  author="Weco AI",
13
  author_email="[email protected]",
14
  description="Autonomous AI for Data Science and Machine Learning",
15
  long_description=long_description,
16
  long_description_content_type="text/markdown",
17
- url="https://github.com/wecoai/aideml",
18
  packages=find_packages(),
19
  package_data={
20
  "aide": [
@@ -22,6 +22,7 @@ setup(
22
  "utils/viz_templates/*",
23
  "example_tasks/bitcoin_price/*",
24
  "example_tasks/house_prices/*",
 
25
  ]
26
  },
27
  classifiers=[
 
7
  requirements = f.read().splitlines()
8
 
9
  setup(
10
+ name="aide",
11
+ version="0.1.1",
12
  author="Weco AI",
13
  author_email="[email protected]",
14
  description="Autonomous AI for Data Science and Machine Learning",
15
  long_description=long_description,
16
  long_description_content_type="text/markdown",
17
+ url="https://github.com/Wecoai/aideml",
18
  packages=find_packages(),
19
  package_data={
20
  "aide": [
 
22
  "utils/viz_templates/*",
23
  "example_tasks/bitcoin_price/*",
24
  "example_tasks/house_prices/*",
25
+ "example_tasks/*",
26
  ]
27
  },
28
  classifiers=[