-
Notifications
You must be signed in to change notification settings - Fork 0
/
TODO.txt
23 lines (22 loc) · 1.03 KB
/
TODO.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#remember scripts are in python2
1. reload table A, B, AllTuplePairs
2. Retrain on a sample to get random foreest model
3. Apply RF to AllTuplePairs and save matched tuples to one file and unmatched to another?
4. Write schema mapper to get new table from matched pairs. Think of which value to use for each field.
<<<<<<< HEAD
5. Add unmatched tuples from A and B to theis table, same schema
6. Analysis
A- IMDB, B= numbers M= matched table
merging data
- stars--> split on comma and union
director- merge names-longer? or how else- split on comma and union
title- longer
date of release- month and year- if A val exists, take from A else from B
mpaa- longer one if it exists else anyways it is -1
runtime- remove string. takes the larger number, put -1 if both -1
genres- split on comma and union
gross- change format million, full numbers and -1 and #, comapre value after removing $ sign, take the higher value
=======
5. Add unmatched tuples from A and B to this table, same schema
6. Analysis
>>>>>>> aadfdc59963270062579f481315c727f9de7071c