Github source code: link
Contributor:
Kaggle page: link
We know that all tickets are unique, and that any two people have the same phone, email, or order ID are exactly the same people. Let's view all tickets as different nodes(vertices). We check phone, email, and order ID. If any two ticket are belong to the same person, we construct a undirected edge between them. Finally, we find the graph's connected components. Each unique user has a corresponding connected components.
Assume number of total tickets is n.
O(nlogn)
time.O(n)
time.o(n^2)
time[1].O(nlogn)
time.O(nlogn)
time.The total time complexity is o(n^2)
.
[1] Common ways of finding connected components like Tarjan's algorithm using adjacent list takes not more than O(number of vertices + number of edges). In this case, we use scipy's connected component function. We've tested that it seems not take up to n^2 time and it's fast. If you're interested in the algorithm, you can find more information about how scipy implements this function.