Join
is very common in SQL, but usually, the types used are inner join
, left join
, right join
, and outer join
. This article introduces what semi join
and anti join
are, using PySpark
as an example.
Python line profiler is a very convenient package that allows you to easily see the time taken for each line of code to execute. However, a fatal flaw is that it does not support profiling in multiprocessing, and there has been an open issue on Github since 2016. Here, I provide a hacky workaround for using line profiler in multiprocessing.