Welcome to 16892 Developer Community-Open, Learning,Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm currently looking to get a table that gets counts for null, not null, distinct values, and all rows for all columns in a given table. This happens to be in Databricks (Apache Spark).

Something that looks like what is shown below.

enter image description here

I know I can do this with something like the sql shown below. Also I can use something like Java or Python, etc., to generate the sql.

The question is:

-Is this the most efficient approach

-Is there a better way to write this query (less typing and/or more efficient)

select 
  1 col_position,
  'location_id' col_name,
  count(*) all_records,
  count(location_id) not_null,
  count(*) - count(location_id) null,
  count(distinct location_id) distinct_values
from
  admin
  
union
select 
  2 col_position,
  'location_zip' col_name,
  count(*) all_records,
  count(location_zip) not_null,
  count(*) - count(location_zip) null,
  count(distinct location_zip) distinct_values
from
  admin
  
union
select 
  3 col_position,
  'provider_npi' col_name,
  count(*) all_records,
  count(provider_npi) not_null,
  count(*) - count(provider_npi) null,
  count(distinct provider_npi) distinct_values
from
  admin
  
order by col_position
;

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
3.6k views
Welcome To Ask or Share your Answers For Others

1 Answer

等待大神解答

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to 16892 Developer Community-Open, Learning and Share
...