apache spark - How to get counts for null, not null, distinct values, all rows for all columns in a sql table?

Question

Welcome To Ask or Share your Answers For Others

apache spark - How to get counts for null, not null, distinct values, all rows for all columns in a sql table?

asked Jan 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

I'm currently looking to get a table that gets counts for null, not null, distinct values, and all rows for all columns in a given table. This happens to be in Databricks (Apache Spark).

Something that looks like what is shown below.

I know I can do this with something like the sql shown below. Also I can use something like Java or Python, etc., to generate the sql.

The question is:

-Is this the most efficient approach

-Is there a better way to write this query (less typing and/or more efficient)

select 
  1 col_position,
  'location_id' col_name,
  count(*) all_records,
  count(location_id) not_null,
  count(*) - count(location_id) null,
  count(distinct location_id) distinct_values
from
  admin
  
union
select 
  2 col_position,
  'location_zip' col_name,
  count(*) all_records,
  count(location_zip) not_null,
  count(*) - count(location_zip) null,
  count(distinct location_zip) distinct_values
from
  admin
  
union
select 
  3 col_position,
  'provider_npi' col_name,
  count(*) all_records,
  count(provider_npi) not_null,
  count(*) - count(provider_npi) null,
  count(distinct provider_npi) distinct_values
from
  admin
  
order by col_position
;

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

3.6k views

1 Answer

深蓝 · Answer 1 · 2021-01-24T00:05:41+0000

answered Jan 24, 2021 by 深蓝 (71.8m points)

等待大神解答

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

apache spark - How to get counts for null, not null, distinct values, all rows for all columns in a sql table?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags