Discussion:
Help for statistic functions
Nitin Pawar
2018-12-05 10:46:13 UTC
Permalink
Hi,

We have a multistep workflow system and in one of the step we do sum(x)
this step results the column being float-optional for next queries and then
functions start failing if the value is large float number (more than 8
digits)

Is there any setting where we can change this or it needs to be fixed in
code?
error we are getting is

Error: SYSTEM ERROR: SchemaChangeException: Failure while materializing
expression.
Error in expression at index -1. Error: Missing function implementation:
[covar_samp(BIGINT-REQUIRED, FLOAT8-OPTIONAL)]. Full expression: --UNKNOWN
EXPRESSION--.
--
Nitin Pawar
Nitin Pawar
2018-12-06 13:20:37 UTC
Permalink
any help on this ??

just to put some more data on this
if a query has select count(1), sum(b) from c
then we keep getting the error mentioned above as count ends up being
bigint and sum ends being double and it is read as float-optional for large
numbers
Post by Nitin Pawar
Hi,
We have a multistep workflow system and in one of the step we do sum(x)
this step results the column being float-optional for next queries and
then functions start failing if the value is large float number (more than
8 digits)
Is there any setting where we can change this or it needs to be fixed in
code?
error we are getting is
Error: SYSTEM ERROR: SchemaChangeException: Failure while materializing
expression.
[covar_samp(BIGINT-REQUIRED, FLOAT8-OPTIONAL)]. Full expression: --UNKNOWN
EXPRESSION--.
--
Nitin Pawar
--
Nitin Pawar
Anton Gozhiy
2018-12-06 13:44:11 UTC
Permalink
Hi Nitin Pawar,
I was investigating this. Indeed, when one of the parameters has optional
data mode, Drill cannot cast the parameters to the same type, and there is
no "covar_samp" UDF that takes parameters with different types.
To reproduce this, I used a nullable column, bu I'm not sure if it is your
case.
You mentioned that it depends on the float number size.
It would be helpful if you share the whole query and describe what data did
you use.
Post by Nitin Pawar
any help on this ??
just to put some more data on this
if a query has select count(1), sum(b) from c
then we keep getting the error mentioned above as count ends up being
bigint and sum ends being double and it is read as float-optional for large
numbers
Post by Nitin Pawar
Hi,
We have a multistep workflow system and in one of the step we do sum(x)
this step results the column being float-optional for next queries and
then functions start failing if the value is large float number (more
than
Post by Nitin Pawar
8 digits)
Is there any setting where we can change this or it needs to be fixed in
code?
error we are getting is
Error: SYSTEM ERROR: SchemaChangeException: Failure while materializing
expression.
--UNKNOWN
Post by Nitin Pawar
EXPRESSION--.
--
Nitin Pawar
--
Nitin Pawar
--
Sincerely, Anton Gozhiy
***@gmail.com
Nitin Pawar
2018-12-06 15:55:11 UTC
Permalink
Hello Anton,
Thanks for the reply.
I have tried explicit casting as well as with subquery mechanism
I have attached the parquet file along with this email

following is the query
select covar_samp(cast(id_dist as double), cast(num2 as double)) from
dfs.tmp.`/nitin`;
Post by Anton Gozhiy
Hi Nitin Pawar,
I was investigating this. Indeed, when one of the parameters has optional
data mode, Drill cannot cast the parameters to the same type, and there is
no "covar_samp" UDF that takes parameters with different types.
To reproduce this, I used a nullable column, bu I'm not sure if it is your
case.
You mentioned that it depends on the float number size.
It would be helpful if you share the whole query and describe what data did
you use.
Post by Nitin Pawar
any help on this ??
just to put some more data on this
if a query has select count(1), sum(b) from c
then we keep getting the error mentioned above as count ends up being
bigint and sum ends being double and it is read as float-optional for
large
Post by Nitin Pawar
numbers
Post by Nitin Pawar
Hi,
We have a multistep workflow system and in one of the step we do sum(x)
this step results the column being float-optional for next queries and
then functions start failing if the value is large float number (more
than
Post by Nitin Pawar
8 digits)
Is there any setting where we can change this or it needs to be fixed
in
Post by Nitin Pawar
Post by Nitin Pawar
code?
error we are getting is
Error: SYSTEM ERROR: SchemaChangeException: Failure while materializing
expression.
Error in expression at index -1. Error: Missing function
--UNKNOWN
Post by Nitin Pawar
EXPRESSION--.
--
Nitin Pawar
--
Nitin Pawar
--
Sincerely, Anton Gozhiy
--
Nitin Pawar
Anton Gozhiy
2018-12-06 16:32:12 UTC
Permalink
Nitin, I don't see the attachment, maybe due to apache politics. Could you
share it by google drive?
Regarding explicit casting, nullable and non-nullable double are
represented by different types inside Drill and cannot be cast that way.
That may cause the error.
Post by Nitin Pawar
Hello Anton,
Thanks for the reply.
I have tried explicit casting as well as with subquery mechanism
I have attached the parquet file along with this email
following is the query
select covar_samp(cast(id_dist as double), cast(num2 as double)) from
dfs.tmp.`/nitin`;
Post by Anton Gozhiy
Hi Nitin Pawar,
I was investigating this. Indeed, when one of the parameters has optional
data mode, Drill cannot cast the parameters to the same type, and there is
no "covar_samp" UDF that takes parameters with different types.
To reproduce this, I used a nullable column, bu I'm not sure if it is your
case.
You mentioned that it depends on the float number size.
It would be helpful if you share the whole query and describe what data did
you use.
Post by Nitin Pawar
any help on this ??
just to put some more data on this
if a query has select count(1), sum(b) from c
then we keep getting the error mentioned above as count ends up being
bigint and sum ends being double and it is read as float-optional for
large
Post by Nitin Pawar
numbers
Post by Nitin Pawar
Hi,
We have a multistep workflow system and in one of the step we do
sum(x)
Post by Nitin Pawar
Post by Nitin Pawar
this step results the column being float-optional for next queries and
then functions start failing if the value is large float number (more
than
Post by Nitin Pawar
8 digits)
Is there any setting where we can change this or it needs to be fixed
in
Post by Nitin Pawar
Post by Nitin Pawar
code?
error we are getting is
Error: SYSTEM ERROR: SchemaChangeException: Failure while
materializing
Post by Nitin Pawar
Post by Nitin Pawar
expression.
Error in expression at index -1. Error: Missing function
--UNKNOWN
Post by Nitin Pawar
EXPRESSION--.
--
Nitin Pawar
--
Nitin Pawar
--
Sincerely, Anton Gozhiy
--
Nitin Pawar
--
Sincerely, Anton Gozhiy
***@gmail.com
Nitin Pawar
2018-12-08 05:15:24 UTC
Permalink
here is the link
https://drive.google.com/open?id=1PSJMjIvwNObhGsc9jmD9wPapSPI1jQWb

We have tried with direct fields as well but it keeps failing
Basically, we have a query which creates above parquet file
and the sequential query which fails, I already provided
Post by Anton Gozhiy
Nitin, I don't see the attachment, maybe due to apache politics. Could you
share it by google drive?
Regarding explicit casting, nullable and non-nullable double are
represented by different types inside Drill and cannot be cast that way.
That may cause the error.
Post by Nitin Pawar
Hello Anton,
Thanks for the reply.
I have tried explicit casting as well as with subquery mechanism
I have attached the parquet file along with this email
following is the query
select covar_samp(cast(id_dist as double), cast(num2 as double)) from
dfs.tmp.`/nitin`;
Post by Anton Gozhiy
Hi Nitin Pawar,
I was investigating this. Indeed, when one of the parameters has
optional
Post by Nitin Pawar
Post by Anton Gozhiy
data mode, Drill cannot cast the parameters to the same type, and there
is
Post by Nitin Pawar
Post by Anton Gozhiy
no "covar_samp" UDF that takes parameters with different types.
To reproduce this, I used a nullable column, bu I'm not sure if it is
your
Post by Nitin Pawar
Post by Anton Gozhiy
case.
You mentioned that it depends on the float number size.
It would be helpful if you share the whole query and describe what data did
you use.
Post by Nitin Pawar
any help on this ??
just to put some more data on this
if a query has select count(1), sum(b) from c
then we keep getting the error mentioned above as count ends up being
bigint and sum ends being double and it is read as float-optional for
large
Post by Nitin Pawar
numbers
Post by Nitin Pawar
Hi,
We have a multistep workflow system and in one of the step we do
sum(x)
Post by Nitin Pawar
Post by Nitin Pawar
this step results the column being float-optional for next queries
and
Post by Nitin Pawar
Post by Anton Gozhiy
Post by Nitin Pawar
Post by Nitin Pawar
then functions start failing if the value is large float number
(more
Post by Nitin Pawar
Post by Anton Gozhiy
Post by Nitin Pawar
than
Post by Nitin Pawar
8 digits)
Is there any setting where we can change this or it needs to be
fixed
Post by Nitin Pawar
Post by Anton Gozhiy
in
Post by Nitin Pawar
Post by Nitin Pawar
code?
error we are getting is
Error: SYSTEM ERROR: SchemaChangeException: Failure while
materializing
Post by Nitin Pawar
Post by Nitin Pawar
expression.
Error in expression at index -1. Error: Missing function
--UNKNOWN
Post by Nitin Pawar
EXPRESSION--.
--
Nitin Pawar
--
Nitin Pawar
--
Sincerely, Anton Gozhiy
--
Nitin Pawar
--
Sincerely, Anton Gozhiy
--
Nitin Pawar
Anton Gozhiy
2018-12-10 16:30:53 UTC
Permalink
This is really a bug, reported it here:
https://issues.apache.org/jira/browse/DRILL-6891.
Thanks for finding this case.
Post by Nitin Pawar
here is the link
https://drive.google.com/open?id=1PSJMjIvwNObhGsc9jmD9wPapSPI1jQWb
We have tried with direct fields as well but it keeps failing
Basically, we have a query which creates above parquet file
and the sequential query which fails, I already provided
Post by Anton Gozhiy
Nitin, I don't see the attachment, maybe due to apache politics. Could
you
Post by Anton Gozhiy
share it by google drive?
Regarding explicit casting, nullable and non-nullable double are
represented by different types inside Drill and cannot be cast that way.
That may cause the error.
Post by Nitin Pawar
Hello Anton,
Thanks for the reply.
I have tried explicit casting as well as with subquery mechanism
I have attached the parquet file along with this email
following is the query
select covar_samp(cast(id_dist as double), cast(num2 as double)) from
dfs.tmp.`/nitin`;
Post by Anton Gozhiy
Hi Nitin Pawar,
I was investigating this. Indeed, when one of the parameters has
optional
Post by Nitin Pawar
Post by Anton Gozhiy
data mode, Drill cannot cast the parameters to the same type, and
there
Post by Anton Gozhiy
is
Post by Nitin Pawar
Post by Anton Gozhiy
no "covar_samp" UDF that takes parameters with different types.
To reproduce this, I used a nullable column, bu I'm not sure if it is
your
Post by Nitin Pawar
Post by Anton Gozhiy
case.
You mentioned that it depends on the float number size.
It would be helpful if you share the whole query and describe what
data
Post by Anton Gozhiy
Post by Nitin Pawar
Post by Anton Gozhiy
did
you use.
Post by Nitin Pawar
any help on this ??
just to put some more data on this
if a query has select count(1), sum(b) from c
then we keep getting the error mentioned above as count ends up
being
Post by Anton Gozhiy
Post by Nitin Pawar
Post by Anton Gozhiy
Post by Nitin Pawar
bigint and sum ends being double and it is read as float-optional
for
Post by Anton Gozhiy
Post by Nitin Pawar
Post by Anton Gozhiy
large
Post by Nitin Pawar
numbers
Post by Nitin Pawar
Hi,
We have a multistep workflow system and in one of the step we do
sum(x)
Post by Nitin Pawar
Post by Nitin Pawar
this step results the column being float-optional for next queries
and
Post by Nitin Pawar
Post by Anton Gozhiy
Post by Nitin Pawar
Post by Nitin Pawar
then functions start failing if the value is large float number
(more
Post by Nitin Pawar
Post by Anton Gozhiy
Post by Nitin Pawar
than
Post by Nitin Pawar
8 digits)
Is there any setting where we can change this or it needs to be
fixed
Post by Nitin Pawar
Post by Anton Gozhiy
in
Post by Nitin Pawar
Post by Nitin Pawar
code?
error we are getting is
Error: SYSTEM ERROR: SchemaChangeException: Failure while
materializing
Post by Nitin Pawar
Post by Nitin Pawar
expression.
Error in expression at index -1. Error: Missing function
--UNKNOWN
Post by Nitin Pawar
EXPRESSION--.
--
Nitin Pawar
--
Nitin Pawar
--
Sincerely, Anton Gozhiy
--
Nitin Pawar
--
Sincerely, Anton Gozhiy
--
Nitin Pawar
--
Sincerely, Anton Gozhiy
***@gmail.com
Loading...