# find min, max, median and average

Discussion in 'Mathematics and Physics' started by electroRF, Jan 1, 2014.

1. ### electroRFMember

Joined:
Jun 23, 2012
Messages:
689
Likes:
9
Location:
Portugal
Hi,

I ran into this question: write a function that receives an array of n numbers and finds min, max, median and average.

You can of course sort the array in O(nlog(n)) and then get the min, max, median and average.

I wanna do it in O(n) though.

I thought of finding the Median using Quick Sort Algorithm to find the (n/2)th element, which would take O(n) on average, or O(n^2) on worst case.

I also know of the Median of Median algorithm which finds the median in O(n) worst case.

Is there any better way you know to find the min, max, median and average?

2. ### NorthGuyWell-Known Member

Joined:
Sep 8, 2013
Messages:
1,218
Likes:
206
Location:
Since you worry about the speed at large n, this must be a huge array.

The important question here is if you're allowed to sort the array in place or you must leave it intact, and if you have to leave it intact whether you have enough memory to allocate a copy of the array you want to sort. When arrays are huge, memory could be a real problem.

There could be desirable to have the array sorted anyway. Then sorting it prior to calling your routine is the best solution regardless of the speed.

You cannot select (or create) any particular algorithm without knowing how it is going to be used.

• Like x 1
3. ### misterTWell-Known MemberMost Helpful Member

Joined:
Apr 19, 2010
Messages:
2,697
Likes:
368
Location:
Finland
If you just combine "running average" and "running median" algorithms you have the solution. Keeping track of Min and Max is very easy. Average is easy also.
Median is a bit more complex, but not that difficult at all. Google for "running/moving/rolling/streaming median algorithm".

Last edited: Jan 1, 2014
• Like x 1

Joined:
Jan 12, 1997
Messages:
-
Likes:
0

5. ### electroRFMember

Joined:
Jun 23, 2012
Messages:
689
Likes:
9
Location:
Portugal

Thank you Guys.

T, that answered the question thanks a lot

NorthGuy, actually it wasn't specified in the question if you're allowed to sort in-place, however T suggested a good solution.

6. ### NorthGuyWell-Known Member

Joined:
Sep 8, 2013
Messages:
1,218
Likes:
206
Location:
QuickSort is way faster than rolling median and requires much less memory (none if we allowed to sort in place; 20 to 50% less if we create a copy of the array for sorting).

7. ### misterTWell-Known MemberMost Helpful Member

Joined:
Apr 19, 2010
Messages:
2,697
Likes:
368
Location:
Finland
Yes. Maybe the median should be solved using Quickselect (based on Quicksort) which is O(n). I like the rolling median because I mostly deal with real time data. And that can be done in O(log n).. but that does not mean it is faster than Quickselect.

8. ### electroRFMember

Joined:
Jun 23, 2012
Messages:
689
Likes:
9
Location:
Portugal
The running median is using 2 heaps - the min heap and max heap - right?

Joined:
Apr 19, 2010
Messages:
2,697
Likes:
368
Location:
Finland
10. ### MrAlWell-Known MemberMost Helpful Member

Joined:
Sep 7, 2008
Messages:
11,070
Likes:
963
Location:
NJ
Hello there,

Many times a problem can be greatly simplified if there is a limit to the range of data that enters the system. For example, for color channels in a normal 16 million color display the range is only 0 to 255 for each channel, so of course that means each channel can be stored in a single byte. If it wasnt like that, it could take two or more bytes just to store that one channel, or channels would have to share bytes and that makes the whole thing harder to implement.

So i just have to ask if there is a limit to the range of data that will be accepted as input. If the data only appears as integers from 1 to 100 for example, we can probably do this in a very nearly N time period without too much trouble even with a HUGE data set of over 2^32 items.

11. ### misterTWell-Known MemberMost Helpful Member

Joined:
Apr 19, 2010
Messages:
2,697
Likes:
368
Location:
Finland
I don't see how the datatype (or range) would affect the big O classification of the algorithm. Of course it will affect the efficiency and overall speed, but not the big-O characteristics. And I think the original question is purely academic. I think electroRF is studying algorithms in general and therefore posting these questions.. quite interesting questions in my opinion. ElectroRF can correct me if I'm wrong, but I don't think he has any specific limits to the dataset.

12. ### NorthGuyWell-Known Member

Joined:
Sep 8, 2013
Messages:
1,218
Likes:
206
Location:
In the simplest case, imagine 0/1 data. You only need to keep count of 0s and 1s, then if there's more 1s than 0s then median is 1, otherwise 0. That's O(n). Can't do that with arbitrary float data.

• Like x 1
13. ### misterTWell-Known MemberMost Helpful Member

Joined:
Apr 19, 2010
Messages:
2,697
Likes:
368
Location:
Finland
Still O(n).. I did not say that datatype would not affect overall speed. I said that the datatype and range do not affect the big-O classification of an algorithm.. There is an O(log n) algorithm for solving running median, but it does not mean that it is faster than Quick Select which is O(n). Regardless of the datatype or range of the data.

Last edited: Jan 3, 2014
• Like x 1
14. ### NorthGuyWell-Known Member

Joined:
Sep 8, 2013
Messages:
1,218
Likes:
206
Location:
It's possible that O(n) algorithm exists (although I doubt so), but it's certainly different than the simple algorithm I described for binary data.

Running median is O(n^2). It's essentially sort by insertion.

QuickSelect is the same as QuickSort except that you do not sort edges because median cannot be there. It's still O(n*log(n)) for random data.

If you have specific data you can use data-specific algorithm which may be not applicable for general data.

• Like x 1
15. ### NorthGuyWell-Known Member

Joined:
Sep 8, 2013
Messages:
1,218
Likes:
206
Location:

However, there's no corresponding O(n) algorithm for full sorting. The best sorting is probably O(n*log(n)), while with binary data it is O(n).

• Like x 1
16. ### MrAlWell-Known MemberMost Helpful Member

Joined:
Sep 7, 2008
Messages:
11,070
Likes:
963
Location:
NJ
Hi,

Ultimately these kinds of algorithms have to be implemented in computer code and thus time becomes an important factor which overshadows any man made notation used to compare things. In notation alone O(n) may be the same as O(8n), but i for one do not wish to wait 8 times longer for an algorithm to complete just because it happens to be classified the same as one that runs faster.

The main point was simplification through range limiting. If the range of the data can be limited (as in NorthGuy's cute example) then the algorithm often ends up being more simple, and simpler usually means faster time wise.