1. Welcome to our site! Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.
    Dismiss Notice

find min, max, median and average

Discussion in 'Mathematics and Physics' started by electroRF, Jan 1, 2014.

  1. electroRF

    electroRF Member

    Joined:
    Jun 23, 2012
    Messages:
    689
    Likes:
    9
    Location:
    Portugal
    Hi,

    I ran into this question: write a function that receives an array of n numbers and finds min, max, median and average.

    You can of course sort the array in O(nlog(n)) and then get the min, max, median and average.

    I wanna do it in O(n) though.

    I thought of finding the Median using Quick Sort Algorithm to find the (n/2)th element, which would take O(n) on average, or O(n^2) on worst case.

    I also know of the Median of Median algorithm which finds the median in O(n) worst case.


    Is there any better way you know to find the min, max, median and average?
     
  2. NorthGuy

    NorthGuy Well-Known Member

    Joined:
    Sep 8, 2013
    Messages:
    1,218
    Likes:
    206
    Location:
    Northern Canada
    Since you worry about the speed at large n, this must be a huge array.

    The important question here is if you're allowed to sort the array in place or you must leave it intact, and if you have to leave it intact whether you have enough memory to allocate a copy of the array you want to sort. When arrays are huge, memory could be a real problem.

    There could be desirable to have the array sorted anyway. Then sorting it prior to calling your routine is the best solution regardless of the speed.

    You cannot select (or create) any particular algorithm without knowing how it is going to be used.
     
    • Like Like x 1
  3. misterT

    misterT Well-Known Member Most Helpful Member

    Joined:
    Apr 19, 2010
    Messages:
    2,697
    Likes:
    368
    Location:
    Finland
    If you just combine "running average" and "running median" algorithms you have the solution. Keeping track of Min and Max is very easy. Average is easy also.
    Median is a bit more complex, but not that difficult at all. Google for "running/moving/rolling/streaming median algorithm".
     
    Last edited: Jan 1, 2014
    • Like Like x 1
  4. dave

    Dave New Member

    Joined:
    Jan 12, 1997
    Messages:
    -
    Likes:
    0


     
  5. electroRF

    electroRF Member

    Joined:
    Jun 23, 2012
    Messages:
    689
    Likes:
    9
    Location:
    Portugal

    Thank you Guys.

    T, that answered the question :) thanks a lot

    NorthGuy, actually it wasn't specified in the question if you're allowed to sort in-place, however T suggested a good solution.
     
  6. NorthGuy

    NorthGuy Well-Known Member

    Joined:
    Sep 8, 2013
    Messages:
    1,218
    Likes:
    206
    Location:
    Northern Canada
    QuickSort is way faster than rolling median and requires much less memory (none if we allowed to sort in place; 20 to 50% less if we create a copy of the array for sorting).
     
  7. misterT

    misterT Well-Known Member Most Helpful Member

    Joined:
    Apr 19, 2010
    Messages:
    2,697
    Likes:
    368
    Location:
    Finland
    Yes. Maybe the median should be solved using Quickselect (based on Quicksort) which is O(n). I like the rolling median because I mostly deal with real time data. And that can be done in O(log n).. but that does not mean it is faster than Quickselect.
     
  8. electroRF

    electroRF Member

    Joined:
    Jun 23, 2012
    Messages:
    689
    Likes:
    9
    Location:
    Portugal
    The running median is using 2 heaps - the min heap and max heap - right?
     
  9. misterT

    misterT Well-Known Member Most Helpful Member

    Joined:
    Apr 19, 2010
    Messages:
    2,697
    Likes:
    368
    Location:
    Finland
  10. MrAl

    MrAl Well-Known Member Most Helpful Member

    Joined:
    Sep 7, 2008
    Messages:
    11,049
    Likes:
    961
    Location:
    NJ
    Hello there,

    Many times a problem can be greatly simplified if there is a limit to the range of data that enters the system. For example, for color channels in a normal 16 million color display the range is only 0 to 255 for each channel, so of course that means each channel can be stored in a single byte. If it wasnt like that, it could take two or more bytes just to store that one channel, or channels would have to share bytes and that makes the whole thing harder to implement.

    So i just have to ask if there is a limit to the range of data that will be accepted as input. If the data only appears as integers from 1 to 100 for example, we can probably do this in a very nearly N time period without too much trouble even with a HUGE data set of over 2^32 items.
     
  11. misterT

    misterT Well-Known Member Most Helpful Member

    Joined:
    Apr 19, 2010
    Messages:
    2,697
    Likes:
    368
    Location:
    Finland
    I don't see how the datatype (or range) would affect the big O classification of the algorithm. Of course it will affect the efficiency and overall speed, but not the big-O characteristics. And I think the original question is purely academic. I think electroRF is studying algorithms in general and therefore posting these questions.. quite interesting questions in my opinion. ElectroRF can correct me if I'm wrong, but I don't think he has any specific limits to the dataset.
     
  12. NorthGuy

    NorthGuy Well-Known Member

    Joined:
    Sep 8, 2013
    Messages:
    1,218
    Likes:
    206
    Location:
    Northern Canada
    In the simplest case, imagine 0/1 data. You only need to keep count of 0s and 1s, then if there's more 1s than 0s then median is 1, otherwise 0. That's O(n). Can't do that with arbitrary float data.
     
    • Like Like x 1
  13. misterT

    misterT Well-Known Member Most Helpful Member

    Joined:
    Apr 19, 2010
    Messages:
    2,697
    Likes:
    368
    Location:
    Finland
    Still O(n).. I did not say that datatype would not affect overall speed. I said that the datatype and range do not affect the big-O classification of an algorithm.. There is an O(log n) algorithm for solving running median, but it does not mean that it is faster than Quick Select which is O(n). Regardless of the datatype or range of the data.
     
    Last edited: Jan 3, 2014
    • Like Like x 1
  14. NorthGuy

    NorthGuy Well-Known Member

    Joined:
    Sep 8, 2013
    Messages:
    1,218
    Likes:
    206
    Location:
    Northern Canada
    It's possible that O(n) algorithm exists (although I doubt so), but it's certainly different than the simple algorithm I described for binary data.

    Running median is O(n^2). It's essentially sort by insertion.

    QuickSelect is the same as QuickSort except that you do not sort edges because median cannot be there. It's still O(n*log(n)) for random data.

    If you have specific data you can use data-specific algorithm which may be not applicable for general data.
     
    • Like Like x 1
  15. NorthGuy

    NorthGuy Well-Known Member

    Joined:
    Sep 8, 2013
    Messages:
    1,218
    Likes:
    206
    Location:
    Northern Canada
    I thought about this, and you're right, QuickSelect finds median close to O(n) on random data.

    However, there's no corresponding O(n) algorithm for full sorting. The best sorting is probably O(n*log(n)), while with binary data it is O(n).
     
    • Like Like x 1
  16. MrAl

    MrAl Well-Known Member Most Helpful Member

    Joined:
    Sep 7, 2008
    Messages:
    11,049
    Likes:
    961
    Location:
    NJ
    Hi,

    Ultimately these kinds of algorithms have to be implemented in computer code and thus time becomes an important factor which overshadows any man made notation used to compare things. In notation alone O(n) may be the same as O(8n), but i for one do not wish to wait 8 times longer for an algorithm to complete just because it happens to be classified the same as one that runs faster.

    The main point was simplification through range limiting. If the range of the data can be limited (as in NorthGuy's cute example) then the algorithm often ends up being more simple, and simpler usually means faster time wise.

    For example if we had to add integers we could do this pretty fast, one operation per integer. But if we had to add 4-vectors (for simplicity with integer elements) we'd have to add each element of every vector meaning it would take 4 times as long to complete. But if we examined the data and found that every one of these vectors had only one non zero element and that element was always of the same dimension, then we could simplify the problem by adding only that one element for each vector and ignore the other three, and that would reduce the time back to almost what it took to add the single integers. So clearly there is a benefit in thinking about this even if we can not formally deem it to be of some lower order.

    BTW, when i talk about algorithms like this i always like to specify if the array has been sorted BEFORE entering the algorithm, because specifying an order for the algorithm when it has already been sorted is a little like cheating <chuckle> unless the data came from a previous operation that required sorting anyway. So if there is a sort involved because the algorithm needs sorted data to work properly then i think it should be stated outright.
    So far i have been assuming that the data coming into the algorithm has not yet been sorted because it if was then there's almost no game. Thus what i have been talking about is based on an input data set that is NOT sorted, and we do not wish to take the time to sort it because no following algorithms will need it that way either. If there were other algorithms that did need a sorted data set, then we'd have to consider the placement of the sort in the sequence of operations and that would probably change everything.
     
    • Like Like x 1

Share This Page