PIC Benchmarking

Overclocked · Sep 23, 2014

For a Application I have in mind I need quick processing. I want to be able to Process and Sample two ADC inputs in a time frame of 1ms, but before I do that, I want to time how fast my microcontroller can do simple multiplication. I came up with the following:

Code:

Device = 18F2410
Clock = 8

Include "IntOSC.bas"


Dim A As Word
Dim B As Word
Dim C As LongWord

A = 2048
B = 1024

While True

  Low (PORTB.7)  //Make PortB.7 High
  C= A*B

  High (PORTB.7)
   
Wend

I use my O-Scope to measure for how long the Port is low for. I get around 276uS. Is there another way to do this? It seems rather slow to me, but I am running at a 8Mhz clock. Any Advice would be appreciated. This might be a time to look into those 32 Bit PICs I have..

NorthGuy · Sep 23, 2014

The PIC18F datasheet provides an assembler code snippet for 16x16 multiplication which runs in 28 cycles for unsigned or 40 cycles for signed words. 40 cycles at 8MHz clock is 20us. Anything above this is the overhead of the language that you use.

jjw · Sep 24, 2014

I tested this with Oshonsoft Basic and PIC16F88.
It takes 247us with 8MHz clock.
PIC18fxxxx have 8x8 multiplier, which your compiler obviously does not use.
Your test values are not good because a clever compiler can multiply by 1024 using shift operations.

Pommie · Sep 24, 2014

Most of the newer (and some older) 18F can run at 32MHz internal or 40MHz with a crystal. With hardware multiply they should be fast enough.

BTW, what compiler is that?

Mike.

Overclocked · Sep 24, 2014

Pommie , Swordfish SE. I use it for all my bit stuff (with the exception of the 12F Series). It supports mostly all of the 18F.

https://www.sfcompiler.co.uk/swordfish/

I did a little more testing, This time I put on PLL and increased the Clock to 32Mhz. Using the same numbers I got it down to 67uS Processing time. Now I added in UART (that was commented out) and at the given Baud rate, it was slowing the PIC way down, so I upped the Baud Rate to max (115200) and my Processing time is around 367uS.

Anyone want to give this a shot in other languages? Im going to try Greatcow Basic next, then Give XC8 a Go (The math part shouldnt be too hard).

NorthGuy · Sep 24, 2014

Overclocked said:
Give XC8 a Go (The math part shouldnt be too hard).

XC8 should give you theoretical 40 cycles/multiply or better.

Jon Wilder · Sep 24, 2014

I just coded up a 8x8 multiply w/16-bit result in asm on a PIC16F887 with a 16MHz xtal. Using timer 1 as the timer, I ran it in MPLAB SIM. Total time from start to finish was 11.25uS.

It's definitely the overhead of the language you're using...code optimization and what not. The PIC16F does not have the 8x8 multiplier hardware and even without it, it was able to multiply within 11.25uS. Double this time for an 8MHz clock (22.50uS).

tumbleweed · Sep 24, 2014

That Swordfish code is doing a full 16x16 multiply to produce the 32-bit result, but it is using a software algorithm to do it.

For a Swordfish implementation that uses the hardware multiplier, check out https://sfcompiler.co.uk/phpBB3/viewtopic.php?f=3&t=1884

The function shown there does the multiply in 8us @ 32MHz, including the function call/return overhead

Mike - K8LH · Sep 24, 2014

Jon Wilder said:
I just coded up a 8x8 multiply w/16-bit result in asm on a PIC16F887 with a 16MHz xtal. Using timer 1 as the timer, I ran it in MPLAB SIM. Total time from start to finish was 11.25uS.

It's definitely the overhead of the language you're using...code optimization and what not. The PIC16F does not have the 8x8 multiplier hardware and even without it, it was able to multiply within 11.25uS. Double this time for an 8MHz clock (22.50uS).

please show the code? Oops! Never mind. I didn't notice the "8x8"...

misterT · Sep 24, 2014

Many years ago I tried to benchmark C# and .NET framework. I find out how get precise system time.. or timing. I think it used some low level media clock from windows media.dll.. anyway. I wrote a program that calculated sin-function thousands times in a row. I thought I had a good setup there.

The result was just about zero, because the code did not use the result (from the sin-function) in anywhere.. and the compiler just optimized the whole loop away.

When you benchmark, you need to know what you are benchmarking.. the compiler or the processor (and other hardware). Many traps there.. not easy.

The first idea you had is ok, but it is not enough. In C there is a keyword "volatile" that prevents the compiler from optimizing, but then again you need to know what you are doing.

EDIT: I would like to know how long your test signal is "High" in your original test described in post #1

EDIT2:
Does this mean that the signal goes high or low? (referring to your first post):
Low (PORTB.7) //Make PortB.7 High

Overclocked · Sep 24, 2014

tumbleweed said:
That Swordfish code is doing a full 16x16 multiply to produce the 32-bit result, but it is using a software algorithm to do it.

For a Swordfish implementation that uses the hardware multiplier, check out https://sfcompiler.co.uk/phpBB3/viewtopic.php?f=3&t=1884

The function shown there does the multiply in 8us @ 32MHz, including the function call/return overhead

Ive seen that in the Data sheet! I also looked (briefly) at the ASM put out by Swordfish, Its a little funky looking and looks nothing like that.

misterT said:
Many years ago I tried to benchmark C# and .NET framework. I find out how get precise system time.. or timing. I think it used some low level media clock from windows media.dll.. anyway. I wrote a program that calculated sin-function thousands times in a row. I thought I had a good setup there.

The result was just about zero, because the code did not use the result (from the sin-function) in anywhere.. and the compiler just optimized the whole loop away.

When you benchmark, you need to know what you are benchmarking.. the compiler or the processor (and other hardware). Many traps there.. not easy.

The first idea you had is ok, but it is not enough. In C there is a keyword "volatile" that prevents the compiler from optimizing, but then again you need to know what you are doing.

EDIT: I would like to know how long your test signal is "High" in your original test described in post #1

EDIT2:
Does this mean that the signal goes high or low? (referring to your first post):
Low (PORTB.7) //Make PortB.7 High

Yes, it goes Low to High. I had it go High to low and saw no difference. The High Signal is 2uS, Which makes sense if the instruction cycle is 1uS. To answer your other questions, I am benchmarking the hardware when running software. I may offload just the Analog and transmission part on a smaller 8 Pit Proc, while leaving the Heavy Duty stuff (Math, Display) to a 32 Bit Proc, or use two 32 Bit Processors.

Pommie · Sep 24, 2014

Overclocked said:
I did a little more testing, This time I put on PLL and increased the Clock to 32Mhz. Using the same numbers I got it down to 67uS Processing time. Now I added in UART (that was commented out) and at the given Baud rate, it was slowing the PIC way down, so I upped the Baud Rate to max (115200) and my Processing time is around 367uS.

If you include the transmit time then even with a processor running at 100GHz the result will be the same. You could add a fifo buffer for the RS232 and that will speed it up. A description of exactly what you are trying to achieve would help a great deal. Note, that means your goal, not your solution.

Edit, why is this post all underlined and blue?

Mike.

Overclocked · Sep 25, 2014

I'm in the works of designing a power monitor for my house (Just in the "Is it possible?" phase). Ive got most of the hardware figured out (I'll be using clamp type transformers for measuring current, and a RMS to DC converter to make things easier). I would like to know (in real time) how much power Im using, rather than waiting a month to get my bill. I figured that sampling a 60Hz wave every 1mS would give enough resolution. Im following after the Microchip application note (which I cant seem to find now).

On the other hand, I have already taken measures to lower it, ie LED/CFL lighting, plus Im not home most of the day anyway.

NorthGuy · Sep 25, 2014

Overclocked said:
RMS to DC converter to make things easier.

You cannot get accurate power readings if you use RMS to DC converter for volatage and current and then multiply the two. This is because of power factor, which may be seriously different from one, espcially if you have lots of LED/CFL.

To get correct power estimate you need to multiply pairwise the voltage and current measurements and then sum the results. This doesn't look like a huge task for PIC18F. Or, you can get a power-analyzing IC, which will do all the job for you. You only need to supply sensors and digitally read the results.

Even easier solution is to just look at your electric meter. It'll display consumption in real-time.

Overclocked · Sep 25, 2014

Yes, I agree the "easy" solution would be to look at my meter, but wheres the fun in that! (Also the meter is outside and its one of the old style type meters). I had gotten the idea from a few places, http://openenergymonitor.org/emon/, and http://www.billporter.info/2010/12/19/not-so-tiny-power-meter/

I know I chose the RMS to DC IC for a reason, and for the life of me I cant remember My thinking. I really must get a notebook for these types of things. Anyway, there was a reason for *not* using a dedicated IC, and it was because of Size (QFN package, etc), But I do have PID controlled Hotplate that should be able to solder it. I will look into this!

Still, its fun to figure out all the nooks and crannies of a monitoring system. Its proving to be a good refresher course with XC-8.

Overclocked · Sep 25, 2014

My Results with XC8 are pretty impressive, 27uS processing time for the same code! I also dont think PLL works correctly, Im outputting the clock on RA6 for testing purposes, and I measure 2Mhz. 267uS Vs 27uS is a lot of Overhead!

Here is the code I used (Fixed-Added a while loop)

Code:

/*
* File:  Math Test.c
* Author: chris
*
* Created on September 24, 2014, 6:00 PM
*/
#include <pic18f2410.h>
#define _XTAL_FREQ 32000000
#include <stdio.h>
#include <stdlib.h>
#include <xc.h>
#include <htc.h>
#include <math.h>

#pragma config OSC = INTIO7
#pragma config FCMEN = OFF  // Fail-Safe Clock Monitor Enable bit (Fail-Safe Clock Monitor disabled)
#pragma config IESO = ON
#pragma config WDT = OFF  // Watchdog Timer Enable bit (WDT disabled (control is placed on the SWDTEN bit))
#pragma config MCLRE = OFF

/*
*
*/

unsigned short A;
unsigned short B;
unsigned long C;

void SetupClock (void) //set up clock
{
  OSCCON = 0b01110010;
  OSCTUNE = 0b0100000;

}

void main ()
{
  SetupClock();
  TRISB = 0b00000000;

  PORTBbits.RB7 = 0;
  while (1)
           {
               A = 2048;
               B = 2048;
               C = A*B;

               PORTBbits.RB7 = 1;
            }

}

NorthGuy · Sep 25, 2014

In C, the result of the expression is the same type as operands

C = A*B will produce 16-bit result - zero.

The optimizing compiler may see that you use fixed numbers, and replace multiplication with C = 0; Or, since C is not used anywhere, it may remove the whole computation.

It is safer to use something like this

C:

A = something;
B = something;
 
while (1) {
 
  A++; // make sure compiler doesn't use constant value
  B++;
 
  PORTBbits.RB7 = 0; // start measurement
  C = (long)A*B;     // make sure you get 32-bit result
  PORTBbits.RB7 = 1; // end measurement
 
  if (C > 1000) PORTBbits.RB7 = 1; // make sure C is used somewhere
 
}

tumbleweed · Sep 26, 2014

267uS Vs 27uS is a lot of Overhead!

It's not "overhead", it's the fact that Swordfish is doing the 16-bit multiply in software, and isn't using the hardware multiplier.
Unlike C, it'll produce a correct 32-bit result when you multiply two 16-bit numbers.

Did you check out the link I posted in #8? It's considerably faster, and you don't have to switch tools.

Overclocked · Sep 26, 2014

NorthGuy said:
In C, the result of the expression is the same type as operands

C = A*B will produce 16-bit result - zero.

The optimizing compiler may see that you use fixed numbers, and replace multiplication with C = 0; Or, since C is not used anywhere, it may remove the whole computation.

It is safer to use something like this

Oddly the complier wouldn't let me use that code without keeping C defined as a unsigned long, it marked it "wrong". Since I added that in, the time is now around 200uS.

C:

unsigned short A;
unsigned short B;
unsigned long  C;
void SetupClock (void) //set up clock
{
  OSCCON = 0b01110010;
  OSCTUNE = 0b0100000;
 
}

void main ()
{ 
  SetupClock();
  TRISB = 0b00000000;

  while(1){
  PORTBbits.RB7 = 0;
 
  A++;
  B++;
  C =(long)A*B;

  PORTBbits.RB7 = 1;
  if (C > 1000) {PORTBbits.RB7 = 1; // make sure C is used somewhere
  }
  }
 
}

tumbleweed said:
It's not "overhead", it's the fact that Swordfish is doing the 16-bit multiply in software, and isn't using the hardware multiplier.
Unlike C, it'll produce a correct 32-bit result when you multiply two 16-bit numbers.

Did you check out the link I posted in #8? It's considerably faster, and you don't have to switch tools.

Yeap, Just did, and it works great. 8.65uS! Now to make it into a library

NorthGuy · Sep 26, 2014

You need to define all the variables in C, or it won't compile. You also took A++/B++ into the measurement in addition to the multiplication, but I don't think it makes a lot of difference.

Looks like XC8 compiler is not very efficient at all

They have $1000 optimized version, but I'm not sure it would fare better.

Welcome to our site!

Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.

PIC Benchmarking

Member

Well-Known Member

Active Member

Well-Known Member

Member

Well-Known Member

Active Member

Well-Known Member

Well-Known Member

Well-Known Member

Member

Well-Known Member

Member

Well-Known Member

Member

Member

Well-Known Member

Well-Known Member

Member

Well-Known Member

Similar threads

New Articles From Microcontroller Tips