Teensy 4.1 is the latest version of the surprisingly popular development platform, featuring an ARM Cortex-M7 processor at 600 MHz, an NXP iMXRT1062 chip, four times larger flash memory than 4.0, and two new locations for optionally adding more memory. The Teensy 4.1 has the same size and shape as the Teensy 3.6 (2.4" x 0.7") and provides more I/O capabilities, including an ethernet PHY, SD card socket, and USB host port.
When operating at 600 MHz, the Teensy 4.1 consumes about 100mA of current and provides support for dynamic clock scaling. Unlike traditional microcontrollers where changing the clock rate causes false baud rates and other problems, the Teensy 4.1 hardware and Teensyduino's software support for Arduino timing functions are designed to allow for speed changes dynamically. Serial baud rates, audio stream sample rates, and Arduino functions like delay() and millis(), and Teensyduino extensions like IntervalTimer and elapsedMillis continue to work properly while the CPU is changing speed. Teensy 4.1 also provides a power off feature. By connecting a button to the On/Off pin, the 3.3 V power supply can be completely disabled by holding the button for five seconds and turned back on with a short press of the button. If a coin cell is connected to the VBAT, the Teensy 4.1's RTC continues to keep track of the date and time even when the power is off. The Teensy 4.1 is also overclocked far beyond 600 MHz!
ARM Cortex-M7 brings many powerful CPU features to a true real-time microcontroller platform. The Cortex-M7 is a dual output superscaler processor, meaning the M7 can execute two instructions per clock cycle at 600 MHz! Of course, executing both at the same time depends on compiler ordering instructions and registers. Initial benchmarks have shown that C++ code compiled by Arduino tends to hit two instructions about 40% to 50% of the time, while performing computationally intensive work using integers and pointers. Cortex-M7 is the first ARM microcontroller to use branch prediction. In M4, loops and other code with most branches take three clock cycles. With M7, after a loop has been executed several times, the branch prediction removes this overhead, allowing the branch instruction to run in just a single clock cycle.