Computer Architecture and Organization 2556/2: มีนาคม 2014

Pipeline และ CPU

โปรเซสเซอร์

ปี 1989 Intel ประกาศตัว 80486 ซึ่งเป็นซีพียูแบบ 32 บิต พร้อมเปิดตัวสิ่งที่เรียกว่า ”ไปป์ไลน์” (Pipeline)

ไปป์ไลน์ช่วยให้ซีพียูสามารถเฟ็ตช์คำสั่งเข้ามาทำงานได้หลาย ๆ คำสั่งในเวลาเดียวกันได้ โดยเอ็กซิคิวต์ในแต่ละคำสั่งในแต่ละสัญญาณนาฬิกา (Clock cycle) เรียกการทำงานแบบนี้ว่า “สเกลลาร์” (Scalar)

ปี 1993 ได้เปิดตัวซีพียูในยุคที่ 5 ที่เรียกว่า “Pentium” โดยนำไปป์ไลน์มาใส่ไว้ในซีพียูถึง 2 ตัว ทำงานแบบขนานพร้อม ๆ กัน โดยไม่ขึ้นต่อกัน ทำให้สามารถเอ็กซิคิวต์ได้ 2 คำสั่งใน 1 สัญญาณนาฬิกา

เรียกสถาปัตยกรรมนี้ว่า “ซุปเปอร์สเกลลาร์” (Superscalar)

องค์ประกอบของ cpu

(Pipeline)

ไปป์ไลน์ (Pipeline) คือการทำงานแบบคาบเกี่ยวกัน (overlap) โดยการแบ่งซีพียูออกเป็นส่วนย่อย ๆ แล้วแบ่งงานกันรับผิดชอบ

เดิมไปป์ไลน์เป็นเทคนิคของสถาปัตยกรรมแบบ RISC ต่อมานำมาใช้กับสถาปัตยกรรมแบบ CISC

แบ่งเป็นภาคหลัก ๆ คือ

ขั้นตอนการทำงานของ Pipeline

ภาคเฟ็ตช์คำสั่ง หรือ Instruction Fetch ส่วนนี้จะทำหน้าที่รับคำสั่งใหม่ ๆ ทั้งจากหน่วยความจำหลัก หรือจาก Instruction Cache เข้ามา

ภาคถอดรหัสคำสั่ง หรือ Instruction Decode ส่วนนี้จะทำหน้าที่แยกแยะคำสั่งต่าง ๆ ของ CISC

ภาครับข้อมูล หรือ Get Operands ส่วนนี้ทำหน้าที่รับข้อมูลที่จะใช้ในการเอ็กซิคิวต์เข้ามาเก็บไว้

ภาคเอ็กซิคิวต์ หรือ Execute ส่วนนี้เป็นขั้นตอนที่ทำการเอ็กวิคิวต์ตามคำสั่งและโอเปอแรนด์ที่ได้รับมา

ภาคเขียนผลลัพธ์ หรือ Write Result เมื่อทำการเอ็กซิคิวต์เสร็จเรียบร้อยแล้ว ผลลัพธ์ที่ได้ก็จะนำไปเก็บไว้ในรีจิสเตอร์ หรือในแคช

ภาพการทำงานของ (Pipeline)

CPU

ขั้นตอนการทำงานของ CPU

ก่อนที่ CPU จะทำการประมวลผลข้อมูล คำสั่งและข้อมูลจะต้องถูกโหลดมาเก็บไว้ในหน่วยความจำหลักเสียก่อน

การประมวลผลคำสั่งของ CPU

หลังจากคำสั่งและข้อมูลอยู่ในหน่วยความจำแล้ว CPU ก็จะทำการประมวลผลที่ละคำสั่ง ใน 4 ขั้นตอนดังนี้

ขั้นตอนการทำงานของ CPU ภาพแสดงขั้นตอนการทำงานของ CPU

• จากโปรแกรมที่ประกอบด้วยกลุ่มของคำสั่งที่ต้องการให้คอมพิวเตอร์ทำการประมวลผล แต่ละคำสั่งประกอบด้วย รหัสให้ทำงาน ( OperationCode)

หรือ ออปโค้ด (Opcode) เช่น ADD (การบวก) SUB (การลบ)MUL (การคูณ) DIV (การหาร) และสิ่งที่เรียกว่า โอเปอแรนต์ (Operand)

ซึ่งจะบอกตำแหน่งของที่เก็บข้อมูลในหน่วยความจำ เช่น สัญลักษณ์ Aหรือ B

• ตัวอย่างของคำสั่งหนึ่งๆ ที่มีอยู่ในโปรแกรมภาษาแอสแซมบลี เช่น ADD A,B หมายถึงให้มีการนำข้อมูลที่เก็บอยู่ในหน่วยความจำที่ตำแหน่ง A และข้อมูลที่เก็บอยู่ในหน่วยความจำที่ตำแหน่ง B มาทำการบวกกัน ซึ่งคำสั่งนี้จะต้องถูกแปลให้เป็นภาษาเครื่อง (MachineLanguage) ก่อนการปฏิบัติงานของซีพียูเสมอ

ขั้นตอนการทำงานของ CPU และความสัมพันธ์ในการใช้ Resistor

ขั้นตอนการประมวลผลของ CPU

• การเฟตช์ (Fetch) เป็นกระบวนการที่หน่วยควบคุม (CU) ไปนำคำสั่งที่ต้องการใช้จากหน่วยความจำมาเพื่อการประมวลผลมาเก็บไว้ที่ Register

• การแปลความหมาย ( Decode ) เป็นกระบวนการถอดรหัสหรือแปลความหมายคำสั่งต่างๆ เพื่อส่งไปยังหน่วยคำนวณและตรรกะเพื่อดำเนินการต่อไป

• การเอ็กซ์คิวต์ ( Execute ) เป็นกระบวนประมวลผลคำสั่งโดยหน่วยคำนวณและตรรกะ ซึ่งการประมวลผลจะประมวลผลทีละคำสั่ง

• การจัดเก็บ ( Store ) เป็นกระบวนการจัดเก็บผลลัพธ์ที่ได้จากการประมวลผลและจัดเก็บไว้ในหน่วยความจำหรือรีจิสเตอร์

วัฏจักรการทำงานของซีพียู หรือวัฏจักรเครื่อง (Machine Cycle)

Machine Cycle & การประมวลผลคำสั่งโปรแกรม

การประมวลผลคำสั่งโปรแกรมระดับเครื่อง (ภาษาเครื่อง) หนึ่งคำสั่ง เกิดขึ้นในระหว่างหนึ่ง Machine Cycle :-

• วัฏจักรคำสั่ง Instruction Cycle (I-cycle) l fetch instruction - control unit รับคำสั่งจากแรม l decode instruction - control unit แปลความหมายคำสั่งโปรแกรม และเก็บส่วนที่เป็น คำสั่ง ของคำสั่งโปรแกรมไว้ใน Instruction Register & เก็บส่วนที่เป็นแอดเดรส ของคำสั่งโปรแกรมไว้ใน Address Register

เวลาที่ใช้ในการแปลคำสั่ง (Instruction Time)

เวลาทั้งหมดในการประมวลผลแต่ละคำสั่ง ประกอบด้วย 2 ส่วนคือ

•การแปลคำสั่ง (fetch and decode) และการประมวลผลคำสั่ง(execute and store)

•เวลาที่ใช้แปลคำสั่งเรียกว่า instruction time

เวลาที่ใช้ในการประมวลผล เรียกว่า E xecution time

เวลาที่ใช้ประมวลผลแต่ละคำสั่ง (Machine Cycle)

The combination of I-time and E-time is called the machine cycle

หน่วยวัดความเร็วของซีพียู

• เมกะเฮิรตซ์ ( Megahertz: MHz ) เป็นหน่วยวัดความเร็วของซีพียูในไมโครคอมพิวเตอร์
หรือ Clock Speed ที่มีความเร็วหนึ่งล้านวัฏจักรเครื่องต่อวินาที
( Millions machine cycle per second )

• มิปส์ ( Million of
Instructions Per Second: MIPS ) เป็นหน่วยวัดความเร็วของซีพียูของคอมพิวเตอร์ขนาดกลางขึ้นไปโดย1 MIPS จะสามารถประมวลผลได้หนึ่งล้านคำสั่งต่อวินาที ( Million
of Instructions Per Second: MIPS )

• ฟลอปส์ ( Floating Point Operations Per Second: FLOPS ) เป็นหน่วยวัดความเร็วของซีพียูในซูเปอร์คอมพิวเตอร์ ซึ่งมักวัดความสามารถในการปฏิบัติการคำนวณทางคณิตศาสตร์แบบทศนิยมหรือFloating Point

               รูปแบบการประมวลผลของซีพียู

1. การประมวลผลแบบเดี่ยว ( Single
processing) หรือ Sequential Processing เป็นการประมวลผลข้อมูลตามลำดับ
เนื่องจากมีซีพียูทำงานเพียงตัวเดียว ปัญหาที่เกิดขึ้นคือ การประมวลผลข้อมูลล่าช้า

2. การประมวลผลแบบขนาน (
Parallel processing) เป็นการใช้ซีพียูมากกว่า 1 ตัว ( Multiple Processors ) ในการประมวลผลงานๆ
หนึ่งพร้อมกัน โดยซีพียูจะแตก (break down) ปัญหาออกเป็นส่วนย่อยๆ
เพื่อแบ่งให้ซีพียูแต่ละตัวประมวลผล ซึ่งสามารถเปรียบเทียบได้กับการประมวลผลแบบซีพียูเดียว

และ หลายซีพียูได้ดังภาพ

อ้างอิง http://bell-jung1101.blogspot.com/2011/07/pipeline-cpu.html

acer aspire 4755g
Processor / Chipset
- CPUIntel Core i5 2 i5-2410M / 2.3 GHz
- Max Turbo Speed2.9 GHz
- Number of CoresDual-Core
- CacheL3 cache - 3.0 MB
- 64-bit ComputingYes
- ChipsetMobile Intel HM65 Express
- FeaturesIntel Turbo Boost Technology 2.0,
  Hyper-Threading Technology

Intel Core i5 680 and Core i7 870s CPUs in the pipeline
umors have it that Intel is planning on extending its Core i5 and Core i7 CPU lines. According toDigitimes, two new CPUs are on the way. The Core i5 680, 3.6GHz dual core part, 256KB of L2 cache per core and 4MB of L3 cache. The CPU is based on Intel's 32nm Clarkdale architecture and features a built-in 45nm integrated graphics processor. This chip will become Intel's fastest dual-core part and is expected to be priced (for a tray of 1,000) at $284. Then there's the Core i7 870s, which is like the i7 860 only clocked at a lower speed of 2.67GHz. This speed drop lowers the chips TDP from 95W to 82W. The price per thousand parts is expected to be $560.These CPUs, along with those from AMD, mean that there are some interesting times ahead of us

ปัญหาที่เป็นอุปสรรคของ PIPELINE (PIPELINE HAZARD)

เป็นผลกระทบจากการทำไปป์ไลน์เมื่อเกิดเหตุการณ์ที่ทำให้เกิดผลกระทบกับผลลัพธ์ของระบบ ยกตัวอย่างเช่น เมื่อคำสั่งบางคำสั่งจะมีการเขียนผลลัพธ์ลงบนตัวโอเปอร์แรนด์บางตัวที่ต้องถูกอ่านค่าจากอีกคำสั่งหนึ่ง หรือคำสั่งประเภท BRANCH ที่มีการกระโดดไปทำงานที่ส่วนอื่นก่อนแล้วจึงกลับมาทำคำสั่งต่อไปได้ ทำให้คำสั่งต่อไปไม่สามารถทำงานขนานกันไปได้ มี 3 ประเภทได้แก่

1.STRUCTURE HAZARDS : เกิดจากการขัดแย้งเมื่อ HARDWARE ไม่สามารถรองรับการรวมชุดคำ สั่งพร้อมกันในเวลาเดียวกัน เกิดการทับซ้อนกันเมื่อมีการ EXECUTE

2. DATA HAZARDS : เกิดจากการทีชุดคำ สั่งทับซ้อนกัน ไม่สามารถที่จะ EXECUTE

3.CONTROL HAZARD : PIPE LINE ที่แยกชุดคำสั่งไม่สามารถควบคุมสัญญาณได้

Reference

http://www.cnet.com/laptops/acer-aspire-4755g-6457/4507-3121_7-35268223.html
http://ark.intel.com/th/products/52224/Intel-Core-i5-2410M-Processor-3M-Cache-up-to-2_90-GHz
http://www.zdnet.com/blog/hardware/intel-core-i5-680-and-core-i7-870s-cpus-in-the-pipeline/7748

Samsung Galaxy Mini

Chipset Qualcomm MSM7227

CPU 600 MHz ARMv6

GPU Adreno 200

Architecture ARMv6
Bit width 32 Cores designed by ARM Holdings ARM11

ARM11 Implements ARMv6 ISA
Arm Ltd. has pulled out all of the stops with its new ARM11 micro-architecture, which implements the new ARM6 instruction set. Whether it's SIMD multimedia acceleration, floating point, compact Thumb16 instructions, or even Java, the new micro-architecture does everything. Delivered as IP, ARM11 can include such options as the vector floating-point coprocessor. ARM11 uses a new eight-stage pipeline that supports out-of-order execution. Initial top speed is 750 MHz via a 1.3-µm process. Next-generation 1.0-µm versions should hit 1-GHz speeds.

This micro-architecture continues the ARM family's use of the AMBA bus and retains the low-power operation that ARM is known for. Power consumption is under 0.4 mW/MHz, including cache controllers. Combined with the multimedia instructions, it makes an excellent choice for portable, wireless multimedia devices. The ARM11 memory subsystem improves task switching. It also reduces bus accesses, thereby lowering power requirements.

New load/store exclusive instructions allow more efficient semaphore implementation, making ARM11 well suited for multiprocessor environments. Enhanced exception handling is provided via new vectored interrupt support.

The ARM11 architecture tolerates unaligned data. In addition, a status bit controls big-endian and little-endian operation, enabling the processor to work well with non-ARM processors and DSPs. Overall, the ARM11 offers a significant architectural advance.

ARM Processor Architecture
ARM architecture forms the basis for every ARM processor. Over time, the ARM architecture has evolved to include architectural features to meet the growing demand for new functionality, high performance and the needs of new and emerging markets. There are currently two ARMv8 profiles, the ARMv8-A architecture profile for high performance markets such as mobile and enterpise, and the ARMv8-R architecture profile for embedded applications in automotive and industrial control.
The ARM architecture supports implementations across a wide range of performance points, establishing it as the leading architecture in many market segments. The ARM architecture supports a very broad range of performance points leading to very small implementations of ARM processors, and very efficient implementations of advanced designs using state of the art micro-architecture techniques. Implementation size, performance, and low power consumption are key attributes of the ARM architecture.

ARM developed architecture extensions to provide support for Java acceleration (Jazelle®), security (TrustZone®), SIMD, and Advanced SIMD (NEON™) technologies. The ARMv8-architecture adds a Cryptographic extension as an optional feature.

The ARM architecture is similar to a Reduced Instruction Set Computer (RISC) architecture, as it incorporates these typicalRISC architecture features:
- A uniform register file load/store architecture, where data processing operates only on register contents, not directly on memory contents.
- Simple addressing modes, with all load/store addresses determined from register contents and instruction fields only.
Enhancements to a basic RISC architecture enable ARM processors to achieve a good balance of high performance, small code size, low power consumption and small silicon area.
Reference
http://en.wikipedia.org/wiki/Samsung_Galaxy_Mini
http://en.wikipedia.org/wiki/ARM_architecture#ARM_cores
http://www.gsmarena.com/samsung_galaxy_mini_s5570-3725.php
http://www.arm.com/products/processors/instruction-set-architectures/index.php
http://electronicdesign.com/dsps/hardware-directory-arm11-implements-armv6-isa

Computer Architecture and Organization 2556/2

วันอาทิตย์ที่ 30 มีนาคม พ.ศ. 2557

Pipeline และ CPU

วันเสาร์ที่ 22 มีนาคม พ.ศ. 2557

Architecture Computer acer aspire 4755g And Samsung Galaxy Mini

Processor / Chipset

2. DATA HAZARDS : เกิดจากการทีชุดคำ สั่งทับซ้อนกัน ไม่สามารถที่จะ EXECUTE

3.CONTROL HAZARD : PIPE LINE ที่แยกชุดคำสั่งไม่สามารถควบคุมสัญญาณได้

ARM Processor Architecture

Chipset	Qualcomm MSM7227
CPU	600 MHz ARMv6
GPU	Adreno 200