copy() 与mmap()使用

September 18th, 2014 by JasonLe's Tech Leave a reply »

之前学了Linux内存管理,知道了内存的虚实地址转换,Linux内存映射这一块有很多应用可以极大的提高文件读写速度,其中提高的方式就是使用mmap的方式,而不是read write的方式。

我们要知道read write的方式需要事先分配一个buffer缓冲区。但是缓冲区非常有讲究,如果分配的过小就意味着os频繁的分配释放,效率比较低。如果buffer分配过大,又会很浪费内存。这也就是mmap()的优势,不仅没有浪费内存,而且速度相当的快。

具体的API解释,大家可以看man page查看.

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <sys/time.h>
#include <string.h>  

#define BUFFER_SIZE 1  

void copy_map()
{
    int fin,fout;   //文件描述符
    void *start;
    void *end;
    struct stat sb;
    if((fin = open("file.in",O_RDONLY)) < 0){
        perror("open error");
        exit(EXIT_FAILURE);
    }
    if((fout = open( "file.out",O_RDWR | O_CREAT | O_TRUNC,00600)) < 0 ){
        perror( "write error" );
        exit( EXIT_FAILURE );
    }  

    fstat(fin,&sb);  

    //这块必须给fout一个需求大小的偏移,因为mmap没有扩展空间的能力
    if(lseek(fout,sb.st_size-1,SEEK_SET) < 0 ){
        exit(EXIT_FAILURE);
    }
    if(write(fout, &sb,1) != 1 ){
        exit(EXIT_FAILURE);
    }     

    start = mmap(NULL,sb.st_size,PROT_READ,MAP_PRIVATE,fin,0);
    if(start == MAP_FAILED)
        return;  

    end = mmap(0,(size_t)sb.st_size,PROT_WRITE,MAP_SHARED,fout,0);
    if(end == MAP_FAILED){
        return ;
    }  

    memcpy(end,start,(size_t)sb.st_size);  

    munmap(start,sb.st_size); //关闭映射
    munmap(end,sb.st_size);  

    close(fin);
    close(fout);
    return;
}  

void copy_RW()
{
    int fin,fout;
    int bytes_read,bytes_write;
    char buffer[BUFFER_SIZE];
    char *ptr;
    if((fin = open("file.in",O_RDONLY)) < 0){
        perror("open error");
        exit(EXIT_FAILURE);
    }
    if((fout = open( "file.out",O_RDWR | O_CREAT | O_TRUNC,00700)) < 0 ){
        perror( "write error" );
        exit( EXIT_FAILURE );
    }  

    while(bytes_read=read(fin,buffer,BUFFER_SIZE)){
        if((bytes_read==-1)&&(errno!=EINTR))
            break;
        else if(bytes_read>0){
            ptr=buffer;
            while(bytes_write=write(fout,ptr,bytes_read)){
                if((bytes_write==-1)&&(errno!=EINTR))
                    break;
                else if(bytes_write==bytes_read)
                    break;
                else if(bytes_write>0){
                    ptr+=bytes_write;
                    bytes_read-=bytes_write;
                }
            }
            if(bytes_write==-1)
               break;
         }
    }  

    close(fin);
    close(fout);
    return;
}

附mmap()用法

void *mmap(void *addr, size_t length, int prot, int flags,
                  int fd, off_t offset);
If addr is NULL, then the kernel chooses the address at which to create the mapping; this is the most portable  method
       of  creating a new mapping.
The  contents  of  a file mapping (as opposed to an anonymous mapping; see MAP_ANONYMOUS below), are initialized using
       length bytes starting at offset offset in the file (or other object) referred to by the file  descriptor  fd.   offset
       must be a multiple of the page size as returned by sysconf(_SC_PAGE_SIZE).
The flags argument determines whether updates to the mapping are visible to other processes mapping the  same  region,
       and  whether updates are carried through to the underlying file.
The  prot argument describes the desired memory protection of the mapping (and must not conflict with the open mode of
       the file).  It is either PROT_NONE or the bitwise OR of one or more of the following flags:

       PROT_EXEC  Pages may be executed.

       PROT_READ  Pages may be read.

       PROT_WRITE Pages may be written.

       PROT_NONE  Pages may not be accessed.
 The flags argument determines whether updates to the mapping are visible to other processes mapping the  same  region,
       and  whether updates are carried through to the underlying file.  This behavior is determined by including exactly one
       of the following values in flags:
MAP_SHARED Share this mapping.  Updates to the mapping are visible to other processes that map this file, and are car‐
                  ried  through  to  the underlying file.  The file may not actually be updated until msync(2) or munmap() is
                  called.
MAP_PRIVATE
                  Create a private copy-on-write mapping.  Updates to the mapping are not visible to other processes  mapping
                  the  same file, and are not carried through to the underlying file.  It is unspecified whether changes made
                  to the file after the mmap() call are visible in the mapped region.

不过这里要注意:
MAP_SHARED标志指定了进程对内存区域的修改会影响到映射文件。而当对flags指定MAP_PRIVATE时,进程会为该映射内存区域创建一个私有副本,对该内存区的所有操作都是在这个副本上进行的,此时对内存区域的修改并不会影响到映射文件。

在mmap两个文件以及设置目的文件长度时都需要源文件的长度。设置目的文件通过lseek()即可完成,如果没有设置目的文件的长度,那么将会产生总线错误(引发信号SIGBUS)。

但是内存映射方式并不是完美的,它所映射的文件只能是固定大小,因为文件所映射的内存区域大小在mmap时通过len已经指定。

另外,文件映射的内存区域的大小必须以页大小为单位。比如系统页大小为4096字节,假定映射文件的大小为20字节,那么该页剩余的4076字节全部被填充为0。虽然通过映射地址可以访问并修改剩余字节,但是任何变动都不会在映射文件中反应出来。由此可见,使用内存映射进行大数据量的拷贝比较有效。