| Personal Software Process Stages | 预估耗时(分) | 实际耗时(分) | |
|---|---|---|---|
| Planning | 计划 | 50 | 60 | 
| Estimate | · 估计这个任务需要多少时间 | 50 | 60 | 
| Development | 开发 | 1490 | 1520 | 
| Analysis | · 需求分析(包括学习新技术) | 500 | 480 | 
| Design Spec | · 生成设计文档 | 30 | 40 | 
| Design Review | · 设计复审 | 30 | 50 | 
| Coding Standard | · 代码规范 (为目前的开发制定合适规范) | 30 | 35 | 
| Design | · 具体设计 | 300 | 300 | 
| Coding | · 具体编码 | 600 | 620 | 
| Code Review | · 代码复审 | 200 | 190 | 
| Test | · 测试(自我测试,修改代码,提交修改) | 200 | 215 | 
| Reporting | 报告 | 50 | 60 | 
| Test Repor | · 测试报告 | 40 | 45 | 
| Size Measurement | · 计算工作量 | 5 | 10 | 
| Postmortem & Process Improvement Plan | · 事后总结, 并提出过程改进计划 | 5 | 5 | 
| 合计 | 1590 | 1640 | 
  public static void main(String[] args) {
      int number=0;
      int i=0;
      String code=geturlcode(url);
      String code1=null;
      //将源代码转换为Doc
      Document doc= Jsoup.parse(code);
      //筛选子网页连接
      Elements ele=(doc).select("dt[class=ptitle]");
      number=ele.size();
      File f=new File("result.txt");
      try {
          PrintWriter output = new PrintWriter(f);
      for(Element link : ele){
          String test="http://openaccess.thecvf.com/";
          link=link.child(1);
          test=test+link.attr("href");
          String text =null;
          //读取子网页内容
          text=geturlcode(test);
          text=gettext(text);
              output.print(i);
              i++;
              output.print("\r\n");
              output.print(text);
      }
      output.close();
      }catch (Exception e) {
          System.out.println("写文件错误");
      }
      System.out.println("成功");
  }
//筛选标题与摘要
  public static String gettext(String code)
  {
      String a,b,c;
      Document doc=Jsoup.parse(code);
      Elements ele1=(doc).select("div[id=papertitle]");
      a=ele1.text();
      Elements ele2=(doc).select("div[id=abstract]");
      b=ele2.text();
      c="Title: "+a+"\r\n"+"Abstract: "+b+"\r\n"+"\r\n"+"\r\n";
      return c;
  }
  //爬取网页源代码
  public static String geturlcode(String url) {
      //定义url
      URL newurl = null;
      //定义连接
      URLConnection urlcon = null;
      //定义输入
      InputStream input = null;
      //定义读取
      InputStreamReader reader = null;
      //定义输出
      BufferedReader breader = null;
      StringBuilder code = new StringBuilder();
      try {
          //获取地址
          newurl = new URL(url);
          //获取连接
          urlcon = newurl.openConnection();
          //获取输入
          input = urlcon.getInputStream();
          //读取输入
          reader = new InputStreamReader(input);
          //输出
          breader = new BufferedReader(reader);
          String temp = null;
          while ((temp = breader.readLine()) != null) {
              code.append(temp + "\r\n");
          }
      } catch (MalformedURLException e) {
          e.printStackTrace();
      } catch (IOException e) {
          e.printStackTrace();
      }
      return code.toString();
  }
}



for(int i=0;i<args.length;i+=2){
            switch(args[i]){
                /*-w 参数设定是否采用不同权重计数*/
                case "-w":
                    if(Integer.parseInt(args[i+1])==1){
                        System.out.println("采用权重计数。");
                        cntByWeight=true;
                    }else{
                        System.out.println("不采用权重计数。");
                        cntByWeight=false;
                    }
                    break;
                /*-i 参数设定读入文件的存储路径*/
                case "-i":
                    inPathname=args[i+1];
                    System.out.println("读入文件路径为"+inPathname+"。");
                    break;
                /*-o 参数设定生成文件的存储路径*/
                case "-o":
                    outPathname=args[i+1];
                    System.out.println("生成文件路径为"+outPathname+"。");
                    break;
                /*-m 参数设定统计的词组长度*/
                /*使用词组词频统计功能时,不再统计单词词频,而是统计词组词频,但不影响单词总数统计*/
                /*未出现 -m 参数时,不启用词组词频统计功能,默认对单词进行词频统计*/
                case "-m":
                    phraseLength=Integer.parseInt(args[i+1]);
                    System.out.println("采用词组词频统计功能,词组长度为"+phraseLength+"。");
                    break;
                /*-n 参数设定输出的单词数量*/
                /*未出现 -n 参数时,不启用自定义词频统计输出功能,默认输出10个*/
                case "-n":
                    needNum=Integer.parseInt(args[i+1]);
                    System.out.println("输出的单词/词组数量为"+needNum+"。");
                    break;
            }
        }
BufferedReader br = new BufferedReader(new FileReader(file));
                while (br.readLine() != null) {
                    try {
                        while ((sb = br.readLine()) != null) {
                            if (sb.length() == 0) {
                                break;
                            }
                            characters += (sb.length() + 1);
                        }
                        characters -= 17;
                        br.readLine();
                    } catch (IOException e) {
                        e.printStackTrace();
                    } finally {
                        try {
                            read.close();
                        } catch (IOException e) {
                            e.printStackTrace();
                        }
                    }
                }
                StringBuilder sb = new StringBuilder();
                while (br.readLine() != null) {
                    String temp = null;
                    try {
                        while ((temp = br.readLine()) != null) {
                            if (temp.length() == 0) {
                                break;
                            }
                            sb.append(temp);
                            sb.append(" ");//每行结束多读一个空格
                        }
                        words -= 2;
                        br.readLine();
                    } catch (IOException e) {
                        e.printStackTrace();
                    } finally {
                        try {
                            read.close();
                        } catch (IOException e) {
                            e.printStackTrace();
                        }
                    }
                }
                ……
代码略。
代码略。
    public static Map<String, String> sortMapByValue(Map<String, String> oriMap) {
        if (oriMap == null || oriMap.isEmpty()) {
            return null;
        }
        Map<String, String> sortedMap = new LinkedHashMap<String, String>();
        List<Map.Entry<String, String>> entryList = new ArrayList<Map.Entry<String, String>>(oriMap.entrySet());
        entryList.sort(new MapValueComparator());
        Iterator<Map.Entry<String, String>> iter = entryList.iterator();
        Map.Entry<String, String> tmpEntry = null;
        while (iter.hasNext()) {
            tmpEntry = iter.next();
            sortedMap.put(tmpEntry.getKey(), tmpEntry.getValue());
        }
        return sortedMap;
    }
public class MapValueComparator implements Comparator<Map.Entry<String, String>> {
    @Override
    /*负整数:当前对象的值 < 比较对象的值 , 位置排在前
    * 零:当前对象的值 = 比较对象的值 , 位置不变
    *正整数:当前对象的值 > 比较对象的值 , 位置排在后
    */
    public int compare(Map.Entry<String, String> me1, Map.Entry<String, String> me2) {
        int flag=0;
        if(Long.parseLong(me2.getValue()) > Long.parseLong(me1.getValue())){
            flag=1;
        }else if(Long.parseLong(me1.getValue()) > Long.parseLong(me2.getValue())){
            flag=-1;
        }else if(Long.parseLong(me1.getValue()) == Long.parseLong(me2.getValue())){
            flag=me1.getKey().compareTo(me2.getKey());
        }
        return flag;
    }
}
    public static String[] replaceNull(String[] str) {
        //用StringBuffer来存放数组中的非空元素,用“;”分隔
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < str.length; i++) {
            if ("".equals(str[i])) {
                continue;
            }
            sb.append(str[i]);
            if (i != str.length - 1) {
                sb.append(";");
            }
        }
        //用String的split方法分割,得到数组
        str = sb.toString().split(";");
        return str;
    }
    public static Map<String, String> countPhraseFrequency(String pathname, long phraseLength,boolean cntByWeight) {
        Map<String, String> map = new HashMap<String, String>();
        boolean flag = false;
        try {
            String encoding = "UTF-8";
            File file = new File(pathname);
            if (file.isFile() && file.exists()) {
                InputStreamReader read = new InputStreamReader(new FileInputStream(file), encoding);
                /*读取文件数据*/
                StringBuffer sb = null;
                BufferedReader br1;
                try {
                    br1 = new BufferedReader(new FileReader(file));
                    String temp = br1.readLine();
                    sb = new StringBuffer();
                    while (temp != null) {
                        sb.append(temp);
                        sb.append(" ");//每行结束多读一个空格
                        temp = br1.readLine();
                    }
                } catch (Exception e) {
                    e.printStackTrace();
                }
                /*读取的内容*/
                String info = null;
                if (sb != null) {
                    info = sb.toString();
                }
                /*保留分隔符*/
                Pattern p =Pattern.compile("[^a-zA-Z0-9]");
                Matcher m = null;
                String s[] = new String[0];
                if (info != null) {
                    m = p.matcher(info);
                    s = info.split("[^a-zA-Z0-9]");
                }
                if(s.length > 0)
                {
                    int count = 0;
                    while(count < s.length)
                    {
                        if(m.find())
                        {
                            s[count] += m.group();
                        }
                        count++;
                    }
                }
                s = replaceNull(s);
                /*统计单词个数*/
                for (int i = 0; i < s.length; i++) {
                    StringBuilder content = new StringBuilder();
                    int cnt=0;
                    for (int j = i; cnt<phraseLength&&j<s.length; j++) {
                        if (s[j].length() >= 4 && !s[j].toLowerCase().equals("title:") && !s[j].toLowerCase().equals("abstract:")) {
                            String temp = s[j].substring(0, 4);
                            temp = temp.replaceAll("[^a-zA-Z]", "");
                            if (temp.length() >= 4) {
                                cnt++;
                                if(cnt==phraseLength){
                                    content.append(s[j].substring(0,s[j].length()-1));
                                }else{
                                    content.append(s[j]);
                                }
                            } else {
                                break;
                            }
                        } else if (s[j].toLowerCase().equals("title:")) {
                            flag = true;
                            break;
                        } else if (s[j].toLowerCase().equals("abstract:")) {
                            flag = false;
                            break;
                        } else if(s[j].matches("[^a-zA-Z0-9]")&&cnt>=1){
                            content.append(s[j]);
                        }else{
                            break;
                        }
                        if (cnt==phraseLength) {
                            String phrase = content.toString();
                            if (flag && cntByWeight) {
                                if (map.containsKey(phrase.toLowerCase())) {//判断Map集合对象中是否包含指定的键名
                                    map.put(phrase.toLowerCase(), Integer.parseInt(map.get(phrase.toLowerCase())) + 10 + "");
                                } else {
                                    map.put(phrase.toLowerCase(), 10 + "");
                                }
                            } else {
                                if (map.containsKey(phrase.toLowerCase())) {//判断Map集合对象中是否包含指定的键名
                                    map.put(phrase.toLowerCase(), Integer.parseInt(map.get(phrase.toLowerCase())) + 1 + "");
                                } else {
                                    map.put(phrase.toLowerCase(), 1 + "");
                                }
                            }
                        }
                    }
                }
                /*map排序*/
                map = sortMapByValue(map);
                read.close();
            } else {
                System.out.println("找不到指定的文件");
            }
        } catch (Exception e) {
            System.out.println("读取文件内容出错");
            e.printStackTrace();
        }
        return map;
    }
ExecutorService executor = Executors.newCachedThreadPool();
      //计算字符数
      String finalInPathname = inPathname;
      Future<Long> futureChar = executor.submit(() -> lib.countChar(finalInPathname));
      //计算单词数
      Future<Long> futureWord = executor.submit(() -> lib.countWord(finalInPathname));
      //计算行数
      Future<Long> futureLine = executor.submit(() -> lib.countLines(finalInPathname));
              info = sb.toString();
改为
              if (sb != null) {
                  info = sb.toString();
              }
从图中可以看出BufferedReader的readline方法和String的split、replaceAll方法占了主要开销,消耗最大的函数是词频统计函数。

代码如下:
public void countChar() {
    System.out.println("count Char1");
    long r=lib.countChar("C:\\Users\\Administrator\\Desktop\\test\\test2.txt");
    assertEquals(1040,r);
    System.out.println("count Char2");
    r=lib.countChar("C:\\Users\\Administrator\\Desktop\\test\\test.txt");
    assertEquals(2914,r);
    System.out.println("count Char3");
    r=lib.countChar("C:\\Users\\Administrator\\Desktop\\test\\test3.txt");
    assertEquals(418,r);
}
@Test
public void countWord() {
    System.out.println("count Word1");
    long w=lib.countWord("C:\\Users\\Administrator\\Desktop\\test\\test2.txt");
    assertEquals(208,w);
    System.out.println("count Word2");
    w=lib.countWord("C:\\Users\\Administrator\\Desktop\\test\\test.txt");
    assertEquals(287,w);
    System.out.println("count Word3");
    w=lib.countWord("C:\\Users\\Administrator\\Desktop\\test\\test3.txt");
    assertEquals(42,w);
}
@Test
public void countPhraseFrequency() {
    System.out.println("count Phrase1");
    String f;
    StringBuilder content = new StringBuilder("");
   int i=1;
    Map<String, String> t=lib.countPhraseFrequency("C:\\Users\\Administrator\\Desktop\\test\\test2.txt",5,true);
    Set<String> keys = t.keySet();
    for (String key : keys) {
        content.append("<").append(key).append(">:").append(t.get(key));
        i++;
        if (i > 5)
            break;
        content.append("\r\n");
    }
    f=content.toString();
    assertEquals("<aaaa aaaa aaaa aaaa aaaa>:192\r\n",f);
    System.out.println("count Phrase2");
    i=1;
    content = new StringBuilder("");
    t=lib.countPhraseFrequency("C:\\Users\\Administrator\\Desktop\\test\\test.txt",5,true);
    keys = t.keySet();
    for (String key : keys) {
        content.append("<").append(key).append(">:").append(t.get(key));
        i++;
        if (i > 5)
            break;
        content.append("\r\n");
    }
    f=content.toString();
    assertEquals("<wild with generative adversarial network>:10\r\n" +
            "<clear high-resolution face from>:2\r\n" +
            "<active perception, goal-driven navigation>:1\r\n" +
            "<agent must first intelligently navigate>:1\r\n" +
            "<challenging dataset wider face demonstrate>:1",f);
    System.out.println("count Phrase3");
    i=1;
    content = new StringBuilder("");
    t=lib.countPhraseFrequency("C:\\Users\\Administrator\\Desktop\\test\\test3.txt",4,true);
    keys = t.keySet();
    for (String key : keys) {
        content.append("<").append(key).append(">:").append(t.get(key));
        i++;
        if (i > 4)
            break;
        content.append("\r\n");
    }
    f=content.toString();
    assertEquals("<active perception, goal-driven>:1\r\n" +
            "<commonsense reasoning, long-term>:1\r\n" +
            "<driven navigation, commonsense reasoning>:1\r\n" +
            "<goal-driven navigation, commonsense>:1",f);
}
@Test
public void countLines() {
    System.out.println("count Lines");
    long l=lib.countLines("C:\\Users\\Administrator\\Desktop\\test\\test2.txt");
    assertEquals(4,l);
    System.out.println("count Lines");
    l=lib.countLines("C:\\Users\\Administrator\\Desktop\\test\\test.txt");
    assertEquals(6,l);
    System.out.println("count Lines");
    l=lib.countLines("C:\\Users\\Administrator\\Desktop\\test\\test3.txt");
    assertEquals(2,l);
}
主要对字符统计,行数统计,单词统计,词组词频排序进行测试,每个单元用三个样例进行测试。



由于使用github不熟,所以代码签入比较混乱,commit也比较随意,以后会慢慢改进的。
结对困难
国庆两个人都要回家所以在前一天晚上制定了工作计划,国庆期间很完美的完成了,几乎没遇到什么困难。
彼此交流不足。
进度表
| 领域 | skills | 课前评估 | 第五次实践作业 | 课后评估 | 
|---|---|---|---|---|
| 编程 | 对编程整体的理解 | 3 | 3.3 | 6 | 
| 编程 | 程序理解 | 4 | 4.2 | 7 | 
| 编程 | 单元测试 | 2 | 2.5 | 5 | 
| 编程 | 性能分析 | 1 | 1.2 | 5 | 
| 软件工程 | 需求分析 | 2 | 2.5 | 6 | 
| 软件工程 | 个人源码管理 | 1 | 1.2 | 5 | 
| 职业技能 | 自主学习能力 | 2 | 2.3 | 5 | 
| 职业技能 | 任务计划 | 2 | 2.2 | 6 | 
原文:https://www.cnblogs.com/xr81970/p/9778471.html